Find open-source science resources

G-quadruplexes (G4s) are unique nucleic acid secondary structures predominantly found in guanine-rich regions and have been shown to be involved in various biological regulatory processes. G4SNVHunter is an R package designed to rapidly identify genomic sequences with G4-forming propensity and to accurately screen user-provided single nucleotide variants—as well as other small-scale variants such as indels and MNVs—for their potential to destabilize these structures. This allows researchers to then screen these critical variants for deeper study, digging into how they might influence biological functions—think gene regulation, for instance—by impairing G4 formation propensity.

012 months ago

MIT + file LICENSE

TRESS

This package is devoted to analyzing MeRIP-seq data. Current functionalities include 1. detect transcriptome wide m6A methylation regions 2. detect transcriptome wide differential m6A methylation regions.

GPL-3 + file LICENSE

TCseq

Quantitative and differential analysis of epigenomic and transcriptomic time course sequencing data, clustering analysis and visualization of the temporal patterns of time course data.

GPL (>= 2)

MotifPeeker

MotifPeeker is used to compare and analyse datasets from epigenomic profiling methods with motif enrichment as the key benchmark. The package outputs an HTML report consisting of three sections: (1. General Metrics) Overview of peaks-related general metrics for the datasets (FRiP scores, peak widths and motif-summit distances). (2. Known Motif Enrichment Analysis) Statistics for the frequency of user-provided motifs enriched in the datasets. (3. Motif Discovery Enrichment Analysis) Statistics for the frequency of ab-initio discovered motifs enriched in the datasets and compared with known motifs.

GPL (>= 3)

mist

mist (Methylation Inference for Single-cell along Trajectory) is a hierarchical Bayesian framework for modeling DNA methylation trajectories and performing differential methylation (DM) analysis in single-cell DNA methylation (scDNAm) data. It estimates developmental-stage-specific variations, identifies genomic features with drastic changes along pseudotime, and, for two phenotypic groups, detects features with distinct temporal methylation patterns. mist uses Gibbs sampling to estimate parameters for temporal changes and stage-specific variations.

MIT + file LICENSE

MEAT

This package estimates epigenetic age in skeletal muscle, using DNA methylation data generated with the Illumina Infinium technology (HM27, HM450 and HMEPIC).

MIT + file LICENSE

knowYourCG

KnowYourCG (KYCG) is a supervised learning framework designed for the functional analysis of DNA methylation data. Unlike existing tools that focus on genes or genomic intervals, KnowYourCG directly targets CpG dinucleotides, featuring automated supervised screenings of diverse biological and technical influences, including sequence motifs, transcription factor binding, histone modifications, replication timing, cell-type-specific methylation, and trait-epigenome associations. KnowYourCG addresses the challenges of data sparsity in various methylation datasets, including low-pass Nanopore sequencing, single-cell DNA methylomes, 5-hydroxymethylation profiles, spatial DNA methylation maps, and array-based datasets for epigenome-wide association studies and epigenetic clocks (<doi:10.1126/sciadv.adw3027>).

AGPL-3

InTAD

The package is focused on the detection of correlation between expressed genes and selected epigenomic signals (i.e. enhancers obtained from ChIP-seq data) either within topologically associated domains (TADs) or between chromatin contact loop anchors. Various parameters can be controlled to investigate the influence of external factors and visualization plots are available for each analysis step.

GPL (>=2)

HiCaptuRe

Capture Hi-C is a set of techniques that enable the detection of genomic interactions involving regions of interest, known as baits. By focusing on selected loci, these approaches reduce sequencing costs while maintaining high resolution at the level of restriction fragments. HiCaptuRe provides tools to import, annotate, manipulate, and export Capture Hi-C data. The package accounts for the specific structure of bait–otherEnd interactions, facilitates integration with other omics datasets, and enables comparison across samples and conditions.

fCCAC

Computational evaluation of variability across DNA or RNA sequencing datasets is a crucial step in genomics, as it allows both to evaluate reproducibility of replicates, and to compare different datasets to identify potential correlations. fCCAC applies functional Canonical Correlation Analysis to allow the assessment of: (i) reproducibility of biological or technical replicates, analyzing their shared covariance in higher order components; and (ii) the associations between different datasets. fCCAC represents a more sophisticated approach that complements Pearson correlation of genomic coverage.

epiRomics

Integrates various levels of epigenomic information, including ChIP-seq, histone modification, ATAC-seq, and RNA-seq data. Regulatory network analysis uses combinatory approaches to infer regions of significance, such as enhancers. Downstream analysis identifies co-occurrence of epigenomic data at regions of interest. Visualization functions display multi-track genomic views with signal overlays. Please contact <ammawla@ucdavis.edu> for suggestions, feedback, or bug reporting.

EpipwR

A quasi-simulation based approach to performing power analysis for EWAS (Epigenome-wide association studies) with continuous or binary outcomes. 'EpipwR' relies on empirical EWAS datasets to determine power at specific sample sizes while keeping computational cost low. EpipwR can be run with a variety of standard statistical tests, controlling for either a false discovery rate or a family-wise type I error rate.

EpiCompare

EpiCompare is used to compare and analyse epigenetic datasets for quality control and benchmarking purposes. The package outputs an HTML report consisting of three sections: (1. General metrics) Metrics on peaks (percentage of blacklisted and non-standard peaks, and peak widths) and fragments (duplication rate) of samples, (2. Peak overlap) Percentage and statistical significance of overlapping and non-overlapping peaks. Also includes upset plot and (3. Functional annotation) functional annotation (ChromHMM, ChIPseeker and enrichment analysis) of peaks. Also includes peak enrichment around TSS.

COCOA

COCOA is a method for understanding epigenetic variation among samples. COCOA can be used with epigenetic data that includes genomic coordinates and an epigenetic signal, such as DNA methylation and chromatin accessibility data. To describe the method on a high level, COCOA quantifies inter-sample variation with either a supervised or unsupervised technique then uses a database of "region sets" to annotate the variation among samples. A region set is a set of genomic regions that share a biological annotation, for instance transcription factor (TF) binding regions, histone modification regions, or open chromatin regions. COCOA can identify region sets that are associated with epigenetic variation between samples and increase understanding of variation in your data.

Chicago

A pipeline for analysing Capture Hi-C data.

atacInferCnv

The package prepares input scATAC-seq data and adapts for copy number variance profiling with InferCNV package usage. It has also various paramters to control the analysis (e.g. external normal reference usage, meta-cells, bin size, etc) and custom plot visualizations.

GPL-3 + file LICENSE

AlphaBeta

AlphaBeta is a computational method for estimating epimutation rates and spectra from high-throughput DNA methylation data in plants. The method has been specifically designed to: 1. analyze 'germline' epimutations in the context of multi-generational mutation accumulation lines (MA-lines). 2. analyze 'somatic' epimutations in the context of plant development and aging.