Find open-source science resources

Toolbox for larger-than-memory scientific computing and visualization, providing efficient out-of-core data structures using files or shared memory, for dense and sparse vectors, matrices, and arrays, with applications to nonuniformly sampled signals and images.

611 month ago

GOaGO

GO-a-GO annotates Gene Ontology terms that are enriched in a given set of gene pairs. The enrichment is calculated from a permutation test for overrepresentation of gene pairs that are associated with a shared term. Such gene pairs are counted for the original set of gene pairs and compared against randomized sets in which the structure of the pairs is preserved, but the gene identities (including the associated terms) are permuted.

12 weeks ago

CNVMetrics

BiologicalQuestion

The CNVMetrics package calculates similarity metrics to facilitate copy number variant comparison among samples and/or methods. Similarity metrics can be employed to compare CNV profiles of genetically unrelated samples as well as those with a common genetic background. Some metrics are based on the shared amplified/deleted regions while other metrics rely on the level of amplification/deletion. The data type used as input is a plain text file containing the genomic position of the copy number variations, as well as the status and/or the log2 ratio values. Finally, a visualization tool is provided to explore resulting metrics.

44 months ago

VariantFiltering

Genetics

Filter genetic variants using different criteria such as inheritance model, amino acid change consequence, minor allele frequencies across human populations, splice site strength, conservation, etc.

47 months ago

fmcsR

Cheminformatics

The fmcsR package introduces an efficient maximum common substructure (MCS) algorithms combined with a novel matching strategy that allows for atom and/or bond mismatches in the substructures shared among two small molecules. The resulting flexible MCSs (FMCSs) are often larger than strict MCSs, resulting in the identification of more common features in their source structures, as well as a higher sensitivity in finding compounds with weak structural similarities. The fmcsR package provides several utilities to use the FMCS algorithm for pairwise compound comparisons, structure similarity searching and clustering.

610 years ago

qsvaR

The qsvaR package contains functions for removing the effect of degration in rna-seq data from postmortem brain tissue. The package is equipped to help users generate principal components associated with degradation. The components can be used in differential expression analysis to remove the effects of degradation.

01 month ago

MsFeatures

The MsFeature package defines functionality for Mass Spectrometry features. This includes functions to group (LC-MS) features based on some of their properties, such as retention time (coeluting features), or correlation of signals across samples. This packge hence allows to group features, and its results can be used as an input for the `QFeatures` package which allows to aggregate abundance levels of features within each group. This package defines concepts and functions for base and common data types, implementations for more specific data types are expected to be implemented in the respective packages (such as e.g. `xcms`). All functionality of this package is implemented in a modular way which allows combination of different grouping approaches and enables its re-use in other R packages.

73 months ago

ggseqalign

Alignment

Simple visualizations of alignments of DNA or AA sequences as well as arbitrary strings. Compatible with Biostrings and ggplot2. The plots are fully customizable using ggplot2 modifiers such as theme().

01 year ago

xenLite

Define a relatively light class for managing Xenium data using Bioconductor. Address use of parquet for coordinates, SpatialExperiment for assay and sample data. Address serialization and use of cloud storage.

11 year ago

rCGH

aCGH

A comprehensive pipeline for analyzing and interactively visualizing genomic profiles generated through commercial or custom aCGH arrays. As inputs, rCGH supports Agilent dual-color Feature Extraction files (.txt), from 44 to 400K, Affymetrix SNP6.0 and cytoScanHD probeset.txt, cychp.txt, and cnchp.txt files exported from ChAS or Affymetrix Power Tools. rCGH also supports custom arrays, provided data complies with the expected format. This package takes over all the steps required for individual genomic profiles analysis, from reading files to profiles segmentation and gene annotations. This package also provides several visualization functions (static or interactive) which facilitate individual profiles interpretation. Input files can be in compressed format, e.g. .bz2 or .gz.

58 years ago

PSMatch

The PSMatch package helps proteomics practitioners to load, handle and manage Peptide Spectrum Matches. It provides functions to model peptide-protein relations as adjacency matrices and connected components, visualise these as graphs and make informed decision about shared peptide filtering. The package also provides functions to calculate and visualise MS2 fragment ions.

64 days ago

MsBackendSql

SQL-based mass spectrometry (MS) data backend supporting also storange and handling of very large data sets. Objects from this package are supposed to be used with the Spectra Bioconductor package. Through the MsBackendSql with its minimal memory footprint, this package thus provides an alternative MS data representation for very large or remote MS data sets.

43 months ago

MetaboAnnotation

High level functions to assist in annotation of (metabolomics) data sets. These include functions to perform simple tentative annotations based on mass matching but also functions to consider m/z and retention times for annotation of LC-MS features given that respective reference values are available. In addition, the function provides high-level functions to simplify matching of LC-MS/MS spectra against spectral libraries and objects and functionality to represent and manage such matched data.

202 months ago

limpca

StatisticalMethod

This package has for objectives to provide a method to make Linear Models for high-dimensional designed data. limpca applies a GLM (General Linear Model) version of ASCA and APCA to analyse multivariate sample profiles generated by an experimental design. ASCA/APCA provide powerful visualization tools for multivariate structures in the space of each effect of the statistical model linked to the experimental design and contrarily to MANOVA, it can deal with mutlivariate datasets having more variables than observations. This method can handle unbalanced design.

21 week ago

GWASTools

SNP

Classes for storing very large GWAS data sets and annotation, and functions for GWAS data cleaning and analysis.

181 year ago

ggsc

DimensionReduction

Useful functions to visualize single cell and spatial data. It supports visualizing 'Seurat', 'SingleCellExperiment' and 'SpatialExperiment' objects through grammar of graphics syntax implemented in 'ggplot2'.

511 week ago

epiSeeker

Annotation

This package implements functions to analyze multi-omics epigenetic data. Data of fragment type and base type are supported by epiSeeker. It provides functions to retrieve the nearest genes around the peak, annotate genomic region of the peak, statistical methods to estimate the significance of overlap among peak data sets, and motif analysis. It incorporates the GEO database for users to compare their own dataset with those deposited in the database. The comparison can be used to infer cooperative regulation and thus can be used to generate hypotheses. Several visualization functions are implemented to summarize the coverage of the peak experiment, average profile and heatmap of peaks binding to TSS regions, genomic annotation, distance to TSS, overlap of peaks or genes, and the single-base resolution epigenetic data by considering the strand, motif, and additional information.

03 weeks ago

demuxmix

SingleCell

A package for demultiplexing single-cell sequencing experiments of pooled cells labeled with barcode oligonucleotides. The package implements methods to fit regression mixture models for a probabilistic classification of cells, including multiplet detection. Demultiplexing error rates can be estimated, and methods for quality control are provided.

52 years ago

consensusSeekeR

BiologicalQuestion

This package compares genomic positions and genomic ranges from multiple experiments to extract common regions. The size of the analyzed region is adjustable as well as the number of experiences in which a feature must be present in a potential region to tag this region as a consensus region. In genomic analysis where feature identification generates a position value surrounded by a genomic range, such as ChIP-Seq peaks and nucleosome positions, the replication of an experiment may result in slight differences between predicted values. This package enables the conciliation of the results into consensus regions.

17 months ago

Chromatograms

The Chromatograms packages defines an efficient infrastructure for storing and handling of chromatographic mass spectrometry data. It provides different implementations of *backends* to store and represent the data. Such backends can be optimized for small memory footprint or fast data access/processing. A lazy evaluation queue and chunk-wise processing capabilities ensure efficient analysis of also very large data sets.

22 days ago

BUSseq

ExperimentalDesign

BUSseq R package fits an interpretable Bayesian hierarchical model---the Batch Effects Correction with Unknown Subtypes for scRNA seq Data (BUSseq)---to correct batch effects in the presence of unknown cell types. BUSseq is able to simultaneously correct batch effects, clusters cell types, and takes care of the count data nature, the overdispersion, the dropout events, and the cell-specific sequencing depth of scRNA-seq data. After correcting the batch effects with BUSseq, the corrected value can be used for downstream analysis as if all cells were sequenced in a single batch. BUSseq can integrate read count matrices obtained from different scRNA-seq platforms and allow cell types to be measured in some but not all of the batches as long as the experimental design fulfills the conditions listed in our manuscript.

14 years ago

RgnTX

AlternativeSplicing

RgnTX allows the integration of transcriptome annotations so as to model the complex alternative splicing patterns. It supports the testing of transcriptome elements without clear isoform association, which is often the real scenario due to technical limitations. It involves functions that do permutaion test for evaluating association between features and transcriptome regions.

Organism.dplyr

Annotation

This package provides an alternative interface to Bioconductor 'annotation' resources, in particular the gene identifier mapping functionality of the 'org' packages (e.g., org.Hs.eg.db) and the genome coordinate functionality of the 'TxDb' packages (e.g., TxDb.Hsapiens.UCSC.hg38.knownGene).

cummeRbund

HighThroughputSequencing

Allows for persistent storage, access, exploration, and manipulation of Cufflinks high-throughput sequencing data. In addition, provides numerous plotting functions for commonly used visualizations.

zinbwave

ImmunoOncology

Implements a general and flexible zero-inflated negative binomial model that can be used to provide a low-dimensional representations of single-cell RNA-seq data. The model accounts for zero inflation (dropouts), over-dispersion, and the count nature of the data. The model also accounts for the difference in library sizes and optionally for batch effects and/or other covariates, avoiding the need for pre-normalize the data.

zenith

RNASeq

Zenith performs gene set analysis on the result of differential expression using linear (mixed) modeling with dream by considering the correlation between gene expression traits. This package implements the camera method from the limma package proposed by Wu and Smyth (2012). Zenith is a simple extension of camera to be compatible with linear mixed models implemented in variancePartition::dream().

ZarrArray

The ZarrArray package leverages the Rarr package to bring Zarr datasets in R as DelayedArray objects. The main class in the package is the ZarrArray class. A ZarrArray object is an array-like object that represents a Zarr dataset in R. ZarrArray objects are DelayedArray derivatives and therefore support all operations (delayed or block-processed) supported by DelayedArray objects.

yarn

Expedite large RNA-Seq analyses using a combination of previously developed tools. YARN is meant to make it easier for the user in performing basic mis-annotation quality control, filtering, and condition-aware normalization. YARN leverages many Bioconductor tools and statistical techniques to account for the large heterogeneity and sparsity found in very large RNA-seq experiments.

yamss

MassSpectrometry

Tools to analyze and visualize high-throughput metabolomics data aquired using chromatography-mass spectrometry. These tools preprocess data in a way that enables reliable and powerful differential analysis. At the core of these methods is a peak detection phase that pools information across all samples simultaneously. This is in contrast to other methods that detect peaks in a sample-by-sample basis.

XVector

Provides memory efficient S4 classes for storing sequences "externally" (e.g. behind an R external pointer, or on disk).

XeniumIO

The package allows users to readily import spatial data obtained from the 10X Xenium Analyzer pipeline. Supported formats include 'parquet', 'h5', and 'mtx' files. The package mainly represents data as SpatialExperiment objects.

Wrench

Normalization

Wrench is a package for normalization sparse genomic count data, like that arising from 16s metagenomic surveys.

wpm

GUI

The Well-Plate Maker (WPM) is a shiny application deployed as an R package. Functions for a command-line/script use are also available. The WPM allows users to generate well plate maps to carry out their experiments while improving the handling of batch effects. In particular, it helps controlling the "plate effect" thanks to its ability to randomize samples over multiple well plates. The algorithm for placing the samples is inspired by the backtracking algorithm: the samples are placed at random while respecting specific spatial constraints.

vtpnet

Network

variant-transcription factor-phenotype networks, inspired by Maurano et al., Science (2012), PMID 22955828

vsn

Microarray

The package implements a method for normalising microarray intensities from single- and multiple-color arrays. It can also be used for data from other technologies, as long as they have similar format. The method uses a robust variant of the maximum-likelihood estimator for an additive-multiplicative error model and affine calibration. The model incorporates data calibration step (a.k.a. normalization), a model for the dependence of the variance on the mean intensity and a variance stabilizing data transformation. Differences between transformed intensities are analogous to "normalized log-ratios". However, in contrast to the latter, their variance is independent of the mean, and they are usually more sensitive and specific in detecting differential transcription.

Voyager

GeneExpression

SpatialFeatureExperiment (SFE) is a new S4 class for working with spatial single-cell genomics data. The voyager package implements basic exploratory spatial data analysis (ESDA) methods for SFE. Univariate methods include univariate global spatial ESDA methods such as Moran's I, permutation testing for Moran's I, and correlograms. Bivariate methods include Lee's L and cross variogram. Multivariate methods include MULTISPATI PCA and multivariate local Geary's C recently developed by Anselin. The Voyager package also implements plotting functions to plot SFE data and ESDA results.

visiumStitched

This package provides helper functions for working with multiple Visium capture areas that overlap each other. This package was developed along with the companion example use case data available from https://github.com/LieberInstitute/visiumStitched_brain. visiumStitched prepares SpaceRanger (10x Genomics) output files so you can stitch the images from groups of capture areas together with Fiji. Then visiumStitched builds a SpatialExperiment object with the stitched data and makes an artificial hexagonal grid enabling the seamless use of spatial clustering methods that rely on such grid to identify neighboring spots, such as PRECAST and BayesSpace. The SpatialExperiment objects created by visiumStitched are compatible with spatialLIBD, which can be used to build interactive websites for stitched SpatialExperiment objects. visiumStitched also enables casting SpatialExperiment objects as Seurat objects.

VisiumIO

The package allows users to readily import spatial data obtained from either the 10X website or from the Space Ranger pipeline. Supported formats include tar.gz, h5, and mtx files. Multiple files can be imported at once with *List type of functions. The package represents data mainly as SpatialExperiment objects.

VDJdive

This package provides functions for handling and analyzing immune receptor repertoire data, such as produced by the CellRanger V(D)J pipeline. This includes reading the data into R, merging it with paired single-cell data, quantifying clonotype abundances, calculating diversity metrics, and producing common plots. It implements the E-M Algorithm for clonotype assignment, along with other methods, which makes use of ambiguous cells for improved quantification.

VariantTools

Genetics

Explore, diagnose, and compare variant calls using filters.

VariantAnnotation

DataImport

Annotate variants, compute amino acid coding changes, predict coding outcomes.

VAExprs

A fundamental problem in biomedical research is the low number of observations, mostly due to a lack of available biosamples, prohibitive costs, or ethical reasons. By augmenting a few real observations with artificially generated samples, their analysis could lead to more robust and higher reproducible. One possible solution to the problem is the use of generative models, which are statistical models of data that attempt to capture the entire probability distribution from the observations. Using the variational autoencoder (VAE), a well-known deep generative model, this package is aimed to generate samples with gene expression data, especially for single-cell RNA-seq data. Furthermore, the VAE can use conditioning to produce specific cell types or subpopulations. The conditional VAE (CVAE) allows us to create targeted samples rather than completely random ones.

uSORT

ImmunoOncology

This package is designed to uncover the intrinsic cell progression path from single-cell RNA-seq data. It incorporates data pre-processing, preliminary PCA gene selection, preliminary cell ordering, feature selection, refined cell ordering, and post-analysis interpretation and visualization.

updateObject

A set of tools built around updateObject() to work with old serialized S4 instances. The package is primarily useful to package maintainers who want to update the serialized S4 instances included in their package. This is still work-in-progress.

Uniquorn

ImmunoOncology

'Uniquorn' enables users to identify cancer cell lines. Cancer cell line misidentification and cross-contamination reprents a significant challenge for cancer researchers. The identification is vital and in the frame of this package based on the locations/ loci of somatic and germline mutations/ variations. The input format is vcf/ vcf.gz and the files have to contain a single cancer cell line sample (i.e. a single member/genotype/gt column in the vcf file).

UniProt.ws

Annotation

The Universal Protein Resource (UniProt) is a comprehensive resource for protein sequence and annotation data. This package provides a collection of functions for retrieving, processing, and re-packaging UniProt web services. The package makes use of UniProt's modernized REST API and allows mapping of identifiers accross different databases.

UMI4Cats

QualityControl

UMI-4C is a technique that allows characterization of 3D chromatin interactions with a bait of interest, taking advantage of a sonication step to produce unique molecular identifiers (UMIs) that help remove duplication bias, thus allowing a better differential comparsion of chromatin interactions between conditions. This package allows processing of UMI-4C data, starting from FastQ files provided by the sequencing facility. It provides two statistical methods for detecting differential contacts and includes a visualization function to plot integrated information from a UMI-4C assay.

UCSC.utils

A set of low-level utilities to retrieve data from the UCSC Genome Browser. Most functions in the package access the data via the UCSC REST API but some of them query the UCSC MySQL server directly. Note that the primary purpose of the package is to support higher-level functionalities implemented in downstream packages like GenomeInfoDb or txdbmaker.

txdbmaker

A set of tools for making TxDb objects from genomic annotations from various sources (e.g. UCSC, Ensembl, and GFF files). These tools allow the user to download the genomic locations of transcripts, exons, and CDS, for a given assembly, and to import them in a TxDb object. TxDb objects are implemented in the GenomicFeatures package, together with flexible methods for extracting the desired features in convenient formats.

TVTB

The package provides S4 classes and methods to filter, summarise and visualise genetic variation data stored in VCF files. In particular, the package extends the FilterRules class (S4Vectors package) to define news classes of filter rules applicable to the various slots of VCF objects. Functionalities are integrated and demonstrated in a Shiny web-application, the Shiny Variant Explorer (tSVE).