TOP

Software
Maintenance light0updated 11 months ago
R
GPL-3.0

TOP constructs a transferable model across gene expression platforms for prospective experiments. Such a transferable model can be trained to make predictions on independent validation data with an accuracy that is similar to a re-substituted model. The TOP procedure also has the flexibility to be adapted to suit the most common clinical response variables, including linear response, binomial and Cox PH models.

README

TOP Due to data scale differences between multiple omics data, a model constructed from a training data tends to have poor prediction power on a validation data. While the usual bioinformatics approach is to re-normalise both the training and the validation data, this step may not be possible due to ethics constrains. TOP avoids re-normalisation of additional data through the use of log-ratio features and thus also enable prediction for single omics samples. The novelty of the TOP procedure…

Source attribution

  • GitHubgithub.com/harry25r/top
  • BioconductorTOP

Related resources

The ASURI (Analysis of SUrvival and patients RIsk prediction based on gene signatures) package discovers marker genes that are related to risk prediction capabilities and to a clinical variable of interest. It uses two main steps, including subsampling glmnet and unicox. The package implements robust functions to discover survival markers related to a clinical phenotype and to predict a risk score, allowing to study the patient's risk based on the gene signatures. Several plots are provided to visualise the relevance of the genes, the risk score, and patient stratification, as well as a robust version of the Kaplan-Meier curves.

03 weeks ago
R
LGPL-3.0

CPSM provides a comprehensive computational pipeline for predicting survival probability and risk groups in cancer patients. The package includes steps for data preprocessing, training/test split, and normalization. It enables feature selection using univariate survival analysis and computes a LASSO-based prognostic index (PI) score. CPSM supports the development of predictive models using various feature sets and offers a suite of visualization tools, including survival curves based on predicted probabilities, barplots for predicted mean and median survival times, KM plots overlaid with individual survival predictions, and nomograms for estimating 1-, 3-, 5-, and 10-year survival probabilities. This makes CPSM a versatile tool for survival analysis in cancer research.

The aim of TCGAbiolinks is : i) facilitate the GDC open-access data retrieval, ii) prepare the data using the appropriate pre-processing strategies, iii) provide the means to carry out different standard analyses and iv) to easily reproduce earlier research results. In more detail, the package provides multiple methods for analysis (e.g., differential expression analysis, identifying differentially methylated regions) and methods for visualization (e.g., survival plots, volcano plots, starburst plots) in order to easily develop complete analysis pipelines.

iPath is the Bioconductor package used for calculating personalized pathway score and test the association with survival outcomes. Abundant single-gene biomarkers have been identified and used in the clinics. However, hundreds of oncogenes or tumor-suppressor genes are involved during the process of tumorigenesis. We believe individual-level expression patterns of pre-defined pathways or gene sets are better biomarkers than single genes. In this study, we devised a computational method named iPath to identify prognostic biomarker pathways, one sample at a time. To test its utility, we conducted a pan-cancer analysis across 14 cancer types from The Cancer Genome Atlas and demonstrated that iPath is capable of identifying highly predictive biomarkers for clinical outcomes, including overall survival, tumor subtypes, and tumor stage classifications. We found that pathway-based biomarkers are more robust and effective than single genes.

Subtypes are defined as groups of samples that have distinct molecular and clinical features. Genomic data can be analyzed for discovering patient subtypes, associated with clinical data, especially for survival information. This package is aimed to identify subtypes that are both clinically relevant and biologically meaningful.

TENET identifies key transcription factors (TFs) and regulatory elements (REs) linked to a specific cell type by finding significantly correlated differences in gene expression and RE DNA methylation between case and control input datasets, and identifying the top genes by number of significant RE DNA methylation site links. It also includes many tools for visualization and analysis of the results, including plots displaying and comparing methylation and expression data and methylation site link counts, survival analysis, TF motif searching in the vicinity of linked RE DNA methylation sites, custom TAD and peak overlap analysis, and UCSC Genome Browser track file generation. A utility function is also provided to download methylation, expression, and patient survival data from The Cancer Genome Atlas (TCGA) for use in TENET or other analyses.