Package 'OncoSubtype' reference manual

Title:	Predict Cancer Subtypes Based on TCGA Data using Machine Learning Method
Description:	Provide functionality for cancer subtyping using nearest centroids or machine learning methods based on TCGA data.
Authors:	Dadong Zhang <[email protected]>
Maintainer:	Dadong Zhang <[email protected]>
License:	GPL-3
Version:	1.0.0
Built:	2025-03-19 04:33:19 UTC
Source:	https://github.com/dadongz/oncosubtype

Predict the subtypes of selected cancer type based published papers

Description

Predict the subtypes of selected cancer type based published papers

Usage

centroids_subtype(data, disease = "LUSC")
centroids_subtype(data, disease = "LUSC")

Arguments

`data`	data set to predict the subtypes which is a numeric matrix with row names of features and column names of samples
`disease`	character string of the disease to predict subtypes, currently support 'LUSC', 'LUAD'

Value

an object of class "SubtypeClass" with four slots: genes used for predictiong, predicted subtypes of samples, a matrix of predicting scores, and the method.

Examples

## Not run: 
library(OncoSubtype)
data <- get_median_centered(example_fpkm)
data <- assays(data)$centered
rownames(data) <- rowData(example_fpkm)$external_gene_name
centroids_subtype(data, disease = 'HNSC')

## End(Not run)
## Not run: 
library(OncoSubtype)
data <- get_median_centered(example_fpkm)
data <- assays(data)$centered
rownames(data) <- rowData(example_fpkm)$external_gene_name
centroids_subtype(data, disease = 'HNSC')

## End(Not run)

select highly variable genes from a expression matrix

Description

select highly variable genes from a expression matrix

Usage

get_hvg(data, top = 1000)
get_hvg(data, top = 1000)

Arguments

`data`	a (normalized) matrix with rownames of features and colnames of samples
`top`	number of top highly variable genes to output

Value

subset with top ranked genes by the variances

Examples

## Not run: 
library(OncoSubtype)
data <- get_median_centered(example_fpkm)
data <- assays(data)$centered
get_hvg(data)

## End(Not run)
## Not run: 
library(OncoSubtype)
data <- get_median_centered(example_fpkm)
data <- assays(data)$centered
get_hvg(data)

## End(Not run)

convert expression matrix to median-centered

Description

convert expression matrix to median-centered

Usage

get_median_centered(data, log2 = TRUE)
get_median_centered(data, log2 = TRUE)

Arguments

`data`	a numeric matrix or 'S4' object
`log2`	logical, if 'TRUE' $log2(x+1)$ will be applied before median centering. Defaut 'TRUE'.

Value

median-centered express matrix or an object with new slot "centered"

Examples

## Not run: 
get_median_centered(example_fpkm)

## End(Not run)

## Not run: 
get_median_centered(example_fpkm)

## End(Not run)

Predict the subtypes of selected cancer type

Description

Predict the subtypes of selected cancer type

Usage

get_rf_pred(train_set, test_set, method = "rf", seed = NULL)
get_rf_pred(train_set, test_set, method = "rf", seed = NULL)

Arguments

`train_set`	training set with rownames of samples, first column named 'mRNA_subtype' and the rest of features and expression values.
`test_set`	test set with rownames of features and colnames of samples.
`method`	character string of the method to use currently support 'rf'.
`seed`	integer seed to use.

Value

a matrix with column names of subtypes and predicted probabilities.

HNSC predictor centroids

Description

HNSC predictor centroids from https://www.nature.com/articles/nature14129

Usage

hnsc_centroids
hnsc_centroids

Format

A tibble with 728 features and four subtypes.

Load Dataset from GitHub Repository

Description

Downloads a specified dataset from a GitHub repository if it is not already present in the specified local directory, then loads the dataset into the global environment. This function is designed to help manage package size by storing data externally and loading it on-demand.

Usage

load_dataset_from_github(disease, local_dir = path.expand(getwd()))
load_dataset_from_github(disease, local_dir = path.expand(getwd()))

Arguments

`disease`	A character string specifying the disease, which corresponds to the name of the dataset to be loaded (e.g., "LUSC"). The function constructs the filename as `tolower(disease)_tcga.rda` and attempts to load this dataset.
`local_dir`	An optional character string specifying the path to the directory where datasets should be stored locally. If not provided, defaults to a subdirectory named `your_package_name_data` within the user's home directory. Users can specify their own directory path if they prefer to store data in a different location.

Value

Invisible NULL. The function is primarily used for its side effect of loading a dataset into the global environment. However, the function itself does not return the dataset directly.

Examples

## Not run: 
  load_dataset_from_github("LUSC")

## End(Not run)

## Not run: 
  load_dataset_from_github("LUSC")

## End(Not run)

LUAD predictor centroids

Description

LUAD predictor centroids from Wilkerson (2012)

Usage

luad_centroids
luad_centroids

Format

A tibble with 506 features and three subtypes bronchioid, magnoid, and squamoid.

LUSC predictor centroids

Description

LUSC predictor centroids from Wilkerson (2010)

Usage

lusc_centroids
lusc_centroids

Format

A tibble with 208 features and four subtypes: primitive, classical, secretory, and basal.

Predict the subtypes of selected cancer type using machine learning

Description

Predict the subtypes of selected cancer type using machine learning

Usage

ml_subtype(
  data,
  disease = "LUSC",
  method = "rf",
  removeBatch = TRUE,
  seed = NULL
)
ml_subtype(
  data,
  disease = "LUSC",
  method = "rf",
  removeBatch = TRUE,
  seed = NULL
)

Arguments

`data`	data set to predict the subtypes which is a numeric matrix with row names of features and column names of samples
`disease`	character string of the disease to predict subtypes, currently support 'LUSC', 'LUAD', and 'BLCA'.
`method`	character string of the method to use currently support 'rf'.
`removeBatch`	whether do batch effect correction using `limma::removeBatchEffect`, default TRUE.
`seed`	integer seed to use.

Value

An object of class "SubtypeClass" with four slots: genes used for predictiong, predicted subtypes of samples, a matrix of predicting scores, and the method.

References

Wilkerson MJ, Yin X, Hayes D, et al. (2010). “Lung Squamous Cell Carcinoma mRNA Expression Subtypes Are Reproducible, Clinically Important, and Correspond to Normal Cell Types.” Clin Cancer Res, 16(19), 4864-4875.
Wilkerson MJ, Yin X, Hayes D, et al. (2012). “Differential pathogenesis of lung adenocarcinoma subtypes involving sequence mutations, copy number, chromosomal instability, and methylation.” Plos One, 7(5), e36530.
Network TCGA (2015). “Comprehensive genomic characterization of head and neck squamous cell carcinomas.” Nature, 517, e36530.

Examples

## Not run: 
library(OncoSubtype)
data <- get_median_centered(example_fpkm)
data <- assays(data)$centered
rownames(data) <- rowData(example_fpkm)$external_gene_name
ml_subtype(data, disease = 'LUAD', method = 'rf', seed = 123)

## End(Not run)
## Not run: 
library(OncoSubtype)
data <- get_median_centered(example_fpkm)
data <- assays(data)$centered
rownames(data) <- rowData(example_fpkm)$external_gene_name
ml_subtype(data, disease = 'LUAD', method = 'rf', seed = 123)

## End(Not run)

Plot heatmap of the train set or test set

Description

Plot heatmap of the train set or test set

Usage

PlotHeat(object, set = "test", ...)
PlotHeat(object, set = "test", ...)

Arguments

`object`	a SubtypeClass object
`set`	options could be 'test', 'train' or 'both'. Default 'test'.
`...`	Parameters passed to `pheatmap`.

Value

a pheatmap object

Examples

## Not run: 
library(OncoSubtype)
data <- get_median_centered(example_fpkm)
data <- assays(data)$centered
rownames(data) <- rowData(example_fpkm)$external_gene_name
object <- MLSubtype(data, disease = 'LUSC')
PlotHeat(object, set = 'both', fontsize = 10, show_rownames = FALSE, show_colnames = FALSE)

## End(Not run)
## Not run: 
library(OncoSubtype)
data <- get_median_centered(example_fpkm)
data <- assays(data)$centered
rownames(data) <- rowData(example_fpkm)$external_gene_name
object <- MLSubtype(data, disease = 'LUSC')
PlotHeat(object, set = 'both', fontsize = 10, show_rownames = FALSE, show_colnames = FALSE)

## End(Not run)

Package 'OncoSubtype'

Help Index

Predict the subtypes of selected cancer type based published papers

Description

Usage

Arguments

Value

Examples

example FPKM data

Description

Usage

Format

select highly variable genes from a expression matrix

Description

Usage

Arguments

Value

Examples

convert expression matrix to median-centered

Description

Usage

Arguments

Value

Examples

Predict the subtypes of selected cancer type

Description

Usage

Arguments

Value

HNSC predictor centroids

Description

Usage

Format

Load Dataset from GitHub Repository

Description

Usage

Arguments

Value

Examples

LUAD predictor centroids

Description

Usage

Format

LUSC predictor centroids

Description

Usage

Format

Predict the subtypes of selected cancer type using machine learning

Description

Usage

Arguments

Value

References

Examples

Plot heatmap of the train set or test set

Description

Usage

Arguments

Value

Examples

Set the SubtypeClass

Description

Value