Clinical data
1. Cohort selection and view the data
A cohort can be selected in the top. Once selected, its details will be shown on the right. The data can be viewed as an excel format in the “View the data” section. For a gene expression, since this table is usually a large data, by default it shows the first 1000 lines to avoid over memory usega. The uses can choose to show all the data if they want.
2. Survival analysis
This section examines the association between gene expression and survival outcomes within a selected patient cohort.
- Enter individual gene names line by line or select a custom gene set.
- Choose how to divide patients into high- and low-expression groups:
- Using the median (default)
- Using top 25% vs. bottom 25%
- Using custom-defined thresholds (e.g., top X% vs. bottom Y%)
- Manually specifying the two groups by entering sample names directly
- Select the event type
- Click the start button.
- It generates a results table showing p-values and hazard ratios for each gene, sorted by the hazard ratio.
- Click on any row in the results table to display the corresponding Kaplan–Meier curve.
- Use the histogram feature to visualise gene expression distributions, which can help determine appropriate sample splitting criteria.
The survival events available for analysis (such as overall survival or progression-free survival) depend on the metadata included in your cohort dataset. For more details, please refer to 10-2. How to upload your own cohort section.
Adjustable graph parameters
- The size (width and height) of the figure.
- The size of the X and Y axis/label font size.
- The size of the legend title
- The colour for the high- and low-expression group.
Example Usage video
3. Gene correlation
This section allows you to analyse gene expression correlations within a selected cohort. Users can investigate how the expression level of a specific gene correlates with the expression levels of other genes.
- Enter one name of the target gene (will be shown on the Y-axis of scatter plots).
- Select an analysis mode under Explore type:
- Explore one gene's correlation with specific genes (default):
- In this case, enter the gene names or select a geneset from a Custom Geneset to investigate the correlation (will be shown on the X-axis)
- Explore one gene's correlation with all the genes:
- It calculates correlations with all genes in the cohort. This takes several minutes.
- Explore one gene's correlation with specific genes (default):
- Choose correlation method: Pearson (linear relationships) or Spearman (rank-based).
- Click Start to run analysis. It returns a table showing correlation coefficient and p-value for each gene.
- Clicking any gene in the table displays its scatter plot showing correlation with the target gene.
Adjustable graph parameters
- The size (width and height) of the figure.
- The size of the X and Y axis/label font size.
- Change the background from gray to white
- The colour of the dot
- Show the correlation line
Example Usage video
4. Mutation analysis
This section helps you analyse gene mutations in the selected cohort.
- Input the genes to investigate by:
- Entering gene names one per line in the text box ('Text input')
- Choosing to analyse all genes in the dataset ('Use all genes')
- Selecting a geneset from the Custom Geneset ('Select from custom genesets')
- Filter the samples if necessary
- By default, all cohort samples are included ('Use all samples')
- When choosing 'Use the selected samples by a specific category', you can filter samples using metadata (treatment group, cancer subtype, demographics) for targeted analysis
- Click the start button. A Results table will appear in the bottom left section. The following plots will be generated in the tabs:
- Frequency Plot tab: A bar plot showing the frequency or the counts of mutated genes. The Y-axis displays either mutation count or frequency (%). You can adjust the number of genes to display.
- Survival analysis tab: Select an event for survival analysis. Clicking a gene name displays a Kaplan-Meier curve comparing the wild-type patients and the mutant patients.
- Gene expression plot tab:
This comparison examines the expression levels between the wild-type and mutant groups.
- Click a gene in the mutation analysis results table
- Enter genes in the input field to generate a table listing these genes
- Click a gene from the Input genes table to generate a plot comparing its expression between the wild-type and mutant patients
Adjustable graph parameters
- Frequency Plot
- Plot size (width and height)
- X and Y label size
- Legend size
- Colours for highest and lowest mutant count/frequency
- Number of genes to display
- White background option
- Option to hide scores on each bar
- Survival analysis
- Figure size (width and height)
- X and Y axis/label font size
- Legend title size
- Colours for high and low expression groups
- Gene expression plot
- Plot size (width and height)
- X and Y label size
- Graph title and legend font size
- Colours for wild-type and mutant groups
Example Usage video
5. Gene expression across subtypes
When metadata for the cohort is provided and patients can be divided into subtypes, users can compare gene expression across patient subgroups.
- Enter the genes or select a custom geneset
- Select a category for subtype from the "Group by" drop-down menu
- Click "Start comparing" to compare gene expression across subtypes. Note that visualisation may be slow and cluttered when there are many subtypes in the selected group.
- A result table with statistical scores and p-values will be generated. Statistical scores include W values for two subtypes and H values for three or more subtypes.
- Clicking any row in the table displays a visualisation on the right. Available plot types include Box plot, Violin plot, Swarm plot, or Violin + Swarm plot
Adjustable graph parameters
- Figure size (width and height)
- X and Y axis/label font size
- Graph title size
- Colour palette for each subtype
Example Usage video
6. Signature analysis
This section performs signature analysis on gene expression data from the selected cohort to evaluate the activity or presence of specific biological processes.
- Choose the input type by either:
- Selecting from custom gene sets
- Entering gene names line by line
-
Select the calculation method:
- GSVA (Gene Set Variation Analysis) or ssGSEA (single-sample Gene Set Enrichment Analysis) is available.
-
Click the start button. This generates a result table with scores for each sample.
- Three plots are generated:
- Survival analysis plot tab:
- Generates a Kaplan-Meier plot.
- Allows selection of methods to split samples into high and low-score patients: either by median or comparing Top 25% vs Bottom 25%.
- Score comparison tab:
- Select the group to compare signature scores and click the start button.
- Four plot types are available (Box plot, Violin plot, Swarm plot, and Violin + Swarm plot).
- Distribution tab:
- Generates a histogram of signature scores to help determine appropriate sample splitting criteria.
- Survival analysis plot tab:
What is GSVA and ssGSEA?
GSVA (Gene Set Variation Analysis) calculates an enrichment score for each gene set by transforming gene expression data into a pathway activity score across samples. It uses kernel-based density estimation to assess the relative enrichment of a gene set, comparing it to the overall expression distribution in the dataset.
ssGSEA (single-sample Gene Set Enrichment Analysis), on the other hand, ranks genes within each sample and calculates an enrichment score based on the ranked positions of genes in a gene set. It evaluates how consistently genes of a set are positioned at the top or bottom of the ranked gene list for each individual sample.
Adjustable graph parameters
- Survival analysis plot
- Figure size (width and height)
- X and Y axis/label font size
- Legend title size
- Colors for high and low expression groups
- Score comparison plot
- Figure size (width and height)
- X and Y axis/label font size
- Legend title size
- Color palette
- Histogram
- Figure size (width and height)
- X and Y axis/label font size
- Legend title size
- Histogram bin color
- Number of histogram bins
Example Usage video
7. Deconvolution analysis
This section provides deconvolution analysis from patients' gene expression data (typically bulk RNAseq). While several deconvolution tools exist, two are available here: MCPcounter and xCell.
- Select the cohort
- Choose either MCPcounter or xCell as your method, then click the start button. A deconvolution result table will appear on the right.
The interface provides two additional analysis tabs.
Example Usage video
7.1. Generating a heatmap/barplot
First, you can generate a heatmap or barplot to visualise cell type fractions across samples.
- Choose which samples to use:
- "All samples": Includes every sample in the dataset
- "Filter from metadata":
- Filter samples by subtypes using information from the metadata files
- The number of samples selected will be displayed after choosing a category
- "Text input": Enter sample IDs (patient IDs) line by line in the text box, ensuring no extra spaces
- Choose which cell types to include:
- "All cell types": Includes all available cell types
- "Select cell types":
- A table of available cell types will be displayed
- Click on specific cell types you want to include in the plot
- Click the "Show plots" button to generate a heatmap and barplot on the right
Adjustable graph parameters
- Figure size (width and height)
- X and Y axis font size
- Legend font size
- Colours for high and low deconvolution score (for the heatmap)
7.2. Exploring correlations between gene expression and cell type abundance
This feature allows you to analyse the relationship between specific gene expression levels and cell type abundance.
- Enter the gene names line by line (or choose a geneset)
- Select a cell type to investigate
- Choose a correlation method, either Pearson or Spearman
- Click the start button. This calculates the correlation between gene expression and cell type abundance, generating a table with correlation coefficients and p-values.
- Clicking any row in the result table generates a scatter plot.
Adjustable graph parameters
- Figure size (width and height)
- X and Y axis font size
- Legend font size
- Dot and correlation line colours
- Option to display or hide the correlation line
- Option to use a white background
8. Compare cohorts
In this section, you can compare gene expression or mutation frequency across different cohorts.
- (You do not have to select a cohort in this section)
- Enter gene names line by line, or choose a custom geneset
- The list of genes will appear. Click the gene you want to investigate.
- Select the cohorts you want to include.
- All cohorts stored in OmicsBridge will be listed.
- You can select multiple cohorts, but note that generating figures may take longer, especially for gene expression analysis. (ex. When selecting all TCGA cohorts, it takes ~30 min for the mutation frequency analysis and 2~3 minutes for the gene expression analysis.)
- Click the start button in either tab ("Mutation Frequency" or "Gene expression")
- In "Mutation Frequency," a bar plot will display the number (or percentage) of patients with mutations in the selected gene across the chosen cohorts.
- In "Gene expression," a box plot is generated that compares gene expression levels across the selected cohorts
Adjustable graph parameters
- Figure size (width and height)
- Font size for x-axis, y-axis, and legend
- Colour scheme for the bar plot (mutation frequency analysis)
- Option to use a white background
Example Usage video
9. Cancer Gene Census (COSMOS)
OmicsBridge includes a database of cancer predisposition genes sourced from Cancer Gene Census from COSMIC. This feature helps you identify which genes from your input are known to be associated with cancer predisposition.
- Enter gene names line by line.
- If any of the genes you entered are associated with cancer predisposition, they will appear in the results table. If none match, the complete database will be displayed instead.
Example Usage video
10. Manage the cohort database
The users can manage the cohort database and upload or delete datasets in the “Cohort database” tab.
10.1. Pre-installed cohort
TCGA data (34 cancer types, see the table below) is available as pre-installed cohorts. This includes mRNA sequencing results, clinical information, metadata and mutation data downloaded from UCSC Xena, with gene expression values transformed as log2(RSEM normalised count+1).
TCGA abbreviation
Abbreviation | Cancer type |
---|---|
TCGA_ACC | Adrenocortical carcinoma |
TCGA_BLCA | Bladder Urothelial Carcinoma |
TCGA_BRCA | Breast invasive carcinoma |
TCGA_CESC | Cervical squamous cell carcinoma and endocervical adenocarcinoma |
TCGA_CHOL | Cholangiocarcinoma |
TCGA_COAD | Colon adenocarcinoma |
TCGA_DLBC | Lymphoid Neoplasm Diffuse Large B-cell Lymphoma |
TCGA_ESCA | Esophageal carcinoma |
TCGA_GBM | Glioblastoma multiforme |
TCGA_HNSC | Head and Neck squamous cell carcinoma |
TCGA_KICH | Kidney Chromophobe |
TCGA_KIRC | Kidney renal clear cell carcinoma |
TCGA_KIRP | Kidney renal papillary cell carcinoma |
TCGA_LAML | Acute Myeloid Leukemia |
TCGA_LGG | Brain Lower Grade Glioma |
TCGA_LIHC | Liver hepatocellular carcinoma |
TCGA_LUAD | Lung adenocarcinoma |
TCGA_LUSC | Lung squamous cell carcinoma |
TCGA_MESO | Mesothelioma |
TCGA_PAAD | Pancreatic adenocarcinoma |
TCGA_PCPG | Pheochromocytoma and Paraganglioma |
TCGA_PRAD | Prostate adenocarcinoma |
TCGA_READ | Rectum adenocarcinoma |
TCGA_SARC | Sarcoma |
TCGA_SKCM | Skin Cutaneous Melanoma |
TCGA_TGCT | Testicular Germ Cell Tumors |
TCGA_THCA | Thyroid carcinoma |
TCGA_THYM | Thymoma |
TCGA_UCEC | Uterine Corpus Endometrial Carcinoma |
TCGA_UCS | Uterine Carcinosarcoma |
TCGA_UVM | Uveal Melanoma |
TCGA_COADREAD | Colon and Rectal Cancer |
TCGA_GBMLGG | lower grade glioma and glioblastoma |
TCGA_LUNG | Lung Cancer |
10.2.How to upload an own cohort
The users can upload their own cohort and analyse it here. Three files (Gene expression, Clinical data a d Metadata) should be uploaded. Optionally, mutation data can be added. Each data has to follow the following data format.
10.2.1 Gene expression
A tab-delimited table of the gene expression of each sample (genes × samples(patients)) from bulk RNAseq (or microarray).
- Ensure the data is already normalised before uploading, as the interface does not perform normalisation automatically.
- Rows (index): gene names.
- Columns (headers): sample names that match those used in your clinical data and metadata.
Example
10.2.2. Patient survival information
A tab-delimited table containing the information of overall survival, progression-free survival, etc (those needed for generating a Kaplan-Meier curve or survival analysis). Please follow these rules.
- The first column must contain sample IDs and should have the header named
sample
(in all lowercase). All sample IDs should exactly match those used in your gene expression and clinical data. -
All other columns must represent pairs of event data:
One column for the event status (censoring), with binary values: 1 (event occurred) or 0 (censored). One corresponding column for the event time (in days), labelled with the same event name followed by.time
.
For Example, when you have Overall Survival (OS) data, use one column namedOS
for event status and use another column namedOS.time
for the number of days until the event or censoring. Similary, for other types of events (e.g., DSS, DFI, PFI), follow the same format.DSS
andDSS.time
,DFI
andDFI.time
,PFI
andPFI.time
, etc. -
You may include other columns in the dataset that do not follow the event/time pair format. These columns will be safely ignored and will not affect the analysis.
Example
10.2.3. Metadata
Please upload a tab-delimited (.tsv) table containing metadata for the samples (patients) in your cohort. This may include information such as treatment condition, gender, grade, or cancer subtype.
- The first column must contain the sample IDs, and the header for this column must be
sample
(all lowercase). All sample IDs should exactly match those used in your gene expression and clinical data.
If you do not have any metadata to include, please upload a .tsv file that contains only the sample IDs in the first column with the header sample
. This ensures consistency and allows the interface to process the data correctly.
Example
10.2.4. Mutation data
If you have information about which genes are mutated in which patients, you can upload this as a TSV file.
- Similar to the patient survival information and metadata, the first column must contain the sample IDs, with the header
sample
(all lowercase). - The second column contains gene names, with the header
id
.
These two columns are sufficient. Any additional columns will be ignored.
Example
10-3. Edit or delete the cohort
10.3.1. Editing
- Go to the "Registered cohort" table in the Cohort database section.
- Edit the table by double-clicking on the desired field.
- After making your changes, click the "Save changes" button. When you see the message "saved!", your edits have been successfully applied.
10.3.2. Delete
- Go to the "Registered cohort" table in the Cohort database section.
- Select the row(s) you wish to delete.
- Click the "Delete selected data" button.