Skip to Content
🧬 BioQuery is in beta. We'd love your feedback!
GuidesAnalysis Types

Analysis Types

BioQuery supports multiple types of genomic analyses across six data sources. Learn when to use each one.

Data Sources

SourceDescriptionData Types
TCGAThe Cancer Genome Atlas - 33 adult cancer typesExpression, Mutations, Survival
TARGETPediatric cancers - 7 cancer typesExpression, Mutations, Clinical
GTExNormal tissue reference - 54 tissuesExpression
CCLECancer Cell Line Encyclopedia - ~1,000 cell linesExpression, Mutations
CPTACClinical Proteomics - 10 cancer typesProtein abundance, Phosphoproteomics
GENIEReal-world clinical - ~40,000 patientsClinical data

Differential Expression

Compare gene expression between two cancer types or conditions.

When to Use

  • Comparing expression between cancer subtypes
  • Investigating tissue-specific expression patterns
  • Validating gene signatures across cancer types

Example Queries

Is DDR1 expression higher in papillary RCC vs clear cell RCC? Compare EGFR expression between LUAD and LUSC How does HER2 expression differ between ER+ and ER- breast cancer?

Statistical Method

  • Test: Wilcoxon rank-sum test (non-parametric)
  • Metric: Fold change (log2 difference in medians)
  • Visualization: Boxplot with individual data points

Interpreting Results

Expression values are log2(TPM+1) normalized. A fold change of 2 means the gene is expressed ~2x higher in one group.

Tumor vs Normal

Compare gene expression in tumor tissue versus matched normal tissue.

When to Use

  • Identifying genes upregulated in cancer
  • Finding potential therapeutic targets
  • Validating known oncogenes/tumor suppressors

Example Queries

Is TP53 upregulated in breast cancer compared to normal? What's the fold change of BRCA1 in ovarian tumors vs normal? Is MYC overexpressed in liver cancer?

Data Sources

  • Tumor: TCGA tumor samples
  • Normal: TCGA matched normal + GTEx normal tissue

Statistical Method

  • Test: Wilcoxon rank-sum test
  • Metric: Fold change and log2 difference
  • Visualization: Grouped boxplot (Tumor vs Normal)

Not all cancer types have matched normal samples. Some comparisons use GTEx data as normal reference.

Mutation Frequency

Calculate how often a gene is mutated in a specific cancer type.

When to Use

  • Identifying driver mutations
  • Understanding mutation landscape of a cancer
  • Finding potential biomarkers

Example Queries

What percentage of glioblastoma has IDH1 mutations? How common is BRAF V600E in melanoma? What's the TP53 mutation rate in colorectal cancer?

Statistical Method

  • Metric: Percentage of samples with mutation
  • Data: TCGA somatic mutation calls (MC3)
  • Visualization: Bar chart with confidence intervals

Mutation Types Included

TypeDescription
MissenseAmino acid change
NonsensePremature stop codon
FrameshiftInsertion/deletion causing frame shift
Splice siteAffects mRNA splicing

Copy number alterations are analyzed separately and not included in mutation frequency calculations.

Survival Analysis

Examine how gene expression relates to patient outcomes.

When to Use

  • Identifying prognostic biomarkers
  • Validating therapeutic targets
  • Understanding disease progression

Example Queries

Does high DDR1 expression predict worse survival in kidney cancer? Is BRCA1 expression associated with overall survival in ovarian cancer? Do patients with high MYC have worse prognosis in lymphoma?

Statistical Method

  • Test: Log-rank test
  • Metric: Hazard ratio (Cox regression)
  • Stratification: Median expression split (high vs low)
  • Visualization: Kaplan-Meier curves

Survival Endpoints

EndpointDescription
Overall Survival (OS)Time to death from any cause
Progression-Free Survival (PFS)Time to disease progression or death
Disease-Specific Survival (DSS)Time to death from cancer

Interpreting Kaplan-Meier Curves

  • Y-axis: Probability of survival
  • X-axis: Time (usually months or years)
  • Curves: One per group (high/low expression)
  • Tick marks: Censored patients (lost to follow-up)
  • Shading: 95% confidence interval

Survival data has limitations: follow-up time varies, some patients are censored, and treatment effects are not accounted for.

Choosing the Right Analysis

Question TypeAnalysisExample
”Is gene X higher in cancer A vs B?”Differential Expression”Is EGFR higher in LUAD vs LUSC?"
"Is gene X upregulated in cancer?”Tumor vs Normal”Is MYC upregulated in breast cancer?"
"How often is gene X mutated?”Mutation Frequency”What’s the TP53 mutation rate in GBM?"
"Does gene X predict survival?”Survival Analysis”Does high DDR1 predict worse survival?"
"Gene X in cell lines?”Cell Line Expression”DDR1 in lung cancer cell lines"
"Gene X protein levels?”Protein Expression”TP53 protein in glioblastoma”

Cell Line Expression (CCLE)

Analyze gene expression across ~1,000 cancer cell lines from the Cancer Cell Line Encyclopedia.

When to Use

  • Studying gene expression in in vitro models
  • Identifying cell lines for drug screening
  • Comparing tumor vs cell line expression
  • Pre-clinical target validation

Example Queries

What is DDR1 expression in lung cancer cell lines? Compare TP53 expression across all CCLE cell lines Which cell line sites have highest EGFR expression?

Data Details

  • Expression: RMA-normalized microarray data
  • Sites: ~30 primary sites (lung, breast, CNS, ovary, etc.)
  • Cell Lines: ~1,000 characterized cell lines
  • Visualization: Boxplot by primary site

CCLE expression uses RMA normalization (not TPM), so values are not directly comparable to TCGA RNA-seq data.

Protein Expression (CPTAC)

Analyze protein abundance from mass spectrometry-based proteomics.

When to Use

  • Understanding post-transcriptional regulation
  • Comparing mRNA vs protein levels
  • Identifying proteins that don’t correlate with mRNA
  • Proteomics-based biomarker discovery

Example Queries

What is TP53 protein abundance in glioblastoma? Compare DDR1 protein levels across CPTAC cancers Does DDR1 mRNA correlate with protein in breast cancer?

Available Cancer Types

CPTAC CodeCancer Type
CCRCCClear cell renal cell carcinoma
GBMGlioblastoma
HNSCCHead and neck squamous cell carcinoma
LSCCLung squamous cell carcinoma
LUADLung adenocarcinoma
PDAPancreatic ductal adenocarcinoma
UCECUterine corpus endometrial carcinoma
BRCABreast cancer
COADColon adenocarcinoma
OVOvarian cancer

Data Details

  • Metric: Log2 ratio (sample vs reference)
  • Method: TMT-labeled mass spectrometry
  • Normalization: Median-centered log2 ratios

Protein abundance values are log2 ratios relative to a pooled reference, not absolute concentrations.

Combining Analyses

For comprehensive characterization of a gene, consider running multiple query types:

  1. Expression: Is the gene overexpressed in the cancer?
  2. Mutations: Is the gene frequently mutated?
  3. Survival: Does expression predict patient outcomes?
  4. Cell Lines: Is the gene expressed in relevant cell line models?
  5. Protein: Does protein abundance match mRNA expression?

This multi-angle approach provides stronger evidence for therapeutic relevance.