Skip to Content
🧬 BioQuery is in beta. We'd love your feedback!
APIQuery Card Schema

Query Card Schema

Query Cards are the core data structure in BioQuery. Every query you run produces a Query Card that captures the full context: the original question, parsed interpretation, data sources, statistical methods, results, and visualizations.

Overview

A Query Card is designed for:

  • Reproducibility: Contains everything needed to recreate the analysis
  • Shareability: Unique URLs for each card
  • Transparency: Full visibility into methods and data sources
  • Extensibility: Schema supports future analysis types

Complete Schema

QueryCard

The top-level object returned by the API.

FieldTypeDescription
idstringUnique identifier (format: bq-YYYY-MM-DD-xxxxx)
versionstringSchema version (currently "1.0")
created_atdatetimeWhen the card was created
created_bystringUser who created the card
parent_idstring?ID of parent card if this is a fork
queryQueryThe user’s query and its interpretation
data_sourceDataSourceInformation about data used
cohortCohortSample/cohort statistics
analysisAnalysisStatistical methods used
resultResultAnalysis results
figureFigure?Visualization data
citationsCitations?Auto-generated citations
reproducibilityReproducibility?Full reproduction details
metadataMetadata?Additional metadata
analysis_paramsobject?User-specified analysis parameters
input_cohortCohortFilter?User’s filter criteria for cohort builder

Query

Contains the original query and its parsed interpretation.

FieldTypeDescription
natural_languagestringThe original user query
parsedParsedQueryStructured interpretation

ParsedQuery

The structured interpretation of the user’s query.

FieldTypeDescription
data_typestringData type: expression, mutation, cnv, protein, unclear
analysis_typestringAnalysis type: differential_expression, mutation_frequency, survival_analysis, pan_cancer_expression, correlation, tumor_vs_normal, ccle_expression, cptac_protein_expression, mrna_protein_correlation, unclear
genestringPrimary gene of interest
gene_idstring?Ensembl gene ID
genesstring[]?Multiple genes (for correlation queries)
cancer_typestring?Single cancer type
cancer_typesstring[]List of cancer types
group_a_labelstring?Label for comparison group A
group_b_labelstring?Label for comparison group B
expression_metricstringmedian or mean
survival_endpointstringOS, PFS, or DFS
needs_clarificationbooleanWhether clarification is needed
clarification_messagestringMessage asking for clarification
confidencefloatConfidence in interpretation (0-1)
assumptions_madestringAny assumptions made during parsing
original_intentQueryIntent?Raw intent extraction (debugging)

Future-Proofing Fields

These fields are reserved for upcoming features and may be empty in current responses.

FieldTypeDescription
variantstring?Specific mutation variant (e.g., "TP53 R175H", "BRAF V600E")
variant_typestring?Variant type: missense, nonsense, frameshift, etc.
transcript_idstring?Ensembl transcript ID (e.g., "ENST00000269305")
isoform_namestring?Isoform name (e.g., "DDR1-201")
cell_typesstring[]?Cell types for single-cell analysis
dataset_idstring?Single-cell dataset reference
signature_namestring?Gene signature name (e.g., "MYC_TARGETS_V1")
signature_genesstring[]?Genes in a multi-gene signature

DataSource

Information about the data source used for the analysis.

FieldTypeDescription
namestringData source name: "TCGA", "TARGET", "GTEx", "CCLE", "CPTAC", "GENIE"
releasestringRelease version (e.g., "GDC Release 39")
accessed_atdatetimeWhen data was accessed
genome_buildstringReference genome (e.g., "hg38")
expression_typestring?Expression data type
expression_normalizationstring?Normalization method
mutation_callerstring?Mutation calling pipeline
bigquery_tablesstring[]Exact BigQuery tables used
bigquery_projectstringBigQuery project ID

Future-Proofing Fields

FieldTypeDescription
data_versionstring?Data version for reproducibility (e.g., "GDC-39.0")
data_checksumstring?Hash to detect if underlying data changed
single_cell_sourcestring?Single-cell data source: "TISCH", "CELLxGENE", "GEO"
single_cell_dataset_idstring?Dataset identifier in the source
cell_annotation_sourcestring?Cell type annotation source

Cohort

Sample and cohort information.

FieldTypeDescription
total_nintegerTotal number of samples
group_aGroupStats?Statistics for group A
group_bGroupStats?Statistics for group B
groupsGroupStats[]?Statistics for multiple groups

GroupStats

FieldTypeDescription
namestringGroup name
nintegerSample count
cancer_typestring?Cancer type
sample_typestring?Sample type (e.g., "Primary Tumor")
filters_appliedstring[]?Filters applied to this group

CohortFilter

User-defined cohort filter criteria for custom analyses.

This schema supports the upcoming cohort builder feature.

FieldTypeDescription
cancer_typesstring[]?Filter by cancer types
stagesstring[]?Filter by stages (e.g., ["Stage I", "Stage II"])
gradesstring[]?Filter by grades (e.g., ["G1", "G2"])
mutation_statusobject?Filter by mutation status (e.g., {"TP53": true, "KRAS": false})
age_range[int, int]?Filter by age range [min, max]
sexstring?Filter by sex: "male", "female", "all"
sample_typesstring[]?Filter by sample types
custom_sql_filterstring?Advanced: raw SQL WHERE clause

Analysis

Statistical analysis details.

FieldTypeDescription
methodstringStatistical method (e.g., "wilcoxon_rank_sum", "log_rank")
method_full_namestring?Full method name
parametersobject?Method-specific parameters
multiple_testing_correctionstringCorrection method: "none", "bonferroni", "fdr_bh"
confidence_levelfloatConfidence level (default: 0.95)
stratification_methodstring?Expression stratification method
stratification_thresholdfloat?Actual threshold used
filtersstring[]?SQL filters applied
software_versionsobject?Software versions for reproducibility

Result

Analysis results.

FieldTypeDescription
summarystringNatural language summary
significantbooleanWhether result is statistically significant
p_valuefloat?P-value
p_value_adjustedfloat?Adjusted p-value
effect_sizefloat?Effect size
effect_size_typestring?Type of effect size
confidence_interval[float, float]?Confidence interval
group_a_statsGroupResultStats?Statistics for group A
group_b_statsGroupResultStats?Statistics for group B
pan_cancer_resultsobject[]?Results for pan-cancer analysis
survival_statsSurvivalStats?Survival-specific statistics

GroupResultStats

FieldTypeDescription
meanfloat?Mean value
medianfloat?Median value
stdfloat?Standard deviation
ci_lowerfloat?Lower confidence interval
ci_upperfloat?Upper confidence interval
ninteger?Sample count
frequencyfloat?Frequency (for mutations)
countinteger?Count (for mutations)

SurvivalStats

FieldTypeDescription
median_survival_group_afloat?Median survival for high expression
median_survival_group_bfloat?Median survival for low expression
hazard_ratiofloat?Hazard ratio
hr_ci_lowerfloat?HR lower confidence interval
hr_ci_upperfloat?HR upper confidence interval
log_rank_pfloat?Log-rank p-value

Figure

Visualization information.

FieldTypeDescription
typestringFigure type: boxplot, kaplan_meier, bar, scatter, heatmap
urlstring?URL to rendered figure
plotly_jsonobject?Plotly.js figure specification
alt_textstring?Alt text for accessibility

Reproducibility

Everything needed to reproduce the analysis.

FieldTypeDescription
sql_querystringExact SQL query executed
bioquery_versionstringBioQuery version
python_versionstring?Python version
package_versionsobject?Package versions
bigquery_projectstringBigQuery project
bigquery_tablesstring[]Tables used
data_accessed_atdatetimeWhen data was accessed
reproduction_stepsstring[]?Step-by-step instructions
python_codestring?Python code snippet
r_codestring?R code snippet

Citations

Auto-generated citations.

FieldTypeDescription
in_textstring?In-text citation
methodsstring?Methods section text
bibtexstring?BibTeX citation

Metadata

Additional card metadata.

FieldTypeDescription
execution_time_msinteger?Query execution time
permalinkstring?Permanent URL
is_publicbooleanWhether card is public
fork_countintegerNumber of forks
view_countintegerNumber of views
export_formatsstring[]Available export formats

Example Query Card

{ "id": "bq-2025-12-07-a7f3x", "version": "1.0", "created_at": "2025-12-07T12:00:00Z", "created_by": "anonymous", "query": { "natural_language": "Is DDR1 expression higher in papillary RCC compared to clear cell RCC?", "parsed": { "data_type": "expression", "analysis_type": "differential_expression", "gene": "DDR1", "cancer_types": ["KIRP", "KIRC"], "group_a_label": "KIRP", "group_b_label": "KIRC", "expression_metric": "median", "confidence": 1.0 } }, "data_source": { "name": "TCGA", "release": "GDC Release 39", "accessed_at": "2025-12-07T12:00:00Z", "genome_build": "hg38", "expression_type": "RNA-seq TPM (STAR-RSEM)", "expression_normalization": "TPM (transcripts per million)", "bigquery_tables": ["isb-cgc-bq.TCGA.RNAseq_hg38_gdc_current"], "bigquery_project": "isb-cgc-bq" }, "cohort": { "total_n": 828, "group_a": { "name": "KIRP", "n": 290, "cancer_type": "KIRP", "sample_type": "Primary Tumor" }, "group_b": { "name": "KIRC", "n": 538, "cancer_type": "KIRC", "sample_type": "Primary Tumor" } }, "analysis": { "method": "wilcoxon_rank_sum", "method_full_name": "Wilcoxon rank-sum test (Mann-Whitney U)", "multiple_testing_correction": "none", "confidence_level": 0.95 }, "result": { "summary": "DDR1 expression is significantly higher in papillary RCC (KIRP) compared to clear cell RCC (KIRC), with a median fold change of 2.3 (p < 0.001).", "significant": true, "p_value": 2.3e-12, "effect_size": 0.85, "effect_size_type": "rank_biserial", "group_a_stats": { "median": 8.7, "mean": 8.9, "std": 1.2, "n": 290 }, "group_b_stats": { "median": 6.4, "mean": 6.5, "std": 1.5, "n": 538 } }, "figure": { "type": "boxplot", "plotly_json": { ... }, "alt_text": "Box plot showing DDR1 expression in KIRP vs KIRC" } }

Schema Versioning

The Query Card schema uses semantic versioning. The current version is 1.0.

  • Major version (1.x): Breaking changes
  • Minor version (x.1): New optional fields (backward compatible)

All new fields added are optional with defaults, ensuring backward compatibility with existing cards.