Query Card Schema
Query Cards are the core data structure in BioQuery. Every query you run produces a Query Card that captures the full context: the original question, parsed interpretation, data sources, statistical methods, results, and visualizations.
Overview
A Query Card is designed for:
- Reproducibility: Contains everything needed to recreate the analysis
- Shareability: Unique URLs for each card
- Transparency: Full visibility into methods and data sources
- Extensibility: Schema supports future analysis types
Complete Schema
QueryCard
The top-level object returned by the API.
| Field | Type | Description |
|---|---|---|
id | string | Unique identifier (format: bq-YYYY-MM-DD-xxxxx) |
version | string | Schema version (currently "1.0") |
created_at | datetime | When the card was created |
created_by | string | User who created the card |
parent_id | string? | ID of parent card if this is a fork |
query | Query | The user’s query and its interpretation |
data_source | DataSource | Information about data used |
cohort | Cohort | Sample/cohort statistics |
analysis | Analysis | Statistical methods used |
result | Result | Analysis results |
figure | Figure? | Visualization data |
citations | Citations? | Auto-generated citations |
reproducibility | Reproducibility? | Full reproduction details |
metadata | Metadata? | Additional metadata |
analysis_params | object? | User-specified analysis parameters |
input_cohort | CohortFilter? | User’s filter criteria for cohort builder |
Query
Contains the original query and its parsed interpretation.
| Field | Type | Description |
|---|---|---|
natural_language | string | The original user query |
parsed | ParsedQuery | Structured interpretation |
ParsedQuery
The structured interpretation of the user’s query.
| Field | Type | Description |
|---|---|---|
data_type | string | Data type: expression, mutation, cnv, protein, unclear |
analysis_type | string | Analysis type: differential_expression, mutation_frequency, survival_analysis, pan_cancer_expression, correlation, tumor_vs_normal, ccle_expression, cptac_protein_expression, mrna_protein_correlation, unclear |
gene | string | Primary gene of interest |
gene_id | string? | Ensembl gene ID |
genes | string[]? | Multiple genes (for correlation queries) |
cancer_type | string? | Single cancer type |
cancer_types | string[] | List of cancer types |
group_a_label | string? | Label for comparison group A |
group_b_label | string? | Label for comparison group B |
expression_metric | string | median or mean |
survival_endpoint | string | OS, PFS, or DFS |
needs_clarification | boolean | Whether clarification is needed |
clarification_message | string | Message asking for clarification |
confidence | float | Confidence in interpretation (0-1) |
assumptions_made | string | Any assumptions made during parsing |
original_intent | QueryIntent? | Raw intent extraction (debugging) |
Future-Proofing Fields
These fields are reserved for upcoming features and may be empty in current responses.
| Field | Type | Description |
|---|---|---|
variant | string? | Specific mutation variant (e.g., "TP53 R175H", "BRAF V600E") |
variant_type | string? | Variant type: missense, nonsense, frameshift, etc. |
transcript_id | string? | Ensembl transcript ID (e.g., "ENST00000269305") |
isoform_name | string? | Isoform name (e.g., "DDR1-201") |
cell_types | string[]? | Cell types for single-cell analysis |
dataset_id | string? | Single-cell dataset reference |
signature_name | string? | Gene signature name (e.g., "MYC_TARGETS_V1") |
signature_genes | string[]? | Genes in a multi-gene signature |
DataSource
Information about the data source used for the analysis.
| Field | Type | Description |
|---|---|---|
name | string | Data source name: "TCGA", "TARGET", "GTEx", "CCLE", "CPTAC", "GENIE" |
release | string | Release version (e.g., "GDC Release 39") |
accessed_at | datetime | When data was accessed |
genome_build | string | Reference genome (e.g., "hg38") |
expression_type | string? | Expression data type |
expression_normalization | string? | Normalization method |
mutation_caller | string? | Mutation calling pipeline |
bigquery_tables | string[] | Exact BigQuery tables used |
bigquery_project | string | BigQuery project ID |
Future-Proofing Fields
| Field | Type | Description |
|---|---|---|
data_version | string? | Data version for reproducibility (e.g., "GDC-39.0") |
data_checksum | string? | Hash to detect if underlying data changed |
single_cell_source | string? | Single-cell data source: "TISCH", "CELLxGENE", "GEO" |
single_cell_dataset_id | string? | Dataset identifier in the source |
cell_annotation_source | string? | Cell type annotation source |
Cohort
Sample and cohort information.
| Field | Type | Description |
|---|---|---|
total_n | integer | Total number of samples |
group_a | GroupStats? | Statistics for group A |
group_b | GroupStats? | Statistics for group B |
groups | GroupStats[]? | Statistics for multiple groups |
GroupStats
| Field | Type | Description |
|---|---|---|
name | string | Group name |
n | integer | Sample count |
cancer_type | string? | Cancer type |
sample_type | string? | Sample type (e.g., "Primary Tumor") |
filters_applied | string[]? | Filters applied to this group |
CohortFilter
User-defined cohort filter criteria for custom analyses.
This schema supports the upcoming cohort builder feature.
| Field | Type | Description |
|---|---|---|
cancer_types | string[]? | Filter by cancer types |
stages | string[]? | Filter by stages (e.g., ["Stage I", "Stage II"]) |
grades | string[]? | Filter by grades (e.g., ["G1", "G2"]) |
mutation_status | object? | Filter by mutation status (e.g., {"TP53": true, "KRAS": false}) |
age_range | [int, int]? | Filter by age range [min, max] |
sex | string? | Filter by sex: "male", "female", "all" |
sample_types | string[]? | Filter by sample types |
custom_sql_filter | string? | Advanced: raw SQL WHERE clause |
Analysis
Statistical analysis details.
| Field | Type | Description |
|---|---|---|
method | string | Statistical method (e.g., "wilcoxon_rank_sum", "log_rank") |
method_full_name | string? | Full method name |
parameters | object? | Method-specific parameters |
multiple_testing_correction | string | Correction method: "none", "bonferroni", "fdr_bh" |
confidence_level | float | Confidence level (default: 0.95) |
stratification_method | string? | Expression stratification method |
stratification_threshold | float? | Actual threshold used |
filters | string[]? | SQL filters applied |
software_versions | object? | Software versions for reproducibility |
Result
Analysis results.
| Field | Type | Description |
|---|---|---|
summary | string | Natural language summary |
significant | boolean | Whether result is statistically significant |
p_value | float? | P-value |
p_value_adjusted | float? | Adjusted p-value |
effect_size | float? | Effect size |
effect_size_type | string? | Type of effect size |
confidence_interval | [float, float]? | Confidence interval |
group_a_stats | GroupResultStats? | Statistics for group A |
group_b_stats | GroupResultStats? | Statistics for group B |
pan_cancer_results | object[]? | Results for pan-cancer analysis |
survival_stats | SurvivalStats? | Survival-specific statistics |
GroupResultStats
| Field | Type | Description |
|---|---|---|
mean | float? | Mean value |
median | float? | Median value |
std | float? | Standard deviation |
ci_lower | float? | Lower confidence interval |
ci_upper | float? | Upper confidence interval |
n | integer? | Sample count |
frequency | float? | Frequency (for mutations) |
count | integer? | Count (for mutations) |
SurvivalStats
| Field | Type | Description |
|---|---|---|
median_survival_group_a | float? | Median survival for high expression |
median_survival_group_b | float? | Median survival for low expression |
hazard_ratio | float? | Hazard ratio |
hr_ci_lower | float? | HR lower confidence interval |
hr_ci_upper | float? | HR upper confidence interval |
log_rank_p | float? | Log-rank p-value |
Figure
Visualization information.
| Field | Type | Description |
|---|---|---|
type | string | Figure type: boxplot, kaplan_meier, bar, scatter, heatmap |
url | string? | URL to rendered figure |
plotly_json | object? | Plotly.js figure specification |
alt_text | string? | Alt text for accessibility |
Reproducibility
Everything needed to reproduce the analysis.
| Field | Type | Description |
|---|---|---|
sql_query | string | Exact SQL query executed |
bioquery_version | string | BioQuery version |
python_version | string? | Python version |
package_versions | object? | Package versions |
bigquery_project | string | BigQuery project |
bigquery_tables | string[] | Tables used |
data_accessed_at | datetime | When data was accessed |
reproduction_steps | string[]? | Step-by-step instructions |
python_code | string? | Python code snippet |
r_code | string? | R code snippet |
Citations
Auto-generated citations.
| Field | Type | Description |
|---|---|---|
in_text | string? | In-text citation |
methods | string? | Methods section text |
bibtex | string? | BibTeX citation |
Metadata
Additional card metadata.
| Field | Type | Description |
|---|---|---|
execution_time_ms | integer? | Query execution time |
permalink | string? | Permanent URL |
is_public | boolean | Whether card is public |
fork_count | integer | Number of forks |
view_count | integer | Number of views |
export_formats | string[] | Available export formats |
Example Query Card
{
"id": "bq-2025-12-07-a7f3x",
"version": "1.0",
"created_at": "2025-12-07T12:00:00Z",
"created_by": "anonymous",
"query": {
"natural_language": "Is DDR1 expression higher in papillary RCC compared to clear cell RCC?",
"parsed": {
"data_type": "expression",
"analysis_type": "differential_expression",
"gene": "DDR1",
"cancer_types": ["KIRP", "KIRC"],
"group_a_label": "KIRP",
"group_b_label": "KIRC",
"expression_metric": "median",
"confidence": 1.0
}
},
"data_source": {
"name": "TCGA",
"release": "GDC Release 39",
"accessed_at": "2025-12-07T12:00:00Z",
"genome_build": "hg38",
"expression_type": "RNA-seq TPM (STAR-RSEM)",
"expression_normalization": "TPM (transcripts per million)",
"bigquery_tables": ["isb-cgc-bq.TCGA.RNAseq_hg38_gdc_current"],
"bigquery_project": "isb-cgc-bq"
},
"cohort": {
"total_n": 828,
"group_a": {
"name": "KIRP",
"n": 290,
"cancer_type": "KIRP",
"sample_type": "Primary Tumor"
},
"group_b": {
"name": "KIRC",
"n": 538,
"cancer_type": "KIRC",
"sample_type": "Primary Tumor"
}
},
"analysis": {
"method": "wilcoxon_rank_sum",
"method_full_name": "Wilcoxon rank-sum test (Mann-Whitney U)",
"multiple_testing_correction": "none",
"confidence_level": 0.95
},
"result": {
"summary": "DDR1 expression is significantly higher in papillary RCC (KIRP) compared to clear cell RCC (KIRC), with a median fold change of 2.3 (p < 0.001).",
"significant": true,
"p_value": 2.3e-12,
"effect_size": 0.85,
"effect_size_type": "rank_biserial",
"group_a_stats": {
"median": 8.7,
"mean": 8.9,
"std": 1.2,
"n": 290
},
"group_b_stats": {
"median": 6.4,
"mean": 6.5,
"std": 1.5,
"n": 538
}
},
"figure": {
"type": "boxplot",
"plotly_json": { ... },
"alt_text": "Box plot showing DDR1 expression in KIRP vs KIRC"
}
}Schema Versioning
The Query Card schema uses semantic versioning. The current version is 1.0.
- Major version (1.x): Breaking changes
- Minor version (x.1): New optional fields (backward compatible)
All new fields added are optional with defaults, ensuring backward compatibility with existing cards.