CSR Research Instrument · Build GLN-2026 · In active development

GALEN

Built for the queries no current tool can answer. Cross-corpus retrieval over the full open biomedical literature, joined by canonical concepts, graded by evidence, tracked across time.

Foundation corpusPMC OA · PubMed
Articles indexed~4.5M+ full-text
Architecture5-layer
StatusPre-launch

The questions specialists ask
that nothing else can answer.

Most biomedical AI tools answer the easy questions. Definitions. Dosages. Summary symptoms. The hard questions, the ones that live across hundreds of papers and require evidence weighting and contradiction tracking, fall through every existing tool. GALEN is engineered for that gap.

Evidence synthesis
What is the current weight of evidence for intervention X, with study quality grading and contradictions surfaced?
Temporal consensus
How has the field's position on Y shifted from 2010 to today, and what triggered the shift?
Section-aware retrieval
Find every Methods section in pediatric populations using technique Z published since 2020.
Phenotype-driven differential
Given this constellation of phenotype terms, what conditions belong in the differential, ranked by overlap?
Citation centrality
Show the foundational papers any new researcher in field W should read, ranked by citation centrality.
Active controversy mapping
Map the active controversy around drug class A: who is on which side, with what evidence, since when?

A retrieval engine, not a chatbot.

GALEN ingests every open-access biomedical paper NIH publishes, every abstract in the PubMed index, every expert-authored review of every genetic condition, every canonical reference textbook in the NCBI Bookshelf, all joined through the medical concept graph.

The data has been free for over a decade. NIH publishes the entire corpus on public infrastructure with explicit reuse licenses. The reason a working version of GALEN does not already exist is not data access. It is the integration: section-aware chunking, concept-graph joins, publication-type evidence weighting, temporal binning, and contradiction surfacing, all wired through a query orchestrator that decides which primitives apply.

Every answer GALEN returns carries the receipts back to the source paragraph. No invention. No ungrounded claims. If the corpus does not contain the answer, GALEN reports the gap honestly rather than hallucinating.

Build profile · GLN-2026.A
StatusSpec locked
PhaseIn active build
PMC OA articles~4,500,000
PubMed records~36,000,000
Bookshelf titlesHundreds
GeneReviews chapters~900
MedGen concepts~50,000
License postureCC-BY-safe
Ground truth100% cited

Six retrieval primitives.
Composed, they answer the impossible.

Every "impossible" query decomposes into some combination of these six. Each individually has been attempted somewhere. None are integrated into a single retrieval surface. GALEN composes all six.

P-01
Section-aware retrieval
Filters by section type at query time. Side-effect questions retrieve from Discussion sections. Study-design questions retrieve from Methods. Same corpus, different filter, different answer.
P-02
Concept-graph expansion
Resolves natural-language queries to canonical MedGen and MeSH concept IDs, then expands through ontology hierarchies. Recall improves without precision loss.
P-03
Evidence-grade weighting
Publication-type metadata is structured and free. Randomized trial outranks case report outranks editorial. Weighting is deterministic, not LLM-judged.
P-04
Cross-corpus fusion
A single concept ID retrieves from PMC primary literature, GeneReviews expert review, and Bookshelf canonical reference simultaneously. Three layers, one query.
P-05
Temporal consensus tracking
Same query bucketed by publication year. Returns the position of the field over time, not a single timeless answer. Surfaces shifts and reversals that flat retrieval averages away.
P-06
Contradiction surfacing
When two retrieved sources disagree on a factual claim, flag it explicitly in synthesis. Currently this entire burden falls on the human reader.

Five layers, stacked.

GALEN is a routing problem dressed as a retrieval problem. The orchestrator decides which corpora and primitives a query touches, then composes the answer with citations from every source it draws on.

LAYER 05 // INTERFACE CLI · CHAT · API · STRUCTURED-OUTPUT EXPORT LAYER 04 // QUERY ORCHESTRATOR INTENT CLASSIFIER → CORPUS ROUTER → RETRIEVER → SYNTHESIZER → CONTRADICTION DETECTOR decides which corpora and primitives a query touches; fuses results; grades evidence; surfaces disagreement; never invents LAYER 03 // CONCEPT GRAPH + AUXILIARY INDEXES MEDGEN · MeSH · HPO · OMIM · CITATION GRAPH · TEMPORAL BINS resolves natural language to canonical IDs; carries the cross-corpus join keys; enables temporal and citation-graph primitives LAYER 02 // TIER 02 // CLINICAL DEPTH BOOKSHELF · STATPEARLS · GENEREVIEWS canonical reference; expert review; teaching-register knowledge LAYER 01 // TIER 01 // FOUNDATION PMC OPEN ACCESS · PUBMED BASELINE · MeSH

Every claim cites a source.

Below is a representative GALEN response. Every factual claim carries a PMCID citation. Contradictions across sources surface explicitly. If the corpus cannot answer, GALEN reports the gap honestly.

galen.cli // session #2026.04 · query #74
0.00s
$ How has the field's position on intermittent fasting and insulin sensitivity shifted since 2015?
P-02 concept expansion P-05 temporal tracking P-06 contradiction retrieved 14 chunks across 11 papers · synth: claude-sonnet
Between 2015 and 2018, early metabolic studies suggested intermittent fasting (IF) improved insulin sensitivity independent of weight loss [PMC4783509][PMC5959807]. Subsequent randomized controlled trials with isocaloric controls challenged this, finding that observed insulin sensitivity gains were largely attributable to caloric deficit rather than the fasting protocol itself [PMC7445432][PMC8290554]. By 2022 to 2024, meta-analyses converged on a more modest position: IF is non-inferior to standard caloric restriction for insulin sensitivity, with no specific mechanistic advantage in most populations [PMC9384721][PMC10112387].
⚠ Contradiction surfaced PMC8290554 (2021) reports no IF-specific benefit in time-restricted eating; PMC10455832 (2023) reports a small benefit in shift workers specifically. Sources disagree on whether population stratification reconciles the conflict.
Sources cited
PMC4783509Effects of 8-hour time restricted feeding on body weight and metabolic disease riskRCT · 2017
PMC5959807Early time-restricted feeding improves insulin sensitivityRCT · 2018
PMC7445432Effects of time-restricted eating on weight loss and other metabolic parametersRCT · 2020
PMC9384721Intermittent fasting and metabolic health: a systematic reviewMeta-analysis · 2022
PMC10112387Time-restricted eating versus caloric restrictionMeta-analysis · 2024

Above example is illustrative of GALEN response shape. Specific PMCIDs and quoted findings are representative.

Six audiences. One retrieval surface.

GALEN serves anyone whose work requires synthesis across the biomedical literature. The query patterns differ by audience; the underlying primitives are the same.

Audience 01
Pharma & Biotech R&D
For target X in oncology, what mechanisms of resistance have been reported in the last twelve months, with the strongest preclinical evidence?
→ Enterprise / API tier
Audience 02
Academic Researchers
Show me papers from the last five years that contradict the standard view that condition A is monogenic.
→ Pro / Lab tier
Audience 03
Hospital Library & Research
For the diagnostic workup of suspected condition B, what tests are recommended in current guidelines and what is the evidence base for each?
→ Site license
Audience 04
Health & Science Journalists
This press release claims drug C reduced mortality by 40 percent. What did the underlying paper actually report?
→ Pro tier
Audience 05
Patient Advocacy Orgs
For ultra-rare condition D, what active research groups are publishing on it, and what therapeutic approaches have been tried?
→ Lab tier (discounted)
Audience 06
Regulatory & Policy
What does the published literature show about the public health effectiveness of intervention E across population subgroups?
→ Enterprise tier

No single tool composes
all six primitives. GALEN does.

The competitive landscape sorts into four categories, each strong in one or two areas and structurally weak in others. The defensibility argument is engineering, not data.

Primitive / Capability UpToDate PubMed Generalist LLM Elicit / Consensus GALEN
Section-aware retrieval
Concept-graph expansion~~ MeSH only
Evidence-grade weighting Editorial~ Filter only~ Deterministic
Cross-corpus fusion
Temporal consensus tracking
Contradiction surfacing
Cited answers (no hallucination)n/a (no synth)
Up-to-date with literature~ Months lag Cutoff Daily refresh
Cost (individual seat / year)~$499Free$240$120-240$0 - $230

Five tiers.
One price per audience.

Pricing anchored to comparable biomedical information tools today, not to generalist consumer AI. Free tier non-negotiable for academic adoption.

Tier 01 · Open now
Researcher
Free · always
Individual academics, students, journalists, advocacy nonprofits
  • 25 queries per month
  • Full citations on every answer
  • Cross-corpus retrieval
  • Attribution required in published work
  • Non-commercial use only
Sign up free →
Tier 03
Lab / Team
$99 / seat / mo
Academic labs, small biotechs, advocacy organizations, policy shops · 5-seat min
  • Everything in Pro
  • Shared saved queries
  • Team annotations
  • Centralized billing
  • Priority support
Start Lab →
Tier 04 · Annual
Enterprise / API
$50k - $250k / year
Pharma, biotech, medical device, large hospital systems, regulatory agencies
  • API access with volume pricing
  • SSO + custom integrations
  • SLA + dedicated support engineer
  • On-premise deployment option
Get notified →
Tier 05 · Annual
Hospital Site License
$25k - $100k / year
Academic medical centers, hospital systems · Research, library, education staff
  • Building-wide access
  • Scoped to research and education use
  • Not direct clinical decision support
  • Procurement-friendly contract
Get notified →

Built by Celaya
Solutions Research.

CSR is an independent AI research lab built on a specific thesis: build research instruments that do not currently exist, optimize for coherence over speed, document the build openly. The lab has a portfolio of working systems demonstrating the methodology. GALEN is the next instrument in that portfolio, oriented for the first time toward direct external commercial use rather than purely internal research.

GALEN is engineering, not training. There is no GPU cluster requirement, no twelve-figure compute commitment. The work is parsing, indexing, retrieval orchestration, and prompt refinement. One focused founder ships v1 in approximately six months of focused work, building each phase on the validated output of the previous.

The mission is not to maximize any single product line. It is to build an institution that produces this caliber of work consistently. GALEN's path to market funds the lab. The lab will produce more instruments. Some will become products. Some will remain research output. All will be documented openly.

Founder · CSR
Christopher Celaya
Eleven years across critical infrastructure, software architecture, and AI development. Founder of Celaya Solutions Research, an independent AI research lab based in El Paso, Texas. Track record of building instruments that bridge domains generalists cannot: electrical, software, cognitive, biomedical.
⚠ Pre-launch · build in progress

Be first in line
when GALEN ships.

Create an account now to claim your slot. The free Researcher tier opens first. We'll notify you when Pro and Lab tiers begin onboarding.

Email copied · hello@celayasolutions.com