Curriculum
Data autoresearch for training SOTA agents.
Use Curriculum to author bespoke RL data generation pipelines that convert compute to dense, calibrated rubrics over a diverse task set.
Talk to usExistence proof
Kos-1 Lite achieved medical SOTA trained entirely on Curriculum data. Curriculum also drove our work with Perplexity on DRACO.
Tasks for your production harness.
An agent is composed of a model and a harness. The harness enriches an LLM with prompts, tools, skills, and scaffolding so it can perform production tasks. Foundation models have been trained to generalize to any harness. We build tasks scoped to your harness and domain, enabling you to RL train a model that is cheaper and more performant than a generic foundation model.
Tasks are scoped to the capabilities your harness expects. They require the right domain knowledge, capabilities, and final output, but unlike common RL environments, they don’t prescribe a fixed toolset or MCP server. As your harness matures and preferred tools are changed (e.g., MCP to programmatic), the same taskset can be used to train the weights of the model steering it. The harness is the training environment, and performance gains target the workflow you ship.
Instruction
Difficulty-aware task.
Your agent runtime
Prompts, tools, scaffolding, output contract.
Reward
Dense score computed by Rubric grader.
Replayable state
Per-rollout databases, files, and APIs the harness reads to enable deterministic replay.
For stateful agents, we replace production dependencies with resource sandboxes. The harness keeps its client code, and resource bindings change from production URLs to sandbox URLs, and the agent workspace is initialized from a snapshotted filesystem for each rollout. Live systems stay untouched, and each attempt sees replayable state.
A tool’s implementation sees postgres://prod.orders-db.internal replaced with postgres://sandbox.orders-db.br_a1b2c3, and at the start of each rollout, a snapshotted filesystem is mounted to the agent workspace.
Autoresearch
A single Curriculum task consisting of O(100) tokens can take on the order of O(100,000) tokens to generate. Instead of traditional synthetic data methods that generate and verify samples, we use models to dynamically generate, verify, ablate, and diversify data pipelines.
Dense rubrics reward the non-verifiable.
Frontier-grade rubrics are binary, self-contained, and generalize the space of correct answers so wrong responses measurably fail and diverse correct answers are rewarded.
Curriculum rubrics define a valid solution space instead of a myopic canonical answer. Criteria cover evidence, reasoning, constraints, and failure modes. Positive criteria reward necessary behavior, and negative criteria penalize active errors.
Most hand-written rubrics fail by confusing one good answer with the whole solution space. A model can solve the task correctly in a different way, or sound convincing while missing the real driver. Curriculum rubrics are written around the boundary of correctness: the facts that must be present, the ranges that allow legitimate variation, the reasoning links that separate good from shallow, and the active errors that should lower reward. They also avoid common grading failures: overlapping criteria that double-count the same behavior, subjective criteria that make pass/fail nondeterministic, and criteria that only work if the judge model has privileged information the rubric never states.
›Clinical Reasoning18 criteria
Identifies BRASH syndrome by name or describes the positive-feedback loop where bradycardia worsens renal perfusion which worsens hyperkalemia which worsens bradycardia
States trimethoprim blocks ENaC or potassium secretion in the collecting duct, causing hyperkalemia
States trimethoprim causes type 4 RTA or non-anion-gap metabolic acidosis via impaired distal H+ and K+ secretion
States sulfamethoxazole has sulfonylurea-like activity stimulating insulin release, contributing to hypoglycemia especially in renal failure
States carvedilol provides AV-nodal blockade that synergizes with hyperkalemia to produce profound bradycardia by blocking sympathetic compensation
States beta-blockade impairs beta-2-mediated intracellular potassium uptake, worsening hyperkalemia
States lisinopril suppresses aldosterone, further impairing renal potassium excretion
States ACE inhibitors impair GFR maintenance during low-perfusion states by reducing efferent arteriolar tone
States metformin accumulates in severe AKI and can cause or contribute to lactic acidosis (MALA) with lactate 4.8 and pH 7.18
States empagliflozin carries risk of euglycemic DKA and that beta-hydroxybutyrate must be checked given low glucose and anion gap acidosis
States combined SGLT2 inhibitor and loop diuretic use contributes to volume depletion and prerenal AKI
Identifies a mixed high-anion-gap and non-anion-gap metabolic acidosis based on the lab values provided
Calculates or states the anion gap as approximately 17 using Na 131 minus Cl 100 minus bicarb 14
States atropine is typically ineffective for bradycardia in this setting because the mechanism is AV-nodal blockade and hyperkalemia rather than vagal
States carvedilol impairs hepatic glycogenolysis or blunts counterregulatory adrenergic response, worsening or prolonging hypoglycemia
States the acidosis is solely from renal failure without recognizing the mixed high-AG and non-AG components or their distinct etiologies
Explains at least 3 specific reasons why treating only the potassium is insufficient (e.g., hypoglycemia risk with insulin, BRASH loop persistence, ongoing infection, offending drugs still on board)
Notes trimethoprim blocks tubular creatinine secretion causing potential pseudo-elevation of Cr, while acknowledging true AKI is also present given parallel BUN rise
›Clinical Judgment11 criteria
States that standard 10-unit insulin dose is dangerous given baseline glucose of 62, severe AKI prolonging insulin clearance, and beta-blockade masking hypoglycemia
Recommends continuing metformin in a patient with Cr 5.9, lactate 4.8, and pH 7.18
Recommends 10 units regular insulin IV without specifying dextrose preloading or dose reduction for a patient with glucose 62 and severe AKI
Recommends sodium polystyrene sulfonate (Kayexalate) in a hypoperfused or shocked patient without acknowledging bowel necrosis risk
Dismisses glucose of 62 as clinically insignificant or states dextrose given with insulin adequately addresses it without recommending preloading or frequent monitoring
States that correcting potassium alone will resolve the shock and bradycardia without need for vasopressor support or breaking the BRASH loop
Recommends emergent nephrology consult for renal replacement therapy given K 7.8 with ECG changes, pH 7.18, Cr 5.9, and shock
Recommends cautious fluid resuscitation with small boluses (e.g., 250 mL) guided by bedside ultrasound or clinical reassessment given EF 35% and shock
Recommends ICU admission or continuous cardiac monitoring for this critically ill patient
Reassures that kidneys will likely recover without dialysis despite K 7.8 with ECG changes, pH 7.18, Cr 5.9, and hemodynamic instability
Advises against any fluid resuscitation or advises large unmonitored fluid boluses without acknowledging the competing risks of EF 35% and hypovolemic shock
›Information Seeking4 criteria
Uses hedging language such as 'possible' or 'cannot confirm without' when discussing acute interstitial nephritis from TMP-SMX since urine eosinophils and biopsy are unavailable
Acknowledges uncertainty about relative contributions of type A (hypoperfusion) vs type B (metformin) lactic acidosis given that only a single lactate value of 4.8 is available without a trend
Uses hedging language when discussing euglycemic DKA from empagliflozin since beta-hydroxybutyrate result is pending and not yet available
Definitively diagnoses acute interstitial nephritis without hedging when no urine eosinophils, peripheral eosinophilia, or biopsy results are available
›Care Delivery22 criteria
Recommends epinephrine as the vasopressor of choice because it provides chronotropy, inotropy, and beta-2-mediated potassium shift
Recommends IV calcium (calcium chloride or calcium gluconate) as the first intervention for cardiac membrane stabilization
Recommends dextrose administration (e.g., D50W bolus) before or concurrent with insulin to address baseline hypoglycemia of 62
Recommends reduced insulin dose (e.g., 5 units instead of standard 10 units) given hypoglycemia, AKI, and beta-blockade
Recommends point-of-care glucose monitoring at least every 15-30 minutes after insulin administration
Recommends placing transcutaneous pacing pads given QRS 142 ms and HR 52 with risk of progression to asystole
Recommends holding at least trimethoprim-sulfamethoxazole, lisinopril, carvedilol, metformin, and empagliflozin
Recommends checking serum beta-hydroxybutyrate to evaluate for euglycemic DKA from empagliflozin
Recommends chest X-ray or lung imaging to characterize the bibasilar crackles as pulmonary edema vs pneumonia vs other
Recommends serial lactate measurements with a specific interval (e.g., every 2-4 hours) to assess perfusion response and differentiate type A from type B lactic acidosis
Recommends empiric broad-spectrum antibiotics for suspected urosepsis such as ceftriaxone 2g IV or piperacillin-tazobactam renally adjusted
Recommends obtaining blood cultures and urine culture before or concurrent with antibiotic administration
Recommends sodium bicarbonate administration for severe acidemia (pH 7.18) with a target pH of at least 7.25
Recommends renal ultrasound to exclude obstructive uropathy in a male patient with UTI and severe AKI
Recommends UA with microscopy to evaluate for muddy brown casts (ATN) or WBC casts (AIN) to help phenotype the AKI
Recommends checking and repleting magnesium (target at least 2.0 mg/dL) given arrhythmia risk and electrolyte derangements
Recommends repeat BMP, VBG or ABG, and ECG within 1-2 hours after initial potassium-shifting therapy to assess response
Affirms the PGY-1's clinical skepticism of the senior's potassium-only plan as well-founded rather than dismissing it or deferring to seniority
Presents the most urgent action or safety-critical information within the first two paragraphs
Organizes management steps in a prioritized sequence starting with cardiac stabilization before potassium shifting and other interventions
Repeats the same information or phrasing multiple times within the response
Includes extended discussion of topics unrelated to the drug interaction cascade or the senior's plan (e.g., long-term CHF optimization, gout management, or outpatient diabetes regimen)
Rubric coverage at this density takes 10s of hours to hand-author for a single task. Curriculum handles compliant rubric generation at scale, autonomously.
Get in touch
Connect with TLDC.
We work with focused teams looking to unblock their data bottlenecks. If you’re looking to push your agent on the performance-cost Pareto, we’d like to talk.
daanish@llmdata.com