there are two ways to create a training run on castform:
- web ui: a step-by-step form in the console. best for new users, quick jobs, or when you want to train without writing code.
- python sdk: full control over your pipeline. best for custom environments, reproducible workflows, or advanced configuration.
both paths produce the same result: a training run you can monitor, evaluate, and serve from the console.
create a training run
1. open the console
go to app.castform.com and click new training run. you’ll see template cards for common tasks. pick one to start with pre-filled defaults, or start from scratch.
2. choose a setup type
- basic: for general tasks where you bring your own dataset. 4 steps.
- rag: for training a model to search and reason over a corpus. 6 steps (includes corpus setup and tool configuration).
basic setup
| step | what you configure |
|---|---|
| task definition | system prompt, model, completion tags |
| dataset | upload CSV/JSON or use a starter dataset, set train/eval split |
| rewards | reward components (exact match, judge, citation, tool call efficiency) |
| launch | name your run, review config, launch |
rag setup
| step | what you configure |
|---|---|
| corpus setup | upload documents/emails or connect an external provider (turbopuffer, pinecone, chroma, notion) |
| task definition | system prompt (pre-populated for RAG), model, completion tags |
| dataset | auto-generated QA pairs from your corpus, preview and adjust, train/eval split |
| tools | search tool name, search modes (lexical, vector, hybrid), filterable fields |
| rewards | reward components (judge-based by default for RAG) |
| launch | name your run, review config, launch |
3. launch
click launch on the final step. your run appears in the sidebar and you land on the monitoring page. GPUs take a few minutes to warm up before metrics start flowing.
downloading templates
you can download code templates in two places:
- before configuring: each template card on the landing page has a download code button. this gives you a starter
.pyor.ipynbfor that task type. - after configuring: the launch step has an export configuration section where you can download your config as a
.py,.ipynb, or.jsonfile (with an option to include your dataset).
prerequisites
- python 3.12 (3.13 is not supported)
- a castform API key (get one from app.castform.com/settings)
pip install cgft1. define your dataset
training data is a list of dicts. each example needs a prompt and a ground_truth field.
train_data = [
{"prompt": "apply the rule: e → 3, s → $. input: 'secret message'",
"ground_truth": "$3cr3t m3$$ag3"},
# ... more examples
]
eval_data = [
{"prompt": "apply the rule: a → @, o → 0. input: 'cool app'",
"ground_truth": "c00l @pp"},
]2. define your environment
an environment needs a system prompt, a reward function, and optionally tools. here’s a minimal example:
from benchmax.envs.base_env import BaseEnv
class CipherEnv(BaseEnv):
system_prompt = "you are a cipher assistant. apply the given substitution rules exactly."
async def compute_reward(self, rollout_id, completion, ground_truth, **kwargs):
expected = ground_truth or ""
return {"exact_match": 1.0 if expected.strip() in completion else 0.0}compute_reward is async and returns a dict[str, float] mapping reward component names to scores.
see environments for tools, rewards, and dataset configuration.
3. launch
from trainer.trainer.pipeline import train
experiment_id = train(
env_class=CipherEnv,
env_args={},
train_dataset=train_data,
eval_dataset=eval_data,
prefix="cipher-swap",
api_key="sk_...",
)train() returns an experiment ID. view your run at https://app.castform.com/experiments/{experiment_id}.
use dry_run=True to validate everything without launching. see launching for the full parameter reference.
what happens next
once your run launches:
- GPUs warm up (a few minutes). status shows “starting”.
- metrics start flowing. reward curves, response lengths, and solve rates appear on the train tab.
- check completions. expand rollouts to see what the model is generating at each step.
don’t draw conclusions from the first few dozen steps. rewards will fluctuate early as the model explores.
- managing training runs: monitor progress, read metrics, inspect completions
- evaluating: compare your model against baselines
- rag training guide: set up a full RAG pipeline