quickstart | castform docs

there are two ways to create a training run on castform:

web ui: a step-by-step form in the console. best for new users, quick jobs, or when you want to train without writing code.
python sdk: full control over your pipeline. best for custom environments, reproducible workflows, or advanced configuration.

both paths produce the same result: a training run you can monitor, evaluate, and serve from the console.

create a training run

1. open the console

go to app.castform.com and click new training run. you’ll see template cards for common tasks. pick one to start with pre-filled defaults, or start from scratch.

2. choose a setup type

basic: for general tasks where you bring your own dataset. 4 steps.
rag: for training a model to search and reason over a corpus. 6 steps (includes corpus setup and tool configuration).

basic setup

step	what you configure
task definition	system prompt, model, completion tags
dataset	upload CSV/JSON or use a starter dataset, set train/eval split
rewards	reward components (exact match, judge, citation, tool call efficiency)
launch	name your run, review config, launch

rag setup

step	what you configure
corpus setup	upload documents/emails or connect an external provider (turbopuffer, pinecone, chroma, notion)
task definition	system prompt (pre-populated for RAG), model, completion tags
dataset	auto-generated QA pairs from your corpus, preview and adjust, train/eval split
tools	search tool name, search modes (lexical, vector, hybrid), filterable fields
rewards	reward components (judge-based by default for RAG)
launch	name your run, review config, launch

3. launch

click launch on the final step. your run appears in the sidebar and you land on the monitoring page. GPUs take a few minutes to warm up before metrics start flowing.

downloading templates

you can download code templates in two places:

before configuring: each template card on the landing page has a download code button. this gives you a starter .py or .ipynb for that task type.
after configuring: the launch step has an export configuration section where you can download your config as a .py, .ipynb, or .json file (with an option to include your dataset).

prerequisites

python 3.12 (3.13 is not supported)
a castform API key (get one from app.castform.com/settings)

pip install cgft

1. define your dataset

training data is a list of dicts. each example needs a prompt and a ground_truth field.

train_data = [
    {"prompt": "apply the rule: e → 3, s → $. input: 'secret message'",
     "ground_truth": "$3cr3t m3$$ag3"},
    # ... more examples
]

eval_data = [
    {"prompt": "apply the rule: a → @, o → 0. input: 'cool app'",
     "ground_truth": "c00l @pp"},
]

2. define your environment

an environment needs a system prompt, a reward function, and optionally tools. here’s a minimal example:

from benchmax.envs.base_env import BaseEnv

class CipherEnv(BaseEnv):
    system_prompt = "you are a cipher assistant. apply the given substitution rules exactly."

    async def compute_reward(self, rollout_id, completion, ground_truth, **kwargs):
        expected = ground_truth or ""
        return {"exact_match": 1.0 if expected.strip() in completion else 0.0}

compute_reward is async and returns a dict[str, float] mapping reward component names to scores.

see environments for tools, rewards, and dataset configuration.

3. launch

from trainer.trainer.pipeline import train

experiment_id = train(
    env_class=CipherEnv,
    env_args={},
    train_dataset=train_data,
    eval_dataset=eval_data,
    prefix="cipher-swap",
    api_key="sk_...",
)

train() returns an experiment ID. view your run at https://app.castform.com/experiments/{experiment_id}.

use dry_run=True to validate everything without launching. see launching for the full parameter reference.

what happens next

once your run launches:

GPUs warm up (a few minutes). status shows “starting”.
metrics start flowing. reward curves, response lengths, and solve rates appear on the train tab.
check completions. expand rollouts to see what the model is generating at each step.

don’t draw conclusions from the first few dozen steps. rewards will fluctuate early as the model explores.

managing training runs: monitor progress, read metrics, inspect completions
evaluating: compare your model against baselines
rag training guide: set up a full RAG pipeline