introduction | castform docs

castform is a reinforcement learning finetuning platform that lets you finetune models for specific workloads (rag, agentic tasks, etc.) without having to be an ai expert.

you bring your data and we’ll auto-generate rl environments, build reward signals and eval harnesses, and run the training loop end-to-end. when you want to go deeper, everything is configurable.

castform pipeline: connect data, define success criteria, generate dataset and setup environment (automated by castform), run training loop, get a fine-tuned model

we handle the ml and infra complexity so you can focus on the interesting tasks that are specific to your goal:

you focus on	we handle
your data upload yourself or use one of our integrations	data pipeline chunking, indexing, synthetic data from docs or agentic traces
success criteria define what “good” looks like for your task	training infrastructure distributed RL on cloud GPUs, scaling, stability
review + iterate inspect results, tweak rewards, retrain	training algorithm reward hacking prevention, collapse detection, mode drift guards
	observability metrics, rollouts, and model health tracking

two ways to train

you can use our web ui to start training runs in seconds, or use our python sdk for full extensibility.

web ui →

python sdk →

quick links

quickstart create your first training run what is rl learn how rl fine-tuning works rag training guide fine-tune a model for search and retrieval