Fine-tune small language models with SFT and GRPO

Spectra supports two training workflows from the same page:

Tool workflow for tool-calling models built from schemas and generated trajectories
Text workflow for direct dataset fine-tuning from curated JSONL or CSV examples

Training modes

SFT: supervised fine-tuning only
SFT + RL: SFT followed by GRPO reinforcement learning

GRPO is only relevant in the tool workflow.

Tool workflow

Use this when you want the model to learn tool calling and structured behavior. The training pipeline does not treat natural-language tool descriptions as free-form prompts. Before trajectory generation, Spectra normalizes JSON schemas, OpenAPI imports, and natural-language tool descriptions into canonical typed tool schemas, then trains the model to emit structured tool calls that satisfy those schemas.

Choose teacher and student models

Select the teacher model that generates traces and the student model that will be trained.

Define tools

Add tools from JSON schema, OpenAPI imports, or natural-language descriptions. Review the active tool list before submitting the run.

Set objectives and generation parameters

Configure seed prompts, trajectories, and training objectives that describe how the model should behave.

Tune SFT and optional GRPO settings

Adjust epochs, learning rates, batch sizes, and checkpoint cadence based on the run size and available credits.

Text workflow

Use this when you already have a curated training dataset and do not need tool-schema generation. You can:

upload JSONL or CSV data
include system prompts that frame each task (per row in your dataset, or named in the app for input-only mode)
train directly from examples instead of generating tool-calling trajectories

Text dataset formats

Spectra accepts .jsonl and .csv uploads for text-mode training.

Input-output pairs

Use this format when every row already includes the per-row instruction, the user input, and the target output. Each row needs three fields:

system_prompt — the per-row instruction that tells the model how to behave on this example.
input — the user-side message the model receives.
output — the target assistant response the model should produce.

Accepted JSONL shapes — pick one of the following per row:

{"system_prompt": "You are a concise assistant.", "input": "Summarize the following note", "output": "A short summary"}

Accepted CSV shape — three columns:

system_prompt	input	output
You are a concise assistant.	Summarize the following note	A short summary
You rewrite text in a polite, professional tone.	Rewrite this email more politely	A more polite version

In the file, that looks like:

system_prompt,input,output
"You are a concise assistant.","Summarize the following note","A short summary"
"You rewrite text in a polite, professional tone.","Rewrite this email more politely","A more polite version"

Rules:

input is required
output is required in Input-output pairs mode
system_prompt is required per row in Input-output pairs mode (for flat JSONL and CSV shapes)
if you use messages, the array must include a system message, a user message, and an assistant message in pairs mode

Input only

Use this format when you only have user inputs and want Spectra to generate the matching outputs for you. How it works:

You upload a dataset that contains only input rows (no outputs, no per-row instructions).
In the app, you add one or more named system prompts — each one is a different instruction style you want the model to learn.
The teacher model combines each input with each named system prompt and generates the matching output. The resulting (input, output) pairs become your training data.

So 100 inputs × 2 named system prompts = 200 training pairs. Accepted JSONL shapes — pick one of the following per row:

{"input": "Summarize the following note"}

Accepted CSV shape — one column:

input
Summarize the following note
Rewrite this email more politely

In the file, that looks like:

input
"Summarize the following note"
"Rewrite this email more politely"

Rules:

input is required
do not include output in Input only mode
after uploading input-only data, add at least one named system prompt in the app — without it, training cannot start

Field-name behavior

Field names are matched case-insensitively, so input, INPUT, and Input are all accepted. The same rule applies to CSV column headers and JSON keys. Spectra does not accept alternate names like prompt, response, question, or answer. Rename those columns or JSON keys to input and output first.

Private model storage

Finished models are stored privately. Your current private-model allowance depends on your account tier and appears on Billing.

Teacher API keys

Bring your own OpenAI, Anthropic, or Google AI keys in Settings if you want to use your own teacher credentials. BYOK usage is not billed by Spectra.

After training

When a run succeeds, the model appears in Models and becomes available for runtime testing and inference. The next step is usually Optimization, where you can attach vectors, test generations, and produce API-ready request snippets.

Getting Started

Features

Tutorials

Account

Fine-tune small language models with SFT and GRPO

Training modes

Tool workflow

Text workflow

Text dataset formats

Input-output pairs

Input only

Field-name behavior

Private model storage

Teacher API keys

After training

Getting Started

Features

Tutorials

Account

Documentation Index

​Training modes

​Tool workflow

​Text workflow

​Text dataset formats

​Input-output pairs

​Input only

​Field-name behavior

​Private model storage

​Teacher API keys

​After training

Training modes

Tool workflow

Text workflow

Text dataset formats

Input-output pairs

Input only

Field-name behavior

Private model storage

Teacher API keys

After training