Documentation Index
Fetch the complete documentation index at: https://docs.axioniclabs.ai/llms.txt
Use this file to discover all available pages before exploring further.
Replaces a general-purpose 1.7B model with a fine-tuned 0.5B model for structured data extraction. The resulting model runs on a 2GB GPU, classifies chat messages into topics, names semantic clusters, and expands search queries with domain synonyms.
Prerequisites
- A Spectra account with a configured teacher model API key
- Ollama installed on your target machine
- Python 3.10+ with
transformers and gguf packages (for local conversion)
- An application that currently calls an LLM for structured output
Step 1: Audit Your Existing LLM Usage
Before training, identify exactly what your application asks the LLM to do. In this project, the application had three LLM-powered workflows:
| Workflow | Input | Output | Previous Model |
|---|
| Cluster naming | 30 sample messages from a k-means cluster | JSON with topic name, description, and keywords | qwen3:1.7b (1.3GB VRAM) |
| Message classification | A single message + list of valid topics | JSON with primary topic, optional secondary topic, confidence score | Keyword regex (no LLM) |
| Search query expansion | A user’s search query | Expanded query with domain synonyms | None (raw query used) |
The first workflow was the only one using an LLM. The second two were opportunities to add LLM-powered functionality that the 1.7B model’s VRAM footprint had previously precluded.
Each distinct LLM task becomes a tool. If your application already prompts for JSON with a specific schema, use that as the starting point.
Upload all three as a JSON array via the Paste JSON tab. See Tool Schemas for format details.
Avoid naming parameters "name" or "description" — these collide with tool-level fields in some parsers. Use topic_name, topic_description, etc.
Model Selection
| Setting | Value | Rationale |
|---|
| Teacher Model | Default (optimized for training) | Uses the built-in Axionic teacher path |
| Student Model | Mechanex Mini (Qwen2.5-0.5B-Instruct) | Smallest available; fits on 2GB GPUs with room to spare |
Training Parameters
| Parameter | Recommended | Default | Why |
|---|
| Training Mode | SFT Only | SFT Only | Structured extraction doesn’t benefit from RL |
| Prompts | 30 | 20 | 0.5B models need more examples to learn structured output reliably |
| Trajectories | 5 | 5 | Default is adequate |
| Epochs | 5 | 3 | Narrow task with small model; extra epochs improve convergence without overfitting |
| Learning Rate | 2e-5 | 1e-5 | Slightly higher LR avoids underfitting within the epoch budget |
Training Objectives
Keep the three defaults and add domain-specific objectives:
Always include required parameters in tool calls
Never hallucinate field names or API endpoints
Validate input types before making calls
Topic names must be 2-4 words and domain-specific, never generic labels like "General" or "Other"
Keywords must be discriminative terms that distinguish one topic from another, never stop words or common verbs
Confidence scores must reflect actual certainty, use 0.4-0.6 for genuinely ambiguous messages
When no topic fits well, output the closest match with low confidence rather than inventing a new topic
Expanded search queries must include domain-specific synonyms
Click Start Training. Training typically completes in 5-15 minutes for a 0.5B student model.
Step 4: Add a vector in Optimization
Prepare Contrastive Data
JSONL file with prompt, positive (desired output), and negative (unwanted output):
{"prompt": "Classify this message: \"anyone know when the next validator set rotation happens?\"", "positive": "{\"primary_topic\":\"Staking\",\"confidence\":0.85}", "negative": "I think this message is probably about staking, since it mentions validators and delegation."}
{"prompt": "Classify this message: \"yeah idk might rain tomorrow\"", "positive": "{\"primary_topic\":\"Community\",\"confidence\":0.3}", "negative": "{\"primary_topic\":\"Community\",\"confidence\":0.95}"}
{"prompt": "Name this cluster:\n1. \"my node keeps missing blocks\"\n2. \"how do I unjail my validator\"", "positive": "{\"topic_name\":\"Validator Operations\",\"topic_description\":\"Running and troubleshooting validator nodes.\",\"keywords\":[\"validator\",\"node\",\"unjail\",\"blocks\",\"slashing\"]}", "negative": "This cluster seems to be about validators. The messages discuss various operational issues that node operators face."}
{"prompt": "Expand this search query: \"how do I unstake my atoms\"", "positive": "{\"expanded_query\":\"unstake unbond undelegate validator delegation 21 days rewards ATOM\",\"topic_hint\":\"Staking\"}", "negative": "To expand your search query about unstaking, I would suggest including related terms such as unbonding, undelegating, and validator delegation periods."}
10-15 pairs covering all tools. Mix clear-cut examples with ambiguous edge cases.
Create the Vector
Open Vector Library
Open the Optimization workspace for your model and click + Create Vector. Configure
| Field | Value |
|---|
| Vector Name | Structured Tool Output |
| Label | structured-output |
| Description | Forces JSON tool-call output instead of prose. Steers toward calibrated confidence scores, domain-specific keywords, and concise output. |
| Category | Compliance |
| Methodology | CAA |
Upload data
Select Upload as the dataset source and upload your JSONL file. Verify the pair count matches your expectations.
Set refinement parameters
Use defaults: Refinement Steps = 100, Learning Rate = 0.01.
Create and attach
Click Create Vector. Once generated, attach it in Optimization and set strength to 60%. Increase to 80% if prose still leaks through during testing.
Step 5: Test via API
from openai import OpenAI
client = OpenAI(
api_key="your-spectra-api-key",
base_url="https://api.axioniclabs.ai/v1"
)
response = client.chat.completions.create(
model="your-model-name",
messages=[
{
"role": "system",
"content": "You are a topic analysis agent. Always respond with ONLY valid JSON."
},
{
"role": "user",
"content": (
'Classify this message: '
'"my IBC transfer from osmosis has been stuck for 3 hours, '
'is the relayer down?"\n'
'Return: {"primary_topic":"...","confidence":0.0}'
)
}
],
temperature=0.3,
max_tokens=200
)
print(response.choices[0].message.content)
Step 6: Deploy Locally with Ollama
Convert to GGUF for local inference via Ollama.
Download from HuggingFace
mkdir -p /tmp/my-model && cd /tmp/my-model
HF_TOKEN="your-hf-token"
REPO="your-username/your-model-name"
BASE="https://huggingface.co/${REPO}/resolve/main"
for f in config.json tokenizer.json tokenizer_config.json generation_config.json model.safetensors; do
curl -sSL -H "Authorization: Bearer $HF_TOKEN" -o "$f" "${BASE}/${f}"
done
Fix Tokenizer Compatibility (Qwen2)
Qwen2-based models may ship with extra_special_tokens as a list instead of a dict, which breaks conversion:
import json
with open('/tmp/my-model/tokenizer_config.json') as f:
cfg = json.load(f)
if isinstance(cfg.get('extra_special_tokens'), list):
tokens = cfg['extra_special_tokens']
cfg['extra_special_tokens'] = {t: t for t in tokens}
with open('/tmp/my-model/tokenizer_config.json', 'w') as f:
json.dump(cfg, f, indent=2)
print('Fixed extra_special_tokens format')
Convert to GGUF
git clone --depth 1 https://github.com/ggml-org/llama.cpp.git
pip install gguf transformers
python llama.cpp/convert_hf_to_gguf.py /tmp/my-model \
--outtype q8_0 \
--outfile /tmp/my-model.q8_0.gguf
Spectra exports safetensors in BF16. GPUs without BF16 support (Pascal-generation and older) will crash at inference. GGUF conversion to F16 or Q8_0 is required for these cards.
Quantization Options
| Format | Size (0.5B model) | Quality | When to use |
|---|
| F16 | ~988MB | Lossless | When VRAM isn’t a constraint |
| Q8_0 | ~525MB | Near-lossless | Default choice for small models |
| Q4_K_M | ~300MB | Good | When VRAM is extremely tight |
For sub-1B models, Q8_0 is the default choice.
Create an Ollama Model
Modelfile with the Qwen2 chat template (required for tool-calling support):
FROM /tmp/my-model.q8_0.gguf
TEMPLATE """{{- if or .System .Tools }}<|im_start|>system
{{ if .System }}{{ .System }}{{ end }}
{{- if .Tools }}
# Tools
You may call one or more functions to assist with the user query.
You are provided with function signatures within <tools></tools> XML tags:
<tools>
{{- range .Tools }}
{"type": "function", "function": {{ .Function }}}
{{- end }}
</tools>
For each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:
<tool_call>
{"name": <function-name>, "arguments": <args-json-object>}
</tool_call>
{{- end -}}
<|im_end|>
{{ end }}
{{- range $i, $_ := .Messages }}
{{- $last := eq (len (slice $.Messages $i)) 1 -}}
{{- if eq .Role "user" }}<|im_start|>user
{{ .Content }}<|im_end|>
{{ else if eq .Role "assistant" }}<|im_start|>assistant
{{ if .Content }}{{ .Content }}
{{- else if .ToolCalls }}<tool_call>
{{ range .ToolCalls }}{"name": "{{ .Function.Name }}", "arguments": {{ .Function.Arguments }}}
{{ end }}</tool_call>
{{- end }}{{ if not $last }}<|im_end|>
{{ end }}
{{- else if eq .Role "tool" }}<|im_start|>user
<tool_response>
{{ .Content }}
</tool_response><|im_end|>
{{ end }}
{{- if and (ne .Role "assistant") $last }}<|im_start|>assistant
{{ end }}
{{- end }}"""
PARAMETER temperature 0.3
PARAMETER num_predict 500
PARAMETER stop <|im_end|>
SYSTEM """You are a topic analysis agent. Always respond with ONLY valid JSON, no other text."""
ollama create my-model -f Modelfile
If you skip the TEMPLATE directive, Ollama assigns a minimal {{ .Prompt }} template that breaks chat-format inference and tool calling.
Verify
curl -sS http://localhost:11434/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "my-model",
"messages": [
{"role": "system", "content": "Always respond with valid JSON only."},
{"role": "user", "content": "Classify: \"validator is jailed after missing 500 blocks\"\nReturn: {\"primary_topic\":\"...\",\"confidence\":0.0}"}
],
"temperature": 0.3
}'
Step 7: Integrate into Your Application
Both the Spectra API and Ollama’s /v1 endpoint implement the OpenAI chat completions spec. Write once, switch with env vars.
const LLM_API_URL = process.env.LLM_API_URL || 'https://api.axioniclabs.ai/v1';
const LLM_API_KEY = process.env.LLM_API_KEY || '';
const LLM_MODEL = process.env.LLM_MODEL || 'my-model';
async function llmComplete(prompt: string, systemPrompt?: string): Promise<string> {
const messages: { role: string; content: string }[] = [];
if (systemPrompt) messages.push({ role: 'system', content: systemPrompt });
messages.push({ role: 'user', content: prompt });
const headers: Record<string, string> = { 'Content-Type': 'application/json' };
if (LLM_API_KEY) headers['Authorization'] = `Bearer ${LLM_API_KEY}`;
const response = await fetch(`${LLM_API_URL}/chat/completions`, {
method: 'POST',
headers,
body: JSON.stringify({
model: LLM_MODEL,
messages,
temperature: 0.3,
max_tokens: 500
})
});
if (!response.ok) throw new Error(`LLM API error: ${response.status}`);
const data = await response.json();
return data.choices?.[0]?.message?.content || '';
}
Environment Configuration
Hosted (Spectra API)
Local (Ollama)
LLM_API_URL=https://api.axioniclabs.ai/v1
LLM_API_KEY=ax_your_api_key_here
LLM_MODEL=your-model-name
Use during development or when you want Spectra’s steering vectors and SAE monitoring applied at inference time.LLM_API_URL=http://localhost:11434/v1
LLM_API_KEY=
LLM_MODEL=my-model
Use in production, air-gapped environments, or when latency matters. No API key needed for local Ollama.
Parse the Response
Small models occasionally wrap JSON in markdown fences:
function parseJsonResponse(raw: string): Record<string, unknown> {
const match = raw.match(/\{[\s\S]*\}/);
if (!match) throw new Error('No JSON found in response');
return JSON.parse(match[0]);
}
const result = parseJsonResponse(await llmComplete(prompt, systemPrompt));
Step 8: Production Checklist
Results
| Metric | Before (qwen3:1.7b) | After (fine-tuned 0.5B) |
|---|
| VRAM usage | ~1.3GB | ~525MB |
| Inference speed | 1x | ~3x faster |
| Structured output | Regex JSON extraction with fallback | Consistent JSON |
| Capabilities | Cluster naming only | Cluster naming + classification + query expansion |