Sample Project: Chat Indexer Agent

Replaces a general-purpose 1.7B model with a fine-tuned 0.5B model for structured data extraction. The resulting model runs on a 2GB GPU, classifies chat messages into topics, names semantic clusters, and expands search queries with domain synonyms.

Prerequisites

A Spectra account with a configured teacher model API key
Ollama installed on your target machine
Python 3.10+ with transformers and gguf packages (for local conversion)
An application that currently calls an LLM for structured output

Step 1: Audit Your Existing LLM Usage

Before training, identify exactly what your application asks the LLM to do. In this project, the application had three LLM-powered workflows:

Workflow	Input	Output	Previous Model
Cluster naming	30 sample messages from a k-means cluster	JSON with topic name, description, and keywords	qwen3:1.7b (1.3GB VRAM)
Message classification	A single message + list of valid topics	JSON with primary topic, optional secondary topic, confidence score	Keyword regex (no LLM)
Search query expansion	A user’s search query	Expanded query with domain synonyms	None (raw query used)

The first workflow was the only one using an LLM. The second two were opportunities to add LLM-powered functionality that the 1.7B model’s VRAM footprint had previously precluded.

Step 2: Design Tool Schemas

Each distinct LLM task becomes a tool. If your application already prompts for JSON with a specific schema, use that as the starting point.

Tool 1: name_topic_cluster
Tool 2: classify_message
Tool 3: expand_search_query

{
  "name": "name_topic_cluster",
  "description": "Analyze semantically similar chat messages and produce a concise topic label, description, and discriminative keywords for pattern matching.",
  "parameters": {
    "type": "object",
    "properties": {
      "topic_name": {
        "type": "string",
        "description": "Short topic name, 2-4 words, specific to the content"
      },
      "topic_description": {
        "type": "string",
        "description": "One sentence describing what this topic covers"
      },
      "keywords": {
        "type": "array",
        "items": { "type": "string" },
        "description": "10-15 discriminative keywords for regex matching, must be specific domain terms"
      }
    },
    "required": ["topic_name", "topic_description", "keywords"]
  }
}

{
  "name": "classify_message",
  "description": "Classify a single chat message into the most relevant topic from a provided topic list.",
  "parameters": {
    "type": "object",
    "properties": {
      "primary_topic": {
        "type": "string",
        "description": "Best matching topic name from the provided list"
      },
      "secondary_topic": {
        "type": "string",
        "description": "Second best topic if the message spans two topics"
      },
      "confidence": {
        "type": "number",
        "description": "Classification confidence between 0 and 1"
      }
    },
    "required": ["primary_topic", "confidence"]
  }
}

{
  "name": "expand_search_query",
  "description": "Expand a user search query with domain-specific synonyms and jargon to improve semantic search recall.",
  "parameters": {
    "type": "object",
    "properties": {
      "expanded_query": {
        "type": "string",
        "description": "The original query rewritten with added synonyms and related terms"
      },
      "topic_hint": {
        "type": "string",
        "description": "Most likely topic category to filter by, if applicable"
      }
    },
    "required": ["expanded_query"]
  }
}

Upload all three as a JSON array via the Paste JSON tab. See Tool Schemas for format details.

Avoid naming parameters "name" or "description" — these collide with tool-level fields in some parsers. Use topic_name, topic_description, etc.

Step 3: Configure Training

Model Selection

Setting	Value	Rationale
Teacher Model	Default (optimized for training)	Uses the built-in Axionic teacher path
Student Model	Mechanex Mini (Qwen2.5-0.5B-Instruct)	Smallest available; fits on 2GB GPUs with room to spare

Training Parameters

Parameter	Recommended	Default	Why
Training Mode	SFT Only	SFT Only	Structured extraction doesn’t benefit from RL
Prompts	30	20	0.5B models need more examples to learn structured output reliably
Trajectories	5	5	Default is adequate
Epochs	5	3	Narrow task with small model; extra epochs improve convergence without overfitting
Learning Rate	2e-5	1e-5	Slightly higher LR avoids underfitting within the epoch budget

Training Objectives

Keep the three defaults and add domain-specific objectives:

Always include required parameters in tool calls
Never hallucinate field names or API endpoints
Validate input types before making calls
Topic names must be 2-4 words and domain-specific, never generic labels like "General" or "Other"
Keywords must be discriminative terms that distinguish one topic from another, never stop words or common verbs
Confidence scores must reflect actual certainty, use 0.4-0.6 for genuinely ambiguous messages
When no topic fits well, output the closest match with low confidence rather than inventing a new topic
Expanded search queries must include domain-specific synonyms

Click Start Training. Training typically completes in 5-15 minutes for a 0.5B student model.

Step 4: Add a vector in Optimization

Prepare Contrastive Data

JSONL file with prompt, positive (desired output), and negative (unwanted output):

{"prompt": "Classify this message: \"anyone know when the next validator set rotation happens?\"", "positive": "{\"primary_topic\":\"Staking\",\"confidence\":0.85}", "negative": "I think this message is probably about staking, since it mentions validators and delegation."}
{"prompt": "Classify this message: \"yeah idk might rain tomorrow\"", "positive": "{\"primary_topic\":\"Community\",\"confidence\":0.3}", "negative": "{\"primary_topic\":\"Community\",\"confidence\":0.95}"}
{"prompt": "Name this cluster:\n1. \"my node keeps missing blocks\"\n2. \"how do I unjail my validator\"", "positive": "{\"topic_name\":\"Validator Operations\",\"topic_description\":\"Running and troubleshooting validator nodes.\",\"keywords\":[\"validator\",\"node\",\"unjail\",\"blocks\",\"slashing\"]}", "negative": "This cluster seems to be about validators. The messages discuss various operational issues that node operators face."}
{"prompt": "Expand this search query: \"how do I unstake my atoms\"", "positive": "{\"expanded_query\":\"unstake unbond undelegate validator delegation 21 days rewards ATOM\",\"topic_hint\":\"Staking\"}", "negative": "To expand your search query about unstaking, I would suggest including related terms such as unbonding, undelegating, and validator delegation periods."}

10-15 pairs covering all tools. Mix clear-cut examples with ambiguous edge cases.

Create the Vector

Open Vector Library

Open the Optimization workspace for your model and click + Create Vector.

Configure

Field	Value
Vector Name	Structured Tool Output
Label	structured-output
Description	Forces JSON tool-call output instead of prose. Steers toward calibrated confidence scores, domain-specific keywords, and concise output.
Category	Compliance
Methodology	CAA

Upload data

Select Upload as the dataset source and upload your JSONL file. Verify the pair count matches your expectations.

Set refinement parameters

Use defaults: Refinement Steps = 100, Learning Rate = 0.01.

Create and attach

Click Create Vector. Once generated, attach it in Optimization and set strength to 60%. Increase to 80% if prose still leaks through during testing.

Step 5: Test via API

from openai import OpenAI

client = OpenAI(
    api_key="your-spectra-api-key",
    base_url="https://api.axioniclabs.ai/v1"
)

response = client.chat.completions.create(
    model="your-model-name",
    messages=[
        {
            "role": "system",
            "content": "You are a topic analysis agent. Always respond with ONLY valid JSON."
        },
        {
            "role": "user",
            "content": (
                'Classify this message: '
                '"my IBC transfer from osmosis has been stuck for 3 hours, '
                'is the relayer down?"\n'
                'Return: {"primary_topic":"...","confidence":0.0}'
            )
        }
    ],
    temperature=0.3,
    max_tokens=200
)

print(response.choices[0].message.content)

Step 6: Deploy Locally with Ollama

Convert to GGUF for local inference via Ollama.

Download from HuggingFace

mkdir -p /tmp/my-model && cd /tmp/my-model
HF_TOKEN="your-hf-token"
REPO="your-username/your-model-name"
BASE="https://huggingface.co/${REPO}/resolve/main"

for f in config.json tokenizer.json tokenizer_config.json generation_config.json model.safetensors; do
  curl -sSL -H "Authorization: Bearer $HF_TOKEN" -o "$f" "${BASE}/${f}"
done

Fix Tokenizer Compatibility (Qwen2)

Qwen2-based models may ship with extra_special_tokens as a list instead of a dict, which breaks conversion:

import json

with open('/tmp/my-model/tokenizer_config.json') as f:
    cfg = json.load(f)

if isinstance(cfg.get('extra_special_tokens'), list):
    tokens = cfg['extra_special_tokens']
    cfg['extra_special_tokens'] = {t: t for t in tokens}
    with open('/tmp/my-model/tokenizer_config.json', 'w') as f:
        json.dump(cfg, f, indent=2)
    print('Fixed extra_special_tokens format')

Convert to GGUF

git clone --depth 1 https://github.com/ggml-org/llama.cpp.git
pip install gguf transformers

python llama.cpp/convert_hf_to_gguf.py /tmp/my-model \
  --outtype q8_0 \
  --outfile /tmp/my-model.q8_0.gguf

Spectra exports safetensors in BF16. GPUs without BF16 support (Pascal-generation and older) will crash at inference. GGUF conversion to F16 or Q8_0 is required for these cards.

Quantization Options

Format	Size (0.5B model)	Quality	When to use
F16	~988MB	Lossless	When VRAM isn’t a constraint
Q8_0	~525MB	Near-lossless	Default choice for small models
Q4_K_M	~300MB	Good	When VRAM is extremely tight

For sub-1B models, Q8_0 is the default choice.

Create an Ollama Model

Modelfile with the Qwen2 chat template (required for tool-calling support):

FROM /tmp/my-model.q8_0.gguf

TEMPLATE """{{- if or .System .Tools }}<|im_start|>system
{{ if .System }}{{ .System }}{{ end }}
{{- if .Tools }}

# Tools

You may call one or more functions to assist with the user query.

You are provided with function signatures within <tools></tools> XML tags:
<tools>
{{- range .Tools }}
{"type": "function", "function": {{ .Function }}}
{{- end }}
</tools>

For each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:
<tool_call>
{"name": <function-name>, "arguments": <args-json-object>}
</tool_call>
{{- end -}}
<|im_end|>
{{ end }}
{{- range $i, $_ := .Messages }}
{{- $last := eq (len (slice $.Messages $i)) 1 -}}
{{- if eq .Role "user" }}<|im_start|>user
{{ .Content }}<|im_end|>
{{ else if eq .Role "assistant" }}<|im_start|>assistant
{{ if .Content }}{{ .Content }}
{{- else if .ToolCalls }}<tool_call>
{{ range .ToolCalls }}{"name": "{{ .Function.Name }}", "arguments": {{ .Function.Arguments }}}
{{ end }}</tool_call>
{{- end }}{{ if not $last }}<|im_end|>
{{ end }}
{{- else if eq .Role "tool" }}<|im_start|>user
<tool_response>
{{ .Content }}
</tool_response><|im_end|>
{{ end }}
{{- if and (ne .Role "assistant") $last }}<|im_start|>assistant
{{ end }}
{{- end }}"""

PARAMETER temperature 0.3
PARAMETER num_predict 500
PARAMETER stop <|im_end|>

SYSTEM """You are a topic analysis agent. Always respond with ONLY valid JSON, no other text."""

ollama create my-model -f Modelfile

If you skip the TEMPLATE directive, Ollama assigns a minimal {{ .Prompt }} template that breaks chat-format inference and tool calling.

Verify

curl -sS http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "my-model",
    "messages": [
      {"role": "system", "content": "Always respond with valid JSON only."},
      {"role": "user", "content": "Classify: \"validator is jailed after missing 500 blocks\"\nReturn: {\"primary_topic\":\"...\",\"confidence\":0.0}"}
    ],
    "temperature": 0.3
  }'

Step 7: Integrate into Your Application

Both the Spectra API and Ollama’s /v1 endpoint implement the OpenAI chat completions spec. Write once, switch with env vars.

const LLM_API_URL = process.env.LLM_API_URL || 'https://api.axioniclabs.ai/v1';
const LLM_API_KEY = process.env.LLM_API_KEY || '';
const LLM_MODEL = process.env.LLM_MODEL || 'my-model';

async function llmComplete(prompt: string, systemPrompt?: string): Promise<string> {
  const messages: { role: string; content: string }[] = [];
  if (systemPrompt) messages.push({ role: 'system', content: systemPrompt });
  messages.push({ role: 'user', content: prompt });

  const headers: Record<string, string> = { 'Content-Type': 'application/json' };
  if (LLM_API_KEY) headers['Authorization'] = `Bearer ${LLM_API_KEY}`;

  const response = await fetch(`${LLM_API_URL}/chat/completions`, {
    method: 'POST',
    headers,
    body: JSON.stringify({
      model: LLM_MODEL,
      messages,
      temperature: 0.3,
      max_tokens: 500
    })
  });

  if (!response.ok) throw new Error(`LLM API error: ${response.status}`);

  const data = await response.json();
  return data.choices?.[0]?.message?.content || '';
}

Environment Configuration

Hosted (Spectra API)
Local (Ollama)

LLM_API_URL=https://api.axioniclabs.ai/v1
LLM_API_KEY=ax_your_api_key_here
LLM_MODEL=your-model-name

Use during development or when you want Spectra’s steering vectors and SAE monitoring applied at inference time.

LLM_API_URL=http://localhost:11434/v1
LLM_API_KEY=
LLM_MODEL=my-model

Use in production, air-gapped environments, or when latency matters. No API key needed for local Ollama.

Parse the Response

Small models occasionally wrap JSON in markdown fences:

function parseJsonResponse(raw: string): Record<string, unknown> {
  const match = raw.match(/\{[\s\S]*\}/);
  if (!match) throw new Error('No JSON found in response');
  return JSON.parse(match[0]);
}

const result = parseJsonResponse(await llmComplete(prompt, systemPrompt));

Step 8: Production Checklist

Test all three tools with representative inputs from your actual dataset
Verify GPU memory with ollama ps while the model is loaded
Set up fallback if using the hosted API (local Ollama for resilience)
Monitor output quality for the first week
Remove the previous model (ollama rm old-model)

Results

Metric	Before (qwen3:1.7b)	After (fine-tuned 0.5B)
VRAM usage	~1.3GB	~525MB
Inference speed	1x	~3x faster
Structured output	Regex JSON extraction with fallback	Consistent JSON
Capabilities	Cluster naming only	Cluster naming + classification + query expansion

Getting Started

Features

Tutorials

Account

Sample Project: Chat Indexer Agent

Prerequisites

Step 1: Audit Your Existing LLM Usage

Step 2: Design Tool Schemas

Step 3: Configure Training

Model Selection

Training Parameters

Training Objectives

Step 4: Add a vector in Optimization

Prepare Contrastive Data

Create the Vector

Step 5: Test via API

Step 6: Deploy Locally with Ollama

Download from HuggingFace

Fix Tokenizer Compatibility (Qwen2)

Convert to GGUF

Quantization Options

Create an Ollama Model

Verify

Step 7: Integrate into Your Application

Environment Configuration

Parse the Response

Step 8: Production Checklist

Results

Getting Started

Features

Tutorials

Account

Documentation Index

​Prerequisites

​Step 1: Audit Your Existing LLM Usage

​Step 2: Design Tool Schemas

​Step 3: Configure Training

​Model Selection

​Training Parameters

​Training Objectives

​Step 4: Add a vector in Optimization

​Prepare Contrastive Data

​Create the Vector

​Step 5: Test via API

​Step 6: Deploy Locally with Ollama

​Download from HuggingFace

​Fix Tokenizer Compatibility (Qwen2)

​Convert to GGUF

​Quantization Options

​Create an Ollama Model

​Verify

​Step 7: Integrate into Your Application

​Environment Configuration

​Parse the Response

​Step 8: Production Checklist

​Results

Prerequisites

Step 1: Audit Your Existing LLM Usage

Step 2: Design Tool Schemas

Step 3: Configure Training

Model Selection

Training Parameters

Training Objectives

Step 4: Add a vector in Optimization

Prepare Contrastive Data

Create the Vector

Step 5: Test via API

Step 6: Deploy Locally with Ollama

Download from HuggingFace

Fix Tokenizer Compatibility (Qwen2)

Convert to GGUF

Quantization Options

Create an Ollama Model

Verify

Step 7: Integrate into Your Application

Environment Configuration

Parse the Response

Step 8: Production Checklist

Results