> ## Documentation Index
> Fetch the complete documentation index at: https://docs.axioniclabs.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Sample Project: Chat Indexer Agent

> Step-by-step guide to building, training, and deploying a fine-tuned 0.5B model for structured data extraction using Spectra distillation.

Replaces a general-purpose 1.7B model with a fine-tuned 0.5B model for structured data extraction. The resulting model runs on a 2GB GPU, classifies chat messages into topics, names semantic clusters, and expands search queries with domain synonyms.

## Prerequisites

* A Spectra account with a configured teacher model API key
* [Ollama](https://ollama.com) installed on your target machine
* Python 3.10+ with `transformers` and `gguf` packages (for local conversion)
* An application that currently calls an LLM for structured output

## Step 1: Audit Your Existing LLM Usage

Before training, identify exactly what your application asks the LLM to do. In this project, the application had three LLM-powered workflows:

| Workflow               | Input                                     | Output                                                              | Previous Model          |
| ---------------------- | ----------------------------------------- | ------------------------------------------------------------------- | ----------------------- |
| Cluster naming         | 30 sample messages from a k-means cluster | JSON with topic name, description, and keywords                     | qwen3:1.7b (1.3GB VRAM) |
| Message classification | A single message + list of valid topics   | JSON with primary topic, optional secondary topic, confidence score | Keyword regex (no LLM)  |
| Search query expansion | A user's search query                     | Expanded query with domain synonyms                                 | None (raw query used)   |

The first workflow was the only one using an LLM. The second two were opportunities to add LLM-powered functionality that the 1.7B model's VRAM footprint had previously precluded.

## Step 2: Design Tool Schemas

Each distinct LLM task becomes a tool. If your application already prompts for JSON with a specific schema, use that as the starting point.

<Tabs>
  <Tab title="Tool 1: name_topic_cluster">
    ```json theme={null}
    {
      "name": "name_topic_cluster",
      "description": "Analyze semantically similar chat messages and produce a concise topic label, description, and discriminative keywords for pattern matching.",
      "parameters": {
        "type": "object",
        "properties": {
          "topic_name": {
            "type": "string",
            "description": "Short topic name, 2-4 words, specific to the content"
          },
          "topic_description": {
            "type": "string",
            "description": "One sentence describing what this topic covers"
          },
          "keywords": {
            "type": "array",
            "items": { "type": "string" },
            "description": "10-15 discriminative keywords for regex matching, must be specific domain terms"
          }
        },
        "required": ["topic_name", "topic_description", "keywords"]
      }
    }
    ```
  </Tab>

  <Tab title="Tool 2: classify_message">
    ```json theme={null}
    {
      "name": "classify_message",
      "description": "Classify a single chat message into the most relevant topic from a provided topic list.",
      "parameters": {
        "type": "object",
        "properties": {
          "primary_topic": {
            "type": "string",
            "description": "Best matching topic name from the provided list"
          },
          "secondary_topic": {
            "type": "string",
            "description": "Second best topic if the message spans two topics"
          },
          "confidence": {
            "type": "number",
            "description": "Classification confidence between 0 and 1"
          }
        },
        "required": ["primary_topic", "confidence"]
      }
    }
    ```
  </Tab>

  <Tab title="Tool 3: expand_search_query">
    ```json theme={null}
    {
      "name": "expand_search_query",
      "description": "Expand a user search query with domain-specific synonyms and jargon to improve semantic search recall.",
      "parameters": {
        "type": "object",
        "properties": {
          "expanded_query": {
            "type": "string",
            "description": "The original query rewritten with added synonyms and related terms"
          },
          "topic_hint": {
            "type": "string",
            "description": "Most likely topic category to filter by, if applicable"
          }
        },
        "required": ["expanded_query"]
      }
    }
    ```
  </Tab>
</Tabs>

Upload all three as a JSON array via the **Paste JSON** tab. See [Tool Schemas](/products/spectra/tool-schemas) for format details.

<Note>
  Avoid naming parameters `"name"` or `"description"` -- these collide with tool-level fields in some parsers. Use `topic_name`, `topic_description`, etc.
</Note>

## Step 3: Configure Training

### Model Selection

| Setting       | Value                                 | Rationale                                               |
| ------------- | ------------------------------------- | ------------------------------------------------------- |
| Teacher Model | Default (optimized for training)      | Uses the built-in Axionic teacher path                  |
| Student Model | Mechanex Mini (Qwen2.5-0.5B-Instruct) | Smallest available; fits on 2GB GPUs with room to spare |

### Training Parameters

| Parameter     | Recommended | Default  | Why                                                                                |
| ------------- | ----------- | -------- | ---------------------------------------------------------------------------------- |
| Training Mode | SFT Only    | SFT Only | Structured extraction doesn't benefit from RL                                      |
| Prompts       | 30          | 20       | 0.5B models need more examples to learn structured output reliably                 |
| Trajectories  | 5           | 5        | Default is adequate                                                                |
| Epochs        | 5           | 3        | Narrow task with small model; extra epochs improve convergence without overfitting |
| Learning Rate | 2e-5        | 1e-5     | Slightly higher LR avoids underfitting within the epoch budget                     |

### Training Objectives

Keep the three defaults and add domain-specific objectives:

```
Always include required parameters in tool calls
Never hallucinate field names or API endpoints
Validate input types before making calls
Topic names must be 2-4 words and domain-specific, never generic labels like "General" or "Other"
Keywords must be discriminative terms that distinguish one topic from another, never stop words or common verbs
Confidence scores must reflect actual certainty, use 0.4-0.6 for genuinely ambiguous messages
When no topic fits well, output the closest match with low confidence rather than inventing a new topic
Expanded search queries must include domain-specific synonyms
```

Click **Start Training**. Training typically completes in 5-15 minutes for a 0.5B student model.

## Step 4: Add a vector in Optimization

### Prepare Contrastive Data

JSONL file with `prompt`, `positive` (desired output), and `negative` (unwanted output):

```jsonl theme={null}
{"prompt": "Classify this message: \"anyone know when the next validator set rotation happens?\"", "positive": "{\"primary_topic\":\"Staking\",\"confidence\":0.85}", "negative": "I think this message is probably about staking, since it mentions validators and delegation."}
{"prompt": "Classify this message: \"yeah idk might rain tomorrow\"", "positive": "{\"primary_topic\":\"Community\",\"confidence\":0.3}", "negative": "{\"primary_topic\":\"Community\",\"confidence\":0.95}"}
{"prompt": "Name this cluster:\n1. \"my node keeps missing blocks\"\n2. \"how do I unjail my validator\"", "positive": "{\"topic_name\":\"Validator Operations\",\"topic_description\":\"Running and troubleshooting validator nodes.\",\"keywords\":[\"validator\",\"node\",\"unjail\",\"blocks\",\"slashing\"]}", "negative": "This cluster seems to be about validators. The messages discuss various operational issues that node operators face."}
{"prompt": "Expand this search query: \"how do I unstake my atoms\"", "positive": "{\"expanded_query\":\"unstake unbond undelegate validator delegation 21 days rewards ATOM\",\"topic_hint\":\"Staking\"}", "negative": "To expand your search query about unstaking, I would suggest including related terms such as unbonding, undelegating, and validator delegation periods."}
```

10-15 pairs covering all tools. Mix clear-cut examples with ambiguous edge cases.

### Create the Vector

<Steps>
  <Step title="Open Vector Library">
    Open the [Optimization](/products/spectra/optimization) workspace for your model and click **+ Create Vector**.
  </Step>

  <Step title="Configure">
    | Field       | Value                                                                                                                                    |
    | ----------- | ---------------------------------------------------------------------------------------------------------------------------------------- |
    | Vector Name | Structured Tool Output                                                                                                                   |
    | Label       | structured-output                                                                                                                        |
    | Description | Forces JSON tool-call output instead of prose. Steers toward calibrated confidence scores, domain-specific keywords, and concise output. |
    | Category    | Compliance                                                                                                                               |
    | Methodology | CAA                                                                                                                                      |
  </Step>

  <Step title="Upload data">
    Select **Upload** as the dataset source and upload your JSONL file. Verify the pair count matches your expectations.
  </Step>

  <Step title="Set refinement parameters">
    Use defaults: **Refinement Steps** = 100, **Learning Rate** = 0.01.
  </Step>

  <Step title="Create and attach">
    Click **Create Vector**. Once generated, attach it in Optimization and set strength to **60%**. Increase to 80% if prose still leaks through during testing.
  </Step>
</Steps>

## Step 5: Test via API

<CodeGroup>
  ```python Python theme={null}
  from openai import OpenAI

  client = OpenAI(
      api_key="your-spectra-api-key",
      base_url="https://api.axioniclabs.ai/v1"
  )

  response = client.chat.completions.create(
      model="your-model-name",
      messages=[
          {
              "role": "system",
              "content": "You are a topic analysis agent. Always respond with ONLY valid JSON."
          },
          {
              "role": "user",
              "content": (
                  'Classify this message: '
                  '"my IBC transfer from osmosis has been stuck for 3 hours, '
                  'is the relayer down?"\n'
                  'Return: {"primary_topic":"...","confidence":0.0}'
              )
          }
      ],
      temperature=0.3,
      max_tokens=200
  )

  print(response.choices[0].message.content)
  ```

  ```bash curl theme={null}
  curl https://api.axioniclabs.ai/v1/chat/completions \
    -H "Authorization: Bearer your-spectra-api-key" \
    -H "Content-Type: application/json" \
    -d '{
      "model": "your-model-name",
      "messages": [
        {"role": "system", "content": "You are a topic analysis agent. Always respond with ONLY valid JSON."},
        {"role": "user", "content": "Classify this message: \"my IBC transfer has been stuck for 3 hours\"\nReturn: {\"primary_topic\":\"...\",\"confidence\":0.0}"}
      ],
      "temperature": 0.3,
      "max_tokens": 200
    }'
  ```

  ```typescript Bun / Node.js theme={null}
  const response = await fetch('https://api.axioniclabs.ai/v1/chat/completions', {
    method: 'POST',
    headers: {
      'Authorization': 'Bearer your-spectra-api-key',
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      model: 'your-model-name',
      messages: [
        { role: 'system', content: 'You are a topic analysis agent. Always respond with ONLY valid JSON.' },
        { role: 'user', content: 'Classify: "my IBC transfer has been stuck for 3 hours"\nReturn: {"primary_topic":"...","confidence":0.0}' }
      ],
      temperature: 0.3,
      max_tokens: 200
    })
  });

  const data = await response.json();
  const content = data.choices[0].message.content;
  const parsed = JSON.parse(content.match(/\{[\s\S]*\}/)[0]);
  console.log(parsed);
  ```
</CodeGroup>

## Step 6: Deploy Locally with Ollama

Convert to GGUF for local inference via Ollama.

### Download from HuggingFace

```bash theme={null}
mkdir -p /tmp/my-model && cd /tmp/my-model
HF_TOKEN="your-hf-token"
REPO="your-username/your-model-name"
BASE="https://huggingface.co/${REPO}/resolve/main"

for f in config.json tokenizer.json tokenizer_config.json generation_config.json model.safetensors; do
  curl -sSL -H "Authorization: Bearer $HF_TOKEN" -o "$f" "${BASE}/${f}"
done
```

### Fix Tokenizer Compatibility (Qwen2)

Qwen2-based models may ship with `extra_special_tokens` as a list instead of a dict, which breaks conversion:

```python theme={null}
import json

with open('/tmp/my-model/tokenizer_config.json') as f:
    cfg = json.load(f)

if isinstance(cfg.get('extra_special_tokens'), list):
    tokens = cfg['extra_special_tokens']
    cfg['extra_special_tokens'] = {t: t for t in tokens}
    with open('/tmp/my-model/tokenizer_config.json', 'w') as f:
        json.dump(cfg, f, indent=2)
    print('Fixed extra_special_tokens format')
```

### Convert to GGUF

```bash theme={null}
git clone --depth 1 https://github.com/ggml-org/llama.cpp.git
pip install gguf transformers

python llama.cpp/convert_hf_to_gguf.py /tmp/my-model \
  --outtype q8_0 \
  --outfile /tmp/my-model.q8_0.gguf
```

<Note>
  Spectra exports safetensors in BF16. GPUs without BF16 support (Pascal-generation and older) will crash at inference. GGUF conversion to F16 or Q8\_0 is required for these cards.
</Note>

### Quantization Options

| Format   | Size (0.5B model) | Quality       | When to use                     |
| -------- | ----------------- | ------------- | ------------------------------- |
| F16      | \~988MB           | Lossless      | When VRAM isn't a constraint    |
| Q8\_0    | \~525MB           | Near-lossless | Default choice for small models |
| Q4\_K\_M | \~300MB           | Good          | When VRAM is extremely tight    |

For sub-1B models, Q8\_0 is the default choice.

### Create an Ollama Model

Modelfile with the Qwen2 chat template (required for tool-calling support):

```
FROM /tmp/my-model.q8_0.gguf

TEMPLATE """{{- if or .System .Tools }}<|im_start|>system
{{ if .System }}{{ .System }}{{ end }}
{{- if .Tools }}

# Tools

You may call one or more functions to assist with the user query.

You are provided with function signatures within <tools></tools> XML tags:
<tools>
{{- range .Tools }}
{"type": "function", "function": {{ .Function }}}
{{- end }}
</tools>

For each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:
<tool_call>
{"name": <function-name>, "arguments": <args-json-object>}
</tool_call>
{{- end -}}
<|im_end|>
{{ end }}
{{- range $i, $_ := .Messages }}
{{- $last := eq (len (slice $.Messages $i)) 1 -}}
{{- if eq .Role "user" }}<|im_start|>user
{{ .Content }}<|im_end|>
{{ else if eq .Role "assistant" }}<|im_start|>assistant
{{ if .Content }}{{ .Content }}
{{- else if .ToolCalls }}<tool_call>
{{ range .ToolCalls }}{"name": "{{ .Function.Name }}", "arguments": {{ .Function.Arguments }}}
{{ end }}</tool_call>
{{- end }}{{ if not $last }}<|im_end|>
{{ end }}
{{- else if eq .Role "tool" }}<|im_start|>user
<tool_response>
{{ .Content }}
</tool_response><|im_end|>
{{ end }}
{{- if and (ne .Role "assistant") $last }}<|im_start|>assistant
{{ end }}
{{- end }}"""

PARAMETER temperature 0.3
PARAMETER num_predict 500
PARAMETER stop <|im_end|>

SYSTEM """You are a topic analysis agent. Always respond with ONLY valid JSON, no other text."""
```

```bash theme={null}
ollama create my-model -f Modelfile
```

<Note>
  If you skip the TEMPLATE directive, Ollama assigns a minimal `{{ .Prompt }}` template that breaks chat-format inference and tool calling.
</Note>

### Verify

```bash theme={null}
curl -sS http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "my-model",
    "messages": [
      {"role": "system", "content": "Always respond with valid JSON only."},
      {"role": "user", "content": "Classify: \"validator is jailed after missing 500 blocks\"\nReturn: {\"primary_topic\":\"...\",\"confidence\":0.0}"}
    ],
    "temperature": 0.3
  }'
```

## Step 7: Integrate into Your Application

Both the Spectra API and Ollama's `/v1` endpoint implement the OpenAI chat completions spec. Write once, switch with env vars.

```typescript theme={null}
const LLM_API_URL = process.env.LLM_API_URL || 'https://api.axioniclabs.ai/v1';
const LLM_API_KEY = process.env.LLM_API_KEY || '';
const LLM_MODEL = process.env.LLM_MODEL || 'my-model';

async function llmComplete(prompt: string, systemPrompt?: string): Promise<string> {
  const messages: { role: string; content: string }[] = [];
  if (systemPrompt) messages.push({ role: 'system', content: systemPrompt });
  messages.push({ role: 'user', content: prompt });

  const headers: Record<string, string> = { 'Content-Type': 'application/json' };
  if (LLM_API_KEY) headers['Authorization'] = `Bearer ${LLM_API_KEY}`;

  const response = await fetch(`${LLM_API_URL}/chat/completions`, {
    method: 'POST',
    headers,
    body: JSON.stringify({
      model: LLM_MODEL,
      messages,
      temperature: 0.3,
      max_tokens: 500
    })
  });

  if (!response.ok) throw new Error(`LLM API error: ${response.status}`);

  const data = await response.json();
  return data.choices?.[0]?.message?.content || '';
}
```

### Environment Configuration

<Tabs>
  <Tab title="Hosted (Spectra API)">
    ```bash theme={null}
    LLM_API_URL=https://api.axioniclabs.ai/v1
    LLM_API_KEY=ax_your_api_key_here
    LLM_MODEL=your-model-name
    ```

    Use during development or when you want Spectra's steering vectors and SAE monitoring applied at inference time.
  </Tab>

  <Tab title="Local (Ollama)">
    ```bash theme={null}
    LLM_API_URL=http://localhost:11434/v1
    LLM_API_KEY=
    LLM_MODEL=my-model
    ```

    Use in production, air-gapped environments, or when latency matters. No API key needed for local Ollama.
  </Tab>
</Tabs>

### Parse the Response

Small models occasionally wrap JSON in markdown fences:

```typescript theme={null}
function parseJsonResponse(raw: string): Record<string, unknown> {
  const match = raw.match(/\{[\s\S]*\}/);
  if (!match) throw new Error('No JSON found in response');
  return JSON.parse(match[0]);
}

const result = parseJsonResponse(await llmComplete(prompt, systemPrompt));
```

## Step 8: Production Checklist

* [ ] Test all three tools with representative inputs from your actual dataset
* [ ] Verify GPU memory with `ollama ps` while the model is loaded
* [ ] Set up fallback if using the hosted API (local Ollama for resilience)
* [ ] Monitor output quality for the first week
* [ ] Remove the previous model (`ollama rm old-model`)

## Results

| Metric            | Before (qwen3:1.7b)                 | After (fine-tuned 0.5B)                           |
| ----------------- | ----------------------------------- | ------------------------------------------------- |
| VRAM usage        | \~1.3GB                             | \~525MB                                           |
| Inference speed   | 1x                                  | \~3x faster                                       |
| Structured output | Regex JSON extraction with fallback | Consistent JSON                                   |
| Capabilities      | Cluster naming only                 | Cluster naming + classification + query expansion |