Run inference on your fine-tuned model via API or SDK

Prerequisites

A trained model with Ready status on the Models page
An Axionic API key (default key created with your account, find it in Settings > API Keys)

Quick Test with curl

curl -X POST https://api.axioniclabs.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "x-api-key: ax_your_key_here" \
  -d '{
    "model": "your-model-name",
    "messages": [
      {"role": "user", "content": "Hello, what can you do?"}
    ],
    "max_tokens": 256
  }'

Using the Python SDK

pip install mechanex

import mechanex as mx

mx.set_key("ax_your_key_here")
mx.set_model("your-model-name")

response = mx.generation.generate(
    prompt="Explain how tool calling works.",
    max_tokens=256,
)
print(response["output"])

Using the OpenAI Python Client

from openai import OpenAI

client = OpenAI(
    api_key="ax_your_key_here",
    base_url="https://api.axioniclabs.ai/v1",
)

response = client.chat.completions.create(
    model="your-model-name",
    messages=[{"role": "user", "content": "Hello!"}],
    max_tokens=256,
)
print(response.choices[0].message.content)

Structured Outputs

If you are coming from OpenAI’s Structured Outputs guide, the main difference is that Axionic does not currently expose OpenAI’s response_format field on /v1/chat/completions. Use one of these Axionic-native paths instead:

pass an inline policy or saved policy_id on /v1/chat/completions or /v1/completions when you want to stay inside an OpenAI SDK client
call /sampling/generate with method: "guided-generation" when you want direct JSON-schema, regex, or grammar constraints

Guided Generation is model/runtime-dependent. If the target runtime does not support that decoding strategy, Axionic can fall back to a standard decoding path. Test your schema on the exact model you plan to deploy.

OpenAI client with an inline policy

Pass the policy through extra_body:

import json
from openai import OpenAI

client = OpenAI(
    api_key="ax_your_key_here",
    base_url="https://api.axioniclabs.ai/v1",
)

response = client.chat.completions.create(
    model="your-model-name",
    messages=[
        {
            "role": "system",
            "content": "Extract the message into a typed JSON object.",
        },
        {
            "role": "user",
            "content": "Relayer outage on Osmosis, confidence 0.82",
        },
    ],
    extra_body={
        "policy": {
            "sampling": {
                "method": "guided-generation",
                "temperature": 0.2,
                "top_p": 0.9,
            },
            "constraints": {
                "json_mode": True,
                "json_schema": {
                    "type": "object",
                    "properties": {
                        "topic": {"type": "string"},
                        "confidence": {"type": "number"},
                    },
                    "required": ["topic", "confidence"],
                    "additionalProperties": False,
                },
            },
            "verifiers": {
                "enabled": ["syntax", "json_schema"],
                "repair_on_failure": True,
            },
        }
    },
)

data = json.loads(response.choices[0].message.content)
print(data["topic"], data["confidence"])

The response still comes back as text in response.choices[0].message.content, so parse it yourself with json.loads(...). See Reusable inference policies for the full policy shape.

Direct guided-generation request

This is the same constraint path used by Spectra’s Optimization page when you choose Guided Generation:

curl -X POST https://api.axioniclabs.ai/sampling/generate \
  -H "Content-Type: application/json" \
  -H "x-api-key: ax_your_key_here" \
  -d '{
    "prompt": "Extract the primary topic and confidence from: relayer outage on Osmosis, confidence 0.82",
    "method": "guided-generation",
    "max_new_tokens": 120,
    "params": {
      "temperature": 0.2
    },
    "json_schema": {
      "type": "object",
      "properties": {
        "topic": {"type": "string"},
        "confidence": {"type": "number"}
      },
      "required": ["topic", "confidence"],
      "additionalProperties": false
    }
  }'

For non-JSON constraints, replace json_schema with either regex_pattern or grammar. Use regex_pattern when you need hard matching. Use grammar only for lightweight format guidance; it is not documented as full OpenAI-style grammar-constrained decoding.

Schema guidance and current limitations

keep the root schema as a JSON object; the current runtime validation path expects object-shaped JSON responses
add additionalProperties: false when you want to reject extra keys instead of only validating required ones
unlike OpenAI’s response_format flow, optional fields are allowed here as long as you leave them out of required
OpenAI-style fields such as response_format on requests, plus message.parsed and message.refusal on responses, are not part of Axionic’s chat surface today
if you need acceptance flags, scores, or trace data instead of just the generated text, use the policy APIs rather than the OpenAI-compatible chat response

OpenAI’s Structured Outputs guide recommends object-rooted schemas, clear field descriptions, and explicit required lists. Those are good practices here as well, but the transport is different: use Axionic policy or /sampling/generate fields rather than OpenAI’s response_format.

Applying Steering Vectors

Pass steering parameters via extra_body:

response = client.chat.completions.create(
    model="your-model-name",
    messages=[{"role": "user", "content": "Describe a sunset."}],
    extra_body={
        "steering_vector_id": "sv_abc123",
        "steering_strength": 1.0,
    },
)

Or with curl:

curl -X POST https://api.axioniclabs.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "x-api-key: ax_your_key_here" \
  -d '{
    "model": "your-model-name",
    "messages": [{"role": "user", "content": "Describe a sunset."}],
    "max_tokens": 256,
    "steering_vector_id": "sv_abc123",
    "steering_strength": 1.0
  }'

Getting Started

Features

Tutorials

Account

Run inference on your fine-tuned model via API or SDK

Prerequisites

Quick Test with curl

Using the Python SDK

Using the OpenAI Python Client

Structured Outputs

OpenAI client with an inline policy

Direct guided-generation request

Schema guidance and current limitations

Applying Steering Vectors

Getting Started

Features

Tutorials

Account

Documentation Index

​Prerequisites

​Quick Test with curl

​Using the Python SDK

​Using the OpenAI Python Client

​Structured Outputs

​OpenAI client with an inline policy

​Direct guided-generation request

​Schema guidance and current limitations

​Applying Steering Vectors

Prerequisites

Quick Test with curl

Using the Python SDK

Using the OpenAI Python Client

Structured Outputs

OpenAI client with an inline policy

Direct guided-generation request

Schema guidance and current limitations

Applying Steering Vectors