> ## Documentation Index
> Fetch the complete documentation index at: https://docs.axioniclabs.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Local Server (OpenAI-Compatible)

> Launch a local OpenAI-compatible server with mx.serve() for drop-in inference, steering vector support, and SAE behavior monitoring on any model.

## Overview

`mx.serve()` starts a local FastAPI server that exposes OpenAI-compatible chat and completion endpoints. This makes it straightforward to use Axionic-hosted or locally loaded models with any tool or library that speaks the OpenAI API format, without changing your existing code.

## `serve()`

<ParamField body="model" type="string">
  The model name to serve. Uses the currently loaded local model if omitted, or the default remote model.
</ParamField>

<ParamField body="host" type="string" default="0.0.0.0">
  Host address to bind the server to.
</ParamField>

<ParamField body="port" type="integer" default="8000">
  Port to listen on.
</ParamField>

<ParamField body="use_vllm" type="boolean" default="false">
  If `true`, uses vLLM as the inference backend instead of the default PyTorch / remote path.
</ParamField>

<ParamField body="corrected_behaviors" type="list[str]">
  Behavior names to monitor and auto-correct for all requests handled by this server instance.
</ParamField>

## Endpoints

Once the server is running:

| Endpoint               | Method | Description                                |
| ---------------------- | ------ | ------------------------------------------ |
| `/v1/chat/completions` | POST   | OpenAI chat format (`messages` array)      |
| `/v1/completions`      | POST   | OpenAI completion format (`prompt` string) |

Both accept standard OpenAI request bodies.

## Axionic Extensions

Pass Axionic-specific parameters via `extra_body` on the OpenAI Python client, or include them directly in the JSON request body:

| Field                | Type       | Description                                                |
| -------------------- | ---------- | ---------------------------------------------------------- |
| `steering_vector_id` | string     | ID of a steering vector to apply during generation         |
| `steering_strength`  | float      | Multiplier for the steering vector magnitude               |
| `policy`             | object     | Inline runtime policy to apply during generation           |
| `policy_id`          | string     | Saved policy ID to apply during generation                 |
| `behavior_names`     | list\[str] | Behaviors to monitor; correction applied if drift detected |
| `force_steering`     | list\[str] | Behaviors to steer toward unconditionally                  |

`mx.serve(corrected_behaviors=[...])` applies those behavior monitors to every request handled by that server instance. The CLI form, `mechanex serve`, exposes `--host`, `--port`, and `--use-vllm`; use Python when you need process-wide `corrected_behaviors`.

## Example

```python theme={null}
import threading
import mechanex as mx
from openai import OpenAI

mx.set_key("ax_your_key_here")

# Start the server in a background thread
server_thread = threading.Thread(
    target=mx.serve,
    kwargs={"port": 8000, "corrected_behaviors": ["safety"]},
    daemon=True,
)
server_thread.start()

# Use the standard OpenAI client pointed at the local server
client = OpenAI(
    api_key="ax_your_key_here",
    base_url="http://localhost:8000/v1",
)

response = client.chat.completions.create(
    model="mechanex-mini",
    messages=[{"role": "user", "content": "Explain gradient descent."}],
    extra_body={
        "steering_vector_id": "sv_abc123",
        "steering_strength": 1.2,
    },
)
print(response.choices[0].message.content)
```
