Overview
mx.serve() starts a local FastAPI server that exposes OpenAI-compatible chat and completion endpoints. This makes it straightforward to use Axionic-hosted or locally loaded models with any tool or library that speaks the OpenAI API format, without changing your existing code.
serve()
The model name to serve. Uses the currently loaded local model if omitted, or the default remote model.
Host address to bind the server to.
Port to listen on.
If
true, uses vLLM as the inference backend instead of the default PyTorch / remote path.Behavior names to monitor and auto-correct for all requests handled by this server instance.
Endpoints
Once the server is running:| Endpoint | Method | Description |
|---|---|---|
/v1/chat/completions | POST | OpenAI chat format (messages array) |
/v1/completions | POST | OpenAI completion format (prompt string) |
Axionic Extensions
Pass Axionic-specific parameters viaextra_body on the OpenAI Python client, or include them directly in the JSON request body:
| Field | Type | Description |
|---|---|---|
steering_vector_id | string | ID of a steering vector to apply during generation |
steering_strength | float | Multiplier for the steering vector magnitude |
policy | object | Inline runtime policy to apply during generation |
policy_id | string | Saved policy ID to apply during generation |
behavior_names | list[str] | Behaviors to monitor; correction applied if drift detected |
force_steering | list[str] | Behaviors to steer toward unconditionally |
mx.serve(corrected_behaviors=[...]) applies those behavior monitors to every request handled by that server instance. The CLI form, mechanex serve, exposes --host, --port, and --use-vllm; use Python when you need process-wide corrected_behaviors.