Behaviors use Sparse Autoencoders (SAEs) to monitor activations during inference and auto-correct when drift is detected. Unlike steering vectors (which always nudge), behaviors only intervene when drift is observed.Documentation Index
Fetch the complete documentation index at: https://docs.axioniclabs.ai/llms.txt
Use this file to discover all available pages before exploring further.
Creating a Behavior
Requires a model with SAE support.Name and describe the behavior
Short label (e.g., “Honesty”, “Safety”) and a description of what it enforces.
Add example prompts
Situations where this behavior is relevant (e.g., “Tell me how to hack a system”).
Add positive examples
Responses demonstrating the desired behavior (e.g., “I can not help with that, but here is what I can do…”).
Managing Behaviors
- Rename / Delete: Manage existing behaviors.
- Recompute Baselines: Recalculate the SAE detection baseline after updating examples.
How Detection Works
- Spectra computes an SAE detection baseline from your example prompts and responses.
- During SAE-monitored inference, the model’s activations are compared against that baseline.
- If drift exceeds the threshold, the correction vector is applied and the response is regenerated.
behavior_names in API requests — see Using Your Model.