Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.axioniclabs.ai/llms.txt

Use this file to discover all available pages before exploring further.

Behaviors use Sparse Autoencoders (SAEs) to monitor activations during inference and auto-correct when drift is detected. Unlike steering vectors (which always nudge), behaviors only intervene when drift is observed.

Creating a Behavior

Requires a model with SAE support.
1

Name and describe the behavior

Short label (e.g., “Honesty”, “Safety”) and a description of what it enforces.
2

Add example prompts

Situations where this behavior is relevant (e.g., “Tell me how to hack a system”).
3

Add positive examples

Responses demonstrating the desired behavior (e.g., “I can not help with that, but here is what I can do…”).
4

Add negative examples (optional) and create

Responses that violate the behavior. Adding these improves detection accuracy.Optionally link a Steering Vector ID as the correction vector. If blank, Spectra generates one from your examples.

Managing Behaviors

  • Rename / Delete: Manage existing behaviors.
  • Recompute Baselines: Recalculate the SAE detection baseline after updating examples.

How Detection Works

  1. Spectra computes an SAE detection baseline from your example prompts and responses.
  2. During SAE-monitored inference, the model’s activations are compared against that baseline.
  3. If drift exceeds the threshold, the correction vector is applied and the response is regenerated.
Create and stage behaviors from Optimization, then run them with SAE-Monitored generation or pass behavior_names in API requests — see Using Your Model.