Steering vectors with the Mechanex SDK

`steering.generate_vectors()`

Computes a steering vector from positive (and optionally negative) example pairs. Returns a vector ID that can be passed to generation.generate().

prompts

list[str]

required

Seed text that precedes each answer (e.g., ["I tell the...", "My statement is..."]).

positive_answers

list[str]

required

Completions that demonstrate the desired behavior (e.g., [" truth", " factual"]).

negative_answers

list[str]

Completions to contrast against for CAA (e.g., [" lie", " false"]). Required for method="caa". Ignored for method="few-shot".

layer_idxs

list[int]

Layer indices to capture activations from. Defaults to a model-appropriate selection if omitted.

method

string

default:"few-shot"

Vector computation method: "few-shot", "caa" (Contrastive Activation Addition), or "steering-perceptrons" (remote-only; not supported for local execution).

name

string

A display name for the vector (used in the Spectra UI and API responses).

label

string

A label/category for organizing the vector.

Returns: A vector ID string.

CAA
Few-Shot

Contrastive Activation Addition computes the directional difference between positive and negative activations. The most precise method when you have both types of examples.

import mechanex as mx

vector_id = mx.steering.generate_vectors(
    prompts=["I tell the...", "My statement is..."],
    positive_answers=[" truth", " factual", " correct"],
    negative_answers=[" lie", " false", " wrong"],
    method="caa",
    name="Honesty",
)
print(vector_id)

Few-Shot optimizes a steering direction from positive examples only. Simpler to set up; works well when negative examples are unavailable.

vector_id = mx.steering.generate_vectors(
    prompts=["The response was...", "I would say..."],
    positive_answers=[" helpful", " accurate", " clear"],
    method="few-shot",
)

`steering.generate_pairs()`

Generates contrastive example pairs automatically using an LLM, given a persona description. Useful for bootstrapping a dataset before computing vectors.

persona_name

string

required

Short name for the persona (e.g., “Empathetic Support Agent”).

persona_description

string

required

Description of the desired behavioral traits.

num_pairs

integer

default:"10"

Number of contrastive pairs to generate.

batch_size

integer

default:"5"

Pairs generated per batch.

Returns: A dict with persona, total_pairs, pairs, and avg_final_score.

result = mx.steering.generate_pairs(
    persona_name="Honesty",
    persona_description="The model always provides truthful, accurate information.",
    num_pairs=20,
)
# Use the generated pairs to compute a vector
vector_id = mx.steering.generate_vectors(
    prompts=[p["prompt"] for p in result["pairs"]],
    positive_answers=[p["positive_answer"] for p in result["pairs"]],
    negative_answers=[p["negative_answer"] for p in result["pairs"]],
    method="caa",
)

`steering.evaluate()`

Evaluates a steering vector’s effectiveness using cosine similarity metrics and LLM-as-judge scoring.

steering_vector_id

string

required

The vector ID to evaluate.

positive_texts

list[str]

required

Texts representing the desired behavior.

negative_texts

list[str]

required

Texts representing the undesired behavior.

test_prompts

list[str]

Prompts to generate steered completions for judge evaluation.

strength

float

default:"1.0"

Steering strength during evaluation.

Returns: A dict with cosine_metrics and judge_evaluation.

Utilities

Load examples from a JSONL file, or persist vectors to disk for reuse:

# Compute from a JSONL file (one {"prompt", "positive_answer", "negative_answer"} per line)
vector_id = mx.steering.generate_from_jsonl(dataset_path="examples.jsonl", method="caa")

# Save to disk and reload in a later session
mx.steering.save_vectors(vector_id, path="honesty.json")
local_vec = mx.steering.load_vectors("honesty.json")

# Retrieve a cached in-memory vector by ID
local_vec = mx.steering.get_vectors(vector_id)

Loaded vectors can be passed directly to generation.generate() as the steering_vector parameter.

Getting Started

SDK Reference

CLI

Steering vectors with the Mechanex SDK

`steering.generate_vectors()`

`steering.generate_pairs()`

`steering.evaluate()`

Utilities

Getting Started

SDK Reference

CLI

Documentation Index

​steering.generate_vectors()

​steering.generate_pairs()

​steering.evaluate()

​Utilities

`steering.generate_vectors()`

`steering.generate_pairs()`

`steering.evaluate()`

Utilities