Text generation API reference

`generation.generate()`

Generates text from a prompt. Runs remotely by default; falls back to a locally loaded model if one is available and the remote call fails.

Core Parameters

prompt

string

required

The input text to generate a continuation for.

max_tokens

integer

default:"128"

Maximum number of tokens to generate.

sampling_method

string

default:"top-k"

Token sampling strategy. See Sampling Methods below.

temperature

float

default:"0.7"

Controls randomness. Lower values (0.1-0.5) produce more focused output; higher values (0.8-1.2) increase creativity.

Sampling Parameters

top_k

integer

default:"50"

For top-k sampling: number of top tokens to sample from.

top_p

float

default:"0.9"

For top-p sampling: cumulative probability threshold (0.0-1.0).

min_p

float

For min-p sampling: minimum relative probability threshold.

typical_p

float

For typical sampling: typical probability threshold.

ads_subset_size

integer

For ads sampling: number of candidate tokens per step (2-10).

ads_beta

float

For ads sampling: quality vs diversity balance (0.1-0.5).

Steering Parameters

steering_vector

string | dict

A steering vector ID string (remote) or a {layer_index: tensor} dict (local).

steering_strength

float

default:"0"

Multiplier for the steering vector magnitude.

steering_preset

string

A named preset (e.g., "brevity", "truthfulness"). Applied as a pre-configured steering configuration.

Constrained Generation

json_schema

dict

JSON schema to constrain output format. Used with guided-generation sampling.

regex_pattern

string

Regex pattern to constrain output. Used with guided-generation sampling.

grammar

string

Grammar specification to constrain output. Used with guided-generation sampling.

Advanced Parameters

draft_model

string

Model name for speculative decoding (ssd sampling method).

ensemble_models

list[str]

List of model names for ensemble-sampling.

policy

dict

Inline policy configuration. See Policies.

policy_id

string

ID of a saved policy to apply during generation.

include_trace

boolean

default:"false"

If true, returns trace information for debugging.

Returns: A plain string containing the generated text.

import mechanex as mx

mx.set_key("ax_your_key_here")
output = mx.generation.generate(
    "Summarize the concept of contrastive activation addition in two sentences.",
    max_tokens=128,
    sampling_method="top-p",
    top_p=0.9,
    temperature=0.7,
)
print(output)

Sampling Methods

Method	Description	Key Parameters
`greedy`	Deterministic; always picks the highest-probability token. Best for factual or structured outputs.	—
`top-k`	Samples from the top K tokens by probability. Default K is 50.	`top_k`
`top-p`	Nucleus sampling; samples from tokens covering the probability threshold. Balanced and natural.	`top_p`
`min-p`	Filters tokens below a relative probability threshold. Adapts dynamically to model confidence.	`min_p`
`typical`	Selects tokens based on local entropy, favoring “typical” continuations.	`typical_p`
`ads`	Adaptive Determinantal Sampling; maximizes diversity. Remote-only.	`ads_subset_size`, `ads_beta`
`guided-generation`	Constrains output to match a JSON schema, regex, or grammar.	`json_schema`, `regex_pattern`, `grammar`
`constrained-beam-search`	Generates multiple candidates and selects the best.	—
`ssd` / `speculative-decoding`	Uses a smaller draft model for faster generation.	`draft_model`
`ensemble-sampling`	Combines outputs from multiple models via voting.	`ensemble_models`

Applying a Steering Vector

Pass a vector ID (returned from steering.generate_vectors() or created in Spectra) or a local {layer: tensor} dict:

output = mx.generation.generate(
    "Tell me about this situation.",
    steering_vector="sv_abc123",
    steering_strength=1.5,
)

For local generation with a custom vector:

import torch

my_vector = {11: torch.randn(1, 768)}  # layer 11, hidden dim 768
output = mx.generation.generate(
    "Tell me about this situation.",
    steering_vector=my_vector,
    steering_strength=0.8,
)

Local vs Remote

When a local model is loaded via mx.load(), generation runs locally if the remote API call fails. ADS, guided-generation, constrained-beam-search, speculative-decoding, and ensemble-sampling are remote-only.

Getting Started

SDK Reference

CLI

Text generation API reference

`generation.generate()`

Core Parameters

Sampling Parameters

Steering Parameters

Constrained Generation

Advanced Parameters

Sampling Methods

Applying a Steering Vector

Local vs Remote

Getting Started

SDK Reference

CLI

Documentation Index

​generation.generate()

​Core Parameters

​Sampling Parameters

​Steering Parameters

​Constrained Generation

​Advanced Parameters

​Sampling Methods

​Applying a Steering Vector

​Local vs Remote

`generation.generate()`

Core Parameters

Sampling Parameters

Steering Parameters

Constrained Generation

Advanced Parameters

Sampling Methods

Applying a Steering Vector

Local vs Remote