Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.axioniclabs.ai/llms.txt

Use this file to discover all available pages before exploring further.

generation.generate()

Generates text from a prompt. Runs remotely by default; falls back to a locally loaded model if one is available and the remote call fails.

Core Parameters

prompt
string
required
The input text to generate a continuation for.
max_tokens
integer
default:"128"
Maximum number of tokens to generate.
sampling_method
string
default:"top-k"
Token sampling strategy. See Sampling Methods below.
temperature
float
default:"0.7"
Controls randomness. Lower values (0.1-0.5) produce more focused output; higher values (0.8-1.2) increase creativity.

Sampling Parameters

top_k
integer
default:"50"
For top-k sampling: number of top tokens to sample from.
top_p
float
default:"0.9"
For top-p sampling: cumulative probability threshold (0.0-1.0).
min_p
float
For min-p sampling: minimum relative probability threshold.
typical_p
float
For typical sampling: typical probability threshold.
ads_subset_size
integer
For ads sampling: number of candidate tokens per step (2-10).
ads_beta
float
For ads sampling: quality vs diversity balance (0.1-0.5).

Steering Parameters

steering_vector
string | dict
A steering vector ID string (remote) or a {layer_index: tensor} dict (local).
steering_strength
float
default:"0"
Multiplier for the steering vector magnitude.
steering_preset
string
A named preset (e.g., "brevity", "truthfulness"). Applied as a pre-configured steering configuration.

Constrained Generation

json_schema
dict
JSON schema to constrain output format. Used with guided-generation sampling.
regex_pattern
string
Regex pattern to constrain output. Used with guided-generation sampling.
grammar
string
Grammar specification to constrain output. Used with guided-generation sampling.

Advanced Parameters

draft_model
string
Model name for speculative decoding (ssd sampling method).
ensemble_models
list[str]
List of model names for ensemble-sampling.
policy
dict
Inline policy configuration. See Policies.
policy_id
string
ID of a saved policy to apply during generation.
include_trace
boolean
default:"false"
If true, returns trace information for debugging.
Returns: A plain string containing the generated text.
import mechanex as mx

mx.set_key("ax_your_key_here")
output = mx.generation.generate(
    "Summarize the concept of contrastive activation addition in two sentences.",
    max_tokens=128,
    sampling_method="top-p",
    top_p=0.9,
    temperature=0.7,
)
print(output)

Sampling Methods

MethodDescriptionKey Parameters
greedyDeterministic; always picks the highest-probability token. Best for factual or structured outputs.
top-kSamples from the top K tokens by probability. Default K is 50.top_k
top-pNucleus sampling; samples from tokens covering the probability threshold. Balanced and natural.top_p
min-pFilters tokens below a relative probability threshold. Adapts dynamically to model confidence.min_p
typicalSelects tokens based on local entropy, favoring “typical” continuations.typical_p
adsAdaptive Determinantal Sampling; maximizes diversity. Remote-only.ads_subset_size, ads_beta
guided-generationConstrains output to match a JSON schema, regex, or grammar.json_schema, regex_pattern, grammar
constrained-beam-searchGenerates multiple candidates and selects the best.
ssd / speculative-decodingUses a smaller draft model for faster generation.draft_model
ensemble-samplingCombines outputs from multiple models via voting.ensemble_models

Applying a Steering Vector

Pass a vector ID (returned from steering.generate_vectors() or created in Spectra) or a local {layer: tensor} dict:
output = mx.generation.generate(
    "Tell me about this situation.",
    steering_vector="sv_abc123",
    steering_strength=1.5,
)
For local generation with a custom vector:
import torch

my_vector = {11: torch.randn(1, 768)}  # layer 11, hidden dim 768
output = mx.generation.generate(
    "Tell me about this situation.",
    steering_vector=my_vector,
    steering_strength=0.8,
)

Local vs Remote

When a local model is loaded via mx.load(), generation runs locally if the remote API call fails. ADS, guided-generation, constrained-beam-search, speculative-decoding, and ensemble-sampling are remote-only.