generation.generate()
Generates text from a prompt. Runs remotely when API or JWT credentials are configured. Runs locally when mx.set_execution_mode("local") is set, or in auto mode when a local model is loaded and no credentials are configured.
Core Parameters
The input text to generate a continuation for.
Maximum number of tokens to generate.
Token sampling strategy. See Sampling Methods below.
Controls randomness. Lower values (0.1-0.5) produce more focused output; higher values (0.8-1.2) increase creativity.
Sampling Parameters
For
top-k sampling: number of top tokens to sample from.For
top-p sampling: cumulative probability threshold (0.0-1.0).For
min-p sampling: minimum relative probability threshold.For
typical sampling: typical probability threshold.For
ads sampling: number of candidate tokens per step (2-10).For
ads sampling: quality vs diversity balance (0.1-0.5).Steering Parameters
A steering vector ID string (remote) or a
{layer_index: tensor} dict (local).Multiplier for the steering vector magnitude.
A named preset (e.g.,
"brevity", "truthfulness"). Applied as a pre-configured steering configuration.Constrained Generation
JSON schema to constrain output format. Used with
guided-generation sampling.Regex pattern to constrain output. Used with
guided-generation sampling.Grammar specification to constrain output. Used with
guided-generation sampling.Advanced Parameters
Model name for speculative decoding (
ssd sampling method).List of model names for
ensemble-sampling.Number of candidates to generate and score when using the policy-backed path.
Enables adaptive temperature during retries.
Temperature values to use across retry rounds. Setting this also enables adaptive temperature.
Enables adaptive nucleus sampling during retries.
Top-p values to use across retry rounds. Setting this also enables adaptive top-p.
Regenerates when the selected candidate’s confidence falls below
confidence_threshold.Minimum candidate confidence when confidence-triggered regeneration is enabled.
Python unit-test snippets used by the policy verifier for generated code.
ID of a saved policy to apply during generation.
Requests trace information from policy execution.
generation.generate() still returns the output string.Sampling Methods
| Method | Description | Key Parameters |
|---|---|---|
greedy | Deterministic; always picks the highest-probability token. Best for factual or structured outputs. | — |
top-k | Samples from the top K tokens by probability. Default K is 50. | top_k |
top-p | Nucleus sampling; samples from tokens covering the probability threshold. Balanced and natural. | top_p |
min-p | Filters tokens below a relative probability threshold. Adapts dynamically to model confidence. | min_p |
typical | Selects tokens based on local entropy, favoring “typical” continuations. | typical_p |
ads | Adaptive Determinantal Sampling; maximizes diversity. Remote-only. | ads_subset_size, ads_beta |
guided-generation | Constrains output to match a JSON schema, regex, or grammar. | json_schema, regex_pattern, grammar |
constrained-beam-search | Generates multiple candidates and selects the best. | — |
ssd / speculative-decoding | Uses a smaller draft model for faster generation. | draft_model |
ensemble-sampling | Combines outputs from multiple models via voting. | ensemble_models |
Applying a Steering Vector
Pass a vector ID (returned fromsteering.generate_vectors() or created in Spectra) or a local {layer: tensor} dict:
Local vs Remote
Remote failures are surfaced when auto mode resolves to remote. Call
mx.set_execution_mode("local") to force a loaded local model. Plain local generation supports greedy, top-k, top-p, min-p, and typical. Policy-backed local generation can also run saved policies, constraints, retries, and verifiers. ADS and steering perceptrons are not supported locally.