Documentation Index
Fetch the complete documentation index at: https://docs.axioniclabs.ai/llms.txt
Use this file to discover all available pages before exploring further.
generation.generate()
Generates text from a prompt. Runs remotely by default; falls back to a locally loaded model if one is available and the remote call fails.
Core Parameters
The input text to generate a continuation for.
Maximum number of tokens to generate.
Token sampling strategy. See Sampling Methods below.
Controls randomness. Lower values (0.1-0.5) produce more focused output; higher values (0.8-1.2) increase creativity.
Sampling Parameters
For
top-k sampling: number of top tokens to sample from.For
top-p sampling: cumulative probability threshold (0.0-1.0).For
min-p sampling: minimum relative probability threshold.For
typical sampling: typical probability threshold.For
ads sampling: number of candidate tokens per step (2-10).For
ads sampling: quality vs diversity balance (0.1-0.5).Steering Parameters
A steering vector ID string (remote) or a
{layer_index: tensor} dict (local).Multiplier for the steering vector magnitude.
A named preset (e.g.,
"brevity", "truthfulness"). Applied as a pre-configured steering configuration.Constrained Generation
JSON schema to constrain output format. Used with
guided-generation sampling.Regex pattern to constrain output. Used with
guided-generation sampling.Grammar specification to constrain output. Used with
guided-generation sampling.Advanced Parameters
Model name for speculative decoding (
ssd sampling method).List of model names for
ensemble-sampling.ID of a saved policy to apply during generation.
If true, returns trace information for debugging.
Sampling Methods
| Method | Description | Key Parameters |
|---|---|---|
greedy | Deterministic; always picks the highest-probability token. Best for factual or structured outputs. | — |
top-k | Samples from the top K tokens by probability. Default K is 50. | top_k |
top-p | Nucleus sampling; samples from tokens covering the probability threshold. Balanced and natural. | top_p |
min-p | Filters tokens below a relative probability threshold. Adapts dynamically to model confidence. | min_p |
typical | Selects tokens based on local entropy, favoring “typical” continuations. | typical_p |
ads | Adaptive Determinantal Sampling; maximizes diversity. Remote-only. | ads_subset_size, ads_beta |
guided-generation | Constrains output to match a JSON schema, regex, or grammar. | json_schema, regex_pattern, grammar |
constrained-beam-search | Generates multiple candidates and selects the best. | — |
ssd / speculative-decoding | Uses a smaller draft model for faster generation. | draft_model |
ensemble-sampling | Combines outputs from multiple models via voting. | ensemble_models |
Applying a Steering Vector
Pass a vector ID (returned fromsteering.generate_vectors() or created in Spectra) or a local {layer: tensor} dict:
Local vs Remote
When a local model is loaded via
mx.load(), generation runs locally if the remote API call fails. ADS, guided-generation, constrained-beam-search, speculative-decoding, and ensemble-sampling are remote-only.