> ## Documentation Index > Fetch the complete documentation index at: https://docs.axioniclabs.ai/llms.txt > Use this file to discover all available pages before exploring further. # Text generation API reference > Call generation.generate() with ten sampling methods, steering vectors, JSON schema constraints, speculative decoding, and ensemble sampling options. ## `generation.generate()` Generates text from a prompt. Runs remotely when API or JWT credentials are configured. Runs locally when `mx.set_execution_mode("local")` is set, or in auto mode when a local model is loaded and no credentials are configured. ### Core Parameters The input text to generate a continuation for. Maximum number of tokens to generate. Token sampling strategy. See [Sampling Methods](#sampling-methods) below. Controls randomness. Lower values (0.1-0.5) produce more focused output; higher values (0.8-1.2) increase creativity. ### Sampling Parameters For `top-k` sampling: number of top tokens to sample from. For `top-p` sampling: cumulative probability threshold (0.0-1.0). For `min-p` sampling: minimum relative probability threshold. For `typical` sampling: typical probability threshold. For `ads` sampling: number of candidate tokens per step (2-10). For `ads` sampling: quality vs diversity balance (0.1-0.5). ### Steering Parameters A steering vector ID string (remote) or a `{layer_index: tensor}` dict (local). Multiplier for the steering vector magnitude. A named preset (e.g., `"brevity"`, `"truthfulness"`). Applied as a pre-configured steering configuration. ### Constrained Generation JSON schema to constrain output format. Used with `guided-generation` sampling. Regex pattern to constrain output. Used with `guided-generation` sampling. Grammar specification to constrain output. Used with `guided-generation` sampling. ### Advanced Parameters Model name for speculative decoding (`ssd` sampling method). List of model names for `ensemble-sampling`. Number of candidates to generate and score when using the policy-backed path. Enables adaptive temperature during retries. Temperature values to use across retry rounds. Setting this also enables adaptive temperature. Enables adaptive nucleus sampling during retries. Top-p values to use across retry rounds. Setting this also enables adaptive top-p. Regenerates when the selected candidate's confidence falls below `confidence_threshold`. Minimum candidate confidence when confidence-triggered regeneration is enabled. Python unit-test snippets used by the policy verifier for generated code. Inline policy configuration. See [Policies](/products/mechanex/policies). ID of a saved policy to apply during generation. Requests trace information from policy execution. `generation.generate()` still returns the output string. **Returns**: A plain string containing the generated text. ```python theme={null} import mechanex as mx mx.set_key("ax_your_key_here") output = mx.generation.generate( "Summarize the concept of contrastive activation addition in two sentences.", max_tokens=128, sampling_method="top-p", top_p=0.9, temperature=0.7, ) print(output) ``` ## Sampling Methods | Method | Description | Key Parameters | | ------------------------------ | -------------------------------------------------------------------------------------------------- | ----------------------------------------- | | `greedy` | Deterministic; always picks the highest-probability token. Best for factual or structured outputs. | -- | | `top-k` | Samples from the top K tokens by probability. Default K is 50. | `top_k` | | `top-p` | Nucleus sampling; samples from tokens covering the probability threshold. Balanced and natural. | `top_p` | | `min-p` | Filters tokens below a relative probability threshold. Adapts dynamically to model confidence. | `min_p` | | `typical` | Selects tokens based on local entropy, favoring "typical" continuations. | `typical_p` | | `ads` | Adaptive Determinantal Sampling; maximizes diversity. Remote-only. | `ads_subset_size`, `ads_beta` | | `guided-generation` | Constrains output to match a JSON schema, regex, or grammar. | `json_schema`, `regex_pattern`, `grammar` | | `constrained-beam-search` | Generates multiple candidates and selects the best. | -- | | `ssd` / `speculative-decoding` | Uses a smaller draft model for faster generation. | `draft_model` | | `ensemble-sampling` | Combines outputs from multiple models via voting. | `ensemble_models` | ## Applying a Steering Vector Pass a vector ID (returned from `steering.generate_vectors()` or created in Spectra) or a local `{layer: tensor}` dict: ```python theme={null} output = mx.generation.generate( "Tell me about this situation.", steering_vector="sv_abc123", steering_strength=1.5, ) ``` For local generation with a custom vector: ```python theme={null} import torch my_vector = {11: torch.randn(1, 768)} # layer 11, hidden dim 768 output = mx.generation.generate( "Tell me about this situation.", steering_vector=my_vector, steering_strength=0.8, ) ``` ## Local vs Remote Remote failures are surfaced when auto mode resolves to remote. Call `mx.set_execution_mode("local")` to force a loaded local model. Plain local generation supports `greedy`, `top-k`, `top-p`, `min-p`, and `typical`. Policy-backed local generation can also run saved policies, constraints, retries, and verifiers. ADS and steering perceptrons are not supported locally.