> ## Documentation Index
> Fetch the complete documentation index at: https://docs.axioniclabs.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Text generation API reference

> Call generation.generate() with ten sampling methods, steering vectors, JSON schema constraints, speculative decoding, and ensemble sampling options.

## `generation.generate()`

Generates text from a prompt. Runs remotely when API or JWT credentials are configured. Runs locally when `mx.set_execution_mode("local")` is set, or in auto mode when a local model is loaded and no credentials are configured.

### Core Parameters

<ParamField body="prompt" type="string" required>
  The input text to generate a continuation for.
</ParamField>

<ParamField body="max_tokens" type="integer" default="128">
  Maximum number of tokens to generate.
</ParamField>

<ParamField body="sampling_method" type="string" default="top-k">
  Token sampling strategy. See [Sampling Methods](#sampling-methods) below.
</ParamField>

<ParamField body="temperature" type="float" default="0.7">
  Controls randomness. Lower values (0.1-0.5) produce more focused output; higher values (0.8-1.2) increase creativity.
</ParamField>

### Sampling Parameters

<ParamField body="top_k" type="integer" default="50">
  For `top-k` sampling: number of top tokens to sample from.
</ParamField>

<ParamField body="top_p" type="float" default="0.9">
  For `top-p` sampling: cumulative probability threshold (0.0-1.0).
</ParamField>

<ParamField body="min_p" type="float">
  For `min-p` sampling: minimum relative probability threshold.
</ParamField>

<ParamField body="typical_p" type="float">
  For `typical` sampling: typical probability threshold.
</ParamField>

<ParamField body="ads_subset_size" type="integer">
  For `ads` sampling: number of candidate tokens per step (2-10).
</ParamField>

<ParamField body="ads_beta" type="float">
  For `ads` sampling: quality vs diversity balance (0.1-0.5).
</ParamField>

### Steering Parameters

<ParamField body="steering_vector" type="string | dict">
  A steering vector ID string (remote) or a `{layer_index: tensor}` dict (local).
</ParamField>

<ParamField body="steering_strength" type="float" default="0">
  Multiplier for the steering vector magnitude.
</ParamField>

<ParamField body="steering_preset" type="string">
  A named preset (e.g., `"brevity"`, `"truthfulness"`). Applied as a pre-configured steering configuration.
</ParamField>

### Constrained Generation

<ParamField body="json_schema" type="dict">
  JSON schema to constrain output format. Used with `guided-generation` sampling.
</ParamField>

<ParamField body="regex_pattern" type="string">
  Regex pattern to constrain output. Used with `guided-generation` sampling.
</ParamField>

<ParamField body="grammar" type="string">
  Grammar specification to constrain output. Used with `guided-generation` sampling.
</ParamField>

### Advanced Parameters

<ParamField body="draft_model" type="string">
  Model name for speculative decoding (`ssd` sampling method).
</ParamField>

<ParamField body="ensemble_models" type="list[str]">
  List of model names for `ensemble-sampling`.
</ParamField>

<ParamField body="best_of_n" type="integer" default="1">
  Number of candidates to generate and score when using the policy-backed path.
</ParamField>

<ParamField body="adaptive_temperature" type="boolean" default="false">
  Enables adaptive temperature during retries.
</ParamField>

<ParamField body="adaptive_temperature_schedule" type="list[float]">
  Temperature values to use across retry rounds. Setting this also enables adaptive temperature.
</ParamField>

<ParamField body="adaptive_top_p" type="boolean" default="false">
  Enables adaptive nucleus sampling during retries.
</ParamField>

<ParamField body="adaptive_top_p_schedule" type="list[float]">
  Top-p values to use across retry rounds. Setting this also enables adaptive top-p.
</ParamField>

<ParamField body="confidence_triggered_regeneration" type="boolean" default="false">
  Regenerates when the selected candidate's confidence falls below `confidence_threshold`.
</ParamField>

<ParamField body="confidence_threshold" type="float" default="0.5">
  Minimum candidate confidence when confidence-triggered regeneration is enabled.
</ParamField>

<ParamField body="code_unit_tests" type="list[str]">
  Python unit-test snippets used by the policy verifier for generated code.
</ParamField>

<ParamField body="policy" type="dict">
  Inline policy configuration. See [Policies](/products/mechanex/policies).
</ParamField>

<ParamField body="policy_id" type="string">
  ID of a saved policy to apply during generation.
</ParamField>

<ParamField body="include_trace" type="boolean" default="false">
  Requests trace information from policy execution. `generation.generate()` still returns the output string.
</ParamField>

**Returns**: A plain string containing the generated text.

```python theme={null}
import mechanex as mx

mx.set_key("ax_your_key_here")
output = mx.generation.generate(
    "Summarize the concept of contrastive activation addition in two sentences.",
    max_tokens=128,
    sampling_method="top-p",
    top_p=0.9,
    temperature=0.7,
)
print(output)
```

## Sampling Methods

| Method                         | Description                                                                                        | Key Parameters                            |
| ------------------------------ | -------------------------------------------------------------------------------------------------- | ----------------------------------------- |
| `greedy`                       | Deterministic; always picks the highest-probability token. Best for factual or structured outputs. | --                                        |
| `top-k`                        | Samples from the top K tokens by probability. Default K is 50.                                     | `top_k`                                   |
| `top-p`                        | Nucleus sampling; samples from tokens covering the probability threshold. Balanced and natural.    | `top_p`                                   |
| `min-p`                        | Filters tokens below a relative probability threshold. Adapts dynamically to model confidence.     | `min_p`                                   |
| `typical`                      | Selects tokens based on local entropy, favoring "typical" continuations.                           | `typical_p`                               |
| `ads`                          | Adaptive Determinantal Sampling; maximizes diversity. Remote-only.                                 | `ads_subset_size`, `ads_beta`             |
| `guided-generation`            | Constrains output to match a JSON schema, regex, or grammar.                                       | `json_schema`, `regex_pattern`, `grammar` |
| `constrained-beam-search`      | Generates multiple candidates and selects the best.                                                | --                                        |
| `ssd` / `speculative-decoding` | Uses a smaller draft model for faster generation.                                                  | `draft_model`                             |
| `ensemble-sampling`            | Combines outputs from multiple models via voting.                                                  | `ensemble_models`                         |

## Applying a Steering Vector

Pass a vector ID (returned from `steering.generate_vectors()` or created in Spectra) or a local `{layer: tensor}` dict:

```python theme={null}
output = mx.generation.generate(
    "Tell me about this situation.",
    steering_vector="sv_abc123",
    steering_strength=1.5,
)
```

For local generation with a custom vector:

```python theme={null}
import torch

my_vector = {11: torch.randn(1, 768)}  # layer 11, hidden dim 768
output = mx.generation.generate(
    "Tell me about this situation.",
    steering_vector=my_vector,
    steering_strength=0.8,
)
```

## Local vs Remote

<Note>
  Remote failures are surfaced when auto mode resolves to remote. Call `mx.set_execution_mode("local")` to force a loaded local model. Plain local generation supports `greedy`, `top-k`, `top-p`, `min-p`, and `typical`. Policy-backed local generation can also run saved policies, constraints, retries, and verifiers. ADS and steering perceptrons are not supported locally.
</Note>
