> ## Documentation Index
> Fetch the complete documentation index at: https://docs.jelou.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Tokens

> Understand what tokens are, how they influence AI consumption, and how to optimize prompts and flows in Brain Studio.

Tokens are the **unit of "fuel"** a model uses to process text. Every message that comes in, every agent instruction, every variable you include as context, and every response that goes out is measured (internally) in tokens.

Think of it like fuel: if your flow "weighs" more or travels further, **it tends** to consume more.

> Note: understanding tokens helps you design faster, more stable, and cheaper agents to operate at scale.
> See: [Pricing](/guides/billing/precios)

## What is a token

A token is a unit of text that the model processes. It does not map 1:1 to "one word": it can be a complete word or a fragment.

In a typical execution, you consume tokens for:

* **Input**: the user's message + the agent's **instruction** + the context you inject.
* **Output**: the model's response.
* **Tools and nodes**: for example, when a node fetches data (like a JSON) and that data is used as context.

## The mental rule: tokens = fuel

### 1) More weight, more consumption

If your agent loads too many instructions, examples, redundant rules, or unnecessary data, the prompt "weighs" more.

**Common causes:**

* You write long or repeated instructions.
* You include too much history or "copy-paste" information into the context.
* You receive tool responses with large payloads.

Related guides:

* [Prompting](/guides/agentes-ia/prompting)
* [General Guide (AI Agents)](/guides/agentes-ia/index)

### 2) The further you travel, the more you spend

Conversations with many turns tend to accumulate useful context… and sometimes noise too.

If your flow depends on long history, consider:

* Summarizing or normalizing key information.
* Saving only what you need for the next step (not the entire chat).

Related guides:

* [Variables - Quick Guide](/guides/variables/guia-rapida)
* [Memory](/guides/variables/memory)
* [Context](/guides/variables/context)

### 3) The model also matters

"Larger" or more capable models tend to be more expensive to run. In general:

* Use a specialized or lightweight model for simple tasks (routing, validations, extraction).
* Reserve a more capable model for complex reasoning or richer generation.

## What typically drives consumption in a flow

* **Extensive prompts** (especially if they include repeated text).
* **Integrations/APIs** that return too much content (huge catalogs, logs, unfiltered JSONs).
* **Variables persisted without criteria** that you include again in every step.
* **Long responses** when the user only needs a short/structured output.

Related:

* [API Node](/guides/nodos/api)
* [Variable Node](/guides/nodos/variable)
* [AI Task](/guides/nodos/ai-task)

## Warning signs

If you notice any of these symptoms, you are probably consuming more tokens than necessary:

| Symptom                         | Likely cause                 | Solution                                     |
| ------------------------------- | ---------------------------- | -------------------------------------------- |
| High latency (>5s)              | Context too large            | Reduce history, filter data                  |
| Truncated responses             | Output limit reached         | Request shorter or structured responses      |
| "context length exceeded" error | Input exceeds the limit      | Reduce instructions or context               |
| Inconsistent responses          | Too much noise in the prompt | Clean up and prioritize relevant information |
| Unexpected costs                | Tool/API tokens              | Filter integration responses                 |

## Example: before vs after

<Tabs>
  <Tab title="Before (heavy)">
    ```text theme={null}
    You are a virtual customer service assistant for the company
    XYZ Corporation S.A. de C.V. which was founded in 1985 and has
    a presence in 15 countries across Latin America. Your main objective
    is to help customers with their queries in a friendly,
    professional, and efficient manner. You should always be courteous and empathetic.
    Remember that the customer is always right and you must treat them
    with respect. If you don't know something, admit it honestly.

    Here is the complete conversation history:
    [500 lines of previous chat]

    Here is the complete product catalog:
    [2000 products with all their attributes]

    Respond to the customer in a detailed and complete manner.
    ```

    **Problem**: \~15,000+ tokens for input alone.
  </Tab>

  <Tab title="After (optimized)">
    ```text theme={null}
    XYZ support assistant. Respond in English, maximum 2 paragraphs.

    Customer context:
    - Name: {{memory.name}}
    - Last order: {{memory.last_order}}

    Relevant products (filtered):
    {{context.filtered_products}}

    Question: {{input.message}}
    ```

    **Result**: \~200-400 tokens. Same result, 50x more efficient.
  </Tab>
</Tabs>

<Tip>
  Use the [OpenAI Tokenizer](https://platform.openai.com/tokenizer) to analyze your prompts and find optimization opportunities.
</Tip>

## Best practices for optimizing (without losing quality)

### Make "lean" prompts

A good prompt tends to be:

* Brief
* With clear rules
* Without unnecessary examples
* With explicit output format (if applicable)

Start here:

* [Prompting](/guides/agentes-ia/prompting)

### Decide where each piece of data lives: Context vs Memory

Not everything needs to persist.

* **Context**: lives only during the current execution. Use it for temporary calculations and intermediate steps.
  See: [Context](/guides/variables/context)
* **Memory**: persists across conversations/skills with a time-to-live (TTL) control. Use it for data you truly reuse (e.g., preferences or identifiers).
  See: [Memory](/guides/variables/memory)

### Limit the information you bring from tools

If you use APIs, filter from the source:

* Request only the necessary fields.
* Paginate results.
* Avoid fetching blobs or complete catalogs "just in case."

Related:

* [API Node](/guides/nodos/api)

## Tokens and billing in Jelou

Learn more about costs at:

* [How it works](/guides/billing/introduccion)
* [Pricing](/guides/billing/precios)
* [Refunds and retries](/guides/billing/reembolsos)

Even if you don't pay "per token" directly in all cases, optimizing tokens is still valuable because you reduce operational friction: latency, noise in context, inconsistent responses, and total infrastructure cost when you scale.

## Guiding principle

Every token should have real work to do.

Decorative tokens = fuel evaporating.
