Skip to main content
Tokens are the unit of “fuel” a model uses to process text. Every message that comes in, every agent instruction, every variable you include as context, and every response that goes out is measured (internally) in tokens. Think of it like fuel: if your flow “weighs” more or travels further, it tends to consume more.
Note: understanding tokens helps you design faster, more stable, and cheaper agents to operate at scale. See: Pricing

What is a token

A token is a unit of text that the model processes. It does not map 1:1 to “one word”: it can be a complete word or a fragment. In a typical execution, you consume tokens for:
  • Input: the user’s message + the agent’s instruction + the context you inject.
  • Output: the model’s response.
  • Tools and nodes: for example, when a node fetches data (like a JSON) and that data is used as context.

The mental rule: tokens = fuel

1) More weight, more consumption

If your agent loads too many instructions, examples, redundant rules, or unnecessary data, the prompt “weighs” more. Common causes:
  • You write long or repeated instructions.
  • You include too much history or “copy-paste” information into the context.
  • You receive tool responses with large payloads.
Related guides:

2) The further you travel, the more you spend

Conversations with many turns tend to accumulate useful context… and sometimes noise too. If your flow depends on long history, consider:
  • Summarizing or normalizing key information.
  • Saving only what you need for the next step (not the entire chat).
Related guides:

3) The model also matters

“Larger” or more capable models tend to be more expensive to run. In general:
  • Use a specialized or lightweight model for simple tasks (routing, validations, extraction).
  • Reserve a more capable model for complex reasoning or richer generation.

What typically drives consumption in a flow

  • Extensive prompts (especially if they include repeated text).
  • Integrations/APIs that return too much content (huge catalogs, logs, unfiltered JSONs).
  • Variables persisted without criteria that you include again in every step.
  • Long responses when the user only needs a short/structured output.
Related:

Warning signs

If you notice any of these symptoms, you are probably consuming more tokens than necessary:
SymptomLikely causeSolution
High latency (>5s)Context too largeReduce history, filter data
Truncated responsesOutput limit reachedRequest shorter or structured responses
”context length exceeded” errorInput exceeds the limitReduce instructions or context
Inconsistent responsesToo much noise in the promptClean up and prioritize relevant information
Unexpected costsTool/API tokensFilter integration responses

Example: before vs after

You are a virtual customer service assistant for the company
XYZ Corporation S.A. de C.V. which was founded in 1985 and has
a presence in 15 countries across Latin America. Your main objective
is to help customers with their queries in a friendly,
professional, and efficient manner. You should always be courteous and empathetic.
Remember that the customer is always right and you must treat them
with respect. If you don't know something, admit it honestly.

Here is the complete conversation history:
[500 lines of previous chat]

Here is the complete product catalog:
[2000 products with all their attributes]

Respond to the customer in a detailed and complete manner.
Problem: ~15,000+ tokens for input alone.
Use the OpenAI Tokenizer to analyze your prompts and find optimization opportunities.

Best practices for optimizing (without losing quality)

Make “lean” prompts

A good prompt tends to be:
  • Brief
  • With clear rules
  • Without unnecessary examples
  • With explicit output format (if applicable)
Start here:

Decide where each piece of data lives: Context vs Memory

Not everything needs to persist.
  • Context: lives only during the current execution. Use it for temporary calculations and intermediate steps. See: Context
  • Memory: persists across conversations/skills with a time-to-live (TTL) control. Use it for data you truly reuse (e.g., preferences or identifiers). See: Memory

Limit the information you bring from tools

If you use APIs, filter from the source:
  • Request only the necessary fields.
  • Paginate results.
  • Avoid fetching blobs or complete catalogs “just in case.”
Related:

Tokens and billing in Jelou

Learn more about costs at: Even if you don’t pay “per token” directly in all cases, optimizing tokens is still valuable because you reduce operational friction: latency, noise in context, inconsistent responses, and total infrastructure cost when you scale.

Guiding principle

Every token should have real work to do. Decorative tokens = fuel evaporating.