Extended Thinking

New in 1.10

Give reasoning models more time and budget to deliberate, with optional access to thinking output

What is Extended Thinking?
Controlling Extended Thinking
1. Effort and Budget
Streaming with Thinking
ActiveRecord Integration
1. Upgrading Existing Installations
Provider Notes
Next Steps

After reading this guide, you will know:

How to control extended thinking with with_thinking
How effort and budget are sent to providers
How to access thinking output in responses and streams
How to persist thinking data with ActiveRecord

What is Extended Thinking?

Extended Thinking gives supported models more time and a larger computation budget to deliberate before answering. It can improve results on multi-step tasks like coding, math, and logic, at the expense of latency and cost. Some providers can also return a thinking trace or signature alongside the final answer.

Controlling Extended Thinking

Use with_thinking to control models that support thinking. Some models think by default, so with_thinking is for tuning (or disabling) rather than turning it on.

chat = RubyLLM.chat(model: 'claude-opus-4.5')
  .with_thinking(effort: :high, budget: 8000)

response = chat.ask("What is 15 * 23?")

response.thinking&.text
response.thinking&.signature
response.content

with_thinking requires at least one of effort or budget:

chat.with_thinking(effort: :low)
chat.with_thinking(budget: 10_000)
chat.with_thinking(effort: :none)

Effort and Budget

Use effort to pick a qualitative depth (:low, :medium, :high) and budget for models that accept a token cap.

RubyLLM sends effort and budget exactly as provided. Check your provider’s docs for supported values.

Streaming with Thinking

Thinking content is delivered alongside normal content in streaming chunks:

chat = RubyLLM.chat(model: 'claude-opus-4.5')
  .with_thinking(effort: :medium)

chat.ask("Solve this step by step: What is 127 * 43?") do |chunk|
  print chunk.thinking&.text
  print chunk.content
end

Some providers only expose thinking in the final response. In those cases, response.thinking is populated after the stream completes, and chunk.thinking stays empty.

ActiveRecord Integration

When using acts_as_chat and acts_as_message, thinking output is persisted to the message table:

# Migration (generated automatically with new installs)
# t.text :thinking_text
# t.text :thinking_signature
# t.integer :thinking_tokens

response = chat_record.ask("Explain quantum entanglement")
response.thinking&.text
response.thinking_tokens

Upgrading Existing Installations

For 1.10 upgrades, consider using the upgrade guide to run the generator. If you prefer manual migrations, add the columns to your message and tool calls tables:

class AddThinkingToMessages < ActiveRecord::Migration[7.1]
  def change
    add_column :messages, :thinking_text, :text
    add_column :messages, :thinking_signature, :text
    add_column :messages, :thinking_tokens, :integer
    add_column :tool_calls, :thought_signature, :string
  end
end

Provider Notes

Claude uses a thinking budget and can return both text and signature.
Anthropic requires a thinking budget.
Bedrock thinking params are model-dependent; models may accept budget, effort, or provider-specific fields.
Gemini 2.5 uses a token budget; Gemini 3 uses effort levels.
OpenAI reasoning models accept effort but may not return thinking text or signatures.
Perplexity sonar reasoning models stream <think> blocks inside content; RubyLLM extracts them after the response completes.
Mistral Magistral models always think and ignore with_thinking params. Non-magistral models warn if you pass them.
Ollama’s Qwen3 models think by default and only accept effort: :none to disable thinking.
Anthropic and Ollama integrations currently do not report thinking token counts.