Batches

New in 2.0

Answer thousands of chats at half price with provider-side batch processing

What are Batches?
Staging Questions
Submitting a Batch
Collecting the Answers
Tools in Batches
Rails Integration
Provider Notes
Next Steps

After reading this guide, you will know:

How to stage questions with ask_later
How to submit chats as a batch with RubyLLM.batch
How to check on a batch and collect its messages, from any process
How to handle tool calls in batched conversations
How to persist batch results with ActiveRecord

What are Batches?

Providers process batched requests asynchronously on their own hardware schedule (usually within an hour; processing ends within 24, and anything still unfinished comes back expired) and charge half price for the privilege. Batches are the right tool whenever nobody is waiting on the answer: nightly classification runs, bulk summarization, evaluations, backfills.

A batch in RubyLLM is an array of chats, each ending on an unanswered question. Everything in the request (model, instructions, history, schemas, temperature, attachments) rides along, so there is nothing new to learn about building requests. The one exception is with_headers: batch APIs have no per-request HTTP headers, so custom headers set on a chat don’t apply to batched requests.

Staging Questions

ask_later is ask without the waiting: it adds your question to the conversation and returns the chat, leaving it awaiting a response.

chat = RubyLLM.chat(model: "claude-haiku-4-5").with_instructions("Be terse.").ask_later("What is 2 + 2?")
chat.complete? # => false, the model still owes a response

Submitting a Batch

Pass the staged chats to RubyLLM.batch. Submission happens immediately and returns a RubyLLM::Batch:

chats = documents.map do |doc|
  RubyLLM.chat(model: "claude-haiku-4-5")
    .with_instructions("Summarize the document in one paragraph.")
    .ask_later(doc.text)
end

batch = RubyLLM.batch(chats)
batch.id     # => "msgbatch_01EhcDuvb5XfWqcdJArbsfNX"
batch.status # => "in_progress"

Chats in one batch can use different models, instructions, schemas, and parameters; each request stands alone:

chats = tickets.map do |ticket|
  RubyLLM.chat(model: ticket.urgent? ? "claude-sonnet-4-5" : "claude-haiku-4-5")
    .with_instructions("You are #{ticket.team} support.")
    .ask_later(ticket.body)
end

batch = RubyLLM.batch(chats)

One provider per batch, though: submitting chats from different providers raises ArgumentError.

Collecting the Answers

Persist batch.id and walk away. From any process, any time later, look the batch up by id:

batch = RubyLLM.batch("msgbatch_01EhcDuvb5XfWqcdJArbsfNX", provider: :anthropic)
batch.complete? # => true

In a long-running process, just keep asking: complete? reloads from the provider until the batch ends, then caches.

batch.complete? # => false, check back later

Once processing ends, messages returns the responses in submission order:

batch.messages.each do |message|
  message.content       # => "The document describes..."
  message.input_tokens  # => 514
end

When you still hold the submitted chats, each message is also appended to its conversation, so the chats come back complete and ready to continue:

chats.first.messages.map(&:role) # => [:system, :user, :assistant]
chats.first.ask "Shorter, please."

Requests can fail individually (or expire at the 24-hour deadline) without failing the batch. Failed slots are nil in messages (details go to the log), and their chats stay awaiting a response; resubmit them in a fresh batch or finish them synchronously with complete.

Batch results arrive as JSONL rather than individual HTTP responses, so message.raw on a batch message wraps only the result body: raw.body works as usual, but there is no status or headers.

You can stop a running batch with batch.cancel; already-processed requests still return results.

Token usage on batch results is billed at half the standard rates. message.cost doesn’t know about batch pricing yet and reports standard rates.

Tools in Batches

A batch generates one model turn. When the model asks for a tool, the round ends there (providers can’t call your Ruby code) and the response comes back with tool_call? true. You drive the rest of the agentic loop yourself between rounds:

chat.complete runs the tools and finishes the conversation synchronously, at standard prices.
chat.run_tools runs the tools and stops, leaving the chat ready for the model again, i.e. for the next batch.

Looping run_tools into fresh batches runs entire agentic workloads at batch prices, one model turn per round:

chats = tickets.map { |t| support_chat(t).ask_later(t.body) }

loop do
  chats.each(&:run_tools)
  pending = chats.reject(&:complete?)
  break if pending.empty?

  batch = RubyLLM.batch(pending)
  sleep 60 until batch.complete?

  # batch.messages appends each answer to its chat; drop chats whose request
  # failed (a nil slot) so the loop can terminate.
  chats -= pending.zip(batch.messages).filter_map { |chat, message| chat unless message }
end

run_tools does nothing on chats without pending tool calls, and reject(&:complete?) keeps the chats heading into another round while finished conversations drop out.

For tools that need human approval before acting, don’t try to pause the loop. Ask inside the tool and return the outcome as its result, so the conversation stays valid:

def execute(id:)
  return "Approval not given; the user declined." unless approved?(id)

  Record.find(id).destroy!
  "Deleted record #{id}."
end

Rails Integration

Batch results flow through the same callbacks as synchronous responses, so acts_as_chat persistence works unchanged. ask_later works on your records too, and RubyLLM.batch takes the records directly: staged questions and collected answers land in the database with tokens and model attached.

Add a batch_id column to your chats (add_column :chats, :batch_id, :string) so a later job can find them:

chats = tickets.map do |ticket|
  Chat.create!(model: "claude-haiku-4-5").ask_later(ticket.body)
end

batch = RubyLLM.batch(chats)
Chat.where(id: chats).update_all(batch_id: batch.id)
BatchPollJob.perform_later(batch.id, provider: :anthropic)

When polling from another process, the original chat objects are gone; append the messages yourself:

class BatchPollJob < ApplicationJob
  def perform(batch_id, provider:)
    batch = RubyLLM.batch(batch_id, provider: provider)
    return self.class.set(wait: 10.minutes).perform_later(batch_id, provider: provider) unless batch.complete?

    Chat.where(batch_id: batch_id).order(:id).zip(batch.messages) do |chat, message|
      chat.to_llm.add_completion(message) if message
    end
  end
end

add_completion is the out-of-band counterpart to a synchronous response: it appends the message and fires the persistence callbacks.

Provider Notes

Anthropic: up to 100,000 requests or 256 MB per batch. Mixed models in one batch are supported. Request validation is asynchronous: a malformed request comes back as a failed result after the batch ends, not as a submission error. Results stay downloadable for 29 days.
Other providers: not supported yet. RubyLLM.batch raises RubyLLM::Error for providers without batch support.