Batches
New in 2.0
Answer thousands of chats at half price with provider-side batch processing
Table of contents
- What are Batches?
- Staging Questions
- Submitting a Batch
- Collecting the Answers
- Tools in Batches
- Rails Integration
- Provider Notes
- Next Steps
After reading this guide, you will know:
- How to stage questions with
ask_later - How to submit chats as a batch with
RubyLLM.batch - How to check on a batch and collect its messages, from any process
- How to handle tool calls in batched conversations
- How to persist batch results with ActiveRecord
What are Batches?
Providers process batched requests asynchronously on their own hardware schedule (usually within an hour; processing ends within 24, and anything still unfinished comes back expired) and charge half price for the privilege. Batches are the right tool whenever nobody is waiting on the answer: nightly classification runs, bulk summarization, evaluations, backfills.
A batch in RubyLLM is an array of chats, each ending on an unanswered question. Everything in the request (model, instructions, history, schemas, temperature, attachments) rides along, so there is nothing new to learn about building requests. The one exception is with_headers: batch APIs have no per-request HTTP headers, so custom headers set on a chat don’t apply to batched requests.
Staging Questions
ask_later is ask without the waiting: it adds your question to the conversation and returns the chat, leaving it awaiting a response.
chat = RubyLLM.chat(model: "claude-haiku-4-5").with_instructions("Be terse.").ask_later("What is 2 + 2?")
chat.complete? # => false, the model still owes a response
Submitting a Batch
Pass the staged chats to RubyLLM.batch. Submission happens immediately and returns a RubyLLM::Batch:
chats = documents.map do |doc|
RubyLLM.chat(model: "claude-haiku-4-5")
.with_instructions("Summarize the document in one paragraph.")
.ask_later(doc.text)
end
batch = RubyLLM.batch(chats)
batch.id # => "msgbatch_01EhcDuvb5XfWqcdJArbsfNX"
batch.status # => "in_progress"
Chats in one batch can use different models, instructions, schemas, and parameters; each request stands alone:
chats = tickets.map do |ticket|
RubyLLM.chat(model: ticket.urgent? ? "claude-sonnet-4-5" : "claude-haiku-4-5")
.with_instructions("You are #{ticket.team} support.")
.ask_later(ticket.body)
end
batch = RubyLLM.batch(chats)
One provider per batch, though: submitting chats from different providers raises ArgumentError.
Collecting the Answers
Persist batch.id and walk away. From any process, any time later, look the batch up by id:
batch = RubyLLM.batch("msgbatch_01EhcDuvb5XfWqcdJArbsfNX", provider: :anthropic)
batch.complete? # => true
In a long-running process, just keep asking: complete? reloads from the provider until the batch ends, then caches.
batch.complete? # => false, check back later
Once processing ends, messages returns the responses in submission order:
batch.messages.each do |message|
message.content # => "The document describes..."
message.input_tokens # => 514
end
When you still hold the submitted chats, each message is also appended to its conversation, so the chats come back complete and ready to continue:
chats.first.messages.map(&:role) # => [:system, :user, :assistant]
chats.first.ask "Shorter, please."
Requests can fail individually (or expire at the 24-hour deadline) without failing the batch. Failed slots are nil in messages (details go to the log), and their chats stay awaiting a response; resubmit them in a fresh batch or finish them synchronously with complete.
Batch results arrive as JSONL rather than individual HTTP responses, so message.raw on a batch message wraps only the result body: raw.body works as usual, but there is no status or headers.
You can stop a running batch with batch.cancel; already-processed requests still return results.
Token usage on batch results is billed at half the standard rates. message.cost doesn’t know about batch pricing yet and reports standard rates.
Tools in Batches
A batch generates one model turn. When the model asks for a tool, the round ends there (providers can’t call your Ruby code) and the response comes back with tool_call? true. You drive the rest of the agentic loop yourself between rounds:
chat.completeruns the tools and finishes the conversation synchronously, at standard prices.chat.run_toolsruns the tools and stops, leaving the chat ready for the model again, i.e. for the next batch.
Looping run_tools into fresh batches runs entire agentic workloads at batch prices, one model turn per round:
chats = tickets.map { |t| support_chat(t).ask_later(t.body) }
loop do
chats.each(&:run_tools)
pending = chats.reject(&:complete?)
break if pending.empty?
batch = RubyLLM.batch(pending)
sleep 60 until batch.complete?
# batch.messages appends each answer to its chat; drop chats whose request
# failed (a nil slot) so the loop can terminate.
chats -= pending.zip(batch.messages).filter_map { |chat, message| chat unless message }
end
run_tools does nothing on chats without pending tool calls, and reject(&:complete?) keeps the chats heading into another round while finished conversations drop out.
For tools that need human approval before acting, don’t try to pause the loop. Ask inside the tool and return the outcome as its result, so the conversation stays valid:
def execute(id:)
return "Approval not given; the user declined." unless approved?(id)
Record.find(id).destroy!
"Deleted record #{id}."
end
Rails Integration
Batch results flow through the same callbacks as synchronous responses, so acts_as_chat persistence works unchanged. ask_later works on your records too, and RubyLLM.batch takes the records directly: staged questions and collected answers land in the database with tokens and model attached.
Add a batch_id column to your chats (add_column :chats, :batch_id, :string) so a later job can find them:
chats = tickets.map do |ticket|
Chat.create!(model: "claude-haiku-4-5").ask_later(ticket.body)
end
batch = RubyLLM.batch(chats)
Chat.where(id: chats).update_all(batch_id: batch.id)
BatchPollJob.perform_later(batch.id, provider: :anthropic)
When polling from another process, the original chat objects are gone; append the messages yourself:
class BatchPollJob < ApplicationJob
def perform(batch_id, provider:)
batch = RubyLLM.batch(batch_id, provider: provider)
return self.class.set(wait: 10.minutes).perform_later(batch_id, provider: provider) unless batch.complete?
Chat.where(batch_id: batch_id).order(:id).zip(batch.messages) do |chat, message|
chat.to_llm.add_completion(message) if message
end
end
end
add_completion is the out-of-band counterpart to a synchronous response: it appends the message and fires the persistence callbacks.
Provider Notes
- Anthropic: up to 100,000 requests or 256 MB per batch. Mixed models in one batch are supported. Request validation is asynchronous: a malformed request comes back as a failed result after the batch ends, not as a submission error. Results stay downloadable for 29 days.
- Other providers: not supported yet.
RubyLLM.batchraisesRubyLLM::Errorfor providers without batch support.