Nothing kills developer flow like hitting a rate limit mid-task. You are deep in a complex refactoring session, the AI assistant has full context of your codebase, and suddenly: "Rate limit exceeded. Please wait before making more requests."
Rate Limit Calculator
Model API throttling budgets, per-client limits, and queue sizing to avoid 429 errors
Open the full Rate Limit Calculator tool →Understanding how rate limits work across different AI coding tools helps you plan your workflow, choose the right tool for each task, and avoid these productivity-killing interruptions.
Why Rate Limits Exist
Rate limits serve multiple purposes that ultimately benefit all users:
Infrastructure protection: AI models require significant computational resources. Limits prevent any single user from monopolizing capacity.
Fair access: Without limits, heavy users could degrade service quality for everyone else during peak times.
Cost management: For providers offering subscription tiers, limits create natural boundaries between pricing levels.
Quality control: Limits can help ensure models have sufficient compute time to generate high-quality responses rather than being rushed.
The challenge for developers is that different tools implement limits differently---some use tokens, others use requests, and some use compute-based measurements. Knowing which system each tool uses is essential for planning your workflow.
How Rate Limits Work
AI coding tools use four main approaches to rate limiting:
Token-Based Limits
Token-based systems measure the total amount of text processed, counting both your prompts (input tokens) and the AI's responses (output tokens).
How it works: Every word, punctuation mark, and code symbol counts toward your limit. A single character might be one token, or a common word might be one token, while unusual terms might split into multiple tokens.
Example calculation:
Your prompt: 500 tokens
AI response: 1,200 tokens
Total consumed: 1,700 tokens
Affected tools: Claude Code uses token-based limits tied to subscription tiers.
Request-Based Limits
Request-based systems count the number of individual API calls, regardless of how much text each request contains.
How it works: Whether you ask "what is 2+2?" or paste your entire codebase for analysis, each counts as one request.
Affected tools: Gemini CLI uses request-based limits on the free tier.
Time-Window Limits
Time-window limits cap usage within a rolling period, such as per minute, per hour, or per 5-hour window.
How it works: You might have 60 requests per minute. Once you hit 60, you wait until the oldest request "ages out" of the window before making another.
Affected tools: Gemini CLI (per-minute limits), Codex CLI (5-hour windows).
Compute-Based Limits
Compute-based systems measure the actual processing resources consumed, which varies by model complexity and task type.
How it works: Using a more powerful model or requesting "extended thinking" consumes more of your quota than a simple query to a smaller model.
Affected tools: GitHub Copilot's premium request system works this way.
Claude Code Rate Limits
Claude Code uses a token-based system with two subscription tiers that reset on different schedules.
Pro Tier ($20/month)
Limit type: Token-based daily quota
Reset timing: Midnight UTC (not your local time)
Approximate capacity: ~45 long messages per day (equivalent usage)
Model access: Claude Sonnet 4, Claude Haiku, limited Opus 4.5 access
When limits are exhausted: Claude automatically falls back to lower-tier models (Sonnet instead of Opus), then shows rate limit errors.
Max Tier ($100/month)
Limit type: Token-based weekly quota
Reset timing: Weekly (check your account for exact day)
Approximate capacity: 5x Pro's allowance (~225 long messages per day equivalent)
Model access: Full access to all models including Opus 4.5 with priority
When limits are exhausted: Same degradation pattern as Pro, but takes much longer to reach.
Model-Specific Consumption Rates
Not all Claude models consume quota equally:
| Model | Relative Cost | Best Use Case |
|---|---|---|
| Opus 4.5 | 1x (baseline, highest) | Complex architecture, security reviews |
| Sonnet 4 | ~0.2x | Most daily coding tasks |
| Haiku | ~0.05x | Quick questions, simple edits |
Practical implication: If Opus allows ~20 complex requests per day on Pro, Sonnet might allow 100+ requests.
Warning Signs
- Slower responses before the hard limit
- Automatic model downgrades (you requested Opus, got Sonnet)
- "Rate limit exceeded" messages with countdown timers
- Inability to switch to Opus with
/model opuscommand
For detailed troubleshooting, see our guide on fixing Claude Code rate limits.
Gemini CLI Rate Limits
Gemini CLI stands out as the only major AI coding CLI with a genuinely free tier. However, that free tier comes with limits that were notably reduced in late 2024.
Free Tier Limits
Daily requests: ~100-250 requests per day (down from ~1,000 previously)
Per-minute requests: 10-15 requests per minute
Context window: 1 million tokens (largest available across all tools)
Reset timing: Rolling 24-hour window
Pro vs Flash Model Limits
The model you choose affects your effective limits:
| Model | Free Tier Daily | Per-Minute |
|---|---|---|
| Gemini 2.5 Pro | 50-100 | 5-10 |
| Gemini 2.0 Pro | 100-250 | 10-15 |
| Gemini 2.0 Flash | 250-500 | 30-60 |
Key insight: Flash models have significantly higher limits. Use Flash for most tasks and reserve Pro for complex reasoning.
Vertex AI (Paid) Limits
Switching to Vertex AI removes shared capacity constraints:
- Dedicated quota based on your billing tier
- No automatic model downgrading
- Limits can be increased on request
- SLA guarantees for production workloads
Cost: Pay-as-you-go pricing (Gemini 2.0 Flash: $0.10/1M input tokens, $0.40/1M output tokens).
Warning Signs
- "Quota exceeded" error messages
- Automatic fallback from Pro to Flash
- Longer delays between request and response
- HTTP 429 errors in verbose mode
For model switching strategies, see our guide on switching between Gemini models.
OpenAI Codex CLI Rate Limits
Codex CLI ties directly to your ChatGPT subscription and uses a hybrid system with both standard and "premium" request pools.
ChatGPT Plus ($20/month)
Standard requests: Generally unlimited for GPT-4o (but may be throttled under high demand)
Premium requests: 30-150 messages per 5-hour window
Reset timing: Rolling 5-hour window
What counts as premium:
- o1 model usage
- Extended thinking mode
- Complex multi-step reasoning tasks
ChatGPT Pro ($200/month)
Standard requests: Unlimited
Premium requests: Significantly higher than Plus (exact limits vary)
Reset timing: Rolling 5-hour window
Additional benefits: Priority access during peak times, no slowdowns
Model-Specific Considerations
Different models consume your quota differently:
| Model | Request Type | Notes |
|---|---|---|
| GPT-4o | Standard | Generally unlimited |
| GPT-4o mini | Standard | Most efficient option |
| o1 | Premium | Each request costs multiple premium units |
| o1-pro | Premium | Highest cost per request |
Warning Signs
- "You've reached your limit" messages
- Noticeably slower response times
- Suggestions to "try again later"
- Model availability changes mid-session
For installation and authentication details, see our guide on installing OpenAI Codex CLI.
GitHub Copilot CLI Rate Limits
Copilot CLI uses a "premium request" system that counts against a monthly allocation rather than daily or hourly windows.
Premium Request System
Pro ($10/month): 300 premium requests per month
Pro+ ($39/month): 1,500 premium requests per month
Business/Enterprise: Custom allocations, typically higher
Overage cost: $0.04 per premium request beyond your allocation
What Counts as Premium
| Model | Premium Cost |
|---|---|
| Claude Sonnet 4.5 (default) | 1 request |
| GPT-4.5 | 1 request |
| o1 | Multiple requests |
| Claude Opus | Multiple requests |
Optimization tip: Stick with Claude Sonnet 4.5 (the default model) for most tasks. It is highly capable and only costs 1 premium request.
Monthly Reset
Unlike other tools with daily or rolling windows, Copilot's limits reset on your billing cycle---typically the first of each month.
Budgeting implication: You can plan your monthly AI assistance usage more predictably. At 300 requests/month on Pro, that is roughly 10 requests per day.
Warning Signs
- Premium request counter in account settings approaching limit
- Warnings when selecting expensive models
- Degraded model availability late in billing cycle
For setup instructions, see our guide on installing GitHub Copilot CLI.
Comparison Table
| Tool | Limit Type | Reset Period | Warning System | Overage Handling |
|---|---|---|---|---|
| Claude Code (Pro) | Token-based | Daily (midnight UTC) | Model downgrades, then errors | Hard stop, wait for reset |
| Claude Code (Max) | Token-based | Weekly | Model downgrades, then errors | Hard stop, wait for reset |
| Gemini CLI (Free) | Request-based | Rolling 24hr + per-minute | HTTP 429 errors, slowdowns | Throttled, then blocked |
| Gemini CLI (Vertex) | Token-based | Pay-as-you-go | None (unlimited within quota) | Billed automatically |
| Codex CLI (Plus) | Hybrid | Rolling 5-hour | Slowdowns, model unavailability | Temporary blocks |
| Codex CLI (Pro) | Hybrid | Rolling 5-hour | Very high limits before warnings | Temporary slowdowns |
| Copilot CLI (Pro) | Premium requests | Monthly | Counter in account settings | $0.04/request overage |
| Copilot CLI (Pro+) | Premium requests | Monthly | Counter in account settings | $0.04/request overage |
Signs You Are Hitting Limits
Recognizing rate limit symptoms early helps you adjust before losing your flow completely.
Degraded Model Access
The clearest sign is being forced to use a less capable model:
- You request Opus but get Sonnet
- Pro models unavailable, only Flash responding
- "Model unavailable" or "try a different model" messages
Slower Responses
Before hard limits hit, many systems throttle:
- Responses take noticeably longer (30+ seconds instead of 5-10)
- "Thinking" indicators spin longer than usual
- Multiple retries needed for complex requests
Error Messages
Explicit rate limit errors are unmistakable:
Rate limit exceeded. Please wait before making more requests.
You've reached your usage limit for this period.
Resets in: 4 hours 23 minutes
HTTP 429: Too Many Requests
Feature Restrictions
Some tools disable features when approaching limits:
- Extended thinking mode unavailable
- File reading/writing disabled
- Context window reduced
- Web search grounding disabled
Strategies to Stay Productive
1. Strategic Model Selection
Use cheaper models for simpler tasks:
| Task | Recommended Model |
|---|---|
| Quick syntax questions | Haiku, Flash, GPT-4o mini |
| Code generation | Sonnet, Flash, GPT-4o |
| Complex debugging | Opus, Pro, o1 |
| Architecture decisions | Opus, 2.5 Pro, o1 |
Claude Code: Default to Sonnet, switch to Opus only for genuinely complex reasoning.
Gemini CLI: Start with Flash, escalate to Pro when needed.
Codex CLI: Use GPT-4o for most tasks, reserve o1 for complex problems.
Copilot CLI: Stick with Claude Sonnet 4.5 (default) unless you specifically need Opus capabilities.
2. Request Batching
Instead of many small requests:
Inefficient:
"What does function X do?"
"What does function Y do?"
"How do X and Y interact?"
Efficient:
"Explain functions X and Y and how they interact. Include their inputs, outputs, and any shared state."
3. Tool Rotation
When one tool approaches its limit, switch to another:
- Primary work: Claude Code (highest capability)
- Research/exploration: Gemini CLI (free tier, 1M context)
- GitHub workflows: Copilot CLI (native integration)
- Code review: Codex CLI (
/reviewcommand)
4. Time-Zone Optimization
Different reset schedules create opportunities:
- Claude Code: Resets at midnight UTC
- US Pacific: 4:00 PM previous day
- US Eastern: 7:00 PM previous day
- Central Europe: 1:00 AM
Plan heavy Claude usage for after your local reset time.
- Copilot CLI: Resets on billing cycle (typically 1st of month)
Schedule intensive Copilot work for early in your billing cycle.
5. Context Efficiency
Reduce token consumption by providing focused context:
# Instead of loading everything
claude "review this codebase for security issues"
# Load only relevant files
claude "review the authentication logic" --include src/auth/**
When You Hit the Wall
Despite best efforts, you will occasionally exhaust your limits. Here is how to recover:
Fallback Options
If Claude is limited:
- Switch to Gemini CLI (free tier) for exploration
- Use Codex CLI for implementation tasks
- Use Copilot CLI for GitHub-related work
If Gemini is limited:
- Codex CLI for lighter tasks
- Reserve Claude for critical work
- Wait for per-minute limits to roll over
If Codex is limited:
- Gemini free tier for research
- Claude for implementation
- Wait for 5-hour window to reset
If Copilot is limited:
- Pay $0.04/request overage for critical work
- Switch to other tools until billing cycle resets
- Consider upgrading to Pro+ for 5x the allocation
Multi-Tool Strategy
The most resilient approach combines tools:
# Morning: Use Claude (fresh daily quota)
claude "refactor the payment processing module"
# Midday: Switch to Gemini for exploration
gemini "analyze how error handling works across the codebase"
# Afternoon: Codex for code generation
codex "write tests for the refactored payment module"
# Evening: Copilot for GitHub workflows
copilot "create a PR description for today's changes"
Upgrading Tiers
When rate limits consistently block your work, consider upgrading:
| Current Tier | Upgrade Option | Cost Increase | Benefit |
|---|---|---|---|
| Claude Pro | Claude Max | +$80/month | 5x usage, weekly reset |
| Codex Plus | Codex Pro | +$180/month | Much higher premium limits |
| Copilot Pro | Copilot Pro+ | +$29/month | 5x premium requests |
| Gemini Free | Vertex AI | Pay-as-go | No shared limits |
Cost-benefit analysis: If rate limits cost you 2+ hours of productivity per month, upgrades often pay for themselves.
Conclusion
Rate limits are an unavoidable part of AI coding tool usage, but understanding how they work transforms them from mysterious blockers into manageable constraints.
Key takeaways:
-
Know your reset times: Claude resets at midnight UTC daily (Pro) or weekly (Max). Copilot resets monthly. Codex uses rolling 5-hour windows. Gemini has both daily and per-minute limits.
-
Choose models strategically: Not every task needs the most powerful model. Using Sonnet instead of Opus, or Flash instead of Pro, dramatically extends your effective limits.
-
Rotate between tools: Each tool has different limits on different schedules. Using 2-3 tools in combination virtually eliminates rate limit interruptions.
-
Plan around limits: Heavy Claude usage early in the day, Gemini for afternoon research, Copilot for end-of-day PR workflows.
-
Monitor proactively: Watch for warning signs (slower responses, model downgrades) before hitting hard limits.
The developers who maintain productivity with AI tools are not necessarily those with the highest subscription tiers---they are those who understand the systems well enough to work within them efficiently.
For tool-specific guidance, explore our Knowledge Base articles on Claude Code, Gemini CLI, Codex CLI, and Copilot CLI.