DESIGN 0002: GitHub API Rate Limit Handling
Status: Approved Author: Donald Gifford Date: 2026-03-01
Summary
Add transparent rate limit handling to the GitHub API client via an
http.RoundTripper middleware that wraps the ghinstallation transport.
Provides pre-emptive throttling, primary rate limit retry, and secondary rate
limit retry without changes to the Client interface.
Problem Statement
The app currently logs rate limit info from every GitHub API response
(logRateLimit in client.go) but takes no action. Workers process jobs with
zero delay between API calls. On startup, the scheduler enqueues all repos and
workers hammer the API -- burning ~350 calls/minute during initial
reconciliation. There is no backoff, no pre-emptive throttling, and no retry on
rate limit errors.
Design
Transport Chain
go-github -> rateLimitTransport (NEW) -> ghinstallation -> http.DefaultTransport
Each installation gets its own transport instance (correct -- GitHub rate limits are per-installation).
rateLimitTransport
type rateLimitTransport struct {
next http.RoundTripper
logger *slog.Logger
threshold float64 // e.g. 0.10 = start throttling at 10% remaining
mu sync.Mutex
remaining int
limit int
resetAt time.Time
}
The transport does three things:
- Pre-emptive throttle: When
remaining < 10%of limit, spread remaining budget evenly until reset. - Primary rate limit retry: On 403 with
X-RateLimit-Remaining: 0, wait until reset + retry once. - Secondary rate limit retry: On 403 with
Retry-Afterheader, wait that duration + retry once.
RoundTrip Flow
waitIfNeeded(ctx)-- ifremaining < limit * threshold, sleep to spread budget until reset. Ifremaining == 0, sleep until reset.next.RoundTrip(req)-- make the actual request.updateFromResponse(resp)-- parseX-RateLimit-*headers, update state + Prometheus gauge.- If rate limited (403): compute delay from
X-RateLimit-ResetorRetry-After, log at WARN, sleep (respecting ctx), retry once.
Edge Cases
- First request (
limit == 0): skip pre-emptive check. - Negative wait (clock skew): floor at 1s.
- Concurrent access: mutex protects state, released before sleeping.
- All sleeps use
selectwithctx.Done()for cancellation. - Max 1 retry. Request body replayed via
req.GetBody()on retry.
Configuration
| Variable | Default | Description |
|---|---|---|
RATE_LIMIT_THRESHOLD |
0.10 |
Start throttling at this % remaining |
Metrics
| Metric | Type | Description |
|---|---|---|
repo_guardian_github_rate_limit_waits_total |
Counter | Rate limit waits (label: reason) |
repo_guardian_github_rate_limit_wait_seconds |
Histogram | Duration of rate limit waits |
Files Changed
| Action | File |
|---|---|
| Create | internal/github/ratelimit.go |
| Create | internal/github/ratelimit_test.go |
| Modify | internal/github/client.go |
| Modify | internal/config/config.go |
| Modify | internal/metrics/metrics.go |
| Modify | cmd/repo-guardian/main.go |
Implementation Phases
Phase 1: Infrastructure (Metrics + Config)
Add new Prometheus metrics and RateLimitThreshold config field.
Phase 2: Core Transport
Create rateLimitTransport implementing http.RoundTripper with pre-emptive
throttling and retry logic.
Test cases: 1. Normal request -- no delay 2. Pre-emptive throttle -- low remaining, verify delay 3. Primary rate limit -- 403, wait, retry, 200 4. Secondary rate limit -- 403 with Retry-After, wait, retry, 200 5. Context cancellation -- immediate return 6. Retry exhausted -- 403 on both calls, error returned
Phase 3: Integration (Wire + Cleanup)
Wrap ghinstallation transports with rateLimitTransport. Remove old
logRateLimit method and its 11 call sites.
Phase 4: End-to-End Validation
Manual verification with DRY_RUN=true against live GitHub API.