Introduction
Offset-based pagination (LIMIT 20 OFFSET 40) is a common API pagination pattern. However, when new records are inserted or existing records are deleted between page requests, the offset shifts, causing some records to appear on multiple pages (duplicates) and others to be skipped entirely. This is particularly problematic for high-write-volume APIs and clients that paginate through all results.
Symptoms
- Paginated API responses contain duplicate records across consecutive pages
- Some records are missing from the full result set when all pages are combined
- Page 1 shows items A, B, C; after a new item is inserted, page 2 also shows C
- Total count changes between page requests
- Error message: No specific error -- duplicates are silently returned
Common Causes
- New records inserted at the beginning of the result set between page requests
- Records deleted from earlier pages shifting later pages' offsets
- No stable sort order -- results ordered by a non-unique column
- Pagination client not handling insertion/deletion during traversal
- API using
OFFSETwithout a stable cursor or keyset
Step-by-Step Fix
- 1.Verify the duplicate results across pages: Confirm the pagination issue.
- 2.```bash
- 3.# Fetch two consecutive pages
- 4.curl -s "https://api.example.com/items?limit=10&offset=0" | jq '.items[].id' > page1.txt
- 5.sleep 5 # During which new items may be inserted
- 6.curl -s "https://api.example.com/items?limit=10&offset=10" | jq '.items[].id' > page2.txt
- 7.# Check for overlap
- 8.comm -12 <(sort page1.txt) <(sort page2.txt)
- 9.
` - 10.Switch to cursor-based (keyset) pagination: Use a stable reference point.
- 11.```sql
- 12.-- BEFORE: offset-based (unstable)
- 13.SELECT * FROM items ORDER BY created_at DESC LIMIT 10 OFFSET 20;
-- AFTER: cursor-based (stable) SELECT * FROM items WHERE created_at < '2024-01-15T10:00:00' ORDER BY created_at DESC LIMIT 10; ```
- 1.Update the API to support cursor pagination: Return a cursor token.
- 2.```python
- 3.# Flask API example
- 4.@app.route('/items')
- 5.def get_items():
- 6.cursor = request.args.get('cursor')
- 7.limit = min(int(request.args.get('limit', 20)), 100)
if cursor: # Decode cursor (base64 encoded timestamp + ID) cursor_time, cursor_id = decode_cursor(cursor) items = db.query("SELECT * FROM items WHERE (created_at, id) < (?, ?) ORDER BY created_at DESC, id DESC LIMIT ?", cursor_time, cursor_id, limit + 1) else: items = db.query("SELECT * FROM items ORDER BY created_at DESC, id DESC LIMIT ?", limit + 1)
next_cursor = encode_cursor(items[-1].created_at, items[-1].id) if len(items) > limit else None return jsonify({'items': items[:limit], 'next_cursor': next_cursor}) ```
- 1.Add a stable sort key to prevent ambiguity: Ensure deterministic ordering.
- 2.```sql
- 3.-- Always include a unique column (like ID) in the ORDER BY
- 4.SELECT * FROM items ORDER BY created_at DESC, id DESC LIMIT 10 OFFSET 20;
- 5.
` - 6.Verify no duplicates with the new pagination method: Test the fix.
- 7.```bash
- 8.# Use cursor pagination and verify no overlap
- 9.curl -s "https://api.example.com/items?limit=10" | jq -r '.items[].id' > page1.txt
- 10.NEXT=$(curl -s "https://api.example.com/items?limit=10" | jq -r '.next_cursor')
- 11.curl -s "https://api.example.com/items?limit=10&cursor=$NEXT" | jq -r '.items[].id' > page2.txt
- 12.comm -12 <(sort page1.txt) <(sort page2.txt)
- 13.# Should return nothing (no duplicates)
- 14.
`
Prevention
- Use cursor-based pagination for APIs with frequent insertions or deletions
- Always include a unique column in the ORDER BY clause for deterministic ordering
- Document the pagination strategy and its behavior under concurrent writes
- Add integration tests that verify no duplicates during paginated traversal
- Monitor for duplicate record reports from API consumers
- Consider using keyset pagination as the default for all list endpoints