In web services, rate limiters are used to control the client traffic by limiting the number of requests allowed within a certain time window. Some of the benefits of using rate limiters are:
- Prevent DDoS.
- Reduce costs.
- Reduce chances of being overloaded.
Interview
1. Client side or server side rate limiter?
1. Tech stack or ecosystem
2. Bandwidth or resources
2. On what basis is this rate limiter working on? Client ID or IP?
3. Scale or audience
4. Separate service or integrated serviceRequirements:
- Accurate
- Fast
- Efficient
- Distributed
- Exception
- High fault tolerate, esp. when distributed
Algorithms
Token Bucket
Imagine there is a bucket and a refiller puts some credits in the bucket periodically. Any incoming request will need one credit to be executed, if the bucket is empty, the request will not be honored.
- Parameters: bucket size and refiller rate
- Different buckets for different API endpoints
- Easy to implement and allows traffic burst but parameters are difficult to tune
Leaky Bucket
Requests are stored in queue and being processed at regular intervals. Excessive requests are dropped.
- Parameters: queue size and processing rate
- Efficient for stable processing rate but cannot handle a traffic burst
Fixed Window Count
Each time window has a counter and one request increases the counter by 1.
- Efficient
- Spike traffic might leads to more requests than allowed. For example, a 100 requests per minute rate limit could allow 200 requests from 00:59 to 01:01 as a result.
Sliding Window Log
Request timestamp is logged and outdated logs are evicted.
- Every accurate but not so memory efficient
Sliding Window Count
Previous requests are discounted in the rolling time window.
-
Not 100% accurate but more memory efficient than
[[]]❌
Interview
A Redis in memory database can be used to store the counters.
Interview
Configuration
It can be written in some YAML file stored on disk.
Dealing with Excessive Requests
- Ignore them with some HTTP response
- Re-enqueueing them if servers failed
Distributed Environment
Race Condition
Synchronization
When multiple rate limiter are used for distributed load-balancing, you can either use sticky request — routing requests to the same rate limiter from the same user, or use central database like Redis.
Optimization
- Route requests to nearest servers for low latency;
- Data synchronization with an eventual consistency model;
Monitoring
We want to make sure our rate limiters are working:
- Algorithm is effective (handling burst e.g.)
- Rules are effective (no valid requests dropped)
Interview
- Rate limits can be used in many layers of the application
-
Design products to reduce the need for rate limits
- Caching
- Load-balancing
- Exceptions
- Retry
References