How To: Manage RDS Connections from AWS Lambda Serverless Functions
Managing database connections with AWS Lambda functions can become an issue when your application reaches scale. Learn strategies for managing your RDS connections in a serverless application.
Someone asked a great question on my How To: Reuse Database Connections in AWS Lambda post about how to end the unused connections left over by expired Lambda functions:
I'm playing around with AWS lambda and connections to an RDS database and am finding that for the containers that are not reused the connection remains. I found before that sometimes the connections would just die eventually. I was wondering, is there some way to manage and/or end the connections without needing to wait for them to end on their own? The main issue I'm worried about is that these unused connections would remain for an excessive amount of time and prevent new connections that will actually be used from being made due to the limit on the number of connections.
🧟♂️ Zombie RDS connections leftover on container expiration can become a problem when you start to reach a high number of concurrent Lambda executions. My guess is that this is why AWS is launching Aurora Serverless, to deal with relational databases at scale. At the time of this writing it is still in preview mode.
Update September 2, 2018: I wrote an NPM module that manages MySQL connections for you in serverless environments. Check it out here.
Update August 9, 2018: Aurora Serverless is now Generally Available!
Overall, I've found that Lambda is pretty good about closing database connections when the container expires, but even if it does it reliably, it still doesn't solve the MAX CONNECTIONS problem. Here are several strategies that I've used to deal with this issue.
1. Implement a good caching strategy
As your application scales, use something like REDIS to cache common data queries. REDIS's maxclient
default is 65,000, far greater than the connection limits for RDS. You should also design your application to ONLY create a connection to RDS when you need to. It's possible that your thousands of concurrent requests might not actually need a database connection if they get data from a cache or are performing another action. Your connect()
call should always be INSIDE your handler and only called when your flow requires a connection.
2. Lower your connection timeouts
RDS doesn't give your root
account the SUPER
privilege, but you can create "RDS DB Parameter Groups" that allow you to set the wait_timeout
and interactive_timeout
to values more appropriate for serverless applications (the default is 8 hours). I've set these to 10 minutes in certain implementations and it seems to work well. My code is always designed to reconnect if the connection drops, so even if "warm" containers lose their connection, this isn't an issue.
3. Limit concurrent executions
If your database only supports 1,000 concurrent connections, limit your Lambda function's concurrent connections to something less that that. This isn't necessarily ideal, since it will impact your users during high traffic situations, but it will prevent the database from maxing out connections.
4. Limit connections per username
Another strategy is to set a resource limit on the number of connections a given db user can have. MAX USER_CONNECTIONS
can be limited to a number less than the max connections allowed by the RDS instance. This will throw an error in your Lambda function, but it is possible for you to gracefully handle that error with a Too many connections
response or something else. This is handy, because you can then...
5. Clear connections with a proxy function
Even without SUPER
privilege, RDS does allow SHOW PROCESSLIST
which lets you view all open connections. You could trigger a proxy Lambda function on a Too many connections
error (or on a regular schedule) and have it inspect all the open connections. Then it can issue a CALL mysql.rds_kill(thread-ID)
on sleeping connections with high time
values.
6. Increase max_connections
setting
As I mentioned earlier, RDS lets you create "RDS DB Parameter Groups" to adjust your database's default settings. This can be used to set the max_connections
value up to 16,000 connections! This obviously has its tradeoffs, especially around memory and CPU consumption. If you are performing long-running or CPU intensive queries, make sure you size your instances to appropriately handle the load. There is a reason why the defaults are the defaults. Also, you can't set this value with Aurora Serverless, so something to think about there.
7. Use the serverless-mysql
NPM module
I found myself continually using the same strategies to manage RDS connections, so I finally decided to package them into an NPM module. This module not only handles connection management, but also throws in a bunch of extras, like transaction support, to make working with MySQL in your apps much easier. You can install it by running npm i serverless-mysql
. The documentation and code is available on GitHub.
The bottomline
Serverless isn't perfect...yet! 😉 However, using appropriate strategies like the ones mentioned above help to work around some of the current limitations. And don't forget to check out my new NPM module: serverless-mysql.
Do you have other strategies for dealing with RDS connections in your serverless functions? Let me know in the comments.