Solving the Cold Start Problem

I have a serious problem: I love AWS Lambda! In fact, I love it so much that I've pretty much gone all in on this whole #serverless thing. My only problem is Cold Starts, especially for my API use cases. Read the solution I proposed to the Lambda team.

Posted in #serverless

Dear AWS Lambda Team,

I have a serious problem: I love AWS Lambda! In fact, I love it so much that I've pretty much gone all in on this whole #serverless thing. I use Lambda for almost everything now. I use it to build backend data processing pipelines, distribute long running tasks, and respond to API requests. Heck, I even built an Alexa app just for fun. I found myself building so many RESTful APIs using Lambda and API Gateway that I went ahead and created the open source Lambda API web framework to allow users to more efficiently route and respond to API Gateway requests.

Serverless technologies, like Lambda, have revolutionized how developers think about building applications. Abstracting away the underlying compute layer and replacing it with on-demand, near-infinitely scalable function containers is brilliant. As we would say out here in Boston, "you guys are wicked smaht." But I think you missed something very important. In your efforts to conform to the "pay only for the compute time you consume" promise of serverless, you inadvertently handicapped the service. My biggest complaint, and the number one objection that I hear from most of the "serverless-is-not-ready-for-primetime" naysayers, are Cold Starts.

I know you have heard this before. A quick Google search for "Lambda Cold Start" and you'll find countless articles on the subject. Most of them end with the ping-your-function-every-few-minutes hack, which you and I both know doesn't solve the scaling concurrency problem. I build most of my functions in Node.js, so I don't see cold start times anywhere near what those poor people using Java see. But, when we factor 30-70 ms for the API Gateway request to roundtrip, I see cold start times of several seconds.

There are other half-baked solutions as well, like increasing memory. But I think it's sort of dumb to provision 2,048 megabytes of memory for a function that never uses more than 50. Now I'm increasing the per execution cost for thousands of warm executions to try and optimize for one cold start per container. While the abstraction still seems worth it, the underlying cost optimizations start making less sense.

Are Cold Starts really that big of a problem?

I think so, especially for my API use cases. A funny thing about CloudWatch Logs is that it only records the Lambda execution time, not the total invocation time. I get that this is for billing purposes, and I appreciate that you don't charge me for the time it takes to load the function. And while there is a noticeable difference in execution times between warm and cold functions (maybe 20 ms, depending on the configuration), the API Gateway request does including that loading time, which means even small functions with lots of memory can take at least 2 seconds to respond. In many cases I've seen this number closer to 10 seconds when running inside a VPC. Meanwhile, the function is still only recording a 20 ms total execution time.

Even two seconds is a very long time. It's the difference between someone clicking the submit button multiple times because the form didn't respond quickly enough. It's the user who sees a blank page in a widget and scrolls passed it. And what if something, as the kids say today, goes "viral"? I've seen Lambda scale pretty well with cold start times getting progressively lower as more functions are invoked. Although, I assume this is a side effect of tenancy and not a design that scales across hardware.

How do we solve this?

I was involved in a recent Twitter conversation with Chris Munns about this and I offered the idea of pre-provisioning functions as a potential solution. He said, "we'd just prefer to do everything in our power to reduce cold start pains directly." I asked how that could be accomplished without pre-warming functions and he responded, "lots of ways to minimize the cold starts to be so small that they hardly matter." This all sounds great, and while I'd love to know what those "ways" are, I'd also really like to know the timeframe for those enhancements. I haven't gotten an answer to that question, but maybe AWS doesn't have one yet.

I certainly appreciate the conversation, and, of course, the recognition by the Lambda team that cold starts cause "pains" and are a problem. I'm still a bit concerned with the timeline and what exactly Chris' definition of "so small that they hardly matter" is. As I mentioned before, even if it is down to two seconds, that still matters for API use cases. So rather than just complaining, I figured I'd offer a solution that I believe would reduce cold start times to zero, so that they truly don't matter.

Why can't we pre-provision function capacity?

The performance of a "warm" function is absolutely amazing, so kudos on that. Even when you have hundreds of concurrent connections (and the warm functions to respond), there appears to be a negligible performance hit (other than perhaps scalability bottlenecks with other services). However, getting all those functions warmed up takes time. It also appears that even once warmed, high concurrency functions get recycled much faster than those with just a few in memory. In my testing, I've seen short bursts, e.g. 500 concurrent connections for just a few requests each, take a huge performance hit (often more than 2,000 ms in the 50th percentile). If those same 500 concurrent connections make many requests each, the latency drops to sub 50 ms up into the 99th percentile, which is amazing.

There's no doubt that Lambda scales quickly and efficiently and can certainly handle the requests thrown at it. However, short bursts of sporadic traffic, which could be caused by something like having your app mentioned on The Today Show (this happened to me, btw), would result in potentially thousands of users getting several second latency and degrading the performance of your application or service. Even daily spikes, which would be fairly easy to plan for, could result in customer frustration and suboptimal user experiences. Now perhaps APIs aren't the most important use case, but I know several companies that have put all their API eggs in the proverbial serverless basket.

The companies I work (and have worked) for pay for pretty much every other service they use in AWS. Lambda is the only service that is "pay only for the compute time you consume." Meanwhile, I still need to provision capacity for other "serverless" services like DynamoDB. I pay for read and write throughput whether I need it or not. My assumption is that in order to maintain "single-digit millisecond latency at any scale", the system needs to know ahead of time so that it can provision enough resources to handle the anticipated number of requests. DynamoDB even handles bursts by retaining the last five minutes of unused capacity. I don't think the parallels to Lambda are that hard to see.

I know I may sound like a heretic to serverless purists, but I (and the companies I work for), would be willing to pay a nominal fee to keep our functions warm. It should be completely optional of course, but if I could pre-provision Lambda functions (preferably on a time schedule), then I could eliminate my cold start problem. I think of it as "reserved capacity", where I can specify the number of concurrent invocations I want pre-warmed. If I set my capacity to 100 for a particular function and 75 concurrent invocations happened, the system would then provision an additional 75 pre-warmed functions. You'd charge me for those invocations like usual, but would continue to provision functions to maintain my reserved capacity.

This should solve both the short bursts cold start problem along with the initial latency hit of the high-concurrency/high-volume-of-requests-per-second issue. A solution like this would help companies of all sizes. For the low volume API, a reserved capacity of "2" might be enough to avoid those frustrating 10 second load times. Companies that need to use thousands of concurrent connections at sub 100ms response times (like adtech), could now look at Lambda as a viable option. And of course, there is everything in between.

Final Thoughts

Like I said earlier, you Lambda engineers have done an absolutely amazing job creating this service. While I realize that there are probably other ways to solve the cold start problem, and that you most likely have already thought of my proposed solution (wicked smaht, remember?), I (and many others) would love to see some more progress on this. I know it would make me and my clients feel better, and maybe we can silence some of the serverless critics out there while we're at it.

I'm excited to find new and creative ways that serverless infrastructures can solve real-world problems. As soon as the cold start issue is fully addressed, the possibilities become endless.

Thanks,

Jeremy Daly


Learn more about handling cold starts using best practices from AWS. Check out 15 Key Takeaways from the Serverless Talk at AWS Startup Day and Lambda Warmer: Optimize AWS Lambda Function Cold Starts.