provide "slow ramp time" to protect new app instances from overloading #463

metskem · 2025-01-23T15:27:41Z

Proposed Change

As a developer I would like to have a "slow ramp time" for app instances that have just become healthy. The behaviour is 100% similar to the F5 slow ramp time.
The gorouter should know the uptime of individual app instances, and for instances that just became healthy, it should not send the full share of load to it immediately, but rather slowly increase the request rate over a period specified by the slow_ramp_time.

This should provide better survivability for new app instances that need to warm up first (jit compile code, warm up backend connection pools etc..) before they provide good enough response times.

Acceptance criteria

A common scenario these days is the following:

We have a Java app running 10 instances by default. The app uses the App Autoscaler to increase the number of instances during high load periods.
However, when additional app instances have become healthy, they immediately get their share of the load which they cannot or hardly handle because Java code needs to be jitted first, connection pools initialized and so on, this results in excessive response times and outages for some customers, or the instance becomes completely unresponsive and CF kills it because the health-check times out.
As a result some teams decide to not use this dynamic scaling but just deploy the maximum number of instances all the time, which adds up to the costs.

A similar scenario is when an instance crashes (for whatever reason), it gets automatically restarted but keeps on crashing once the full load comes in again. Only drastically increasing the health-check timeout can help (but that requires a full redeployment of the app).

If the gorouter would be able to gradually increase the load to new instances over the given ramp time (~ 10-30 secs), then we expect these new instances to survive and provide better average response times and/or fewer slow responses.

It could be implemented as a regular gorouter (boshrelease) configuration, or (even better) as a per-route option.

Related links

No response

peanball · 2025-01-24T10:41:01Z

Playing devil's advocate a little here, this sounds like a workaround for an issue in the app.

Please note that you can also use some of the existing Gorouter features to implement something like this on your end.

Gorouter will transparently retry requests that were rejected by the backend on another backend.

And finally, there is now the ready check in addition to the health check. When an app reports as ready, its route will be registered. You could use that to mark your app ready and not-ready in increasing intervals while it starts up.

metskem · 2025-01-24T13:40:00Z

Thanks for the quick response.
Although I agree that the app itself can arrange these things, we think it's better not to bother all our developers with it, they should focus on writing functional (business) code, and preferably the platform should handle these kind of things.
The suggestion about the readiness check is a good one. I will suggest that to them.

metskem added the enhancement label Jan 23, 2025

cf-foundation-community-automation bot added this to Application Runtime Platform Working Group Jan 23, 2025

cf-foundation-community-automation bot moved this to Inbox in Application Runtime Platform Working Group Jan 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

provide "slow ramp time" to protect new app instances from overloading #463

provide "slow ramp time" to protect new app instances from overloading #463

metskem commented Jan 23, 2025

peanball commented Jan 24, 2025

metskem commented Jan 24, 2025

provide "slow ramp time" to protect new app instances from overloading #463

provide "slow ramp time" to protect new app instances from overloading #463

Comments

metskem commented Jan 23, 2025

Proposed Change

Acceptance criteria

Related links

peanball commented Jan 24, 2025

metskem commented Jan 24, 2025