Skip to content
This repository has been archived by the owner on Jul 25, 2024. It is now read-only.

Latest commit

 

History

History
44 lines (27 loc) · 2.12 KB

2021-05-rollback.md

File metadata and controls

44 lines (27 loc) · 2.12 KB

Rollback for Cloud Run uses Traffic Splitting

Context & Problem Statement

If we detect a problem in a deployment, we need a way to undo the deployment. The language in this space is inconsistent, so we want to use "rollback" as our baseline, but define what that means and how we'll execute it.

In the event of a bad deployment, we want to be able to revert to previous service configuration & codebase.

Priorities & Constraints

  • Get back to a running system quickly
  • Minimize risk of introducing new bugs
  • Managing rollbacks on data schemas or underlying infrastructure is important but lower priority.

Considered Options

  • Option 1: Redeploy a known-good container with previously used configuration
  • Option 2: Route traffic to previous known-good revision

Decision

Chosen option [Option 2]: Re-route Traffic

Rollbacks, gradual rollouts, and traffic migration defines straightforward commands that can be used to redirect traffic for a service to a revision or known, mutable "tag".

This will allow a relatively easy decision to send traffic to a known working revision without the overhead or risk of rebuilding a release artifact or retrieving and reapplying configuration.

This is easy for the whole team to reason about.

Expected Consequences

  • Not all hosting platforms have this capability, which means if we choose to use other hosting platforms in the future, this rollback definition may be more complicated to implement.
  • We are not addressing state, schema management, or infrastructure configuration. When we do, the traffic routing command will need to be wrapped with more complex logic.
  • Service-level Cloud Run configuration such as labels will not be reverted as part of traffic routing.

Links