Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(CDK Garbage Collection): stack-scoped garbage collection #32799

Open
1 of 2 tasks
kaizencc opened this issue Jan 8, 2025 · 2 comments
Open
1 of 2 tasks

(CDK Garbage Collection): stack-scoped garbage collection #32799

kaizencc opened this issue Jan 8, 2025 · 2 comments
Labels
effort/medium Medium work item – several days of effort feature-request A feature should be added or improved. p2 package/tools Related to AWS CDK Tools or CLI

Comments

@kaizencc
Copy link
Contributor

kaizencc commented Jan 8, 2025

Describe the feature

We recently launched CDK Garbage Collection in CDK v2.165.0. This version of garbage collection is scoped to an individual environment (account + region) due to legacy constraints of the CDK Assets mechanism. With a modern CDK Assets, we can scope CDK Garbage Collection to each individual stack, and this will fit the mental model of CDK customers better. Additionally, it will fix a theoretical race condition that exists in CDK Garbage Collection today. See: https://github.com/aws/aws-cdk/tree/main/packages/aws-cdk#theoretical-race-condition-with-review_in_progress-stacks

Use Case

Customers who want to garbage collect assets that are managed by their CDK app, and disregard other stacks in the same account/region.

Proposed Solution


Background:

Garbage Collection was completed 10/25/2024 with the following design CDK Garbage Collection Design Doc. The main requirement for that design was that garbage collection would fit with the existing asset mechanism so that customers would be able to retroactively clean up their bootstrapped resources. While the initial Garbage Collection achieves exactly that, it comes with the following caveats:

  • Garbage Collection must be done per-environment, not per-stack or per-app. This is because all stacks in an environment share the same bootstrapped S3 bucket / ECR repository and the assets in there are virtually indistinguishable from each other.
  • Garbage Collection has a subtle race condition when dealing with REVIEW_IN_PROGRESS stacks. This again is a result of it being a per-environment operation, so it has to deal with stacks getting created in parallel.

Goal:

A better version of Garbage Collection would be one that can operate on a per-stack basis. This would have the benefit of being a much more contained scope for a delete operation.

Design:

We cannot achieve this with the current version of the asset mechanism because all assets are named via their content-based hash. This means that different stacks can share the same asset in the same environment. One stack not using a particular asset is not enough to say that the asset is isolated because other stacks could be referencing the same one.

A new asset upload mechanism would need to ensure each asset is uploaded with an identifier to the stack. That can look something like this:

/assets/MyStack/.zip

The complexity here would be that a) stacks can be renamed at deploy time, and b) nested stacks would need to be handled correctly.

For a), we would need to make sure that the stack identifier is unique to the stack and traceable back to the stack even if the stack name changes. For this, we can likely reuse template metadata to trace the name uploaded to the actual stack it represents.

For b), TBD

Migrating from old to new:

Customers migrating from the old asset mechanism would see all their assets reuploaded the first time, but there should be no problem beyond that.

Why should we do this?

This will result in a cleaner experience overall for both cdk gc and assets. In the past CDK has determined that the asset mechanism is an implementation detail but in practice customers are confused/concerned that assets are not separated out per-stack. This will align better with our customers’ understanding that CDK stacks are independent of each other. For cdk gc, the operation would take a trivial amount of time.

Why should we not do this?

We already have a system that improves on our bootstrap system, called the App Staging Synthesizer. The idea is to bootstrap resources per-stack to separate out bootstrap entirely. out bootstrap entirelWe can invest more in migrating customers to use that system that negates the need for garbage collection entirely.


Other Information

This may eventually be an RFC when we decide to pick this up. For now, if this is something you are interested, please 👍 this issue.

Acknowledgements

  • I may be able to implement this feature request
  • This feature might incur a breaking change

CDK version used

2.165.0

Environment details (OS name and version, etc.)

Mac

@kaizencc kaizencc added feature-request A feature should be added or improved. needs-triage This issue or PR still needs to be triaged. labels Jan 8, 2025
@github-actions github-actions bot added the @aws-cdk/assets Related to the @aws-cdk/assets package label Jan 8, 2025
@kaizencc kaizencc added p2 package/tools Related to AWS CDK Tools or CLI and removed @aws-cdk/assets Related to the @aws-cdk/assets package needs-triage This issue or PR still needs to be triaged. labels Jan 8, 2025
@khushail khushail added the effort/medium Medium work item – several days of effort label Jan 8, 2025
@blimmer
Copy link
Contributor

blimmer commented Jan 9, 2025

Huge 👍 on this. The fact that ECR and S3 assets in the CDK default bootstrap locations are just hashes is a really annoying UX. Reworking assets to have the stack ID is a simple solution that would be a big UX win, in addition to the gc and other improvements you mentioned.

@hoegertn
Copy link
Contributor

hoegertn commented Jan 9, 2025

Prefixing the assets with the stackname is a good idea I think, but keep in mind that might increase cost as assets are duplicated if you are deploying the same assets to different environments in the same AWS account.

Also please do not push the AppStagingSynth as the new/better way to do things as it also has several drawbacks. It is just another way to do things.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
effort/medium Medium work item – several days of effort feature-request A feature should be added or improved. p2 package/tools Related to AWS CDK Tools or CLI
Projects
None yet
Development

No branches or pull requests

4 participants