Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] App Manifests v2 #1019

Merged
merged 18 commits into from
Jan 7, 2025
Merged

[RFC] App Manifests v2 #1019

merged 18 commits into from
Jan 7, 2025

Conversation

Gerg
Copy link
Member

@Gerg Gerg commented Nov 13, 2024

Gerg added 3 commits October 29, 2024 23:13
- Update network policy section to reflect thinking about cross-space
  resource sharing
- Add fields for provisioning/updating service instances
- Add user-provided service instances
- Break service bindings into their own section. They don't fit well
  nested under service instances: Apps depend on services; services
  don't depend on apps. For "app" bindings, it might make sense to nest
  under apps, but "key" and "route" bindings don't make sense under
  apps. Instead, break bindings into a top-level node.
Gerg added 9 commits November 13, 2024 15:13
- Fix inconsistent health check configuration
- Add `scale` node to processes to organize related nodes
- To add more hierarchy and closer match API design
- Small style updates and typo fixes
- Add additional risk to declarative manifests
@Gerg Gerg marked this pull request as ready for review November 15, 2024 23:06
@beyhan beyhan added toc rfc CFF community RFC labels Nov 18, 2024
@loewenstein-sap

This comment was marked as resolved.

Copy link
Member

@maxmoehl maxmoehl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for taking the time to write this down! I really like the overall approach.

toc/rfc/rfc-draft-v2-app-manifests.md Outdated Show resolved Hide resolved
toc/rfc/rfc-draft-v2-app-manifests.md Outdated Show resolved Hide resolved
available with v2 manifests. These extensions are less-developed than the core
v2 manifest design, and may change significantly prior to implementation.

#### Merge State
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not that active on the developer side of CF so take this comment with a grain of salt.

The way the RFC is written the use-case of managing the entire space in a single manifest seems to be the default (looking at the listed issues I can see why that is), conversely managing only a sub-set of resources is supported but requires additional configuration and might not be supported initally. Does that reflect how users typically use CF (manifests)?

Intuitively CF feels very app-centred for me and for the few apps we have the manifest is checked-in together with the app code. No assumptions about the space are made and the space might be shared between multiple apps in a similar style.

Taking inspiration from Kubernetes: cf push on a full space manifest seems similar to kubectl apply of a multi-document YAML file or directory and it will only ever add & modify resources by default. Deletion is supported via --prune and has to be explicitly set.

I would prefer if we go a similar route: resource themselves are fully declarative but a resource not listed should not be deleted implicitly following the principle of least astonishment (imagine the surprise of deleting a production database upon pushing a hello-world app to the same space).

This comment was marked as outdated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thinking further about this most probably there should be a different delete strategy for resources referenced by an application like a service binding or a sidecar. They should be deleted when they are not referenced any more by the application because they are "part" of the application definition. E.g. it is painful to delete a sidecar now days because you have to go over CF APIs. There is no support in the CF CLI or app manifest for this. Would be interested to see more comments on this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm also uncertain about how aggressive the current proposal is.

My primary goal is to have something that can easily be explained. It's easier to explain "everything is declarative", vs "configuration of resources is declarative, but the set of resources themselves are not declarative, unless you pass a --prune flag", especially when the boundary between a "resource" and "configuration of a resource" is murky.

Taking inspiration from Kubernetes: cf push on a full space manifest seems similar to kubectl apply of a multi-document YAML file or directory and it will only ever add & modify resources by default. Deletion is supported via --prune and has to be explicitly set.

Yea, the big distinction for me here is that k8s breaks things into multiple documents, so there is a clear boundary between what is declarative-by-default vs what is not. I could imagine a more revolutionary manifest proposal, where we do something similar for CF:

---
kind: app
spec:
  name: my-app
---
kind: service_instance
spec:
  name: my-service-instance

The other inspiration is BOSH, which more-or-less has a single manifest that is strictly declarative.

I would prefer if we go a similar route: resource themselves are fully declarative but a resource not listed should not be deleted implicitly following the principle of least astonishment (imagine the surprise of deleting a production database upon pushing a hello-world app to the same space).

Yes, this resonates with me. I'll think about this some more. I'm eager to hear additional feedback/suggested as well.

Copy link
Member

@maxmoehl maxmoehl Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My primary goal is to have something that can easily be explained. It's easier to explain "everything is declarative", vs "configuration of resources is declarative, but the set of resources themselves are not declarative, unless you pass a --prune flag", especially when the boundary between a "resource" and "configuration of a resource" is murky.

I agree, this needs careful consideration and there would need to be a strong convention to prevent the varying behaviours we have in v1 manifests.

Taking inspiration from Kubernetes: cf push on a full space manifest seems similar to kubectl apply of a multi-document YAML file or directory and it will only ever add & modify resources by default. Deletion is supported via --prune and has to be explicitly set.

Yea, the big distinction for me here is that k8s breaks things into multiple documents, so there is a clear boundary between what is declarative-by-default vs what is not. I could imagine a more revolutionary manifest proposal, where we do something similar for CF: [...]
The other inspiration is BOSH, which more-or-less has a single manifest that is strictly declarative.

The difference to BOSH is that BOSH only has one manifest managing one resource. In that sense CF is probably closer to Kubernetes than BOSH. On the other hand moving towards Kubernetes-like primitives could also send the wrong signal to users.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tim also pointed me to this blog post, which is an interesting industry survey of declarative deletion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the blog post link! A really good summary for declarative deletion. Based on the experience shared in the blog it reads to me that a well supported imperative deletion should be our priority. Extending that to the app manifest could be done in a later point if desired by users. Regarding the imperative approach if we want to have a resource-by-resource "pruning" my understanding is that we will need something like:

cf push -f manifest.yml --prune_apps=app-1,app-2 --prune_routes=route-1,route-2

and on API side:

POST /v3/spaces/:guid/actions/apply_manifest?prune_apps=app-1,app-2&prune_routes=route-1,route-2

@Gerg Is that what you meant in your example above?

Copy link
Member Author

@Gerg Gerg Nov 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My example was inspired by kubectl apply's --prune-allowlist flag [docs]. You give it a list of the kinds of resource you want to delete. For example:

kubectl apply --prune -f manifest.yaml --all --prune-allowlist=core/v1/ConfigMap

would prune all excess ConfigMaps, but leave other kinds of resources alone.

My intention was for:

cf push -f manifest.yml --prune=apps,routes

to prune excess apps and routes, but leave other resources alone.


Regarding imperative deletion, I think we already have that well covered via the cf delete-* commands. For example, if the developer already has a list of specific routes to delete, it doesn't seem much harder to pass those routes to cf delete-route vs a --prune-routes flag.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was looking into this from the perspective where one space is managed by multiple manifests and then the approach with cf push -f manifest.yml --prune=apps,routes doesn't fit so well but I can see that for the use case you described above:

The motivating usecase that we're designing around is an upstream continuous deployment system that assembles desired state upstream, builds a space-level manifest, and applies it to CF, ideally without requiring imperative commands. We're probably over-fitting this design to that vision.

it makes sense.

I agree with:

Regarding imperative deletion, I think we already have that well covered via the cf delete-* commands.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I substantially updated the declarative resource deletion section in 17db5d1, based on this thread. Let me know what you think.

toc/rfc/rfc-draft-v2-app-manifests.md Show resolved Hide resolved
available with v2 manifests. These extensions are less-developed than the core
v2 manifest design, and may change significantly prior to implementation.

#### Merge State

This comment was marked as outdated.

@Gerg
Copy link
Member Author

Gerg commented Nov 18, 2024

cc

  • @cloudfoundry/wg-app-runtime-interfaces-capi-approvers (primary affected group)
  • @cloudfoundry/wg-app-runtime-interfaces-cli-approvers (CLI manifest integration)
  • @cloudfoundry/wg-app-runtime-interfaces-java-tools-approvers (cf java client has manifest support)
  • @cloudfoundry/wg-app-runtime-platform-networking-approvers (due to proposed networking/routing changes)
  • @cloudfoundry/wg-service-management-cloud-service-broker-approvers (due to proposed service changes)

Feel free to cc any other WG areas y'all think are affected by this RFC.

@Gerg

This comment was marked as resolved.

@beyhan beyhan requested review from a team, rkoster, beyhan, stephanme, ameowlia and ChrisMcGowan and removed request for a team November 19, 2024 09:56
Gerg added 3 commits November 25, 2024 15:47
- Wikipedia article was low quality (per their own standards)
- Based on PR feedback
- Add list of additional references
- Remove disclaimer about tolorating hyphenated keys
- This was due to a misunderstanding about the original RFC. The
  per-route options are NOT opaque to CC.
- Update `docker` lifecycle app to use consistent access_credentials
  format
@Gerg Gerg requested a review from loewenstein-sap November 26, 2024 23:13
@IvanBorislavovDimitrov
Copy link

IvanBorislavovDimitrov commented Dec 2, 2024

Hello CF colleagues,

I’m writing on behalf of the MultiApps project, part of WG App Runtime Interfaces, team at SAP.

Our service focuses on managing the lifecycle of microservices (Cloud Foundry apps), service instances, routes, bindings, and keys. These microservices are referred to as multitarget applications, which share many similarities with the current RFC.

In our approach, applications, service instances, and routes are defined in a deployment descriptor file called mtad.yaml. The applications themselves are packaged into a single file, ((MTA)).mtar, which adheres to the JAR specification.

The service operates in a relatively declarative manner: when redeploying, it removes applications that have been deleted from the descriptor. However, service instances that are no longer part of the descriptor are not removed unless the user explicitly specifies the --delete-service-instances option in the CLI.

The service consists of a client application and server application:

Client: https://github.com/cloudfoundry/multiapps-cli-plugin

Server: https://github.com/cloudfoundry/multiapps-controller

Docs: https://help.sap.com/docs/btp/sap-business-technology-platform/multitarget-applications-in-cloud-foundry-environment

Examples: https://github.com/SAP-samples/cf-mta-examples

We think both App Manifest v2 and MultiApps Controller have pros and cons:

MultiApps Controller

Pros Cons
The deployment is partially transactional. You can retry a failed process after fixing a bug. Another service in CF -> operations effort when is part of CF
Grouping the resources for the different MTA application in the space Adds another layer of abstraction (MTA)
Values between different apps and service can be shared between them Overhead of zipping unzipping and transferring app binaries
Deployment history
Inspect a single deployed MTA or all deployed in the space or namespace
Zero downtime update of multiple apps via blue-green deployment and rolling blue-green deployment
Support for fail safe service instances
Option to disable and enable service instances during deployment with the same MTA
Versioning of the deployed MTA
Parallel, sequential, ordered processing of apps and service
Shared variables between different MTA applications
Option to keep existing app related entities after deployment, like bindings or routes
Service keys recreation
Support for namespaces. Apps and Services in one space can be separated
Default variables for domains, route, etc.
Hooks. A way to execute CF task in different parts of the deployment process
Tasks support during deployment
Managing lifecycle of space scoped service brokers
Partial deployment. Deploy only some parts of the descriptor
Logging. Detailed logs of the deployment process can be downloaded and are stored for 3 days
Dynamic parameters resolution. The guid of just created service instance can be passed as argument for another
Customer defined option to skip service updates, like offering, plan, parameters

App Manifests V2

Pros Cons
Closer to the CF V3 API like naming conventions. Native design The app manifests seem more like space manifests
Verbose and native description No versioning, no way to list what was deployed with a single manifest
Support for most of the CF features Already deployed apps, service in space might be deleted when applying manifest
Metadata for each resource
Shared resources

Our questions and concerns

1.	Will service bindings and service keys be recreated when applying a manifest? Many customers expect service key recreation as part of the process.

2.	Are service instances updated during Manifest application, including parameters, plans, or offerings? Does the manifest cover the entire lifecycle of service instances? If so, are service instance parameters consistently updated?

3.	Is there a plan to introduce fail-safe resources? These would be valuable for customers using the same manifest across different infrastructures or Cloud Foundry environments.

4.	Will a history of applied manifests and deployment logging be introduced? Are there plans to support troubleshooting, such as reviewing failed deployments within a space?

5.	What will the deployment flow look like? For example, will it resemble Kubernetes, where the descriptor file triggers background operations, or App Manifest V1, where users see the progress of each push in real time?

6.	Are there plans to introduce rollback mechanisms for apps or services? This could help mitigate downtime caused by deployment failures.

7.	How will resource grouping be managed? Will customers be able to differentiate which resources were applied with an App Manifest? Or will a single App Manifest be confined to use within one space?

8.	What happens if an App Manifest is applied to a space with existing apps and services? Will the manifest override or remove the pre-existing resources?

9.	Can one manifest be applied across multiple spaces? For instance, could a manifest define a random route and still be reusable across different spaces?

10.	Will each manifest describe a single business application? If so, does this imply that separate manifests are needed for different stages of the delivery process, such as staging, testing, and production?

11.	Will a user need a constant internet connection throughout the entire deployment process? For example, are app binaries uploaded first to minimize dependency on the connection during later stages?

12.	What is the intended approach for handling droplets? Should users build them manually, or can they download existing droplets from Cloud Foundry and reuse them?

13.	Will there be an option to skip updates for service instances? For instance, if a user changes the plan or parameters, can they opt to skip updating the plan?

14.	Will users be able to push a service broker using App Manifests v2?

15.	Is there a way to delete all resources pushed by a single app manifest from a space in one step?

16.	Are there plans to introduce rolling updates or blue-green deployment strategies?

17.	What is the order in which resources are applied? For example, are services created first, followed by apps and then bindings? Can users control the order of resource deployment, such as specifying which application or service should be created or updated before others?

@Gerg
Copy link
Member Author

Gerg commented Dec 4, 2024

Thank you for your thorough analysis of v2 manifests vs MultiApps.

v2 manifests are intended to be a refresh for the existing v1 manifest interface and capabilities: an API-native way to apply bulk configuration. They are not intended as a replacement for MultiApps. We're not planning to introduce the advanced packaging and deployment configuration that is possible with MultiApps. Devs that want the features of MultiApps can and should continue to use them.

I'm not familiar with the implementation of MultiApps, but maybe there are some opportunities for the MultiApps server to leverage v2 manifests under the hood.


Regarding the questions/concerns, a lot of these details are specific and would need to be worked out during development. In general, I would expect the behavior to be similar to v1 manifests, when applicable. Don't take the following as definitive, but these are my first pass at answering the questions.

  1. Will service bindings and service keys be recreated when applying a manifest? Many customers expect service key recreation as part of the process.

Bindings will behave the same as v1 manifests: no re-create. Same for keys. This could be a good value-add for MultiApps.

  1. Are service instances updated during Manifest application, including parameters, plans, or offerings? Does the manifest cover the entire lifecycle of service instances? If so, are service instance parameters consistently updated?

SIs are only updated if plan or parameters change.

  1. Is there a plan to introduce fail-safe resources? These would be valuable for customers using the same manifest across different infrastructures or Cloud Foundry environments.

Not currently. This is probably another good example of a MultiApps advanced feature.

  1. Will a history of applied manifests and deployment logging be introduced? Are there plans to support troubleshooting, such as reviewing failed deployments within a space?

No

  1. What will the deployment flow look like? For example, will it resemble Kubernetes, where the descriptor file triggers background operations, or App Manifest V1, where users see the progress of each push in real time?

Manifest schema version won't affect how deployments work. This will be the same as with v1 manifests (for better or worse).

  1. Are there plans to introduce rollback mechanisms for apps or services? This could help mitigate downtime caused by deployment failures.

Only what exists currently (via cf cancel-deployment and cf rollback).

  1. How will resource grouping be managed? Will customers be able to differentiate which resources were applied with an App Manifest? Or will a single App Manifest be confined to use within one space?

Manifests only affect a single space. The only way they could figure out what is provided by manifests is by looking at audit events.

  1. What happens if an App Manifest is applied to a space with existing apps and services? Will the manifest override or remove the pre-existing resources?

We are actively discussing this above, but we're leaning towards making pruning opt-in for each resource.

  1. Can one manifest be applied across multiple spaces? For instance, could a manifest define a random route and still be reusable across different spaces?

This will be the same as v1 manifests. You can write a manifest that can be applied to different spaces, but it's not guaranteed to work.

  1. Will each manifest describe a single business application? If so, does this imply that separate manifests are needed for different stages of the delivery process, such as staging, testing, and production?

We don't have an opinion on this. Manifests bulk-apply configuration to a space. We leave it up to users and other tooling to be opinionated about how apps and spaces are managed.

  1. Will a user need a constant internet connection throughout the entire deployment process? For example, are app binaries uploaded first to minimize dependency on the connection during later stages?

This will be the same as with v1 manifests. It will be across multiple requests (manifest vs bits upload), but the bulk of the configuration is applied asynchronously.

  1. What is the intended approach for handling droplets? Should users build them manually, or can they download existing droplets from Cloud Foundry and reuse them?

We were only considering the cf download-droplet case, but if users find a way to build droplets outside of CF, that's a cool use case.

  1. Will there be an option to skip updates for service instances? For instance, if a user changes the plan or parameters, can they opt to skip updating the plan?

We will apply all configuration that is present in the manifest.

  1. Will users be able to push a service broker using App Manifests v2?

We hadn't considered registering space-scoped brokers, but that is a possible future extension. We wouldn't handle other broker registrations though, since they are a global resource.

  1. Is there a way to delete all resources pushed by a single app manifest from a space in one step?

This will be the same as v1 manifest. We won't be tracking the manifest post-application, so no.

  1. Are there plans to introduce rolling updates or blue-green deployment strategies?

There are already rolling deploys (& canaries) built in to CF. There may be additional deployment strategies in the future, but that wouldn't be related to the manifest schema.

  1. What is the order in which resources are applied? For example, are services created first, followed by apps and then bindings? Can users control the order of resource deployment, such as specifying which application or service should be created or updated before others?

This isn't defined right now. We'd need to figure out the proper order when implementing. Like in v1, users won't have fine-grained control over resource creation/deploy order.

@IvanBorislavovDimitrov
Copy link

Thank you for answering all the questions!

regarding: SIs are only updated if plan or parameters change.

How will cases where a service broker adds additional “default” parameters to the creation parameters be handled? For example, some brokers return extra parameters after a service instance is created, making the comparison between the parameters in the manifest and the actual parameters always different.

Will Cloud Foundry handle scenarios where certain service creation parameters cannot be provided during an update? For instance, what happens if the broker fails in such cases?

@beyhan
Copy link
Member

beyhan commented Dec 6, 2024

Hi @IvanBorislavovDimitrov

I think I can also answer the last two question:

How will cases where a service broker adds additional “default” parameters to the creation parameters be handled? For example, some brokers return extra parameters after a service instance is created, making the comparison between the parameters in the manifest and the actual parameters always different.

The Cloud Controller will try to update the service instance if a plan or parameters changed in the manifest definition. If there are no changes for a SI Cloud Controller won't try to do anything. Additionally, there is no way for the Cloud Controller to know that an added parameter to the manifest definition is a default one and also in this case a SI update will be triggered.

Will Cloud Foundry handle scenarios where certain service creation parameters cannot be provided during an update? For instance, what happens if the broker fails in such cases?

The RFC answers this question here: https://github.com/cloudfoundry/community/pull/1019/files#diff-adfdd3cef870d4a5d339bc9f879ad6f7cee5ef16931049dc6bb46973540c6cf9R169-R174. If a resource creation fails the deployment will fail but there won't be an automatic rollback for already created resources.

@beyhan
Copy link
Member

beyhan commented Dec 11, 2024

@Gerg Going over the discussions it looks to me that most of them could be resolved. Would be great if you could resolve the discussion where we have an agreement and update the RFC if needed. Next TOC meeting on Tuesday we could start the Final Period for Comment if it looks good.

multi-word keys. Schema version 2 keys should only use underscores, to match
the v3 API convention.

### Behavior
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question Any thoughts on whether v2 manifests mimic the behaviour of v1 where you may have to restart/restage the app in order for configuration to take effect (e.g. update env vars)? I can see it being annoying to apply a v2 manifest and then have to take an imperative action (restart) to actually see changes applied.

Copy link
Member Author

@Gerg Gerg Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That will probably remain the same. That gives users control over how they want to restart their app ("hard" restart vs restage vs deployment).

We can patch over that in clients, to give a more singular experience (e.g. cf push --manifest /path/to/manifest.yml --strategy=rolling.

1. Update `POST /v3/spaces/:guid/actions/apply_manifest` to accept v2 manifests
1. Update `POST /v3/spaces/:guid/manifest_diff` to accept v2 manifests

#### CLI
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought If there are multiple manifests applied to the same space and I just want to remove my stuff, should we support something like cf delete-manifest? (Which would only delete resources by type/name that are defined in the manifest)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can call them "anti-manifests".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same manifest 😄 . Thinking about the parity to the k8s world of k apply / k delete. Also, since if we're going to support pruning resources then the manifest piece will already have delete logic.


#### CLI

1. Update `cf push` and `cf apply-manifest` to accept v2 manifests
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought Maybe too detailed for this proposal, but it seems the existing cf apply-manifest does display a diff but does theres no way to wait for user confirmation to apply changes - it just applies changes. In this manifest v2 world, if its possible some actions are destructive we will definitely need that.

Copy link
Member Author

@Gerg Gerg Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That may be considered a breaking change. For example, in a continuous delivery pipeline that starts hanging/failing because an app with a v2 manifest triggers an interactive prompt. We may need to reserve the prompt behavior (or v2 manifest support in general) for a new major version of the CLI.

Gerg added 2 commits December 12, 2024 14:09
- Top-level resources are no longer deleted by default
- Based on PR feedback, deleting resource by default was too risky
- Add section describing opt-in resource deletion (pruning)
@beyhan
Copy link
Member

beyhan commented Dec 17, 2024

We discussed this RFC during the TOC meeting on 17th of December and decided to start the FCP with the goal to accept this RFC during the next TOC meeting which is planed to be on 7th of January.

toc/rfc/rfc-draft-v2-app-manifests.md Outdated Show resolved Hide resolved
- To match new named used in v1 manifests

Co-authored-by: Maximilian Moehl <[email protected]>
@beyhan beyhan requested a review from Samze January 7, 2025 15:39
@beyhan beyhan merged commit a0c916f into cloudfoundry:main Jan 7, 2025
1 check passed
beyhan added a commit that referenced this pull request Jan 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
rfc CFF community RFC toc
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.