Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

provider-aws-ecs: taskDefinition is not updated #1061

Closed
denzhel opened this issue Jan 3, 2024 · 9 comments
Closed

provider-aws-ecs: taskDefinition is not updated #1061

denzhel opened this issue Jan 3, 2024 · 9 comments
Labels
bug Something isn't working is:triaged Indicates that an issue has been reviewed.

Comments

@denzhel
Copy link

denzhel commented Jan 3, 2024

What happened?

I've created a task definition and then updated one the container-definition values and expected crossplane to create a new revision, instead it tries to modify the existing revision and fails with this error:

 Message:               async update failed: refuse to update the external resource: cannot change the value of the argument "container_definitions" from "[{\"cpu\":10,\"environment\":[],\"essential\":true,\"image\":\"service-first\",\"memory\":512,\"mountPoints\":[],\"name\":\"first\",\"portMappings\":[],\"volumesFrom\":[]}]" to "[{\"cpu\":11,\"essential\":true,\"image\":\"service-first\",\"memory\":512,\"name\":\"first\"}]"

How can we reproduce it?

  1. Apply the following file using kubectl apply -f task.yaml:
apiVersion: ecs.aws.upbound.io/v1beta1
kind: TaskDefinition
metadata:
  name: dennis
spec:
  forProvider:
    containerDefinitions: |-
      [
        {
          "name": "first",
          "image": "service-first",
          "cpu": 10,
          "memory": 512,
          "essential":true
        }
      ]
    family: dennis
    region: us-central-1
  providerConfigRef:
    name: "dev-eu-central-1-cs-account"
  1. Change cpu to 20
  2. Re-apply the file using kubectl apply -f task.yaml
  3. Run kubectl describe taskdefinition.ecs.aws.upbound.io/dennis
  4. See the error

What environment did it happen in?

  • Crossplane Version: 1.14.0
  • Provider Version: 0.47.0
  • Kubernetes Version: v1.26.11-eks-8cb36c9
  • Kubernetes Distribution: EKS
@denzhel denzhel added bug Something isn't working needs:triage labels Jan 3, 2024
@turkenf
Copy link
Collaborator

turkenf commented Jan 7, 2024

Hi @denzhel, thank you for raising this issue.

The issue can be reproduced with the information provided, I also get the same error when I add a second definition block.

Additionally, when I try with provider version 0.43, I get the following error:

    message: 'observe failed: cannot run plan: plan failed: Instance cannot be destroyed:
      Resource aws_ecs_task_definition.sample-task1 has lifecycle.prevent_destroy
      set, but the plan calls for this resource to be destroyed. To avoid this error
      and continue with the plan, either disable lifecycle.prevent_destroy or reduce
      the scope of the plan using the -target flag.'
    reason: ReconcileError
    status: "False"

@turkenf turkenf added is:triaged Indicates that an issue has been reviewed. and removed needs:triage labels Jan 7, 2024
@denzhel
Copy link
Author

denzhel commented Jan 7, 2024

Hi @turkenf ,

I see that this issue was already known in v0.38.0. #823

Is there an ETA for a fix ?

@jeanduplessis
Copy link
Collaborator

@denzhel

We've discussed this issue, and we don't feel we should allow the Terraform layer to replace (destroy & recreate the external resource with the updated cpu (core) count) due to the immutable fields policy of the XRM.

It is suggested that you delete the corresponding MR and recreate a new MR with the updated parameters as a path forward to configure an existing TaskDefinition.ecs with an updated cpu count, if needed.


Going forward we should be properly treating this field as an immutable field, as captured by the following two issues:

We can also:

  • Add a reference to the XRM explaining the rationale for blocking Terraform from replacing the resource when the cpu count is updated
  • Acknowledge the missing immutability feature, e.g., generating the CRD validation rules to block updates to the forceNew fields (in TF plugin SDK) or the "requires replace" plan modifiers.

@denzhel
Copy link
Author

denzhel commented Jan 8, 2024

@denzhel

We've discussed this issue, and we don't feel we should allow the Terraform layer to replace (destroy & recreate the external resource with the updated cpu (core) count) due to the immutable fields policy of the XRM.

It is suggested that you delete the corresponding MR and recreate a new MR with the updated parameters as a path forward to configure an existing TaskDefinition.ecs with an updated cpu count, if needed.


Going forward we should be properly treating this field as an immutable field, as captured by the following two issues:

We can also:

  • Add a reference to the XRM explaining the rationale for blocking Terraform from replacing the resource when the cpu count is updated

  • Acknowledge the missing immutability feature, e.g., generating the CRD validation rules to block updates to the forceNew fields (in TF plugin SDK) or the "requires replace" plan modifiers.

Hi @jeanduplessis ,

Thank you for the reply !

This behavior happens not only with cpu field but with all fields. You can check the same with the image field where we wanted to bump the version when updating the micro service version.

I"m not sure how to implement your suggestion as we use CRD, Compositions and Claims. The claim is templated using Helm and ArgoCD.

Help will be appreciated.

@mbbush
Copy link
Collaborator

mbbush commented Jan 8, 2024

This behavior happens not only with cpu field but with all fields.

This makes sense, as an ECS task definition is supposed to be immutable once created.

One trick you could try is to make the metadata.name field of the managed resource dynamic in your composition. One way to do that might be hashing all the properties in spec.forProvider and using the hash as part of the name.

@denzhel
Copy link
Author

denzhel commented Jan 8, 2024

Hi @mbbush,

So I have this claim:

apiVersion: bigpanda.io/v1alpha1
kind: EcsCluster
metadata:
  name: {{ printf "%s-%s" .Values.global.env .Values.ecs.name }}
  annotations:
    crossplane.io/external-name: {{ printf "%s-%s" .Values.global.env .Values.ecs.name }}
spec:
  taskDefinition:
    containerDefinitions: {{ .Values.ecs.taskDefinition.containerDefinitions | toJson | quote }}

This composition:

apiVersion: apiextensions.crossplane.io/v1
kind: Composition
metadata:
  name: ecscluster
spec:
  compositeTypeRef:
    apiVersion: bigpanda.io/v1alpha1
    kind: CompositeEcsCluster
  resources:
    # creation of the task definition
    - base:
        apiVersion: ecs.aws.upbound.io/v1beta1
        kind: TaskDefinition
        spec:
          forProvider: {}
          providerConfigRef:
            name: ""
      patches:
        # create an hash from the containerDefinition object
        - type: ToCompositeFieldPath
          fromFieldPath: spec.taskDefinition.containerDefinitions
          toFieldPath: status.containerDefinitionsHash
          transforms:
            - type: string
              string:
                type: Convert
                convert: "ToSha256"
         # define the name of the object
        - type: CombineFromComposite
          combine:
            variables:
              - fromFieldPath: spec.name
              - fromFieldPath: status.containerDefinitionsHash
            strategy: string
            string:
              fmt: "%s-task-%s"
          toFieldPath: metadata.name

Added this to the XRD:

status:
              containerDefinitionsHash:
                type: string

But I cannot get the hashed value to appear inside the composition.
I don't see it in the status block:

k get composite dev-us-east-1-inbound-alert-persistence-v4hsz -o json | jq .status
{
  "clusterArn": "arn:aws:ecs:us-east-1:123456:cluster/dev-us-east-1-inbound-alert-persistence-cluster",
  "clusterName": "dev-us-east-1-inbound-alert-persistence-cluster",
  "conditions": [
    {
      "lastTransitionTime": "2024-01-08T20:18:41Z",
      "reason": "ReconcileSuccess",
      "status": "True",
      "type": "Synced"
    },
    {
      "lastTransitionTime": "2024-01-08T20:21:07Z",
      "reason": "Available",
      "status": "True",
      "type": "Ready"
    }
  ],
  "taskArn": "arn:aws:ecs:us-east-1:123456:task-definition/dev-us-east-1-inbound-alert-persistence-task:1"
}

@denzhel
Copy link
Author

denzhel commented Jan 8, 2024

HI @mbbush ,

I was able to get it working but now I have another issue.
When I first create the resources using the claim, two tasks are created:

  1. Without the hash
  2. With the hash
    I expected both of them to include a hashed value, different hashed value.
k get managed
NAME                                                                                                                                              READY   SYNCED   EXTERNAL-NAME                                  AGE
taskdefinition.ecs.aws.upbound.io/dev-us-east-1-inbound-alert-persistence-jx7kc-x9rp4                                                             True    True     dev-us-east-1-inbound-alert-persistence-task   12s
taskdefinition.ecs.aws.upbound.io/dev-us-east-1-inbound-alert-persistence-task-cb1dfba189efa313836effc0489847dfe17b5478b5413643a301396be8a3b1ab   True    True     dev-us-east-1-inbound-alert-persistence-task   12s

@denzhel
Copy link
Author

denzhel commented Jan 9, 2024

Hi @jeanduplessis and @turkenf ,

Help would be much appreciated <3

@denzhel
Copy link
Author

denzhel commented Jan 10, 2024

Wanted to share that I was able to solve the above using this in the claim file:

containerDefinitionsHash: {{ sha256sum (.Values.ecs.taskDefinition.containerDefinitions | toJson | quote) }}

and addition to the CRD

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working is:triaged Indicates that an issue has been reviewed.
Projects
None yet
Development

No branches or pull requests

4 participants