Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🌱 Support for ClusterClass template #1405

Merged

Conversation

p-strusiewiczsurmacki-mobica
Copy link
Contributor

What this PR does / why we need it:

This PR adds support for ClusterClass API as discussed in #1267

Which issue(s) this PR fixes:
Fixes #1267

@metal3-io-bot metal3-io-bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jan 16, 2024
@metal3-io-bot
Copy link
Contributor

Hi @p-strusiewiczsurmacki-mobica. Thanks for your PR.

I'm waiting for a metal3-io member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@metal3-io-bot metal3-io-bot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Jan 16, 2024
@p-strusiewiczsurmacki-mobica p-strusiewiczsurmacki-mobica marked this pull request as draft January 16, 2024 14:38
@metal3-io-bot metal3-io-bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 16, 2024
@p-strusiewiczsurmacki-mobica
Copy link
Contributor Author

Converting this to draft as I am yet trying to create testing environment for this change.

@Rozzii
Copy link
Member

Rozzii commented Jan 31, 2024

/cc @lentzi90 @zaneb @Rozzii @hardys FYI

@metal3-io-bot
Copy link
Contributor

@Rozzii: GitHub didn't allow me to request PR reviews from the following users: FYI.

Note that only metal3-io members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @lentzi90 @zaneb @Rozzii @hardys FYI

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@Rozzii
Copy link
Member

Rozzii commented Jan 31, 2024

/cc @kashifest

@Rozzii Rozzii requested a review from kashifest January 31, 2024 11:47
@metal3-io-bot metal3-io-bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jan 31, 2024
@lentzi90
Copy link
Member

lentzi90 commented Feb 1, 2024

/ok-to-test

@metal3-io-bot metal3-io-bot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Feb 1, 2024
Copy link
Member

@lentzi90 lentzi90 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! I think the code is looking good already but I'm a bit confused about the kustomizations under examples...
Why is it split up in cluster, controlplane and machinedeployment? I would expect to have one kustomization with everything needed for the ClusterClass (Metal3MachineTemplates, Metal3ClusterTemplate, KubeadmControlPlaneTemplate, KubeadmConfigTemplates, and also Metal3DataTemplates and IPPools). Then the ClusterClass kustomization would be "self-contained" and we can have a separate cluster-template to create clusters based on this class. What do you think?

Comment on lines 102 to 107
---
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
name: ${CLUSTER_NAME}
namespace: ${NAMESPACE}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cluster-template should be separate from the ClusterClass IMO, and we should not rely on the CLUSTER_NAME in the ClusterClass at all. That way it will be possible to reuse the ClusterClass for multiple clusters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed that. I previously left the old names as it was easier for me to make clusterclass example similar to the non-clusterclass one as possible.
Can you check if this is what you've meant?

@@ -94,14 +97,26 @@ ENVSUBST="${SOURCE_DIR}/envsubst-go"
curl --fail -Ss -L -o "${ENVSUBST}" https://github.com/a8m/envsubst/releases/download/v1.2.0/envsubst-"$(uname -s)"-"$(uname -m)"
chmod +x "$ENVSUBST"

SRC_DIR="${SOURCE_DIR}"
REORDER_TYPE="--reorder=legacy"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mainly --reorder=none is required for clusterclass example. By default kustomization will use reorder=legacy and will reorder definitions in the outcome file alphabetically. Now, if the definitions are ordered and ClusterClass will be in the outcome file above the Metal3 definitions (e.g. Metal3DataTemplate), during deletion with make delete-examples-clusterclass the operation will fail as k8s will first delete ClusterClass and associated Metal3 objects and will try to delete Metal3 objects once again. At least that's what's happening to me, hence reored=none was added for ClusterClass example and reorder=legacy was added for non-ClusterClass example generation.

However, I've deleted this reorder=legacy and now only reorder=none will be used and only for ClusterClass.

g.Expect(c.Spec).To(Equal(Metal3ClusterTemplateSpec{}))
}

func TestMetal3ClusterTemplateValidation(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add one test for when the template is invalid also?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done :)

@p-strusiewiczsurmacki-mobica p-strusiewiczsurmacki-mobica marked this pull request as ready for review February 5, 2024 17:52
@metal3-io-bot metal3-io-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 5, 2024
@p-strusiewiczsurmacki-mobica
Copy link
Contributor Author

Added changes required by @lentzi90

Additionally added some changes in the Tiltfile and other scripts.
Now, to test with Tilt below commands should be sufficient:

make tilt-settings-clusterclass
make tilt-up
make generate-examples-clusterclass
make deploy-examples-clusterclass

make titl-settings-clusterclass will now generate settings that should enable ClusterTopology in Cluster API.

Additionally, I was able to change the example available in metal3-dev-env and to deploy and provision cluster using ClusterClass, so I'm moving this from draft.

If anyone wants to try to test it with metal3-dev-env here's my branch with the changes.. Cluster should be provisioned if make test will be executed, but test itself will fail as I don't know how to enable ClusterTopology in the 'internal' cluster yet.

Copy link
Member

@lentzi90 lentzi90 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not worry about the tilt setup here honestly. If you want to work on it, fine, but isn't it easier to just use CAPI's directly?
The template and ClusterClass that you have here will work great with that already. The only thing you have to do is to add the folder to template_dirs in the tilt settings.

examples/clusterclass/class/class.yaml Outdated Show resolved Hide resolved
Comment on lines 11 to 12
host: ${CLUSTER_APIENDPOINT_HOST}
port: ${CLUSTER_APIENDPOINT_PORT}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should make these variables in the ClusterClass or it will be impossible to create more than one cluster from it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Added variables and patches to ClusterClass and referenced those from Cluster itself. Seems to work fine regarding to my tests.
Just one question - currently Metal3ClusterTemplate is validated by the webhook for controlPlaneEndpoint definition, meaning something has always be defined there. Should this validation be omitted and we should allow empty definition (without specifying controlPlaneEndpoint`)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understood the question. I think the controlPlaneEndpoint is always required. How would the cluster work without it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently Metal3ClusterTemplate is required to have controlPlaneEndpoint defined, e.g:

apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: Metal3ClusterTemplate
metadata:
  name: example-cluster-template
spec:
  template:
    spec:
      controlPlaneEndpoint:
        host: 127.0.0.1
        port: 6443
      noCloudProvider: true

But I think that makes it not reusable, right? Do we want to accept something like this (with no controlPlaneEndpoint):

apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: Metal3ClusterTemplate
metadata:
  name: example-cluster-template
spec:
  template:
    spec:
      noCloudProvider: true

And then controlPlaneEndpoint should be defined in Cluster resource, I think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok now I understand!
I think the way to handle this is to use variables in the ClusterClass. So set it like you already did in the Metal3ClusterTemplate, but then add a variable and patch to override it. Then it would be possible to create clusters like

apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  name: my-cluster
spec:
  ...
  topology:
    class: example-clusterclass
    version: v1.29.0
    variables:
    - name: controlPlaneEndpointHost
      value: 192.168.0.100

examples/clusterclass/class/class.yaml Outdated Show resolved Hide resolved
examples/clusterclass/class/class.yaml Outdated Show resolved Hide resolved
Comment on lines 170 to 173
checksum: ${IMAGE_CHECKSUM}
checksumType: ${IMAGE_CHECKSUM_TYPE}
format: ${IMAGE_FORMAT}
url: ${IMAGE_URL}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These would also be nice to set through ClusterClass variables, but more as a nice to have.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

examples/clusterclass/cluster/cluster.yaml Outdated Show resolved Hide resolved
@@ -44,10 +44,27 @@ get_latest_release() {
CAPIRELEASEPATH="${CAPIRELEASEPATH:-https://api.github.com/repos/${CAPI_BASE_URL:-kubernetes-sigs/cluster-api}/releases}"
export CAPIRELEASE="${CAPIRELEASE:-$(get_latest_release "${CAPIRELEASEPATH}" "v1.3.")}"

# ClusterClass enable flag
CLUSTERCLASS_ENABLE="${CLUSTERCLASS:-}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use the same variable as CAPI does, so this should be CLUSTER_TOPOLOGY.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This way of generating manifests with kustomize and envsubst doesn't seem to really give us anything useful from what I can see. I'm fine having it here is if it useful to you but I wonder if we should not just put the ClusterClass and cluster-template in templates instead? No kustomize needed (we are anyway not doing anything useful with it) and they would be easily discovered and useful. It also matches exactly what CAPI has for CAPD.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've deleted customization files and moved clusterclass.yaml and cluster.yaml to examples/templates/.

hack/gen_tilt_settings.sh Outdated Show resolved Hide resolved
clusterConfiguration:
apiServer:
extraArgs:
cloud-provider: baremetal
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this cloud provider?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be CCM, and looks like a leftover before the rename (baremetal=>metal3)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about what this does honestly. I think we can remove it. We don't use it in the tests at least: https://github.com/metal3-io/cluster-api-provider-metal3/blob/main/test/e2e/data/infrastructure-metal3/bases/cluster/cluster-with-kcp.yaml

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, removed all references to cloud-provider: baremetal.

Makefile Outdated Show resolved Hide resolved
examples/generate.sh Outdated Show resolved Hide resolved
examples/generate.sh Outdated Show resolved Hide resolved
metadata:
name: example-md-metadata
spec:
clusterName: example
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too bad we have the clusterName on the Metal3DataTemplate and IPPool 😞
I guess this makes it quite hard to use them from the ClusterClass? Would it make more sense to have them separate or in the cluster.yaml?

@kashifest
Copy link
Member

This would need squashing of the commits I suppose, @furkatgofurov7 PTAL

@furkatgofurov7
Copy link
Member

This would need squashing of the commits I suppose, @furkatgofurov7 PTAL

@kashifest I approved it already, had a small comment but this is not a blocker for this PR.

And yes, the last 2 commits could be squashed into 1, preferably.

@metal3-io-bot metal3-io-bot added the needs-rebase Indicates that a PR cannot be merged because it has merge conflicts with HEAD. label Apr 11, 2024
@metal3-io-bot
Copy link
Contributor

metal3-io-bot commented Apr 11, 2024

@p-strusiewiczsurmacki-mobica: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
markdownlint 4552910 link true /test markdownlint

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@metal3-io-bot metal3-io-bot removed the needs-rebase Indicates that a PR cannot be merged because it has merge conflicts with HEAD. label Apr 11, 2024
@p-strusiewiczsurmacki-mobica
Copy link
Contributor Author

@furkatgofurov7

And yes, the last 2 commits could be squashed into 1, preferably.

I've left 2 commits now. Is it OK, or should squash it into 1?

Also, I've just rebased the branch as there were some conflicts. Trying to run e2e tests to check if everything is correct just now.

@p-strusiewiczsurmacki-mobica
Copy link
Contributor Author

p-strusiewiczsurmacki-mobica commented Apr 11, 2024

Well, both this branch and main branch just kills my VM when first node is being provisioned. Need to investigate that.

EDIT: Apart from that, the clusterclass and cluster resource are created correctly, so I think the PR itself should be OK.

@furkatgofurov7
Copy link
Member

I've left 2 commits now. Is it OK, or should squash it into 1?

I am fine with as is (2 commits).

@furkatgofurov7
Copy link
Member

/test-centos-e2e-integration-main
/test-ubuntu-integration-main

1 similar comment
@furkatgofurov7
Copy link
Member

/test-centos-e2e-integration-main
/test-ubuntu-integration-main

@furkatgofurov7
Copy link
Member

furkatgofurov7 commented Apr 12, 2024

@kashifest tests are not triggered, is this a known issue or triggers have changed?

@kashifest
Copy link
Member

kashifest commented Apr 15, 2024

@kashifest tests are not triggered, is this a known issue or triggers have changed?

Yeah its a known issue, CI admins are working on it, tests are getting triggered but status is not getting updated on github.

@metal3-io-bot metal3-io-bot added the needs-rebase Indicates that a PR cannot be merged because it has merge conflicts with HEAD. label Apr 18, 2024
@LingyanCao
Copy link

LingyanCao commented Apr 27, 2024

May I know when could this PR be merged and when is the next release that contains this change? @p-strusiewiczsurmacki-mobica
Our product has the ClusterClass dependency of capm3. I want to raise this risk if there is schedule.
Thank you in advance!

@kashifest
Copy link
Member

May I know when could this PR be merged and when is the next release that contains this change? @p-strusiewiczsurmacki-mobica Our product has the ClusterClass dependency of capm3. I want to raise this risk if there is schedule. Thank you in advance!

I think this is ready to go in , pending a rebase

spec:
joinConfiguration:
nodeRegistration:
name: '{{ ds.meta_data.hostname }}'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't work for me(virtualized env). Shouldn't it be '{{ ds.meta_data.local_hostname }}' or '{{ ds.meta_data.name }}'?. The question is also, what is the difference between local_hostname and name because I see both here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can fix it in a followup or leave it out completely from the template in a followup PR. @p-strusiewiczsurmacki-mobica would you be interested in a followup PR? For now, we take this PR in

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kashifest I'll try to take a look at this as soon as I can. However if anyone wants to work on this before I'll be able to, then go ahead. :)

Co-authored-by: Lennart Jern <[email protected]>
Co-authored-by: Furkat Gofurov <[email protected]>
Signed-off-by: Patryk Strusiewicz-Surmacki <[email protected]>
Signed-off-by: Patryk Strusiewicz-Surmacki <[email protected]>
@metal3-io-bot metal3-io-bot removed the needs-rebase Indicates that a PR cannot be merged because it has merge conflicts with HEAD. label Apr 30, 2024
@p-strusiewiczsurmacki-mobica
Copy link
Contributor Author

May I know when could this PR be merged and when is the next release that contains this change? @p-strusiewiczsurmacki-mobica Our product has the ClusterClass dependency of capm3. I want to raise this risk if there is schedule. Thank you in advance!

I think this is ready to go in , pending a rebase

Hi, I've just rebased it :)

@kashifest
Copy link
Member

/lgtm

@metal3-io-bot metal3-io-bot added the lgtm Indicates that a PR is ready to be merged. label Apr 30, 2024
@kashifest
Copy link
Member

/test metal3-centos-e2e-integration-test-main
/test metal3-ubuntu-e2e-integration-test-main

@metal3-io-bot metal3-io-bot merged commit e948801 into metal3-io:main Apr 30, 2024
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement the ClusterClass API
8 participants