Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support for externally managed control plane #106

Merged
merged 6 commits into from
Nov 4, 2024

Conversation

prometherion
Copy link
Contributor

@prometherion prometherion commented Feb 9, 2024

Issue #95

Description of changes:

Supporting empty Control Plane endpoint when ProxmoxCluster is used by an externally managed Control Plane.

The ProxmoxCluster controller will wait for a valid IP before proceeding to mark the infrastructure ready.

Testing performed:

N.A.

Copy link

sonarqubecloud bot commented Feb 9, 2024

Quality Gate Failed Quality Gate failed

Failed conditions
20.0% Coverage on New Code (required ≥ 80%)

See analysis details on SonarCloud

@prometherion
Copy link
Contributor Author

Before providing more test coverage, may I ask for a simple review of the proposed changes, from a workflow perspective?

It's not clear to me if the maintainers are open to skipping the validation of the ProxmoxCluster.spec.controlPlaneEnpdoint field.

@avorima
Copy link
Collaborator

avorima commented Feb 19, 2024

Hi @prometherion, thank you for your contribution. Skipping validation for optional fields is fine, but please make sure your changes don't alter the validation behavior of other fields. Users should see errors as early as possible.
The controller checks for empty control plane endpoint should be done after IPAM was reconciled, so that they're not waiting for the external process to set the control plane endpoint.

Does Kamaji set the port and address separately? I'm asking, because my understanding was that the endpoint is always written in full or not at all. This would mean that you could merge the two conditions into one MissingControlPlaneEndpoint.

@wikkyk wikkyk added this to the v0.4.0 milestone Feb 22, 2024
@prometherion
Copy link
Contributor Author

Does Kamaji set the port and address separately?

Kamaji is doing in a single transaction, yes, I can uniform those checks.

@mcbenjemaa
Copy link
Member

Anything needed here?

@prometherion
Copy link
Contributor Author

Thanks for the heads up @mcbenjemaa, I'm planning to work on this to make the PR ready for review, by the end of the week or the following one.

Pretty busy days, sorry.

@mcbenjemaa
Copy link
Member

take your time mate

@mcbenjemaa
Copy link
Member

@prometherion
Are there any updates on this, or can I try it

@prometherion
Copy link
Contributor Author

Finally, I'm revamping it, feeling sorry for being late @mcbenjemaa.

Let me know if we're ready to get this tested.

@mcbenjemaa
Copy link
Member

can you provide a use-case to test it with Kamaji?

Copy link
Collaborator

@wikkyk wikkyk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is close to done. The core is there but I'm not sold on the details.. This could use some test cases which would've uncovered the inconsistency I pointed out inline.

However, I don't think I like the 'magic' behaviour of empty host and(?) port meaning externally managed. Someone could omit host/port by accident without actually intending to use Kamaji or so. This needs very clear documentation. (Technically, this doesn't actually make the fields optional (-:)

Personally, I would prefer an optional bool field in ProxmoxClusterSpec like ControlPlaneEndpointExternallyManaged and to require that either ControlPlaneEndpointExternallyManaged or ControlPlaneEndpoint is set. This would make the intent clear.

That said, I would be fine with just clearly, explicitly documenting that setting host="" and port=0 means we'll wait for an externally managed endpoint. The check in proxmoxcluster_controller.go would need to be exactly the same as in the validation func i.e. host=="" && port==0.

internal/webhook/proxmoxcluster_webhook.go Outdated Show resolved Hide resolved
Comment on lines 98 to 107
// Skipping the validation of the Control Plane endpoint in case of an empty value:
// this is the case of externally managed Control Plane which eventually provides the LB.
if ep.Host == "" && ep.Port == 0 {
return nil
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the case where someone accidentally doesn't provide the control plane endpoint?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although outdated, we're keeping up with Cluster conditions.

@@ -94,6 +95,11 @@ func (*ProxmoxCluster) ValidateUpdate(_ context.Context, _ runtime.Object, newOb

func validateControlPlaneEndpoint(cluster *infrav1.ProxmoxCluster) error {
ep := cluster.Spec.ControlPlaneEndpoint
// Skipping the validation of the Control Plane endpoint in case of an empty value:
// this is the case of externally managed Control Plane which eventually provides the LB.
if ep.Host == "" && ep.Port == 0 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could use some test cases.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, default Port is 6443.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Outdated, with the new flag no need anymore for this check.

@@ -168,6 +168,22 @@ func (r *ProxmoxClusterReconciler) reconcileNormal(ctx context.Context, clusterS
// If the ProxmoxCluster doesn't have our finalizer, add it.
ctrlutil.AddFinalizer(clusterScope.ProxmoxCluster, infrav1alpha1.ClusterFinalizer)

cpe := clusterScope.ControlPlaneEndpoint()
switch {
case cpe.Host == "":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only checks host but not port, yet the validation func checks both.

@wikkyk wikkyk added this to the v0.6.0 milestone Jun 10, 2024
@prometherion
Copy link
Contributor Author

However, I don't think I like the 'magic' behaviour of empty host and(?) port meaning externally managed. Someone could omit host/port by accident without actually intending to use Kamaji or so. This needs very clear documentation. (Technically, this doesn't actually make the fields optional (-:)

Documentation is a great place, of course, wondering if we could implement this kind of check in a different way.

The Cluster API has a contract for an externally managed control plane thanks to the status key externalManagedControlPlane.

When a user is open to using Kamaji, or any other Control Plane provider which is satisfying that status key contract, we could skip the validation requiring a filled ControlPlaneEndpoint struct. With this, we're creating more guardrails for users that otherwise could shoot themselves in the foot.

@prometherion
Copy link
Contributor Author

@mcbenjemaa I just fixed the broken generated files, an e2e would be cool despite it seems a bit flaky according to the latest runs.

I can try also to provide a small recorded smoke test by showing the integration with Proxmox and Kamaji: unfortunately, providing a proper test is a bit complicated since the dependencies between the moving parts.

Copy link

sonarqubecloud bot commented Sep 5, 2024

@mcbenjemaa
Copy link
Member

@mcbenjemaa I just fixed the broken generated files, an e2e would be cool despite it seems a bit flaky according to the latest runs.

I can try also to provide a small recorded smoke test by showing the integration with Proxmox and Kamaji: unfortunately, providing a proper test is a bit complicated since the dependencies between the moving parts.

Can you share with me the manifest used to provision a proxmox cluster?

Copy link

sonarqubecloud bot commented Nov 4, 2024

@mcbenjemaa
Copy link
Member

@prometherion, we are happy to add support for this,
can you help us document how to provide a cluster template and the technical considerations
for the Kamaji CP provider?

Copy link
Member

@mcbenjemaa mcbenjemaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wikkyk wikkyk merged commit c8cc3a4 into ionos-cloud:main Nov 4, 2024
8 checks passed
@wikkyk
Copy link
Collaborator

wikkyk commented Nov 4, 2024

Finally! Thank you for your contribution! :)

@prometherion
Copy link
Contributor Author

@mcbenjemaa absolutely, it's on my todo list!

@prometherion prometherion deleted the issues/95 branch November 7, 2024 13:09
@prometherion
Copy link
Contributor Author

Sharing also here just a reference to get this working with Kamaji as externally managed Control Plane

apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  name: proxmox-quickstart
  namespace: default
spec:
  clusterNetwork:
    pods:
      cidrBlocks:
      - REDACTED/REDACTED
  controlPlaneRef:
    apiVersion: controlplane.cluster.x-k8s.io/v1alpha1
    kind: KamajiControlPlane
    name: proxmox-quickstart
  infrastructureRef:
    apiVersion: infrastructure.cluster.x-k8s.io/v1alpha1
    kind: ProxmoxCluster
    name: proxmox-quickstart
---
apiVersion: controlplane.cluster.x-k8s.io/v1alpha1
kind: KamajiControlPlane
metadata:
  name: proxmox-quickstart
  namespace: default
spec:
  dataStoreName: default
  addons:
    coreDNS: { }
    kubeProxy: { }
  kubelet:
    cgroupfs: systemd
    preferredAddressTypes:
    - InternalIP
  network:
    serviceType: LoadBalancer
  deployment:
  replicas: 2
  version: 1.29.7
---
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha1
kind: ProxmoxCluster
metadata:
  name: proxmox-quickstart
  namespace: default
spec:
  allowedNodes:
  - pve
  dnsServers:
  - REDACTED
  - REDACTED
  externalManagedControlPlane: true
  ipv4Config:
    addresses:
    - REDACTED-REDACTED
    gateway: REDACTED
    prefix: REDACTED
---
apiVersion: cluster.x-k8s.io/v1beta1
kind: MachineDeployment
metadata:
  name: proxmox-quickstart-workers
  namespace: default
spec:
  clusterName: proxmox-quickstart
  replicas: 2
  selector:
    matchLabels: null
  template:
    metadata:
      labels:
        node-role.kubernetes.io/node: ""
    spec:
      bootstrap:
        configRef:
          apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
          kind: KubeadmConfigTemplate
          name: proxmox-quickstart-worker
      clusterName: proxmox-quickstart
      infrastructureRef:
        apiVersion: infrastructure.cluster.x-k8s.io/v1alpha1
        kind: ProxmoxMachineTemplate
        name: proxmox-quickstart-worker
      version: v1.29.7
---
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha1
kind: ProxmoxMachineTemplate
metadata:
  name: proxmox-quickstart-worker
  namespace: default
spec:
  template:
    spec:
      disks:
        bootVolume:
          disk: scsi0
          sizeGb: REDACTED
      format: qcow2
      full: true
      memoryMiB: REDACTED
      network:
        default:
          bridge: REDACTED
          model: virtio
      numCores: REDACTED
      numSockets: REDACTED
      sourceNode: pve
      templateID: REDACTED
---
apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
kind: KubeadmConfigTemplate
metadata:
  name: proxmox-quickstart-worker
  namespace: default
spec:
  template:
    spec:
      joinConfiguration:
        nodeRegistration:
          kubeletExtraArgs:
            provider-id: proxmox://'{{ ds.meta_data.instance_id }}'
      users:
      - name: root
        sshAuthorizedKeys:
        - REDACTED

I wasn't able to let worker nodes join the Control Plane mostly because I'm working on a kind environment, and Proxmox VE deployed in a Virtual Machine via Vagrant.

But overall everything's look good from the clusterctl describe command, sic:

NAME                                                         READY  SEVERITY  REASON                       SINCE  MESSAGE                                                       
Cluster/proxmox-quickstart                                   True                                          7m30s                                                                 
├─ClusterInfrastructure - ProxmoxCluster/proxmox-quickstart  True                                          7m43s                                                                 
├─ControlPlane - KamajiControlPlane/proxmox-quickstart                                                                                                                           
└─Workers                                                                                                                                                                        
  └─MachineDeployment/proxmox-quickstart-workers             False  Warning   WaitingForAvailableMachines  7m43s  Minimum availability requires 2 replicas, current 0 available
    ├─Machine/proxmox-quickstart-workers-ft5tv-gfxf8         False  Error     VMProvisionFailed            35s    1 of 2 completed                                               
    └─Machine/proxmox-quickstart-workers-ft5tv-gjnmf         False  Info      PoweringOn                   5m14s  1 of 2 completed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants