Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CAPM3 lacks of options for bond interfaces #1466

Open
tmonguillon opened this issue Feb 21, 2024 · 10 comments
Open

CAPM3 lacks of options for bond interfaces #1466

tmonguillon opened this issue Feb 21, 2024 · 10 comments
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. triage/accepted Indicates an issue is ready to be actively worked on.

Comments

@tmonguillon
Copy link

User Story

As a [developer/user/operator] I would like to [high level description] for [reasons]
CAPM3 bond definition lacks of options for bond definition regarding what is possible with cloud-init.

Detailed Description

In Project Sylva we're maintaining a helm chart sylva-capi-cluster to create CAPI manifests definitions for clusters (not using clusterctl). We use Metal3 to provision BM hosts in our cluster and our network design recommends using bond interfaces. For some use cases we set the bond_mode to 802.3ad (LACP).

With some OSs like SuSE, the network backend (wicked) needs explicit parameters to properly configure a bond. This wicked backend is not able to build a proper bond without the parameter miimon set to a given value (even if the bonding kernel module has a default one).

Metal3 lacks of options in the bond definition [1] and there is some missing info for the network configuration sent to cloud-init. If we take a look to the cloud-init documentation, a bond definition has multiple parameters for bonding [2] [3] that could properly set a bond even on SuSE OSs with wicked.

The problem here is if we need some tuning for any type of bond, it's impossible via Metal3 to provide it to the BM node. So it would be nice to introduce a way to inject bond parameters and having a 1:1 compatibility to what parameters are exposed by cloud-init.

[1] for example here https://github.com/metal3-io/cluster-api-provider-metal3/blob/v1.6.0/baremetal/metal3data_manager.go#L1001-L1007
[2] https://cloudinit.readthedocs.io/en/latest/reference/network-config-format-v1.html#params-dictionary-of-key-value-bonding-parameter-pairs
[3] https://cloudinit.readthedocs.io/en/latest/reference/network-config-format-v2.html#bonds

Anything else you would like to add:

/kind feature

@metal3-io-bot metal3-io-bot added kind/feature Categorizes issue or PR as related to a new feature. needs-triage Indicates an issue lacks a `triage/foo` label and requires one. labels Feb 21, 2024
@hardys
Copy link
Member

hardys commented Feb 21, 2024

I would say that although the problem was encountered on a specific OS, the issue is a general one - as we have seen e.g with #708 there are several options supported by cloud-init which are not currently supported by CAPM3

Adding just e.g bond_miimon seems quite simple and reasonable, but clearly it's a somewhat larger change if we add support for every option, and it's complicated by the fact that the cloud-init implementation doesn't appear to support exactly the same options for different backends (e.g look at the NetworkManger renderer)

Also note that the cloud-init network-data format mentioned in the report is not the same format used in the Metal3/Ironic flow - in that case we're constrained by the OpenStack network_data schema which is converted inside cloud-init so any additional fields need to be compatible with that process

@tmonguillon do you think it's sufficient just to add bond_miimon support for now? Clearly there's a need for a more flexible approach in the long term, but I think it may be quite difficult given the constraints explained above.

@tmonguillon
Copy link
Author

tmonguillon commented Feb 21, 2024

@hardys bond_miimon is sufficient for now for the specific LACP/802.3ad use case. Regarding a more global point of view I understand it's hard to find a list of parameters to support regarding the various network backend. But if we take a look closer to them, NetworkManager is kind of an exception in the list of option it supports (or not) regarding the others (eni, sysconfig, systemd) which seem supporting various (all ?) bond options. I'm not sure NetworkManager is so common as default backend nowaday

@hardys
Copy link
Member

hardys commented Feb 21, 2024

Thanks for the reply @tmonguillon - in that case my suggestion is we proceed with bond_miimon as a first step, then consider additional parameters individually, that will allow some review of the backends to ensure we're not committing to an API which will be non-portable between different OS choices.

Also in the recent community meetup call we had some good discussions on a related topic - in future we may want to make the templating solution more flexible/pluggable to support other implementations in addition to cloud-init (specific example using nmstate which consumes a different DSL and configures NetworkManager)

If we do make the templating more pluggable it's likely we'll want to retain the same Metal3 APIs, so we will need to be careful to avoid adding any APIs which are coupled only to cloud-init and incompatible with other choices in the future.

@Rozzii
Copy link
Member

Rozzii commented Mar 6, 2024

/triage accepted
Just to summarize:
So as an actionable part of this someone will implement bond_miimon support but then the original problem of extending support ties into the topics discussed during the meetup (https://youtu.be/HPRGDLYoPaU?t=3447 )

@metal3-io-bot metal3-io-bot added triage/accepted Indicates an issue is ready to be actively worked on. and removed needs-triage Indicates an issue lacks a `triage/foo` label and requires one. labels Mar 6, 2024
@metal3-io-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues will close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@metal3-io-bot metal3-io-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 4, 2024
@Rozzii
Copy link
Member

Rozzii commented Jun 5, 2024

/remove-lifecycle stale

@Rozzii
Copy link
Member

Rozzii commented Jun 5, 2024

/life-cycle-frozen
/help

@Rozzii
Copy link
Member

Rozzii commented Jun 12, 2024

/remove-lifecycle stale
/help
/life-cycle-frozen

@metal3-io-bot
Copy link
Contributor

@Rozzii:
This request has been marked as needing help from a contributor.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

/remove-lifecycle stale
/help
/life-cycle-frozen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@metal3-io-bot metal3-io-bot added help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 12, 2024
@Rozzii
Copy link
Member

Rozzii commented Jun 13, 2024

/lifecycle frozen

@metal3-io-bot metal3-io-bot added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Jun 13, 2024
@mboukhalfa mboukhalfa moved this from Backlog to CAPM3 WIP in Metal3 - Roadmap Jun 27, 2024
@mboukhalfa mboukhalfa moved this from CAPM3 WIP to CAPM3 on hold / blocked in Metal3 - Roadmap Jun 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. triage/accepted Indicates an issue is ready to be actively worked on.
Projects
Status: CAPM3 on hold / blocked
Development

No branches or pull requests

4 participants