Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate to aws sdk v2 #397

Merged

Conversation

michael-diggin
Copy link
Contributor

@michael-diggin michael-diggin commented Oct 20, 2024

Is this a bug fix or adding new feature?
Neither, changes all uses of aws-sdk-go to aws-sdk-go-v2 - resolves #385.

What is this PR about? / Why do we need it?
As mentioned in #385, aws-sdk-go-v1 won't be receiving any more updates. Migrating to v2 allows for using newer features from FSx (eg AutoExportPolicy as mentioned in #255, this is something I need and is my motivation for this PR).

What testing is done?
make test

I've aimed to make as few changes to the interfaces as possible, hoping it's just a 'drop-in' replacement, following https://aws.github.io/aws-sdk-go-v2/docs/migrating/
Summary of changes

  1. aws-sdk-go -> aws-sdk-go-v2 and added aws-sdk-go-v2/config, aws-sdk-go-v2/service/fsx, aws-sdk-go-v2/feature/ec2/imds to go.mod.
  2. Session is replaced by Config
  3. FSx types moved to aws-sdk-go-v2/service/fsx/types, and enums are now used rather than strings in many of the structs.
  4. Many FSx fields were changed from int64 to int32. (This did mean I've changed the types of some of the fields in some of the public APIs eg pkg/cloud/cloud.go FileSystem, alternatively they could be kept as int64 and the uses could be converted to int32 where needed.)
  5. ec2metadata is replaced by ec2/imds
  6. ec2/imds in v2 doesn't have an Available() function, however in v1 all that did was check GetMetadata, so I've implemented Available() here following the same logic.

Copy link

linux-foundation-easycla bot commented Oct 20, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: michael-diggin / name: Michael Diggin (7a144b4)

@k8s-ci-robot k8s-ci-robot added the cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. label Oct 20, 2024
@k8s-ci-robot
Copy link
Contributor

Welcome @michael-diggin!

It looks like this is your first PR to kubernetes-sigs/aws-fsx-csi-driver 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/aws-fsx-csi-driver has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Oct 20, 2024
@k8s-ci-robot
Copy link
Contributor

Hi @michael-diggin. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Oct 20, 2024
pkg/cloud/cloud.go Show resolved Hide resolved
pkg/cloud/cloud.go Outdated Show resolved Hide resolved
@jacobwolfaws
Copy link
Contributor

Hi @michael-diggin, Thanks for posting this PR! Can you squash the commits into a single commit?
/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Oct 21, 2024
@michael-diggin
Copy link
Contributor Author

Hi @michael-diggin, Thanks for posting this PR! Can you squash the commits into a single commit? /ok-to-test

Hey @jacobwolfaws, thanks for the quick review! I've squashed the two commits.

@jacobwolfaws
Copy link
Contributor

/approve
Since this is a rather large PR, going to find a second reviewer to this, but this seems good to me. My largest concern was any changes being made to user experience in terms of yaml files, but not seeing anything here. Thanks again!

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 22, 2024
Copy link
Member

@torredil torredil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR largely lgtm, thank you for the contribution.

pkg/util/util.go Outdated
Comment on lines 94 to 95
func roundUpSize(volumeSizeBytes int64, allocationUnitBytes int64) int32 {
return int32((volumeSizeBytes + allocationUnitBytes - 1) / allocationUnitBytes)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can silently overflow with large inputs here.

Casting in several different places is rather confusing and increases the likelihood of bugs. If the expected output from this function is int32, can we make sure that the inputs are also int32?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, it was a bit confusing working on it.
The inputs should be int64 however, as they are volume sizes in bytes and provided as that type from the CSI API. The output is the number of allocationUnitBytes needed to store volumeSizeBytes (rounded up) and so will always fit in an int32 based on the set of inputs provided in this file. It could overflow if someone provided a very small number of bytes as an input, the smallest that is given is 1200 * 1024 * 1024 * 1024.
I've added a comment and altered the function name as well to hopefully make it more clear, let me know if you'd like me to have another go.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack - Thanks for letting us know about that pain point. I generally agree that we should make the minimal set of changes for the migration and not blow up the scope of this PR; maintainers of this repo ought to be aware of this and help with cleaning up the casting situation to prevent regressions in the future.

TLDR - CSI spec expects and returns values in int64, for example:

int64 volume_capacity_bytes = 2;

but the SDK's expected input/output might be of typeint32 for some attributes.

In other words, we need to work towards a state where minimal casting is done - preferably, all inputs are in int32 where needed, then a single cast to int64 is performed in a centralized place. That should be safe because int32 is implicitly convertible to int64- every 32 bit int can be expressed as a 64 bit int, so there is no risk of silent overflows, etc.

cc: @jacobwolfaws

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, thanks for the extra detail!
Just thinking, we could leave everything in RoundUpVolumeSize in int64, there's only two uses of that function (https://github.com/kubernetes-sigs/aws-fsx-csi-driver/blob/master/pkg/driver/controller.go#L213 and https://github.com/kubernetes-sigs/aws-fsx-csi-driver/blob/master/pkg/driver/controller.go#L395). I could change it there such that they cast to int32 only in those locations, and even check if the number to cast doesn't exceed Math.MaxInt32, with an error if it is?

pkg/cloud/metadata_ec2.go Outdated Show resolved Hide resolved
@michael-diggin
Copy link
Contributor Author

PR largely lgtm, thank you for the contribution.

Thanks for the review, I've replied to your comments and made a small update. Let me know what you think when you get another chance to review it.

@torredil
Copy link
Member

/approve

Code looks good for migration. @michael-diggin @jacobwolfaws please manually test the scenario where IMDS is not available to validate that there are no regressions in expected behavior before this is released. The driver should successfully fall back to retrieving the needful information from k8s API without crashing.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jacobwolfaws, michael-diggin, torredil

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [jacobwolfaws,torredil]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jacobwolfaws
Copy link
Contributor

Hi @michael-diggin @torredil, just catching up to this thread.
@michael-diggin
can you squash these into a single commit (or two if you feel like there's a natural separation)
Will try to test with imds disabled in the next couple of days so that we can get this merged, thanks for all your work!

@michael-diggin
Copy link
Contributor Author

@michael-diggin can you squash these into a single commit (or two if you feel like there's a natural separation) Will try to test with imds disabled in the next couple of days so that we can get this merged, thanks for all your work!

Done, thanks for offering to test it!

@jacobwolfaws
Copy link
Contributor

Confirmed that this falls back to kubernetes api successfully when imds is disabled (also tested dynamic provisioning)

I1031 14:14:00.407477       1 driver.go:61] "Driver Information" Driver="fsx.csi.aws.com" Version="vsdk2"
I1031 14:14:00.407582       1 node.go:60] "[Debug] Retrieving node info from metadata service"
I1031 14:14:00.407602       1 metadata.go:72] "retrieving instance data from ec2 metadata"
I1031 14:14:03.533898       1 metadata.go:75] "ec2 metadata is not available"
I1031 14:14:03.533921       1 metadata.go:83] "retrieving instance data from kubernetes api"
I1031 14:14:03.534376       1 metadata.go:88] "kubernetes api is available"
I1031 14:14:03.541869       1 node.go:66] "regionFromSession Node service" region=""
I1031 14:14:03.543952       1 mount_linux.go:282] Detected umount with safe 'not mounted' behavior
I1031 14:14:03.544330       1 driver.go:127] "Listening for connections" address="/csi/csi.sock"
I1031 14:14:03.547718       1 node.go:347] "No taints to remove on node, skipping taint removal"
I1031 14:14:04.837509       1 node.go:243] "NodeGetInfo: called" args={}

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 31, 2024
@k8s-ci-robot k8s-ci-robot merged commit 65519d8 into kubernetes-sigs:master Oct 31, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

switch to aws-sdk-go-v2 as v1 is going away
4 participants