Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: changed the default value of nrext64 to 0 to avoid unstable feature of nrext64 #295

Open
wants to merge 4 commits into
base: develop
Choose a base branch
from

Conversation

Sparksssj
Copy link
Contributor

@Sparksssj Sparksssj commented Dec 5, 2024

Issue number:

Closes bottlerocket-os/bottlerocket#4281

Description of changes:
A new PR for #251, see discussion there.

Testing done:

Manual testing done using the exact setup as original issue mentioned , see the nrext64 value changed to 0 as expected.
Also, tested config files consuming behavior and made sure the program would not crash when receive a bad default config file. It would just continue with the default value given in the xfs_mkfs.c file.

With correct config files:

bash-5.1# xfs_info /dev/nvme2n1
meta-data=/dev/nvme2n1           isize=512    agcount=4, agsize=7629394 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=1
         =                       reflink=1    bigtime=1 inobtcount=1 nrext64=0
data     =                       bsize=4096   blocks=30517576, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=16384, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
bash-5.1#
bash-5.1#
bash-5.1# umount /dev/nvme2n1
bash-5.1# mkfs.xfs /dev/nvme2n1
Parameters parsed from default config file /usr/share/xfs/mkfs.xfs.conf successfully
mkfs.xfs: /dev/nvme2n1 contains a mounted filesystem

Without correct config files:

bash-5.1# mount | grep ephemeral
/dev/nvme2n1 on /mnt/.ephemeral type xfs (rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,noquota)
bash-5.1# xfs_info /dev/nvme2n1
meta-data=/dev/nvme2n1           isize=512    agcount=4, agsize=7629394 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=1
         =                       reflink=1    bigtime=1 inobtcount=1 nrext64=1
data     =                       bsize=4096   blocks=30517576, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=16384, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
bash-5.1#
bash-5.1# umount /dev/nvme2n1
bash-5.1# mkfs.xfs /dev/nvme2n1
/usr/share/xfs/mkfs.xfs.conf: Unrecognised input on line 6. Aborting.
mkfs.xfs: /dev/nvme2n1 contains a mounted filesystem

The program would abort the parse default config function and keep other functions work well.
We can always track if default config file works by using

bash-5.1# umount /dev/nvme2n1
bash-5.1# mkfs.xfs /dev/nvme2n1

And see if there's a log message showing

Parameters parsed from default config file /usr/share/xfs/mkfs.xfs.conf successfully

This log message is expected every time we call mkfs.xfs, with or without manual CLI config file provided.
Terms of contribution:

By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.

@Sparksssj Sparksssj requested a review from yeazelm December 5, 2024 00:01
@@ -16,6 +16,9 @@ Source100: config-bottlerocket
Source220: neuron-sysinit.target.drop-in.conf
Source221: [email protected]

# XFS configuration
Source300: mkfs.xfs-5_10.conf
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'd like to keep the name of these files consistent on the packages. Since they are all in different packages, its fine to name them all mkfs.xfs.conf

Comment on lines 33 to 35
+ FILE *default_config_file = fopen(DEFAULT_CONFIG_PATH, "r");
+ if (default_config_file != NULL) {
+ fclose(default_config_file);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels a bit odd to fopen() the file only for checking if it exists, then close it, and finally parse it. Would it make more sense to implement similar logic to cfgfile_parse() but not error out completely if the file isn't present/parsable?

Copy link
Contributor

@yeazelm yeazelm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Did you confirm on aws-k8s-1.25 that using the ephemeral storage commands no longer have issues like in bottlerocket-os/bottlerocket#4281?

+ error, cli->cfgfile);
+ }
+ } else {
+ printf(_("Parameters parsed from default config file %s successfully\n"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you see this output in the journal when it ran successfully? Do we expect this to be seen on every boot or are the mkfs.xfs calls for the XFS backed data partitions not emitting logs?

Copy link
Contributor Author

@Sparksssj Sparksssj Dec 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment 1: I did make confirmation on aws-k8s-1.25 with size i3en.2xlarge and corresponding userdata which is exactly what user mentioned in original issue.
Comment 2: Yes. We do see this output when this's succeed. And these logs will not be seen directly in EC2 console. And can be seen if you run

umount /dev/nvme2n1
mkfs.xfs /dev/nvme2n1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so it sounds like you did validate that this code fixes the issue in bottlerocket-os/bottlerocket#4281 but just on aws-k8s-1.24 instead of aws-k8s-1.25 which should work the same here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I did both 1.24 and 1.25 tests, both works.

Copy link
Contributor

@yeazelm yeazelm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done some offline discussion here around testing but this looks good to me!

@@ -83,6 +86,12 @@ Summary: Modules for the Linux kernel
%description modules
%{summary}.

%package mkfs-confs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestions to improve the dependencies:

  1. Require the parent package so that the installed configs match the installed kernel.
  2. Provide a generic capability (using parens, by convention) that other packages can key off of
  3. Conflict with any other package providing the same capability

Goal is to make it so that only one mkfs-confs package can be installed.

Suggested change
%package mkfs-confs
%package mkfs-confs
Requires: %{name}
Provides: %{_cross_os}kernel(mkfs-confs)
Conflicts: %{_cross_os}kernel(mkfs-confs)

However, that said - I would not make this a separate subpackage, but instead add that Provides and Conflicts to the parent kernel package.

Comment on lines +1 to +5
# V5 features that were the mkfs defaults when the upstream Linux 5.10 LTS
# kernel was released at the end of 2020.

[inode]
nrext64=0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really the only feature of interest that's worth pinning? It seems a bit sparse.

Copy link
Contributor Author

@Sparksssj Sparksssj Dec 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, this is indeed the only feature we care about to fix the original issue. Unless you have more thoughts and we can sure add more in other PRs -- I still think we should only keep this one here because this is the one related to the issue.

Comment on lines +19 to +21
Requires: (%{_cross_os}kernel-5.10-mkfs-confs if %{_cross_os}kernel-5.10)
Requires: (%{_cross_os}kernel-5.15-mkfs-confs if %{_cross_os}kernel-5.15)
Requires: (%{_cross_os}kernel-6.1-mkfs-confs if %{_cross_os}kernel-6.1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please just require a generic capability that any kernel package can implement. Otherwise it will be challenging to bring a custom kernel of a different version.

Suggested change
Requires: (%{_cross_os}kernel-5.10-mkfs-confs if %{_cross_os}kernel-5.10)
Requires: (%{_cross_os}kernel-5.15-mkfs-confs if %{_cross_os}kernel-5.15)
Requires: (%{_cross_os}kernel-6.1-mkfs-confs if %{_cross_os}kernel-6.1)
Requires: (%{_cross_os}kernel(mkfs-confs)

Comment on lines +29 to +32
+
+/* This function is similar to cfgfile_parse() and specifically parses
+ * default config files. It doesn’t exit on failure.
+ */
Copy link
Contributor

@bcressey bcressey Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function seems almost identical to cfgfile_parse except the exit(1) has been removed and the message is changed in a trivial way.

Rather than duplicate the function, I would just leave the message unmodified (it will obviously point to the default config file) and add a check around exit(1) like this:

if (strcmp(cli->cfgfile, DEFAULT_CONFIG_PATH) != 0)
    exit(1);

The problem with the copy-paste-tweak approach to patching is that if the upstream version of the function ever changes, your copy-paste version won't pick up those changes, and the patch is very likely to still apply cleanly.

By modifying the existing function in a careful way, you are much more likely to see the patch fail to apply on newer versions if there's been a significant change, which is what you want as it will signal the need to revisit the downstream alteration.

Comment on lines +74 to +83
- cfgfile_parse(&cli);
+
+ if (cli.cfgfile == NULL) {
+ /* No user config file specified, try default config */
+ cli.cfgfile = DEFAULT_CONFIG_PATH;
+ cfgfile_parse_default(&cli);
+ } else {
+ /* User specified their own config file, use that */
+ cfgfile_parse(&cli);
+ }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should all just be:

if (cli.cfgfile == NULL) {
    cli.cfgfile = DEFAULT_CONFIG_PATH;
}

cfgfile_parse(&cli);

That minimizes the changes to this function while still providing the fallback behavior you're trying to add.

Comment on lines +3 to +5

[inode]
nrext64=0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we keep this as the default of 1 on 6.1? If I recall correctly, isn't that when the default of 1 was introduced?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is. But it will show EXPERIMENTAL Large extent counts feature in use. Use at your own risk!. Since we are not very confident about that so we just set it 0 for now. The purpose for this change is to conveniently change it to 1 when needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

XFS Ephemeral Storage fails to mount with: Superblock has unknown incompatible features (0x20) enabled.
4 participants