Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor generation of required environment variables in module files + deprecate make_module_req_guess method in EasyBlock class #4653

Open
wants to merge 57 commits into
base: 5.0.x
Choose a base branch
from

Conversation

lexming
Copy link
Contributor

@lexming lexming commented Sep 23, 2024

Goal is to lay groundwork to be able to fix #3331 and add new option to control which search path variables are added in modules by simplifying the code injecting environment variables to module files.

This PR should not alter the resulting module files to what is currently generated by EB. If it changes the list of environment variables it should have no effect (e.g. because those paths are empty).

Simplification of logic handling environment variables for module files:

  1. replace obscure rules defining which search paths require populated dirs with:
  2. replace obscure rules defining which search paths require files in their top level directory with:
    • by default all search paths to directories can be populated at any level (recursive check)
    • this setting is now controlled by top_level_file attribute of ModuleEnvironmentVariable
  3. replace convoluted rules to handle symlinked lib64 that generate duplicate paths and need explicit exceptions with:
    • paths to lib64 are always ignored if lib64 is a symlink to lib
    • paths to lib are always ignored if lib is a symlink to lib64
    • remove lib32 from list of library search paths
  4. allow simple updates of environment variables in child EasyBlocks, replacing the clunky:
    def make_module_req_guess(self):
        guesses = super(EB_Clang, self).make_module_req_guess()
        guesses.update({'SOME_VAR': ['some/path]})
        return guesses
    
    with an elegant:
    self.module_load_environment.SOME_VAR = ['some/path']
    

Changelog:

  • add ModuleEnvironmentVariable pseudo-dataclass to easybuild.tools.modules to hold definitions of environment variables for modules
  • add ModuleLoadEnvironment singleton class to easybuild.tools.modules to hold environment definition for modules on load
  • deprecate EasyBlock.make_module_req_guess in favor of ModuleLoadEnvironment
  • add LibSymlink enum to easybuild.framework.easyblock to define possible linkage states of lib directories
  • add install_lib_symlink attribute to EasyBlcok to keep track of symlink status between lib dirs
  • add globals to easybuild.tools.config that define known directories for binaries, libraries and headers
  • avoid changing the current working directory of the main process in easyblock.make_module_req by working with absolute paths instead

@lexming lexming added the EasyBuild-5.0 EasyBuild 5.0 label Sep 23, 2024
@lexming lexming marked this pull request as draft September 23, 2024 14:46
@lexming lexming marked this pull request as ready for review September 24, 2024 06:15
@lexming
Copy link
Contributor Author

lexming commented Sep 24, 2024

@boegel look all the beautiful green tests!

Compared to what I showed on the meeting yesterday, I have undone all changes to make_module_req_guess. So existing easyblocks continue to work without any further changes.

This seems ready on my side. I tested it with several easyblocks that add custom stuff to make_module_req_guess and I have hit no issues so far. Even things like root.py easyblock that set PYTHONPATH in make_module_req_guess instead of the more common make_module_extra do work. If you have anything in mind that I should test in particular let me know.



# singleton metaclass: only one instance is created
BaseModuleEnvironment = create_base_metaclass('BaseModuleEnvironment', Singleton, object)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lexming Using a singleton here (so a single instance that is shared across all EasyBlock instances) seems like a bad idea to me...

If EasyBuild installs two things in the same session (via --robot, or by being fed multiple easyconfigs on the command line), you'll end up with incorrect stuff in the generated module file due to this, no?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lexming I've implemented a test that shows that using a singleton here is a bad idea, see lexming#5

add test to verify that environment variables don't leak into module file of subsequent installations
@boegel
Copy link
Member

boegel commented Nov 6, 2024

@lexming Can you look into fixing the merge conflict, and get the test to pass?

Copy link
Member

@boegel boegel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lexming All remarks fixed in lexming#6

easybuild/tools/modules.py Outdated Show resolved Hide resolved
easybuild/tools/config.py Outdated Show resolved Hide resolved
A dictionary of common search path variables to be loaded by environment modules
Each key contains the list of known directories related to the search path
"""
return self.module_load_environment.environ
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we print a deprecation warning here too, nothing should be relying on make_module_req_guess anymore, but use module_load_environment.environ instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's true that we need it here to catch the case where make_module_req_guess() is directly called in some arbitrary location in a custom easyblock.

However, this has the downside of duplicating deprecation warnings in the following common case:

  1. EB reaches make_module_req
  2. custom easyblock has its own make_module_req_guess()
  3. first warning triggers
  4. make_module_req calls custom make_module_req_guess()
  5. custom make_module_req_guess() calls super()
  6. generic make_module_req_guess() triggers second deprecation warning

But I don't see a way to avoid it, as custom make_module_req_guess() methods do not necessarily call super. So we need both to cover all cases.

easybuild/tools/modules.py Outdated Show resolved Hide resolved
easybuild/tools/modules.py Outdated Show resolved Hide resolved
test/framework/modules.py Outdated Show resolved Hide resolved
test/framework/modules.py Outdated Show resolved Hide resolved
def environ(self):
"""
Return dict with mapping of ModuleEnvironmentVariables names with their contents
Equivalent in shape to os.environ
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit misleading, since the values are actually lists of strings, not string values as they are in os.environ, so I would remove this:

Suggested change
Equivalent in shape to os.environ

Copy link
Contributor Author

@lexming lexming Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, that was not my intention actually. I updated environ to be a proper environ equivalent with strings as values and added a new as_dict that returns a dict with lists as values. See 9297094 and 87afba7

Actual contents of the environment variable are held in self.contents
"""
self.contents = contents
self.top_level_file = bool(top_level_file)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rename this to something more explicit, like requires_files

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That might be misleading. The current behaviour (v4.9) is that all paths require files in them to be declared in the module file. This attribute only changes the strictness of that check to require those files to be in the top directory.

Here there is an underlying design question, because ModuleEnvironmentVariable can be generic in purpose and used for other things than make_module_req. So maybe we need a ternary parameter (NO_FILES, ANY_FILES, TOP_LEVEL_FILES).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got a nice solution for this, I generalized a bit more ModuleEnvironmentVariable by adding a type attribute to it which can be one of ModEnvVarType. A new enum with STRING, PATH, PATH_WITH_FILES and PATH_WITH_TOP_FILES. Then with a single attribute we can control the behaviour of a specific variable. See 516d6e6

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improved further the use and tests of ModEnvVarType in d4a9df0

This is ready on my side.

easybuild/framework/easyblock.py Outdated Show resolved Hide resolved
if paths:
lines.append(self.module_generator.prepend_paths(key, paths))
mod_req_paths = []
top_level = getattr(self.module_load_environment, env_var).top_level_file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This goes horribly wrong when using a copy of PerlModule that has not been migrated yet to module_load_environment (as is done in easybuilders/easybuild-easyblocks#3529):

Traceback (most recent call last):
  File "/usr/lib64/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib64/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/tmp/vsc40023/easybuild-framework/easybuild/main.py", line 801, in <module>
    main_with_hooks()
  File "/tmp/vsc40023/easybuild-framework/easybuild/main.py", line 787, in main_with_hooks
    main(args=args, prepared_cfg_data=(init_session_state, eb_go, cfg_settings))
  File "/tmp/vsc40023/easybuild-framework/easybuild/main.py", line 743, in main
    hooks, do_build)
  File "/tmp/vsc40023/easybuild-framework/easybuild/main.py", line 567, in process_eb_args
    exit_on_failure=exit_on_failure)
  File "/tmp/vsc40023/easybuild-framework/easybuild/main.py", line 175, in build_and_install_software
    raise ec_res['err']
  File "/tmp/vsc40023/easybuild-framework/easybuild/main.py", line 136, in build_and_install_software
    (ec_res['success'], app_log, err_msg, err_code) = build_and_install_one(ec, init_env)
  File "/tmp/vsc40023/easybuild-framework/easybuild/framework/easyblock.py", line 4432, in build_and_install_one
    result = app.run_all_steps(run_test_cases=run_test_cases)
  File "/tmp/vsc40023/easybuild-framework/easybuild/framework/easyblock.py", line 4300, in run_all_steps
    self.run_step(step_name, step_methods)
  File "/tmp/vsc40023/easybuild-framework/easybuild/framework/easyblock.py", line 4143, in run_step
    step_method(self)()
  File "/user/gent/400/vsc40023/easybuild/eb5/easybuild-easyblocks/easybuild/easyblocks/generic/perlmodule.py", line 140, in sanity_check_step
    return ExtensionEasyBlock.sanity_check_step(self, EXTS_FILTER_PERL_MODULES, *args, **kwargs)
  File "/tmp/vsc40023/easybuild-framework/easybuild/framework/extensioneasyblock.py", line 188, in sanity_check_step
    fake_mod_data = self.load_fake_module(purge=True, extra_modules=extra_modules)
  File "/tmp/vsc40023/easybuild-framework/easybuild/framework/easyblock.py", line 1796, in load_fake_module
    fake_mod_path = self.make_module_step(fake=True)
  File "/tmp/vsc40023/easybuild-framework/easybuild/framework/easyblock.py", line 3905, in make_module_step
    txt += self.make_module_req(fake=fake)
  File "/tmp/vsc40023/easybuild-framework/easybuild/framework/easyblock.py", line 1664, in make_module_req
    requires_files = getattr(self.module_load_environment, env_var).requires_files
AttributeError: 'ModuleLoadEnvironment' object has no attribute 'PERL5LIB'

(note that I'm using my updated version here, see also lexming#6)

It's getting PERL5LIB via PerlModule.make_module_req_guess, but since that has never been defined as an attribute in module_load_environment, this fails hard.

That tells me we're missing something above, where we deal with easyblocks that still define a custom make_module_req_guess?

Copy link
Contributor Author

@lexming lexming Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the problem, very stupid... I changed approach and now using any deprecated make_module_req_guess will always result in updating module_load_environment. Ensuring that it is always present.

To this end I added an update method to ModuleLoadEnvironment in 87afba7 and switch to it in 38774ac

# Only inject path-like environment variables into module file
env_var_requirements = sorted([
(envar_name, envar_val) for envar_name, envar_val in self.module_load_environment.items()
if envar_val.is_path
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lexming Shouldn't we at least log a warning or something if any entries in self.module_load_environment do not have is_path is True?

Because those cases will essentially be totally silently ignores currently?

lines.append(self.module_generator.prepend_paths(key, paths))
path_type = search_paths.type
if fake:
path_type = ModEnvVarType.PATH
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lexming Is this only done to bypass the check in expand_module_search_path that ensures that this directory actually has files in them?

Why? Is it only to get some marginal speedup?

This may be important, the "fake" module is generated right before the sanity check is run, so if for some weird reason having additional paths to empty directories in environment variables causes trouble (or worse, somehow makes the sanity check pass), it sort of invalidates the sanity check...

Is there another reason to make this exception beyond a small performance optimization?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Nice-to-have
Development

Successfully merging this pull request may close these issues.

2 participants