-
Notifications
You must be signed in to change notification settings - Fork 550
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
setuptools plugin solution for variables #1121
Conversation
dedupe/datamodel.py
Outdated
|
||
variable_types = {} | ||
for variable_type in pm.hook.register_variable(): | ||
variable_types.update(variable_type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be good to do some protocol checking here.
@NickCrews so here's an implementation inspired by how datasette uses pluggy. would appreciate your thoughts on this. |
Unfortunately I don't think that pluggy was actually a good suggestion from me, sorry :( I think the largest drawback is the boilerplate that you describe above that plugin authors will need. Second, I don't like how this isn't easily testable. At least I can't see how to install a mock package that registers an entry point. I'm assuming you didn't either, and that's why there are no tests here. When I see commits where someone bumps some external package to check for a change in behavior I get nervous lol The actual implementation here looks good though if this still is the direction you want to head. |
there are reasonably test pattern, but i feel you on the boilerplate. it’s about the same amount and much clearer boilerplate than the namespace solution. hmm… |
the core of way that pluggy works with external libraries is with the setuptools mechanism. def load_setuptools_entrypoints(
self, group: str, name: Optional[str] = None
) -> int:
"""Load modules from querying the specified setuptools ``group``.
:param str group: Entry point group to load plugins.
:param str name: If given, loads only plugins with the given ``name``.
:rtype: int
:return: The number of plugins loaded by this call.
"""
count = 0
for dist in list(importlib_metadata.distributions()):
for ep in dist.entry_points:
if (
ep.group != group
or (name is not None and ep.name != name)
# already registered
or self.get_plugin(ep.name)
or self.is_blocked(ep.name)
):
continue
plugin = ep.load()
self.register(plugin, name=ep.name)
self._plugin_distinfo.append((plugin, DistFacade(dist)))
count += 1
return count Reading setuptool's docs for this, i think we can do something much more direct than this current pluggy approach. it will look similar to @NickCrews's proposal, but the developer will only have to write the pathstring once (for the entrypoint), instead of the user having to do it multiple times. |
okay, i'm liking this solution a lot better than the pluggy one. thanks for pushing against that @NickCrews! i think it should be a bit more user friendly than the full path model, but almost as convenient for developers hacking one something. i'm going to try this out on a local project that needs a custom variable type and if i still like it, i'll write some tests and bring it in. |
@@ -153,7 +153,10 @@ def typify_variables( | |||
variable_definitions: Iterable[VariableDefinition], | |||
) -> tuple[list[FieldVariable], list[Variable]]: | |||
|
|||
variable_types = {variable.type: variable for variable in dedupe.variables.__all__} | |||
variable_types = {} | |||
for variablename in dedupe.variables.__all__: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you copied this from my PR, I'm not positive that all is actually the correct solution, perhaps I'm abusing it and it actually is intended for something else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's intended use is to control the behavior of "import *" but i think this works fine.
This is much better than pluggy, awesome! I think the basic philosophical difference between this and #1122 is who has the responsibility of defining the lookup key, and whether namespacing with the package name is important. Here, when the author of plugin A writes
they decide that the way to get this plugin is via the key "zipcodetype". You don't have to know that it lives inside a package called pluginA. If there is another plugin B that wants to do basically the same thing, they can't say Whereas in #1122, the name of the package, pluginA vs pluginB, is significant, and is used to namespace. Now you can have two different plugins that export variables of the same name. On PyPI pluginA and pluginB are already forced to be using different names, so we can piggyback off that guarantee and now you can't have accidental collisions. What do you think, are you trying to avoid having to specify "pluginA" vs "pluginB" in your variable definition dicts? I think that is silly, you already know where ZipCodeVariable is coming from, you have the package name written down in your requirements.txt. Am I missing something here? Note this is different from wanting to not care about WHAT FILE in the plugin , which I agree with the user shouldn't have to specify. But plugin authors shoud get around that with what I mention in #1122 (comment) |
i think my main reason for preferring this is consistency of interface. i don't like that there are some variables that we refer to with a short alias and others that we use a path in the datamodel. we could have a deprecation path where the paths were the way to reference variables, but honestly i think that a lot of new people would be confused by that. if we were going to go down that road, i might even prefer a datamodel that went from: data_model = [{field: "name", type: "String"},
{field: "address", type: "String"}] to from dedupe.variables import StringVariable
data_model = [StringVariable(field="name"), StringVariable(field="address")] which would avoid the plugin question altogether. i kind of like that solution, but it seems like a lot of disruption without a huge benefit to users. hmmm.... |
closed by #1193 |
Using setuptools plugin facilities to set up variables. will close #1085
external plugins
external dedupe-variable plugins will look like https://github.com/dedupeio/dedupe-variable-datetime
in particular, they will have this in their pyproject.toml (or equivalent setup.cfg/setup.py)
https://github.com/dedupeio/dedupe-variable-datetime/blob/fcd36afd000641168d9ae369623df866eeac35f9/pyproject.toml#L16
to do