Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

setuptools plugin solution for variables #1121

Closed
wants to merge 15 commits into from
Closed

Conversation

fgregg
Copy link
Contributor

@fgregg fgregg commented Dec 10, 2022

Using setuptools plugin facilities to set up variables. will close #1085

external plugins

external dedupe-variable plugins will look like https://github.com/dedupeio/dedupe-variable-datetime

in particular, they will have this in their pyproject.toml (or equivalent setup.cfg/setup.py)

[project.entry-points]
dedupevariables = {datetimetype = "datetimetype:DateTimeType"}

https://github.com/dedupeio/dedupe-variable-datetime/blob/fcd36afd000641168d9ae369623df866eeac35f9/pyproject.toml#L16

to do

@coveralls
Copy link

coveralls commented Dec 10, 2022

Coverage Status

Coverage increased (+0.08%) to 73.364% when pulling 784bf63 on pluggy-variables into 3666d3a on main.


variable_types = {}
for variable_type in pm.hook.register_variable():
variable_types.update(variable_type)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be good to do some protocol checking here.

@fgregg
Copy link
Contributor Author

fgregg commented Dec 11, 2022

@NickCrews so here's an implementation inspired by how datasette uses pluggy. would appreciate your thoughts on this.

@NickCrews
Copy link
Contributor

NickCrews commented Dec 11, 2022

Unfortunately I don't think that pluggy was actually a good suggestion from me, sorry :(

I think the largest drawback is the boilerplate that you describe above that plugin authors will need.

Second, I don't like how this isn't easily testable. At least I can't see how to install a mock package that registers an entry point. I'm assuming you didn't either, and that's why there are no tests here. When I see commits where someone bumps some external package to check for a change in behavior I get nervous lol

The actual implementation here looks good though if this still is the direction you want to head.

@fgregg
Copy link
Contributor Author

fgregg commented Dec 11, 2022

there are reasonably test pattern,

but i feel you on the boilerplate. it’s about the same amount and much clearer boilerplate than the namespace solution.

hmm…

@fgregg
Copy link
Contributor Author

fgregg commented Dec 11, 2022

the core of way that pluggy works with external libraries is with the setuptools mechanism.

    def load_setuptools_entrypoints(
        self, group: str, name: Optional[str] = None
    ) -> int:
        """Load modules from querying the specified setuptools ``group``.

        :param str group: Entry point group to load plugins.
        :param str name: If given, loads only plugins with the given ``name``.
        :rtype: int
        :return: The number of plugins loaded by this call.
        """
        count = 0
        for dist in list(importlib_metadata.distributions()):
            for ep in dist.entry_points:
                if (
                    ep.group != group
                    or (name is not None and ep.name != name)
                    # already registered
                    or self.get_plugin(ep.name)
                    or self.is_blocked(ep.name)
                ):
                    continue
                plugin = ep.load()
                self.register(plugin, name=ep.name)
                self._plugin_distinfo.append((plugin, DistFacade(dist)))
                count += 1
        return count

https://github.com/pytest-dev/pluggy/blob/450b85178d0ea2db4ecf75f6abfbb97e1164d5ab/src/pluggy/_manager.py#L352-L377

Reading setuptool's docs for this, i think we can do something much more direct than this current pluggy approach.

it will look similar to @NickCrews's proposal, but the developer will only have to write the pathstring once (for the entrypoint), instead of the user having to do it multiple times.

@fgregg fgregg changed the title pluggy solution for variables setuptools plugin solution for variables Dec 11, 2022
@fgregg
Copy link
Contributor Author

fgregg commented Dec 11, 2022

okay, i'm liking this solution a lot better than the pluggy one. thanks for pushing against that @NickCrews!

i think it should be a bit more user friendly than the full path model, but almost as convenient for developers hacking one something.

i'm going to try this out on a local project that needs a custom variable type and if i still like it, i'll write some tests and bring it in.

@@ -153,7 +153,10 @@ def typify_variables(
variable_definitions: Iterable[VariableDefinition],
) -> tuple[list[FieldVariable], list[Variable]]:

variable_types = {variable.type: variable for variable in dedupe.variables.__all__}
variable_types = {}
for variablename in dedupe.variables.__all__:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you copied this from my PR, I'm not positive that all is actually the correct solution, perhaps I'm abusing it and it actually is intended for something else.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's intended use is to control the behavior of "import *" but i think this works fine.

@NickCrews
Copy link
Contributor

This is much better than pluggy, awesome!

I think the basic philosophical difference between this and #1122 is who has the responsibility of defining the lookup key, and whether namespacing with the package name is important. Here, when the author of plugin A writes

[project.entry-points]
dedupevariables = {zipcodetype = "pluginA:ZipCodeVariable"}

they decide that the way to get this plugin is via the key "zipcodetype". You don't have to know that it lives inside a package called pluginA. If there is another plugin B that wants to do basically the same thing, they can't say dedupevariables = {zipcodetype = "pluginB:ZipCodeVariable"} because the key zipcodetype is now ambiguous, do you mean from pluginA or pluginB?

Whereas in #1122, the name of the package, pluginA vs pluginB, is significant, and is used to namespace. Now you can have two different plugins that export variables of the same name. On PyPI pluginA and pluginB are already forced to be using different names, so we can piggyback off that guarantee and now you can't have accidental collisions.

What do you think, are you trying to avoid having to specify "pluginA" vs "pluginB" in your variable definition dicts? I think that is silly, you already know where ZipCodeVariable is coming from, you have the package name written down in your requirements.txt. Am I missing something here? Note this is different from wanting to not care about WHAT FILE in the plugin , which I agree with the user shouldn't have to specify. But plugin authors shoud get around that with what I mention in #1122 (comment)

@fgregg
Copy link
Contributor Author

fgregg commented Dec 12, 2022

i think my main reason for preferring this is consistency of interface. i don't like that there are some variables that we refer to with a short alias and others that we use a path in the datamodel.

we could have a deprecation path where the paths were the way to reference variables, but honestly i think that a lot of new people would be confused by that.

if we were going to go down that road, i might even prefer a datamodel that went from:

data_model = [{field: "name", type: "String"},
              {field: "address", type: "String"}]

to

from dedupe.variables import StringVariable
data_model = [StringVariable(field="name"), StringVariable(field="address")]

which would avoid the plugin question altogether.

i kind of like that solution, but it seems like a lot of disruption without a huge benefit to users.

hmmm....

@hancush hancush self-assigned this May 3, 2024
fgregg added a commit that referenced this pull request Jun 27, 2024
set up data models like

```python
[
  dedupe.variables.String("name"),
  dedupe.variables.Exact("address")
 ]
 ```

instead of 

```
[
  {"field", "name", "type": "String"},
  {"field", "address", "type": "String"},
 ]
```

supersedes #1122, #1121. will close #1085
@fgregg
Copy link
Contributor Author

fgregg commented Jun 27, 2024

closed by #1193

@fgregg fgregg closed this Jun 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

transition to plugins for dedupe variables.
4 participants