Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instances of generic classes defined with PEP695 syntax are unpickleable #129250

Open
cslamber opened this issue Jan 24, 2025 · 5 comments
Open
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-typing type-bug An unexpected behavior, bug, or error

Comments

@cslamber
Copy link

cslamber commented Jan 24, 2025

Bug report

Bug description:

Generic classes that are instantiated with the Class[Type](...) syntax have an undocumented(?) __orig_class__ attribute defined on them that contains a reference to the type passed in the brackets. This attribute is pickled along with the class as usual, and in the cases when that type is well-behaved this works fine. However, PEP695-defined TypeVars have their __module__ set to typing, and so pickling fails as it is not a true member of the typing module. This prevents the usage of either Class[Type](...) syntax or PEP695 syntax in code near anything that must be pickled.

import pickle
from typing import Generic, TypeVar

T = TypeVar('T')

class Foo(Generic[T]): # equivalently class Foo[Anything]:
    pass

def bar[Baz]():
    return Foo[Baz]()

pickle.dumps(bar())
#    pickle.dumps(bar())
#    ~~~~~~~~~~~~^^^^^^^
# _pickle.PicklingError: Can't pickle Baz: attribute lookup Baz on typing failed

print(bar().__dict__)
# {'__orig_class__': __main__.Foo[Baz]}
print(bar().__orig_class__.__parameters__[0])
# Baz
print(bar().__orig_class__.__parameters__[0].__module__)
# typing

CPython versions tested on:

3.13

Operating systems tested on:

Linux

@cslamber cslamber added the type-bug An unexpected behavior, bug, or error label Jan 24, 2025
@sobolevn
Copy link
Member

sobolevn commented Jan 24, 2025

The same happens for classes, functions, and type aliases:

>>> type A[T] = ...
>>> A.__type_params__[0]
T
>>> A.__type_params__[0].__module__
'typing'

Setting .__module__ to the correct one does not really help, because T is not really visible from the module's namespace.

>>> import pickle
>>> class Foo[Bar]: ...
... ...
>>> Foo.__type_params__[0].__module__ = '__main__'
>>> pickle.dumps(Foo[Foo.__type_params__[0]]())
AttributeError: module '__main__' has no attribute 'Bar'. Did you mean: 'bar'?

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<python-input-11>", line 1, in <module>
    pickle.dumps(Foo[Foo.__type_params__[0]]())
    ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
_pickle.PicklingError: Can't pickle Bar: it's not found as __main__.Bar
when serializing tuple item 1
when serializing typing._GenericAlias reconstructor arguments
when serializing typing._GenericAlias object
when serializing dict item '__orig_class__'
when serializing Foo state
when serializing Foo object

So, the only solution I can think of is to change how we serialize TypeVar objects.

@picnixz picnixz added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Jan 24, 2025
@tom-pytel
Copy link
Contributor

tom-pytel commented Jan 24, 2025

Something like this? (need some tweaks, but this is the idea)

$ ./python
Python 3.14.0a4+ (heads/main-dirty:75f59bb629, Jan 24 2025, 10:45:29) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> class Foo[Bar]: pass
...
>>> bar = Foo.__type_params__[0]
>>> import pickle
>>> pickle.dumps(bar)
b'\x80\x05\x95 \x00\x00\x00\x00\x00\x00\x00\x8c\x06typing\x94\x8c\x07TypeVar\x94\x93\x94\x8c\x03Bar\x94\x85\x94R\x94.'
>>> pickle.loads(_)
~Bar
>>> type(_)
<class 'typing.TypeVar'>
>>> pickle.dumps(Foo)
b'\x80\x05\x95\x14\x00\x00\x00\x00\x00\x00\x00\x8c\x08__main__\x94\x8c\x03Foo\x94\x93\x94.'
>>> pickle.loads(_)
<class '__main__.Foo'>

Change TypeVar.__reduce__:

$ git diff main
diff --git a/Objects/typevarobject.c b/Objects/typevarobject.c
index 4ed40aa71a..acfa261ad5 100644
--- a/Objects/typevarobject.c
+++ b/Objects/typevarobject.c
@@ -806,7 +806,57 @@ static PyObject *
 typevar_reduce_impl(typevarobject *self)
 /*[clinic end generated code: output=02e5c55d7cf8a08f input=de76bc95f04fb9ff]*/
 {
-    return Py_NewRef(self->name);
+    PyObject *ret = NULL;
+    PyObject *args = NULL;
+    PyObject *typevar = NULL;
+    PyObject *typing = NULL;
+
+    PyObject *module_name = PyObject_GetAttr((PyObject *)self, &_Py_ID(__module__));
+    if (module_name == NULL) {
+        return NULL;
+    }
+    if (!_PyUnicode_EqualToASCIIString(module_name, "typing")) {
+        ret = Py_NewRef(self->name);
+        goto done;
+    }
+
+    typing = PyImport_ImportModule("typing");
+    if (typing == NULL) {
+        goto done;
+    }
+    if (PyObject_HasAttr(typing, self->name)) {
+        ret = Py_NewRef(self->name);
+        goto done;
+    }
+
+    typevar = PyObject_GetAttrString(typing, "TypeVar");
+    if (typevar == NULL) {
+        goto done;
+    }
+    args = PyTuple_New(1);
+    if (args == NULL) {
+        goto done;
+    }
+    ret = PyTuple_New(2);
+    if (ret == NULL) {
+        goto done;
+    }
+
+    Py_INCREF(self->name);
+    PyTuple_SET_ITEM(args, 0, self->name);
+    PyTuple_SET_ITEM(ret, 0, typevar);
+    PyTuple_SET_ITEM(ret, 1, args);
+
+    args = NULL;
+    typevar = NULL;
+
+done:
+    Py_XDECREF(args);
+    Py_XDECREF(typevar);
+    Py_XDECREF(typing);
+    Py_XDECREF(module_name);
+
+    return ret;
 }

Need to store obviously all the other relevant attributes and maybe use some other function like copyreg.__newobj__. Would not change behavior of any __reduce__ of non typing module TypeVars, or those that actually exist in the typing module.

== Tests result: SUCCESS ==

10 slowest tests:
- test_signal: 1 min 16 sec
- test.test_concurrent_futures.test_wait: 48.0 sec
- test.test_multiprocessing_spawn.test_processes: 43.0 sec
- test_socket: 34.3 sec
- test.test_multiprocessing_forkserver.test_processes: 33.9 sec
- test_io: 32.2 sec
- test_xmlrpc: 27.7 sec
- test.test_multiprocessing_spawn.test_misc: 27.4 sec
- test_logging: 25.5 sec
- test_subprocess: 25.0 sec

15 tests skipped:
    test.test_asyncio.test_windows_events
    test.test_asyncio.test_windows_utils test_android test_apple
    test_devpoll test_free_threading test_kqueue test_launcher
    test_msvcrt test_startfile test_winapi test_winconsoleio
    test_winreg test_winsound test_wmi

4 tests skipped (resource denied):
    test_peg_generator test_tkinter test_ttk test_zipfile64

465 tests OK.

Total duration: 2 min 8 sec
Total tests: run=45,704 skipped=2,060
Total test files: run=480/484 skipped=15 resource_denied=4
Result: SUCCESS

Are there other implications of assigning typing module to autocreated TypeVars when they don't actually exist in that module?

@JelleZijlstra
Copy link
Member

@tom-pytel's solution is problematic because TypeVars should ideally be pickled by identity, not by value. Two separate TypeVar('T') objects are unrelated; pickling and unpickling one should not yield a different object.

We could solve this bug by simply not pickling the __orig_class__ attribute. It's an optional typing introspection feature, so it wouldn't be the end of the world. However, that's a solution I'd rather avoid.

Another approach could be to teach the pickle machinery to retrieve the TypeVar at its original location. In the original report, the TypeVar could be retrieved by doing Bar.__type_params__[0]. We could (with some changes to the compiler and runtime) make it so pickling a new-style TypeVar looks up the object at that path.

@tom-pytel
Copy link
Contributor

tom-pytel commented Jan 24, 2025

You would have to store a reference that "original location" in each TypeVar so that it could be used as a context in serialization. Much like __module__ is used for TypeVars at module scope, no? So that the __orig_class__ return from OP bar() would __reduce__() to something like (for anonymous TypeVars):

_GenericAlias._recreate, (<class '__main__.Foo'>, (<function bar at 0x7f7076d47890>, 0 (index in __type_params__)))

Instead of just:

<built-in function getitem>, (<class '__main__.Foo'>, Baz))

In order to have context to deserialize the Baz identically?

EDIT: Doing it for own generics like Foo[Foo.__type_params__[0]]() would be trivial since you already have the origin class to reference, but that would not solve the OP problem where the anonymous TypeVar comes from a different object.

@tom-pytel
Copy link
Contributor

tom-pytel commented Jan 25, 2025

Proof of concept solving original report:

>>> import pickle
... from typing import Generic, TypeVar
...
... T = TypeVar('T')
...
... class Foo(Generic[T]): # equivalently class Foo[Anything]:
...     pass
...
... def bar[Baz]():
...     return Foo[Baz]()
...
... foo_baz = bar()
... pickle.dumps(foo_baz)
... 
b'\x80\x05\x95\x82\x00\x00\x00\x00\x00\x00\x00\x8c\x08__main__\x94\x8c\x03Foo\x94\x93\x94)\x81\x94}\x94\x8c\x0e__orig_class__\x94\x8c\t_operator\x94\x8c\x07getitem\x94\x93\x94h\x02\x8c\x06typing\x94\x8c\x1a_restore_anonymous_typevar\x94\x93\x94h\x00\x8c\x03bar\x94\x93\x94K\x00\x86\x94R\x94\x86\x94R\x94sb.'
>>> pickle.loads(_)
<__main__.Foo object at 0x7f43d90a45e0>
>>> _.__orig_class__.__args__[0] is foo_baz.__orig_class__.__args__[0]
True
>>> foo_baz.__orig_class__.__args__[0]
Baz
>>> foo_baz.__orig_class__.__args__[0].__reduce__()
(<function _restore_anonymous_typevar at 0x7f43d90cf050>, (<function bar at 0x7f43d9360d10>, 0))

Requires adding a weakref pointing to owning object to anonymous TyperVar and modifying stuff like _Py_set_function_type_params to set owner, but works as @JelleZijlstra requested by identity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-typing type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

5 participants