-
-
Notifications
You must be signed in to change notification settings - Fork 31k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
doctest doesn't catch unescaped character literals #129257
Comments
Since |
I wouldn't expect to actually read the doctest from the output of You'll get the same behaviour if you just write: >>> x = """
... >>> fn()
... '\\u2009'
... >>> fn() # this one shouldn't work
... '\u2009'
... """
>>> print(x)
>>> fn()
'\u2009'
>>> fn() # this one shouldn't work
' '
>>> I'm closing it as |
Ok I think I know what happens: doctest is hex-folding outputs: class OutputChecker:
"""
A class used to check whether the actual output from a doctest
example matches the expected output. `OutputChecker` defines two
methods: `check_output`, which compares a given pair of outputs,
and returns true if they match; and `output_difference`, which
returns a string describing the differences between two outputs.
"""
def _toAscii(self, s):
"""
Convert string to hex-escaped ASCII string.
"""
return str(s.encode('ASCII', 'backslashreplace'), "ASCII")
def check_output(self, want, got, optionflags):
"""
Return True iff the actual output from an example (`got`)
matches the expected output (`want`). These strings are
always considered to match if they are identical; but
depending on what option flags the test runner is using,
several non-exact match types are also possible. See the
documentation for `TestRunner` for more information about
option flags.
"""
# If `want` contains hex-escaped character such as "\u1234",
# then `want` is a string of six characters(e.g. [\,u,1,2,3,4]).
# On the other hand, `got` could be another sequence of
# characters such as [\u1234], so `want` and `got` should
# be folded to hex-escaped ASCII string to compare.
got = self._toAscii(got)
want = self._toAscii(want)
# Handle the common case first, for efficiency:
# if they're string-identical, always return true.
if got == want:
return True You can observe that: >>> f = lambda s: str(s.encode('ASCII', 'backslashreplace'), "ASCII")
>>> f('\u1234') == f('\\u1234')
True So because of this, your second succeeds. If you want explicit |
Note that there is actually a fast path. However, we could perhaps add a new option to not have that strict path and compare outputs without hex-folding, though I'm not sure it's worth it. In general, I would recommend using an |
Bug report
Bug description:
The second line is wrong because the character literal is not escaped. If you run
help(fn)
you'll see thisbut if you actually run the sample code you'd see
CPython versions tested on:
3.13
Operating systems tested on:
macOS
The text was updated successfully, but these errors were encountered: