doctest doesn't catch unescaped character literals #129257

verhovsky · 2025-01-24T10:42:25Z

Bug report

Bug description:

def fn():
    """
    >>> fn()
    '\\u2009'
    >>> fn()  # this one shouldn't work
    '\u2009'
    """
    return '\u2009'


if __name__ == "__main__":
    import doctest
    doctest.testmod()

The second line is wrong because the character literal is not escaped. If you run help(fn) you'll see this

but if you actually run the sample code you'd see

>>> fn()
'\u2009'
>>> fn()
'\u2009'

CPython versions tested on:

3.13

Operating systems tested on:

macOS

The text was updated successfully, but these errors were encountered:

verhovsky · 2025-01-24T10:48:43Z

Since ' ' == '\u2009' you can close the issue, but the end result is modules with slightly incorrect help() output. Maybe there could be a STRICT_STRINGS option.

picnixz · 2025-01-24T11:01:44Z

the end result is modules with slightly incorrect help()

I wouldn't expect to actually read the doctest from the output of help. And I don't think we can change it because the docstring is a regular string and help doesn't do something special for doctests.

You'll get the same behaviour if you just write:

>>> x =     """
...     >>> fn()
...     '\\u2009'
...     >>> fn()  # this one shouldn't work
...     '\u2009'
...     """
>>> print(x)

    >>> fn()
    '\u2009'
    >>> fn()  # this one shouldn't work
    ' '

>>>

I'm closing it as wontfix because it won't be possible to fix it (or help() needs to parse the file itself and process the lines statically)

picnixz · 2025-01-24T11:12:19Z

Ok I think I know what happens: doctest is hex-folding outputs:

class OutputChecker:
    """
    A class used to check whether the actual output from a doctest
    example matches the expected output.  `OutputChecker` defines two
    methods: `check_output`, which compares a given pair of outputs,
    and returns true if they match; and `output_difference`, which
    returns a string describing the differences between two outputs.
    """
    def _toAscii(self, s):
        """
        Convert string to hex-escaped ASCII string.
        """
        return str(s.encode('ASCII', 'backslashreplace'), "ASCII")

    def check_output(self, want, got, optionflags):
        """
        Return True iff the actual output from an example (`got`)
        matches the expected output (`want`).  These strings are
        always considered to match if they are identical; but
        depending on what option flags the test runner is using,
        several non-exact match types are also possible.  See the
        documentation for `TestRunner` for more information about
        option flags.
        """

        # If `want` contains hex-escaped character such as "\u1234",
        # then `want` is a string of six characters(e.g. [\,u,1,2,3,4]).
        # On the other hand, `got` could be another sequence of
        # characters such as [\u1234], so `want` and `got` should
        # be folded to hex-escaped ASCII string to compare.
        got = self._toAscii(got)
        want = self._toAscii(want)

        # Handle the common case first, for efficiency:
        # if they're string-identical, always return true.
        if got == want:
            return True

You can observe that:

>>> f = lambda s: str(s.encode('ASCII', 'backslashreplace'), "ASCII")
>>> f('\u1234') == f('\\u1234')
True

So because of this, your second succeeds. If you want explicit \ in the output, you should add an r-prefix for the docstring. Therefore, I think this is the expected behaviour, albeit a bit surprising (I'm convinced to closing this as wontfix).

picnixz · 2025-01-24T11:14:01Z

Note that there is actually a fast path. However, we could perhaps add a new option to not have that strict path and compare outputs without hex-folding, though I'm not sure it's worth it. In general, I would recommend using an r-prefix or write an explicit unittest instead (or even an explicit test function that you would also run).

verhovsky added the type-bug An unexpected behavior, bug, or error label Jan 24, 2025

picnixz added the stdlib Python modules in the Lib dir label Jan 24, 2025

picnixz removed the type-bug An unexpected behavior, bug, or error label Jan 24, 2025

verhovsky mentioned this issue Jan 24, 2025

Remove u string prefix from docs python-babel/babel#1174

Open

picnixz closed this as not planned Won't fix, can't repro, duplicate, stale Jan 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

doctest doesn't catch unescaped character literals #129257

doctest doesn't catch unescaped character literals #129257

verhovsky commented Jan 24, 2025 •

edited

Loading

verhovsky commented Jan 24, 2025 •

edited

Loading

picnixz commented Jan 24, 2025

picnixz commented Jan 24, 2025 •

edited

Loading

picnixz commented Jan 24, 2025

doctest doesn't catch unescaped character literals #129257

doctest doesn't catch unescaped character literals #129257

Comments

verhovsky commented Jan 24, 2025 • edited Loading

Bug report

Bug description:

CPython versions tested on:

Operating systems tested on:

verhovsky commented Jan 24, 2025 • edited Loading

picnixz commented Jan 24, 2025

picnixz commented Jan 24, 2025 • edited Loading

picnixz commented Jan 24, 2025

verhovsky commented Jan 24, 2025 •

edited

Loading

verhovsky commented Jan 24, 2025 •

edited

Loading

picnixz commented Jan 24, 2025 •

edited

Loading