Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doctest doesn't catch unescaped character literals #129257

Closed
verhovsky opened this issue Jan 24, 2025 · 4 comments
Closed

doctest doesn't catch unescaped character literals #129257

verhovsky opened this issue Jan 24, 2025 · 4 comments
Labels
stdlib Python modules in the Lib dir

Comments

@verhovsky
Copy link
Contributor

verhovsky commented Jan 24, 2025

Bug report

Bug description:

def fn():
    """
    >>> fn()
    '\\u2009'
    >>> fn()  # this one shouldn't work
    '\u2009'
    """
    return '\u2009'


if __name__ == "__main__":
    import doctest
    doctest.testmod()

The second line is wrong because the character literal is not escaped. If you run help(fn) you'll see this

Image

but if you actually run the sample code you'd see

>>> fn()
'\u2009'
>>> fn()
'\u2009'

CPython versions tested on:

3.13

Operating systems tested on:

macOS

@verhovsky verhovsky added the type-bug An unexpected behavior, bug, or error label Jan 24, 2025
@verhovsky
Copy link
Contributor Author

verhovsky commented Jan 24, 2025

Since ' ' == '\u2009' you can close the issue, but the end result is modules with slightly incorrect help() output. Maybe there could be a STRICT_STRINGS option.

@picnixz picnixz added the stdlib Python modules in the Lib dir label Jan 24, 2025
@picnixz
Copy link
Member

picnixz commented Jan 24, 2025

the end result is modules with slightly incorrect help()

I wouldn't expect to actually read the doctest from the output of help. And I don't think we can change it because the docstring is a regular string and help doesn't do something special for doctests.

You'll get the same behaviour if you just write:

>>> x =     """
...     >>> fn()
...     '\\u2009'
...     >>> fn()  # this one shouldn't work
...     '\u2009'
...     """
>>> print(x)

    >>> fn()
    '\u2009'
    >>> fn()  # this one shouldn't work
    ' '

>>>

I'm closing it as wontfix because it won't be possible to fix it (or help() needs to parse the file itself and process the lines statically)

@picnixz picnixz removed the type-bug An unexpected behavior, bug, or error label Jan 24, 2025
@picnixz
Copy link
Member

picnixz commented Jan 24, 2025

Ok I think I know what happens: doctest is hex-folding outputs:

class OutputChecker:
    """
    A class used to check whether the actual output from a doctest
    example matches the expected output.  `OutputChecker` defines two
    methods: `check_output`, which compares a given pair of outputs,
    and returns true if they match; and `output_difference`, which
    returns a string describing the differences between two outputs.
    """
    def _toAscii(self, s):
        """
        Convert string to hex-escaped ASCII string.
        """
        return str(s.encode('ASCII', 'backslashreplace'), "ASCII")

    def check_output(self, want, got, optionflags):
        """
        Return True iff the actual output from an example (`got`)
        matches the expected output (`want`).  These strings are
        always considered to match if they are identical; but
        depending on what option flags the test runner is using,
        several non-exact match types are also possible.  See the
        documentation for `TestRunner` for more information about
        option flags.
        """

        # If `want` contains hex-escaped character such as "\u1234",
        # then `want` is a string of six characters(e.g. [\,u,1,2,3,4]).
        # On the other hand, `got` could be another sequence of
        # characters such as [\u1234], so `want` and `got` should
        # be folded to hex-escaped ASCII string to compare.
        got = self._toAscii(got)
        want = self._toAscii(want)

        # Handle the common case first, for efficiency:
        # if they're string-identical, always return true.
        if got == want:
            return True

You can observe that:

>>> f = lambda s: str(s.encode('ASCII', 'backslashreplace'), "ASCII")
>>> f('\u1234') == f('\\u1234')
True

So because of this, your second succeeds. If you want explicit \ in the output, you should add an r-prefix for the docstring. Therefore, I think this is the expected behaviour, albeit a bit surprising (I'm convinced to closing this as wontfix).

@picnixz picnixz closed this as not planned Won't fix, can't repro, duplicate, stale Jan 24, 2025
@picnixz
Copy link
Member

picnixz commented Jan 24, 2025

Note that there is actually a fast path. However, we could perhaps add a new option to not have that strict path and compare outputs without hex-folding, though I'm not sure it's worth it. In general, I would recommend using an r-prefix or write an explicit unittest instead (or even an explicit test function that you would also run).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir
Projects
None yet
Development

No branches or pull requests

2 participants