Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-100239: specialize bitwise logical binary ops on ints #128927

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

iritkatriel
Copy link
Member

@iritkatriel iritkatriel commented Jan 16, 2025

This adds specialisations for bitwise |, &, ^ on non-negative ints.

I'm not adding more in the same PR so we can more easily bisect in the future if we need to.

return (is_nonnegative_compactlong(lhs) && is_nonnegative_compactlong(rhs));
}

#define NONNEGATIVE_LONGS_ACTION(NAME, OP) \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why restrict to nonnegative longs here? If we do restrict, then we can replace the calls _PyLong_CompactValue with direct access to op->long_value.ob_digit[0]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because negative ints need more work at runtime and I don't think they're common with bitwise logical ops.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The extra work is already done by _PyLong_CompactValue or am missing something? The output of ls OP rhs might not be a compact int, but there are no guards for the output type.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The extra work is already done by _PyLong_CompactValue or am missing something?

No, I think you're right. Good point.

The output of ls OP rhs might not be a compact int, but there are no guards for the output type.

For bitwise logical operators we should expect the results to have the same size as the inputs.

Copy link
Contributor

@chris-eibl chris-eibl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think non-negative should be dropped in the news entry now?

{ \
Py_ssize_t rhs_val = _PyLong_CompactValue((PyLongObject *)rhs); \
Py_ssize_t lhs_val = _PyLong_CompactValue((PyLongObject *)lhs); \
return PyLong_FromLong(lhs_val OP rhs_val); \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the conversion warnings on Windows: they stem from long being 32bit on Windows x86_64, whereas Py_ssize_t is 64bit there. @markshannon forsightfully constructed _PyLong_CompactValue() to return 64bit, so that digits in the future can grow above 30 bits.

Maybe use PyLong_FromSsize_t() here?

P.S: The only disadvantage is, that PyLong_FromSsize_t() misses the fast path using _PyLong_FromMedium() like PyLong_FromLong() (and also PyLong_FromLongLong()) does.

Is this worth an issue? Maybe then implement PyLong_FromSsize_t(), PyLong_FromLong() and PyLong_FromLongLong()) using a macro similar to PYLONG_FROM_UINT to get rid of the repetitive code?

BTW: PYLONG_FROM_UINT is missing the fast path for medium values, too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I've updated. Feel free to create an issue about the fast path.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: #129149.
This is my first issue, hopefully I didn't mess up :)

@chris-eibl
Copy link
Contributor

I think non-negative should be dropped in the news entry now?

And maybe in the title of this pull request, too?

@iritkatriel iritkatriel changed the title gh-100239: specialize bitwise logical binary ops on non-negative int gh-100239: specialize bitwise logical binary ops on ints Jan 21, 2025
Copy link
Member

@markshannon markshannon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we are seeing a high proportion of specialization failures for non-compact ints with the & operator.
I don't know why that would be. My guess is that some of the benchmarks are using ints as bit vectors and using more than one digit.

@@ -2556,6 +2609,7 @@ binary_op_extended_specialization(PyObject *lhs, PyObject *rhs, int oparg,

LOOKUP_SPEC(compactlong_float_specs, oparg);
LOOKUP_SPEC(float_compactlong_specs, oparg);
LOOKUP_SPEC(compactlongs_specs, oparg);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why three tables, rather than one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A single table would need to have a list of (guard, action) pairs for each OP. So it's a table of tables. Same thing basically.

@chris-eibl
Copy link
Contributor

It looks like we are seeing a high proportion of specialization failures for non-compact ints with the & operator. I don't know why that would be. My guess is that some of the benchmarks are using ints as bit vectors and using more than one digit.

Just a wild guess: could those be from enums, which derive from int? Especially flag enums?

@iritkatriel
Copy link
Member Author

Just a wild guess: could those be from enums, which derive from int? Especially flag enums?

I don't think so. We check for int with PyLong_CheckExact.

@eendebakpt
Copy link
Contributor

I suspect the misses might be bm_pyflate:

https://github.com/python/pyperformance/blob/1d9261a7da8fcaa642a36181db8e7c4a306a1303/pyperformance/data-files/benchmarks/bm_pyflate/run_benchmark.py#L153-L163

Other python constructions that would causes misses (but are not in pyperformance afaics) are uuid.uuid4 (and variants) and bitwise logical operations on the hash of python objects.

@iritkatriel
Copy link
Member Author

(Would be nice if the report was rendered so that it's easy to see which benchmark contributed to a stat.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants