-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More efficient hvcat of scalars and arrays of numbers #39729
base: master
Are you sure you want to change the base?
Conversation
Oops, yep. completely broken. Should have guessed. |
This is what I achieved by consolidating with hvncat (once it's ready). Note it integrates #39725, which is currently broken for assigning strings into a range setindex. |
db3547c
to
de41ca2
Compare
b09fd6f
to
e5b9ba7
Compare
Updated this, now that the
|
With respect to the original motivating examples:
So better. Still some more improvements to make. Not entirely sure why the middle case is relatively poor. |
This comment was marked as outdated.
This comment was marked as outdated.
e41aa39
to
f8cb0ff
Compare
4d1c1d2
to
34872cd
Compare
Current status of this PR vs. Master: (Some timing differences from previous test likely due to my new computer) This PR:
Master:
So currently, Master is actually faster or equivalent for 4 of the 6 cases now, which is great. So will need to think if this PR is truly necessary anymore, and see what can improve that last case. |
Updating based on current master: This PR:
Master:
All cases on master are all faster than they were, even in the presence of additional allocations. I'll see about whether this PR can be wrapped up one way or the other this weekend I think. |
New simple approach - just dispatching hvcat to the hvncat_shape methods. Back to being similar or faster than master. But need to verify I didn't slow any other conditions down. EDIT: updated timings after better addressing dynamic dispatch of cat_similar
|
The remaining slowdown I'm seeing relates to the In 1.10, On master, the fallback abstractarrays.jl Possibly related to #49322 ? Looks like a regression in
But we can probably ignore that for the purposes of this PR, which is about hvcat. |
I wonder if this resolves the following issue: using LinearAlgebra
function measure()
data = rand(3,3)
mat = [
data ones(size(data, 1))
ones(size(data, 2))' 0
]
det(mat)
end
@code_warntype measure() # -> can't infer type of `mat` There has been some discussion at https://julialang.zulipchat.com/#narrow/stream/225542-helpdesk/topic/Matrix.20comprehension.20failed.20type.20inference. |
Seems like it does!
|
Alright, I think it's time! Can we consider this for merging? Universally faster Comparing against current master:
And current PR:
|
The one thing left that it would be good to see is performance on a literal 100x100 matrix (or similar). It's fairly common for large arrays to be written out in source code and it's important that the compiler not choke on them. Other than that, this PR looks awesome! Deleting code and getting speedup is a great combination. |
Love the 4-year persistence on this!! 😄 |
I did a 100x100 array of vs master: 8.108 μs (4 allocations: 79.01 KiB) Even though they seem to be a bit faster on master, I also got a bunch of warnings I've never seen before about inference being slow that I don't see on the PR, when I threw the string in:
Side note - don't try to paste a literal 100x100 matrix into PowerShell, it goes very, very slowly. (WSL is fine) |
Looking at the profiles, I think the performance decrease for the mixed-type example is largely due to a little more overhead, and the extensive dimensionality checking (because the concatenated elements could be numbers or arrays). The current code concatenates each row and then concatenates the rows. This PR checks dimensions, creates the destination array, and then populates it one element at a time. I suspect the master performance would get worse if more rows were mixed types, since all the intermediate arrays would have to have an The differences in timing might not matter that much, I think - might not be repeatable? And the first example with all 1s actually doesn't touch my code. |
First attempt to address #39713
Original:
New:
Others unchanged, as expected.
Though if different types of numbers are mixed, it still takes the longer path. I tried expanding the definition but there's some weird stuff going on that increases allocations in the other situations I posted in that issue.Works for any Number element type.Fixes #39713