Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

@turbo not doing LoopVectorization on Float64 #541

Open
Lincoln-Hannah opened this issue Jan 6, 2025 · 9 comments
Open

@turbo not doing LoopVectorization on Float64 #541

Lincoln-Hannah opened this issue Jan 6, 2025 · 9 comments

Comments

@Lincoln-Hannah
Copy link

Seems to fail for Float64 and AbstractFloat but works for real and more general types.
Also, should it be able to work on structs as in the X1 example at the bottom.

X    = randn(50_000_000)

float64(   x::Float64       ) = sin(x)
abs_float( x::AbstractFloat ) = sin(x)
real(      x::Real          ) = sin(x)
any(       x::Any           ) = sin(x)

@turbo float64.(X)      # LoopVectorization.check_args` on your inputs failed; running fallback `@inbounds @fastmath` loop instead
@turbo abs_float.(X)    # LoopVectorization.check_args` on your inputs failed; running fallback `@inbounds @fastmath` loop instead
@turbo real.(X)         # works
@turbo any.(X)          # works

struct X1; x::Float64 end
fx1((;x)::X1) = sin(x)
x1 = repeat([X1(1.0)],1000_000)

@turbo fx1.( x1 )       # LoopVectorization.check_args` on your inputs failed; running fallback `@inbounds @fastmath` loop instead
@chriselrod
Copy link
Member

It works by passing AbstractSIMD objects to your functions.
If your signature only accepts ::Float64 or ::AbstractFloat, this will throw a MethodError. Hence, check_args fails.

This is a limitation of how the package is implemented.

@Lincoln-Hannah
Copy link
Author

Lincoln-Hannah commented Jan 6, 2025

Is there a way to explicitly define a function for a Float64 SIMD object. I tried the below but it doesn't work.

f( x ::LoopVectorization.VectorizationBase.AbstractSIMD{8, Float64} ) = sin(x)
vmap( f, X )

I'd like to run vmap on structs as in the example.

Since Julia is so much about structs and performance I would have thought this would be supported.

@chriselrod
Copy link
Member

Ah, yeah, I was lazy in the implementation and just check if it works for Vec{2,Int}, not the type it will actually be used with.

promoted_op = Base.promote_op(f, ntuple(RetVec2Int(), Val(NARGS))...)

Most of the functions are either fully generic, or only work with integers, so AFAIK no one noticed.
This does, however, prevent you from being able to add restrictions like

@inline f(x::LoopVectorization.VectorizationBase.AbstractSIMD{N, Float64}) where {N} = sin(x)

Any particular motivation for not supporting Int? You may as well add a Int implementation that calls f(float(x)), or something like that.

Alternatively, make a PR that runs a naive type inference (avoiding the base type inference, which we can do due to not being generic) on LV's internal IR to infer the correct types, and use this in check_args.
This seemed like more effort than it is worth to me.

I have not had much time to work on performance lately, but my future work has all moved to LoopModels (which will not have any of these problems, due to working at a lower level).

@Lincoln-Hannah
Copy link
Author

I was trying to understand the basics. Ultimately I'd like to use vmap on StructArrays or Vectors of Structs like I can do with GPU.
In the below example, running on AMDGPU gives 100x speedup but vmap doesn't give any. I assume its not using SIMD

struct X10
    a::Float64
    b::Float64
end

f((;a,b)::X10) = 2sin(a)cos(a) + 3sin(b)cos(b) + sin(a*b)cos(a/b)

A   = randn(10^7)
B   = randn(10^7)
rA  = ROCArray(A)
rB  = ROCArray(B)

x10  = StructArray{X10}(a=A,b=B)
rx10 = StructArray{X10}(a=rA,b=rB);

@time f.(x10)           # .4   seconds
@time vmap( f, x10 )    # .4   seconds 
@time f.(rx10)          # .004 seconds

@Lincoln-Hannah
Copy link
Author

More general use case. A portfolio of financial products. Thousands of trades and dozens of product types. Each product type has a Struct and a Value function which takes the struct as an input. All the trades of a product type could be stored as a StructArray. For each product type I'd like to run the valuations vectorized. GPU is fast but has other disadvantages. I was hoping vmap or @tturbo might be a good substitute.

@chriselrod
Copy link
Member

You could get it to work with some manual effort.

Or, you can make a PR to add explicit StructArrays support by wrapping vmap.
Basically, you'd decompose the StructArray into a set of vectors, and call a wrapper function that rebuilds a struct, but with SIMD types as its fields.
Maybe it'd have to be a NamedTuple, because it can't be a X10 that only accepts scalar fields.
This would still require a reasonably generic function, i.e. it'd have to accept the NamedTuple in place of an X10, or, if X10 is parameterized to accept scalars or SIMD types, it'd have to accept both parameterizations (and the wrapping function would have to do the right thing).

Note that I no longer use Julia, nor do I maintain LoopVectorization or the JuliaSIMD ecosystem, so I'm unlikely to make any of these changes myself. I can, however, answer questions or describe approaches, so that those who do have time and stand to benefit can take over.

@Lincoln-Hannah
Copy link
Author

Any help is appreciated.

If I understand correctly you're saying any solution would involve re-writing the structs and function inputs.
Funny since it works seamlessly with GPU Vectors. I guess the SIMD hardware is more restrictive.
Sorry you're leaving Julia.

@chriselrod
Copy link
Member

chriselrod commented Jan 8, 2025

It's not about the hardware, but that LoopVectorization.jl's implementation is bad/limited.

This is why LoopModels would fix this.

@Lincoln-Hannah
Copy link
Author

I hope they build it (don't think I'm smart enough to help there) :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants