Running on AMD #156

csccva · 2024-02-19T14:22:27Z

Hello,

Are you aware of any attempts to hipify this library on AMD GPUS using HIP?

Cristian

ahbarnett · 2024-02-19T14:24:34Z

I am not, although others have asked. Have a look at Discussions over at FINUFFT GitHub. There are others that may want to help. Best, Alex

…

On Mon, Feb 19, 2024 at 9:22 AM Cristian-Vasile Achim < ***@***.***> wrote: Hello, Are you aware of any attempts to hipify this library on AMD GPUS using HIP? Cristian — Reply to this email directly, view it on GitHub <#156>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACNZRSXGN3KNPCN6U7MQHT3YUNN25AVCNFSM6AAAAABDPRV7CGVHI2DSMVQWIX3LMV43ASLTON2WKOZSGE2DENBYGY4TSNQ> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

-- *-------------------------------------------------------------------~^`^~._.~' |\ Alex Barnett Center for Computational Mathematics, Flatiron Institute | \ http://users.flatironinstitute.org/~ahb 646-876-5942

csccva · 2024-02-19T14:28:34Z

I am not, although others have asked. Have a look at Discussions over at FINUFFT GitHub. There are others that may want to help. Best, Alex
…
On Mon, Feb 19, 2024 at 9:22 AM Cristian-Vasile Achim < @.> wrote: Hello, Are you aware of any attempts to hipify this library on AMD GPUS using HIP? Cristian — Reply to this email directly, view it on GitHub <#156>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACNZRSXGN3KNPCN6U7MQHT3YUNN25AVCNFSM6AAAAABDPRV7CGVHI2DSMVQWIX3LMV43ASLTON2WKOZSGE2DENBYGY4TSNQ . You are receiving this because you are subscribed to this thread.Message ID: @.>
-- *-------------------------------------------------------------------~~^`^~~._.~' |\ Alex Barnett Center for Computational Mathematics, Flatiron Institute | \ http://users.flatironinstitute.org/~ahb 646-876-5942

Thank you for your reply. I have to use this on AMD and my only is via hipify. SInce I never used yet the library I need to ask. Is there anything in the library is very specific to CUDA that a port will require a massive rewriting?

Cristian

ahbarnett · 2024-02-19T16:52:04Z

Melody's code uses shared memory (49kB per thread block), although that only affects type-1 transforms, and the speed of global mem seems to be catching up anyway in my A6000 tests. @blackwer may have opinions about porting, who has worked on the cuda code most recently. On Mon, Feb 19, 2024 at 9:28 AM Cristian-Vasile Achim < ***@***.***> wrote:

…

I am not, although others have asked. Have a look at Discussions over at FINUFFT GitHub. There are others that may want to help. Best, Alex … <#m_-4376589966925262107_> On Mon, Feb 19, 2024 at 9:22 AM Cristian-Vasile Achim < *@*.*> wrote: Hello, Are you aware of any attempts to hipify this library on AMD GPUS using HIP? Cristian — Reply to this email directly, view it on GitHub <#156 <#156>>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACNZRSXGN3KNPCN6U7MQHT3YUNN25AVCNFSM6AAAAABDPRV7CGVHI2DSMVQWIX3LMV43ASLTON2WKOZSGE2DENBYGY4TSNQ <https://github.com/notifications/unsubscribe-auth/ACNZRSXGN3KNPCN6U7MQHT3YUNN25AVCNFSM6AAAAABDPRV7CGVHI2DSMVQWIX3LMV43ASLTON2WKOZSGE2DENBYGY4TSNQ> . You are receiving this because you are subscribed to this thread.Message ID: @.*> -- *-------------------------------------------------------------------^`^._.~' |\ Alex Barnett Center for Computational Mathematics, Flatiron Institute | \ http://users.flatironinstitute.org/~ahb 646-876-5942 Thank you for your reply. I have to use this on AMD and my only is via hipify. SInce I never used yet the library I need to ask. Is there anything in the library is very specific to CUDA that a port will require a massive rewriting? Cristian — Reply to this email directly, view it on GitHub <#156 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACNZRSSZB72Q74YCPOJJLTDYUNOR5AVCNFSM6AAAAABDPRV7CGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNJSGU3DIOJSGU> . You are receiving this because you commented.Message ID: ***@***.***>

-- *-------------------------------------------------------------------~^`^~._.~' |\ Alex Barnett Center for Computational Mathematics, Flatiron Institute | \ http://users.flatironinstitute.org/~ahb 646-876-5942

blackwer · 2024-02-20T13:41:26Z

Melody's code uses shared memory (49kB per thread block),

We should revisit this. This is an old constraint that isn't really present on modern hardware where you can change these limits on the fly. It's been noted before we just haven't really done anything about it.

@blackwer may have opinions about porting, who has worked on the cuda code
most recently.

Porting is straightforward and requires relatively few modifications afaik. It's definitely on my list of "fun" side-projects to tackle. I could probably do it in an afternoon or two once I got a sense for the tooling (famous last words). That said I'm busy with some other projects right now so I don't really want to work on this immediately.

@csccva If you'd like to contribute... #116 would be a good starting point for inspiration. The code there isn't usable directly since the repo has diverged so significantly, but I doubt the requirements for the port have changed much.

Notable differences with the current code that might require some thinking:

considerably less reliance on macros than the version linked
cmake, rather than makefile
python code is more generic than prior -- though i think will probably "just work" without intervention (also famous last words)

csccva · 2024-02-20T14:49:48Z

Thank you for reply. I can infer that there are no special cuda features used. I can not dig now the amount of shared memory available per CU (SMP). This document suggests it is 64kb. So quite ok. I can try to give it a try with hipify. We had quite good experience with this and we are trying to see as well some header only porting approach (https://github.com/cschpc/hop)

Last (stupid) question. I got to this project recommended buy someone who used it. Is it possible to use it in C codes or only python?

Cristian

blackwer · 2024-02-20T14:58:05Z

I can not dig now the amount of shared memory available per CU (SMP).

Don't worry about this. I'll deal with this later

I can try to give it a try with hipify. We had quite good experience with this and we are trying to see as well some header only porting approach

Great! Please feel free to submit a PR

Last (stupid) question. I got to this project recommended buy someone who used it. Is it possible to use it in C codes or only python?

We provide C/C++ bindings. See: https://github.com/flatironinstitute/finufft/tree/master/examples/cuda

csccva · 2024-02-20T15:02:40Z

Thank you for your replies. I will let you know how it goes.

Cristian

blackwer mentioned this issue Feb 20, 2024

Revisit shared memory limitations #157

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running on AMD #156

Running on AMD #156

csccva commented Feb 19, 2024

ahbarnett commented Feb 19, 2024 via email

csccva commented Feb 19, 2024

ahbarnett commented Feb 19, 2024 via email

blackwer commented Feb 20, 2024

csccva commented Feb 20, 2024 •

edited

Loading

blackwer commented Feb 20, 2024

csccva commented Feb 20, 2024

Running on AMD #156

Running on AMD #156

Comments

csccva commented Feb 19, 2024

ahbarnett commented Feb 19, 2024 via email

csccva commented Feb 19, 2024

ahbarnett commented Feb 19, 2024 via email

blackwer commented Feb 20, 2024

csccva commented Feb 20, 2024 • edited Loading

blackwer commented Feb 20, 2024

csccva commented Feb 20, 2024

csccva commented Feb 20, 2024 •

edited

Loading