WIP: [GPU] Use FP32 accumulator for QK multiplication for 2nd+ token calculation in PagedAttention #18555
Job | Run time |
---|---|
23s | |
53s | |
12m 14s | |
0s | |
0s | |
0s | |
0s | |
0s | |
0s | |
0s | |
0s | |
0s | |
0s | |
1m 57s | |
52s | |
3m 13s | |
1m 31s | |
2m 41s | |
0s | |
0s | |
0s | |
0s | |
0s | |
0s | |
0s | |
0s | |
0s | |
1s | |
23m 45s |