Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[webgpu] Use subgroup for matmulnbits (#23224)
### Description This PR applies subgroup to implement matmulnbits when tile_m > 1 for intel devices. With this PR, prefill for 500 tokens prompt for phi3 becomes 3.5s from 8.5s on intel Meteor Lake.
- Loading branch information