Skip to content

Tensile-4.27.0 for ROCm 4.2.0

Compare
Choose a tag to compare
@saadrahim saadrahim released this 10 May 23:17
3438af2

Added

  • Benchmarking and library support for CU efficiency vs. overall speed
  • support general batch GEMM
  • Support offset for each input/output buffer in Tensile
  • support support ldc != ldd for all GEMM kernel

Optimizations

  • Refactor ConvolutionVsContraction

Fixed

  • Fixed MasterSolutionLibrary having duplicated hardware rows
  • channel stride is incorrect when converting conv problem into tensor contraction problem]