Halide v18.0.0
Changes Of Note since Halide 17
- Ring-buffering now supported in schedules (
Func::ring_buffer()
). This is distinct from fold_storage in that it folds across time (the loop variables) rather than folding across space (the pure vars of the Func). - Fixed a longstanding bug in
lossless_cast()
- Lots of fixes for Vulkan backend
- OpenGLCompute is no longer supported
- Added support for ARM SVE2
- Added (basic) support for Intel APX and AVX10
- Added support for Hexagon HVX v68
- Added support for numpy's
.npy
format to.debug_to_file()
and the code in halide_image_io.h - Python bindings now support bfloat and int64 properly
- Hacky code that auto-named Funcs, Vars etc via DWARF introspection was removed
- The profiler was revamped to behave better when multiple Halide pipelines are in flight at the same time.
- Numerous lowering passes were sped up, resulting in faster compilation for large pipelines. However, time spent in LLVM is still the long pole for most pipelines.
- Fixed-point instruction selection has been improved via tracking constant integer bounds of expressions.
- Adds feature detection for ARM CPUs to the runtime library and to the host target feature computation. Supports Windows, macOS,
Linux, iOS, and Android.
Deprecations / Removals
tuple_select()
has been removed in favor of overloads toselect()
.- Various fixed-point operators have been removed from the
Halide::Internal
namespace and are now in the publicHalide
namespace.
What's Changed
- Detect ARM CPU features for host target and in runtime (#8298)
- Scheduling directive to support ring buffering by @vksnk in #7967
- Don't add ring_buffer semaphores if the function is not scheduled as async by @vksnk in #8015
- Quick fix for crash that is occurring in SVE2 tests. by @zvookin in #8020
- Don't use variable-length arrays by @steven-johnson in #8021
- Set warnings on tests as well as src by @steven-johnson in #8022
- Stronger chain detection in LoopCarry pass by @vksnk in #8016
- adds mappings for f16 variants of halide float math by @mikewoodworth in #8029
- Require LLVM >= 16.0 by @steven-johnson in #8003
- Add test for #8029 by @steven-johnson in #8032
- Tweak the Printer code in runtime for smaller code by @steven-johnson in #8023
- Fix bounds_of_nested_lanes by @abadams in #8039
- Track whether or not let expressions failed to solve in solver by @abadams in #7982
- Fix type error in VectorizeLoops by @abadams in #8055
- Update makefile to use test/common/terminate_handler.cpp by @abadams in #8066
- add unsafe_promise_clamped by @wraith1995 in #8071
- Don't require Halide_WebGPU when using wasm (#8063) by @steven-johnson in #8065
- Outsmart the LLVM optimizer by @steven-johnson in #8073
- Add hexagon_benchmarks app for CMake builds by @prasmish in #8069
- Fix bool conversion bug in Vulkan code generator by @derek-gerstmann in #8067
- Better validation of gpu schedules by @abadams in #8068
- Add an easy way to print vectors in debug output. by @zvookin in #8072
- [WebGPU] Update to latest native headers by @jrprice in #8081
- Remove OpenGLCompute by @steven-johnson in #8077
- Add checks to prevent people from using negative split factors by @abadams in #8076
- Fix rfactor adding too many pure loops by @abadams in #8086
- Forward the partition methods from generator outputs by @abadams in #8090
- Parallelize some tests by @abadams in #8078
- Allow disabling of mutlithreading in simd op check by @steven-johnson in #8096
- clang does not support
_Float16
when targeting i386 by @LebedevRI in #8085 - tests: correctness/float16_t: mark
__extendhfsf2
with default visibility by @LebedevRI in #8084 - Fix reduce_expr_modulo of vector in Solve.cpp by @abadams in #8089
- [Vulkan] Region allocator fixes for memory requirements and allocations by @derek-gerstmann in #8087
- Ensure string(REPLACE) is called with the right number of arguments by @alexreinking in #8097
- Strip asserts right at the end of lowering by @abadams in #8094
- Fix clang-tidy error in runtime.printer.h (parameter shadows member) by @steven-johnson in #8074
- Fix an issue where the Halide compiler hits an internal error for bool types in widening intrinsics. by @zvookin in #8099
- Small Tutorial Fix by @2022tgoel in #8111
- Optionally print the time taken by each lowering pass by @abadams in #8116
- Do less redundant work in UnpackBuffers by @abadams in #8104
- Avoid redundant scope lookups by @abadams in #8103
- Add Intel APX and AVX10 target flags and LLVM attribute setting. by @zvookin in #8052
- Use a caching version of stmt_uses_vars in TightenProducerConsumer nodes by @abadams in #8102
- Fix hoist_storage not handling condition correctly. by @abadams in #8123
- Rewrite the skip stages lowering pass by @abadams in #8115
- Remove two dead vars from the Makefile by @abadams in #8125
- Add support for setting the default allocator and deallocator functions in Halide::Runtime::Buffer. by @mcourteaux in #8132
- Make realization order invariant to unique_name suffixes by @abadams in #8124
- Make gpu thread and block for loop names opaque by @abadams in #8133
- Add class template type deduction guides to avoid CTAD warning. by @zvookin in #8135
- [vulkan] Add conform API methods to memory allocator to fix block allocations by @derek-gerstmann in #8130
- Add sobel in hexagon benchmarks app for CMake builds by @prasmish in #8127
- Handle loads of broadcasts in FlattenNestedRamps by @abadams in #8139
- Use python itself to get the extension suffix, not python-config by @abadams in #8148
- Rewrite the pass that adds mutexes for atomic nodes by @abadams in #8105
- Feature: mark a Func as no_profiling, to prevent injection of profiling. (2nd implementation) by @mcourteaux in #8143
- Bound allocation extents for hoist_storage using loop variables one-by-one by @vksnk in #8154
- Support for ARM SVE2. by @zvookin in #8051
- Fix two compute_with bugs. by @abadams in #8152
- Python bindings:
add_python_test()
: do setHL_JIT_TARGET
too by @LebedevRI in #8156 - fix ub in lower rounding shift right by @abadams in #8173
- Add some missing _Float16 support by @steven-johnson in #8174
- Add conversion code for Float16 that was missed in #8174 by @steven-johnson in #8178
- Tighten bounds of abs() by @rootjalex in #8168
- Clarify the meaning of Shuffle::is_broadcast() by @abadams in #8158
- Add .npy support to halide_image_io by @steven-johnson in #8175
- Update Hexagon Install Instructions by @FabianSchuetze in #8182
- Add .npy support to debug_to_file() by @steven-johnson in #8177
- Don't print on parallel task entry/exit with -debug flag by @abadams in #8185
- Fix corner case in if_then_else simplification by @abadams in #8189
- Rewrite IREquality to use a more compact stack instead of deep recursion by @abadams in #8198
- [HEXAGON] Keep support for hexagon_remote/Makefile by @aankit-quic in #8186
- Faster substitute_facts by @abadams in #8200
- Make Interval::is_single_point check for deep equality by @abadams in #8202
- Refactor ConstantInterval by @abadams in #8179
- Faster vars used tracking in simplify let visitor by @abadams in #8205
- More aggressively unify duplicate lets by @abadams in #8204
- Update debug_to_file API to remove type_code by @steven-johnson in #8183
- [x86 & HVX & WASM] Use bounds inference for saturating_narrow instruction selection by @rootjalex in #7805
- Insert apparently-missing
break;
in IREquality.cpp by @steven-johnson in #8211 - Fix Reinterpret cmp in IREquality by @rootjalex in #8217
- Fix give-up case in ModulusRemainder by @abadams in #8221
- Fix for top-of-tree LLVM by @steven-johnson in #8223
- Add some EVAL_IN_LAMBDAs to Simplify_Sub.cpp by @abadams in #8230
- Fix saturating add matching in associativity checking by @abadams in #8220
- Add HVX_v68 target to support Hexagon HVX v68. by @wangcheng22 in #8232
- Mark host_dirty() and device_dirty() with no_discard. by @mcourteaux in #8248
- Rework the simplifier to use ConstantInterval for bounds by @abadams in #8222
- Remove max size assert from Anderson2021 by @jansel in #8253
- Expose BFloat in Python bindings by @jansel in #8255
- Fix Metal handling for float16 literals by @shoaibkamil in #8260
- Python binding support for int64 literals by @jansel in #8254
- Report useful error to user if the promise_clamp all fails to losslessly cast. by @mcourteaux in #8238
- It's generally a bad idea for simplifier rules to multiply constants by @abadams in #8234
- [vulkan] Fix Vulkan SIMT mappings for GPU loop vars. by @derek-gerstmann in #8259
- Stop region costs from complaining about new intrinsics by @abadams in #8262
- No longer silently hide errors in Metal completion handlers (alternative approach) by @shoaibkamil in #8240
- Use upstream interface for consuming SPIR-V by @alexreinking in #8265
- Fix OpenCL positive and negative INF constants. by @alexreinking in #8266
- scoped_truth for the loop variable being always less than the loop extent. by @mcourteaux in #8306
- Fix incorrect type in emulation of float16 is_inf/nan by @abadams in #8310
- Don't try to codegen predicated atomic stores by @abadams in #8285
- Add ability to pass explicit RDom to Function::define_update by @abadams in #8284
- [vulkan] Dynamically load Vulkan loader library. Avoid Validation Layer crash on exit. by @derek-gerstmann in #8289
- Remove Introspection by @steven-johnson in #8273
- Per-pipeline-invocation profiling by @abadams in #8153
- Fix device slices for Buffer with fixed dimensionality in template. by @mcourteaux in #8313
- Remove deprecated operators by @steven-johnson in #8321
- Provide a minimum OS version for MachO objects by @alexreinking in #8323
- Fix horrifying bug in lossless_cast of a subtract by @abadams in #8155
New Contributors
- @tylerhou made their first contribution in #8013
- @wraith1995 made their first contribution in #8071
- @prasmish made their first contribution in #8069
- @2022tgoel made their first contribution in #8111
- @FabianSchuetze made their first contribution in #8182
- @FindHao made their first contribution in #8322
Full Changelog: v17.0.2...v18.0.0