Releases: gomlx/gopjrt
Releases · gomlx/gopjrt
Benchmarks; Direct access to PJRT Buffers (when using CPU); Several speed ups.
- Added
install_linux_amd64_amazonlinux.sh
and pre-built libraries for amazonlinux (built using old glibc support). - Fixed installation scripts: s/sudo/$_SUDO. Also made them more verbose.
- Removed dependency on
xargs
in installation script for Linux. - Improved documentation on Nvidia GPU card detection, and error message if not found.
- Updated GitHub action (
go.yaml
) to only change the README.md with the result of the change, if pushing to the
main
branch. - Added
prjt.arena
to avoid costly allocations for CGO calls, and merged some of CGO calls for general speed-ups.
The following functions had > 50% improvements on their fixed-cost (measured on transfers with 1 value, and minimal programs)
execution time (not the variable part):Buffer.ToHost()
Client.BufferFromHost()
LoadedExecutable.Execute()
- Added
BufferToHost
andBufferFromHost
benchmarks. - Added support for environment variable
XLA_DEBUG_OPTIONS
: if set, it is parsed as aDebugOptions
proto that
is passed to the JIT-compilation of a computation graph. LoadedExecutable.Execute()
now waits for the end of the execution (by setting
PJRT_LoadedExecutable_Execute_Args.device_complete_events
).
Previous behavior lead to odd behavior and was undefined (not documented).- Package
dtypes
:- Added tests;
- Added
SizeForDimensions()
to be used for dtypes that uses fractions of bytes (like 4 bits).
- Added
Client.NewSharedBuffer
(and the lower levelclient.CreateViewOfDeviceBuffer()
) to create buffers with shared
memory with the host, for faster input.- Added
AlignedAlloc
andAlignedFree
required byclient.CreateViewOfDeviceBuffer
.
- Added
- Added
Buffer.Data
for direct access to a buffer's data. Undocumented in PJRT, and likely only works on CPU. - Fixed coverage script.
v0.4.9 Optional (static) pre-linking CPU PJRT; MacOS support with static linked PJRT
- Optional preloading CPU PJRT plugin:
github.com/gomlx/gopjrt/pjrt/cpu/static
that statically links the PJRT CPU plugin: easy to deploy binary.
It includes the corresponding C BUILD rule to build the static library (libpjrt_c_api_cpu_static.a
)github.com/gomlx/gopjrt/pjrt/cpu/dynamic
that dynamically links (and preloads) the PJRT CPU plugin.
pjrt_c_api_cpu.so
now compiled directly fromgopjrt
, and doesn't require cloningxla
separately. It will
be distributed in the sametar.gz
file.- Added MacOS support by statically linking the CPU PJRT plugin.
v0.4.8 Minor C++ code update
- Replaced C++
xla::StatusOr
byabsl::StatusOr
(the former was already an alias to the later) -- required for upcoming XLA change.
v0.4.7 Updated XLA dependencies, PJRT v0.57
- Sync'ed with updated proto definitions from OpenXLA/XLA project.
- TestEndToEnd: added
klog
flags; list devices before trying to compile. - Renamed deprecated xla::Status to absl::Status.
- Update to XLA and PJRT v0.57
- Updated XLA dependency.
- Updated PJRT CPU plugin.
- Updated
pjrt_c_api.h
: copying over from XLA source is now part of the generate program. - Note: PJRT v0.56 was broken for a few days, and the version was skipped.
(breakage here openxla/xla@590b36f#r149134910)
- Mac version broken :( : Following up on openxla/xla#19152. Since it's
outside our control, not blocking the release here.
v0.4.6 Minor fixes and improvements
- Fix to installation script: missing
sudo
to remove old library, not observing the GOPJRT_NOSUDO request. - Fixed github test action
go.yaml
. - Explicitly set the random algorithm to Philox when using RngBitGenerator. Also improved documentation and added
check on the validity of the random state shape. - Added
dtype.DType.IsUnsigned()
v0.4.5 Clean up multi-platform paths and code
- Fixes to experimental/GPU MacOS (darwin) on arm64.
- XlaBuilder works on Darwin/X86_64 (darwin_amd64) but OpenXLA/XLA PJRT CPU does not work (yet?).
- Normalized names of prebuilt-binaries.
- Test
TestEndToEnd
only test first device by default, because CPU PJRT seems to falsely advertise more than one addressable device.- Added
--alldevices
to loop over all devices during the test.
- Added
v0.4.4 Buffers and literals small updates
- Package
pjrt
:- Fixed some API documentation issues with Buffer transfers from host. Added tests.
- Package
xlabuilder
:- Fixed
NewArrayLiteral[T dtypes.Supported](flat []T, dimensions ...int)
to create a scalar if no dimensions are passed.
- Fixed
v0.4.3 Static XlaBuilder library; Experimental Apple/Metal support
- GoMLX XlaBuilder C library is now linked as a static library (
.a
instead of.so
).- Using new Bazel 7.4.0, with support for
cc_static_library
.
- Using new Bazel 7.4.0, with support for
- EXPERIMENTAL support for Apple/Metal (
darwin-arm64
) support:- Added C-wrapper compilation for darwin-arm64.
- Added converter from HLO to StableHLO -- it greatly increases the size of libgomlx_builder.a, since it has to
include the whole LLVM :(- Enables Apple Metal PJRT -- it only supports StableHLO/MLIR programs (and not the simpler HLO).
- Only enabled for Darwin
- Updated XLA dependency; Updated PJRT for linux/amd64 CPU.
- Added
Literal.Data()
v0.4.2 New operations
- Added
IsFinite
andPopulationCount
operations. - Updated protos.
v0.4.1 TPUs, updated XlaBuilder, improved installation
- Added memory layout information in buffer-to-host transfers: required for TPU.
- Included C error message when reporting PJRT plugin failures.
- Added GOPJRT_NOSUDO and GOPJRT_INSTALL_DIR to control
cmd/install.sh
andcmd/install_cuda.sh
. - Improved installation instructions to install directly from Github using
curl
, without the need to clone the repository. - Updated
XlaBuilder
C-wrapper to refactorings withing github.com/openxla/xla.