Skip to content

Releases: gomlx/gopjrt

Benchmarks; Direct access to PJRT Buffers (when using CPU); Several speed ups.

19 Dec 10:00
3e4e41d
Compare
Choose a tag to compare
  • Added install_linux_amd64_amazonlinux.sh and pre-built libraries for amazonlinux (built using old glibc support).
  • Fixed installation scripts: s/sudo/$_SUDO. Also made them more verbose.
  • Removed dependency on xargs in installation script for Linux.
  • Improved documentation on Nvidia GPU card detection, and error message if not found.
  • Updated GitHub action (go.yaml) to only change the README.md with the result of the change, if pushing to the
    main branch.
  • Added prjt.arena to avoid costly allocations for CGO calls, and merged some of CGO calls for general speed-ups.
    The following functions had > 50% improvements on their fixed-cost (measured on transfers with 1 value, and minimal programs)
    execution time (not the variable part):
    • Buffer.ToHost()
    • Client.BufferFromHost()
    • LoadedExecutable.Execute()
  • Added BufferToHost and BufferFromHost benchmarks.
  • Added support for environment variable XLA_DEBUG_OPTIONS: if set, it is parsed as a DebugOptions proto that
    is passed to the JIT-compilation of a computation graph.
  • LoadedExecutable.Execute() now waits for the end of the execution (by setting
    PJRT_LoadedExecutable_Execute_Args.device_complete_events).
    Previous behavior lead to odd behavior and was undefined (not documented).
  • Package dtypes:
    • Added tests;
    • Added SizeForDimensions() to be used for dtypes that uses fractions of bytes (like 4 bits).
  • Added Client.NewSharedBuffer (and the lower level client.CreateViewOfDeviceBuffer()) to create buffers with shared
    memory with the host, for faster input.
    • Added AlignedAlloc and AlignedFree required by client.CreateViewOfDeviceBuffer.
  • Added Buffer.Data for direct access to a buffer's data. Undocumented in PJRT, and likely only works on CPU.
  • Fixed coverage script.

v0.4.9 Optional (static) pre-linking CPU PJRT; MacOS support with static linked PJRT

25 Nov 07:57
Compare
Choose a tag to compare
  • Optional preloading CPU PJRT plugin:
    • github.com/gomlx/gopjrt/pjrt/cpu/static that statically links the PJRT CPU plugin: easy to deploy binary.
      It includes the corresponding C BUILD rule to build the static library (libpjrt_c_api_cpu_static.a)
    • github.com/gomlx/gopjrt/pjrt/cpu/dynamic that dynamically links (and preloads) the PJRT CPU plugin.
  • pjrt_c_api_cpu.so now compiled directly from gopjrt, and doesn't require cloning xla separately. It will
    be distributed in the same tar.gz file.
  • Added MacOS support by statically linking the CPU PJRT plugin.

v0.4.8 Minor C++ code update

19 Nov 10:33
Compare
Choose a tag to compare
  • Replaced C++ xla::StatusOr by absl::StatusOr (the former was already an alias to the later) -- required for upcoming XLA change.

v0.4.7 Updated XLA dependencies, PJRT v0.57

17 Nov 18:12
56ec5a2
Compare
Choose a tag to compare
  • Sync'ed with updated proto definitions from OpenXLA/XLA project.
  • TestEndToEnd: added klog flags; list devices before trying to compile.
  • Renamed deprecated xla::Status to absl::Status.
  • Update to XLA and PJRT v0.57
    • Updated XLA dependency.
    • Updated PJRT CPU plugin.
    • Updated pjrt_c_api.h: copying over from XLA source is now part of the generate program.
    • Note: PJRT v0.56 was broken for a few days, and the version was skipped.
      (breakage here openxla/xla@590b36f#r149134910)
  • Mac version broken :( : Following up on openxla/xla#19152. Since it's
    outside our control, not blocking the release here.

v0.4.6 Minor fixes and improvements

12 Nov 09:05
Compare
Choose a tag to compare
  • Fix to installation script: missing sudo to remove old library, not observing the GOPJRT_NOSUDO request.
  • Fixed github test action go.yaml.
  • Explicitly set the random algorithm to Philox when using RngBitGenerator. Also improved documentation and added
    check on the validity of the random state shape.
  • Added dtype.DType.IsUnsigned()

v0.4.5 Clean up multi-platform paths and code

06 Nov 11:01
e524f15
Compare
Choose a tag to compare
  • Fixes to experimental/GPU MacOS (darwin) on arm64.
  • XlaBuilder works on Darwin/X86_64 (darwin_amd64) but OpenXLA/XLA PJRT CPU does not work (yet?).
  • Normalized names of prebuilt-binaries.
  • Test TestEndToEnd only test first device by default, because CPU PJRT seems to falsely advertise more than one addressable device.
    • Added --alldevices to loop over all devices during the test.

v0.4.4 Buffers and literals small updates

24 Oct 06:54
fe16097
Compare
Choose a tag to compare
  • Package pjrt:
    • Fixed some API documentation issues with Buffer transfers from host. Added tests.
  • Package xlabuilder:
    • Fixed NewArrayLiteral[T dtypes.Supported](flat []T, dimensions ...int) to create a scalar if no dimensions are passed.

v0.4.3 Static XlaBuilder library; Experimental Apple/Metal support

23 Oct 09:37
e1b1715
Compare
Choose a tag to compare
  • GoMLX XlaBuilder C library is now linked as a static library (.a instead of .so).
    • Using new Bazel 7.4.0, with support for cc_static_library.
  • EXPERIMENTAL support for Apple/Metal (darwin-arm64) support:
    • Added C-wrapper compilation for darwin-arm64.
    • Added converter from HLO to StableHLO -- it greatly increases the size of libgomlx_builder.a, since it has to
      include the whole LLVM :(
      • Enables Apple Metal PJRT -- it only supports StableHLO/MLIR programs (and not the simpler HLO).
      • Only enabled for Darwin
  • Updated XLA dependency; Updated PJRT for linux/amd64 CPU.
  • Added Literal.Data()

v0.4.2 New operations

03 Oct 07:18
c818cc3
Compare
Choose a tag to compare
  • Added IsFinite and PopulationCount operations.
  • Updated protos.

v0.4.1 TPUs, updated XlaBuilder, improved installation

28 Sep 09:39
8d7c8f2
Compare
Choose a tag to compare
  • Added memory layout information in buffer-to-host transfers: required for TPU.
  • Included C error message when reporting PJRT plugin failures.
  • Added GOPJRT_NOSUDO and GOPJRT_INSTALL_DIR to control cmd/install.sh and cmd/install_cuda.sh.
  • Improved installation instructions to install directly from Github using curl, without the need to clone the repository.
  • Updated XlaBuilder C-wrapper to refactorings withing github.com/openxla/xla.