Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
This closes #283 The `XlaSend` call requires `envpool` to make a copy of the `action` to prevent `action` from being recycled by the XLA runtime before `envpool` finishes using it. Originally, I used `cudaMemcpy` to make sure the copy was finished synchronously. However, it seems to cause a problem with issue #283. Here, I replace the original `cudaMemcpy` call with the async version, and an explicit `streamSynchronize`. It is not clear how `cudaMemcpy` in the default stream in a custom call interacts with the stream managed by pjrt. However, from the code [here](https://github.com/tensorflow/tensorflow/blob/0d2d79e84c9bdf71c737ad17a7b1dc04d9efc24f/tensorflow/compiler/xla/g3doc/custom_call.md), I can hypothesize that an explicit stream synchronization in the custom call is safe.
- Loading branch information