Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-37848: [C++][Gandiva] Migrate LLVM JIT engine from MCJIT to ORC v2/LLJIT #39098

Merged
merged 28 commits into from
Jan 4, 2024

Conversation

niyue
Copy link
Contributor

@niyue niyue commented Dec 6, 2023

Rationale for this change

Gandiva currently employs MCJIT as its internal JIT engine. However, LLVM has introduced a newer JIT API known as ORC v2/LLJIT since LLVM 7.0, and it has several advantage over MCJIT, in particular, MCJIT is not actively maintained, and is slated for eventual deprecation and removal.

What changes are included in this PR?

  • This PR replaces the MCJIT JIT engine with the ORC v2 engine, using the LLJIT API.
  • This PR adds a new JIT linker option JITLink (https://llvm.org/docs/JITLink.html), which can be used together with LLJIT, for LLVM 14+ on Linux/macOS platform. It is turned off by default but could be turned on with environment variable GANDIVA_USE_JIT_LINK

Are these changes tested?

Yes, they are covered by existing unit tests

Are there any user-facing changes?

Copy link

github-actions bot commented Dec 6, 2023

⚠️ GitHub issue #37848 has been automatically assigned in GitHub to PR creator.


jit_builder.setJITTargetMachineBuilder(std::move(jtmb));
if (object_cache.has_value()) {
jit_builder.setCompileFunctionCreator(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the LLJIT API for customizing the object cache.

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Dec 6, 2023
if (exec_engine == nullptr) {
return Status::CodeGenError("Could not instantiate llvm::ExecutionEngine: ",
builder_error);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move the logic into GetTargetMachineBuilder and BuildJIT functions.

std::unique_ptr<llvm::TargetMachine> machine(
target->createTargetMachine(target_triple, cpu, features.getString(), {}, {}));

module->setDataLayout(machine->createDataLayout());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of the logic here is moved into GetTargetMachineBuilder function.

The setDataLayout is called in line 265 within VerifyAndLinkModule function.

@niyue niyue force-pushed the feature/orc-v2-jit branch from f06a8cf to a2cbdd5 Compare December 6, 2023 09:58
auto types = engine->types();
llvm::IRBuilder<>* builder = engine->ir_builder();
llvm::LLVMContext* context = engine->context();
std::string BuildVecAdd(Engine* gdv_engine) {
Copy link
Contributor Author

@niyue niyue Dec 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The engine param is renamed to gdv_engine since it shadows the engine member variable in this class which introduces unnecessary confusion here

/// \return arrow::Result containing the created engine
static Result<std::unique_ptr<Engine>> Make(
const std::shared_ptr<Configuration>& config, bool cached,
std::optional<std::reference_wrapper<GandivaObjectCache>> object_cache = std::nullopt);
Copy link
Contributor Author

@niyue niyue Dec 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this PR has to change the Engine::Make and LLVMGenerator::Make signatures anyway, I change the factory method to return an arrow::Result instead of arrow::Status so that it is simpler to use and the optional parameter like object_cache can be used.

@niyue niyue force-pushed the feature/orc-v2-jit branch 4 times, most recently from 82ac6a2 to 83b1980 Compare December 6, 2023 14:21
@niyue niyue requested a review from wjones127 as a code owner December 6, 2023 14:21
@@ -186,10 +186,6 @@ cdef class Projector(_Weakrefable):
self.pool = pool
return self

@property
def llvm_ir(self):
return self.projector.get().DumpIR().decode()
Copy link
Contributor Author

@niyue niyue Dec 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to remove the llvm_ir API from python binding, since the Configuration is not exposed via the make_projector and make_filter APIs, but we may consider adding it back if we can expose the Configuration as well

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we keep this? This was introduced by #6417 .

@niyue
Copy link
Contributor Author

niyue commented Dec 6, 2023

There are still two checks failing, it is likely they failed for the same reason.

JIT session error: Symbols not found: [ atexit ]
/arrow/cpp/src/gandiva/tests/binary_test.cc:66: Failure
Value of: status.ok()
  Actual: false
Expected: true
Failed to look up function: expr_0_0 error: Failed to materialize symbols: { (main, { $.codegen.__inits.0, expr_0_0, __orc_init_func.codegen }) }

The cause is the Symbols not found: [ atexit ], and the atexit should be a C function defined in C standard library, but for some reason, this symbol cannot be found in this AMD64 + Ubuntu 22.04 CI env after switching to LLJIT. Currently I have no idea why this happens and it doesn't happen to my local env, and I will have to do some more investigation to see how I can address it.

@niyue
Copy link
Contributor Author

niyue commented Dec 7, 2023

@kou @wjones127

For one of the failed CI check [1], it reported:

87/93 Test #90: gandiva-internals-test .......................***Failed 0.29 sec
Running gandiva-internals-test, redirecting output into /build/cpp/build/test-logs/gandiva-internals-test.txt (attempt 1/1)
/arrow/cpp/build-support/run-test.sh: line 88: 27381 Segmentation fault (core dumped) $TEST_EXECUTABLE "$@" > $LOGFILE.raw 2>&1

Do you have any idea how I can view the log file and the core dump file stack? Thanks.

[1] https://github.com/apache/arrow/actions/runs/7115811560/job/19372808518?pr=39098

@niyue niyue force-pushed the feature/orc-v2-jit branch 3 times, most recently from d055df3 to 994d83c Compare December 7, 2023 05:55
@kou
Copy link
Member

kou commented Dec 7, 2023

Are you working on macOS, right?
You may reproduce this on your macOS with Docker: LLVM=10 CLANG_TOOLS=10 UBUNTU=20.04 archery docker run ubuntu-cpp

(I'll try it later on my local Linux machine.)

@niyue
Copy link
Contributor Author

niyue commented Dec 7, 2023

You may reproduce this on your macOS with Docker

Thanks. I will give it a try later. I found that the crash in C++ / ARM64 Ubuntu 20.04 C++ (pull_request) is because LLVM 10 returns a symbol for a given function name successfully, however, for some reason, the function address in the symbol is 0, which causes the function pointer to be null. Previously only the returned Expected is checked and its address is not checked if successful status. Now I added an additional check for the function address, so it stops crashing, but the test case still fails in this CI env.

@niyue
Copy link
Contributor Author

niyue commented Dec 7, 2023

@kou about the Docker approach, do you have any idea why the Docker image tries to use HTTP_PROXY/HTTPS_PROXY localhost:8080? I spent some time to try to get it up and running, and found the amd64/ubuntu:20.04 base image defines such env vars, and this causes all network traffic to go through localhost:8080 but there is no such proxy running and all the package installation fail because of this. Do I need to set up some http proxy to use it? Thanks.

@kou
Copy link
Member

kou commented Dec 8, 2023

I spent some time to try to get it up and running, and found the amd64/ubuntu:20.04 base image defines such env vars, and this causes all network traffic to go through localhost:8080 but there is no such proxy running and all the package installation fail because of this.

Oh, really? How did you confirm it?
I couldn't find it:

$ docker run -it --rm amd64/ubuntu:20.04
root@6d3439e4a60d:/# env
HOSTNAME=6d3439e4a60d
PWD=/
HOME=/root
LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.zst=01;31:*.tzst=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.wim=01;31:*.swm=01;31:*.dwm=01;31:*.esd=01;31:*.jpg=01;35:*.jpeg=01;35:*.mjpg=01;35:*.mjpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.m4a=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.oga=00;36:*.opus=00;36:*.spx=00;36:*.xspf=00;36:
TERM=xterm
SHLVL=1
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
_=/usr/bin/env

@niyue
Copy link
Contributor Author

niyue commented Dec 8, 2023

Oh, really? How did you confirm it?

Please ignore my comment above about the proxy stuff. It is likely I set up such proxy long time ago system wide for Docker but forgot it completely after a long time 🤦

std::string fn_name = "codegen";
// LLVM 10 doesn't like the expr function name to be the same as the module name when
// LLJIT is used
std::string fn_name = "llvm_gen_test_add_expr";
Copy link
Contributor Author

@niyue niyue Dec 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a weird behavior of LLVM LLJIT. The LLVM module is created with name codegen, if later a function is created with the same name codegen, it can be successfully created, when looking up its function pointer, a successful symbol will be returned, however, the function address inside the symbol is 0, making it a nullptr and cannot be used.

  • This issue happened for LLVM 10 (not sure if it is Linux specific), and the same code doesn't have such an issue when using LLVM 14.0.6 on macOS
  • This issue is specific for ORC v2/LLJIT, and MCJIT doesn't have such an issue

To workaround this issue, I have to:

  1. rename the function name in the test case to be more specific and avoid conflict
  2. rename the LLVM module name to be more specific and add some random suffix (the object's address) to avoid conflict, and it is very unlikely to encounter such a conflict in reality after this change

UPDATE:
I reported a bug to LLVM, llvm/llvm-project#74903

@niyue niyue force-pushed the feature/orc-v2-jit branch from d3deae2 to db819dc Compare December 20, 2023 13:53
Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

cpp/src/gandiva/engine.cc Show resolved Hide resolved
cpp/src/gandiva/engine.cc Outdated Show resolved Hide resolved
@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting change review Awaiting change review labels Dec 21, 2023
@kou
Copy link
Member

kou commented Dec 21, 2023

@jorisvandenbossche Could you take a look at Cython changes?

@js8544 Do you want to review this before we merge this?

Copy link
Collaborator

@js8544 js8544 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I do have a question regarding the API changes.

cpp/src/gandiva/engine.cc Show resolved Hide resolved
cpp/src/gandiva/engine.h Show resolved Hide resolved
cpp/src/gandiva/llvm_generator.h Show resolved Hide resolved
@niyue niyue force-pushed the feature/orc-v2-jit branch from b484dfb to 8b88f6d Compare December 22, 2023 04:28
@kou kou merged commit 83cba25 into apache:main Jan 4, 2024
30 of 32 checks passed
@kou kou removed the awaiting merge Awaiting merge label Jan 4, 2024
Copy link

After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 83cba25.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 4 possible false positives for unstable benchmarks that are known to sometimes produce them.

clayburn pushed a commit to clayburn/arrow that referenced this pull request Jan 23, 2024
…ORC v2/LLJIT (apache#39098)

### Rationale for this change
Gandiva currently employs MCJIT as its internal JIT engine. However, LLVM has introduced a newer JIT API known as ORC v2/LLJIT since LLVM 7.0, and it has several advantage over MCJIT, in particular, MCJIT is not actively maintained, and is slated for eventual deprecation and removal. 

### What changes are included in this PR?
* This PR replaces the MCJIT JIT engine with the ORC v2 engine, using the `LLJIT` API.
* This PR adds a new JIT linker option `JITLink` (https://llvm.org/docs/JITLink.html), which can be used together with `LLJIT`, for LLVM 14+ on Linux/macOS platform. It is turned off by default but could be turned on with environment variable `GANDIVA_USE_JIT_LINK`

### Are these changes tested?
Yes, they are covered by existing unit tests

### Are there any user-facing changes?
* `Configuration` class has a new option called `dump_ir`. If users would like to call `DumpIR` API of `Projector` and `Filter`, they have to set the `dump_ir` option first.
* Closes: apache#37848

Authored-by: Yue Ni <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
dgreiss pushed a commit to dgreiss/arrow that referenced this pull request Feb 19, 2024
…ORC v2/LLJIT (apache#39098)

### Rationale for this change
Gandiva currently employs MCJIT as its internal JIT engine. However, LLVM has introduced a newer JIT API known as ORC v2/LLJIT since LLVM 7.0, and it has several advantage over MCJIT, in particular, MCJIT is not actively maintained, and is slated for eventual deprecation and removal. 

### What changes are included in this PR?
* This PR replaces the MCJIT JIT engine with the ORC v2 engine, using the `LLJIT` API.
* This PR adds a new JIT linker option `JITLink` (https://llvm.org/docs/JITLink.html), which can be used together with `LLJIT`, for LLVM 14+ on Linux/macOS platform. It is turned off by default but could be turned on with environment variable `GANDIVA_USE_JIT_LINK`

### Are these changes tested?
Yes, they are covered by existing unit tests

### Are there any user-facing changes?
* `Configuration` class has a new option called `dump_ir`. If users would like to call `DumpIR` API of `Projector` and `Filter`, they have to set the `dump_ir` option first.
* Closes: apache#37848

Authored-by: Yue Ni <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
zanmato1984 pushed a commit to zanmato1984/arrow that referenced this pull request Feb 28, 2024
…ORC v2/LLJIT (apache#39098)

### Rationale for this change
Gandiva currently employs MCJIT as its internal JIT engine. However, LLVM has introduced a newer JIT API known as ORC v2/LLJIT since LLVM 7.0, and it has several advantage over MCJIT, in particular, MCJIT is not actively maintained, and is slated for eventual deprecation and removal. 

### What changes are included in this PR?
* This PR replaces the MCJIT JIT engine with the ORC v2 engine, using the `LLJIT` API.
* This PR adds a new JIT linker option `JITLink` (https://llvm.org/docs/JITLink.html), which can be used together with `LLJIT`, for LLVM 14+ on Linux/macOS platform. It is turned off by default but could be turned on with environment variable `GANDIVA_USE_JIT_LINK`

### Are these changes tested?
Yes, they are covered by existing unit tests

### Are there any user-facing changes?
* `Configuration` class has a new option called `dump_ir`. If users would like to call `DumpIR` API of `Projector` and `Filter`, they have to set the `dump_ir` option first.
* Closes: apache#37848

Authored-by: Yue Ni <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
lriggs added a commit to lriggs/arrow that referenced this pull request Mar 20, 2024
ryantse added a commit to dremio/arrow that referenced this pull request Mar 30, 2024
lriggs added a commit to lriggs/arrow that referenced this pull request Apr 5, 2024
lriggs added a commit to lriggs/arrow that referenced this pull request May 10, 2024
stevelorddremio pushed a commit to stevelorddremio/arrow that referenced this pull request Jun 14, 2024
lriggs added a commit to lriggs/arrow that referenced this pull request Sep 3, 2024
lriggs added a commit to lriggs/arrow that referenced this pull request Sep 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[C++][Gandiva] Migration JIT engine from MCJIT to LLJIT
3 participants