-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CIR] Add support and enable MLIR's mem2reg for flat CIR #659
Conversation
Sorry I didn't had the time to look just yet, but should do that soon, can you update post rebase? |
@bcardosolopes sorry for delay! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is PR enables generic MLIR's mem2reg pass for the flat CIR code. I submit this draft PR if someone wants to play with mem2reg right now and to define the next incremental steps if necessary.
This is pretty cool, very excited with the things we can achieve using this.
- There are a couple bugs fixed in the code . The most important is tablegen for
BranchOp
that assumed same sized operands for branches (it was extremely hard to detect!).
Yay!
The last one is important and has some relation to
canonicalizer
- may be you remember I mentioned this problem once somewhere, when conditional branch verification fails on target block arguments number and branch operands number mismatch.
Awesome, after this land, great to see one more blocker to the canonicalizer
going away.
and with mem2reg enabled
clang tmp.c -fclangir -fclangir-mem2reg -emit-llvm -S -o -
:define dso_local i32 @return_42() #0 !dbg !3 { ret i32 42, !dbg !7 }
Neat!
So. What do you think? Which steps do you expect me to perform? I can try to split this PR into parts and add interfaces one-by-one, but I'n not sure the tests will work without all of them implemented.
This is good as is, some minor comments and question and should be ready to go.
Can you also update the description with a brief mention why this is only valid for flat CIR? (I know the answer, but it's good to leave some pointers for context) |
@bcardosolopes so I think I did all you asked for. Please let me know if I missed something |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few nits that can be addressed in future PRs. LGTM
@@ -17,6 +17,9 @@ | |||
#include "mlir/Pass/Pass.h" | |||
#include "mlir/Pass/PassManager.h" | |||
#include "mlir/Support/LogicalResult.h" | |||
#include "mlir/Transforms/Passes.h" | |||
|
|||
#include <iostream> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need iostream
here?
@@ -2982,6 +2982,10 @@ def fclangir_call_conv_lowering : Flag<["-"], "fclangir-call-conv-lowering">, | |||
Visibility<[ClangOption, CC1Option]>, Group<f_Group>, | |||
HelpText<"Enable ClangIR calling convention lowering">, | |||
MarshallingInfoFlag<FrontendOpts<"ClangIREnableCallConvLowering">>; | |||
def fclangir_mem2reg : Flag<["-"], "fclangir-mem2reg">, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be a good a idea to do the defm
version already as to populate the "no" variant, the idea is to make it handy to disable whenever we flip this by default while testing in our codebase (or when anyone else tries to do the same).
This is PR enables generic MLIR's mem2reg pass for the flat CIR code. I submit this draft PR if someone wants to play with mem2reg right now and to define the next incremental steps if necessary. Briefly, what's important: 1) I used llvm dialect implementation for the inspiration 2) I did not implement interfaces for some of the operations, e.g. for `cir.get_member`, since looks like mem2reg doesn't handle them - it's more about SROA pass. 3) simple tests are added 4) mem2reg is disabled by default, so one need to use it explicitly, with the `-fclangir-mem2reg` option 5) There are a couple bugs fixed in the code . The most important is tablegen for `BranchOp` that assumed same sized operands for branches (it was extremely hard to detect!). Note, that this `mem2reg` implementation work with flat CIR only. First of all, the MLIR's mem2reg knows how to propagate (and remove) memory slots only when we can build a graph of blocks (e.g. dom tree is a bread and butter of mem2reg). Also, given that phi nodes are represents as blocks with arguments in MLIR, there is no chance to transfer control flow and `call` a block from another region (and pass some values as well). The last one is important and has some relation to `canonicalizer` - may be you remember I mentioned this problem once somewhere, when conditional branch verification fails on target block arguments number and branch operands number mismatch. finally, short example to play with: ``` int return_42() { int y = 42; return y; } ``` First, without mem2reg enabled `clang tmp.c -fclangir -emit-llvm -S -o -` : ``` define dso_local i32 @return_42() #0 !dbg !3 { %1 = alloca i32, i64 1, align 4, !dbg !7 %2 = alloca i32, i64 1, align 4, !dbg !8 store i32 42, ptr %2, align 4, !dbg !8 %3 = load i32, ptr %2, align 4, !dbg !9 store i32 %3, ptr %1, align 4, !dbg !10 %4 = load i32, ptr %1, align 4, !dbg !10 ret i32 %4, !dbg !10 } ``` and with mem2reg enabled `clang tmp.c -fclangir -fclangir-mem2reg -emit-llvm -S -o -`: ``` define dso_local i32 @return_42() #0 !dbg !3 { ret i32 42, !dbg !7 } ``` So. What do you think? Which steps do you expect me to perform? I can try to split this PR into parts and add interfaces one-by-one, but I'n not sure the tests will work without all of them implemented.
This is PR enables generic MLIR's mem2reg pass for the flat CIR code. I submit this draft PR if someone wants to play with mem2reg right now and to define the next incremental steps if necessary. Briefly, what's important: 1) I used llvm dialect implementation for the inspiration 2) I did not implement interfaces for some of the operations, e.g. for `cir.get_member`, since looks like mem2reg doesn't handle them - it's more about SROA pass. 3) simple tests are added 4) mem2reg is disabled by default, so one need to use it explicitly, with the `-fclangir-mem2reg` option 5) There are a couple bugs fixed in the code . The most important is tablegen for `BranchOp` that assumed same sized operands for branches (it was extremely hard to detect!). Note, that this `mem2reg` implementation work with flat CIR only. First of all, the MLIR's mem2reg knows how to propagate (and remove) memory slots only when we can build a graph of blocks (e.g. dom tree is a bread and butter of mem2reg). Also, given that phi nodes are represents as blocks with arguments in MLIR, there is no chance to transfer control flow and `call` a block from another region (and pass some values as well). The last one is important and has some relation to `canonicalizer` - may be you remember I mentioned this problem once somewhere, when conditional branch verification fails on target block arguments number and branch operands number mismatch. finally, short example to play with: ``` int return_42() { int y = 42; return y; } ``` First, without mem2reg enabled `clang tmp.c -fclangir -emit-llvm -S -o -` : ``` define dso_local i32 @return_42() #0 !dbg !3 { %1 = alloca i32, i64 1, align 4, !dbg !7 %2 = alloca i32, i64 1, align 4, !dbg !8 store i32 42, ptr %2, align 4, !dbg !8 %3 = load i32, ptr %2, align 4, !dbg !9 store i32 %3, ptr %1, align 4, !dbg !10 %4 = load i32, ptr %1, align 4, !dbg !10 ret i32 %4, !dbg !10 } ``` and with mem2reg enabled `clang tmp.c -fclangir -fclangir-mem2reg -emit-llvm -S -o -`: ``` define dso_local i32 @return_42() #0 !dbg !3 { ret i32 42, !dbg !7 } ``` So. What do you think? Which steps do you expect me to perform? I can try to split this PR into parts and add interfaces one-by-one, but I'n not sure the tests will work without all of them implemented.
This is PR enables generic MLIR's mem2reg pass for the flat CIR code. I submit this draft PR if someone wants to play with mem2reg right now and to define the next incremental steps if necessary. Briefly, what's important: 1) I used llvm dialect implementation for the inspiration 2) I did not implement interfaces for some of the operations, e.g. for `cir.get_member`, since looks like mem2reg doesn't handle them - it's more about SROA pass. 3) simple tests are added 4) mem2reg is disabled by default, so one need to use it explicitly, with the `-fclangir-mem2reg` option 5) There are a couple bugs fixed in the code . The most important is tablegen for `BranchOp` that assumed same sized operands for branches (it was extremely hard to detect!). Note, that this `mem2reg` implementation work with flat CIR only. First of all, the MLIR's mem2reg knows how to propagate (and remove) memory slots only when we can build a graph of blocks (e.g. dom tree is a bread and butter of mem2reg). Also, given that phi nodes are represents as blocks with arguments in MLIR, there is no chance to transfer control flow and `call` a block from another region (and pass some values as well). The last one is important and has some relation to `canonicalizer` - may be you remember I mentioned this problem once somewhere, when conditional branch verification fails on target block arguments number and branch operands number mismatch. finally, short example to play with: ``` int return_42() { int y = 42; return y; } ``` First, without mem2reg enabled `clang tmp.c -fclangir -emit-llvm -S -o -` : ``` define dso_local i32 @return_42() #0 !dbg !3 { %1 = alloca i32, i64 1, align 4, !dbg !7 %2 = alloca i32, i64 1, align 4, !dbg !8 store i32 42, ptr %2, align 4, !dbg !8 %3 = load i32, ptr %2, align 4, !dbg !9 store i32 %3, ptr %1, align 4, !dbg !10 %4 = load i32, ptr %1, align 4, !dbg !10 ret i32 %4, !dbg !10 } ``` and with mem2reg enabled `clang tmp.c -fclangir -fclangir-mem2reg -emit-llvm -S -o -`: ``` define dso_local i32 @return_42() #0 !dbg !3 { ret i32 42, !dbg !7 } ``` So. What do you think? Which steps do you expect me to perform? I can try to split this PR into parts and add interfaces one-by-one, but I'n not sure the tests will work without all of them implemented.
This is PR enables generic MLIR's mem2reg pass for the flat CIR code. I submit this draft PR if someone wants to play with mem2reg right now and to define the next incremental steps if necessary. Briefly, what's important: 1) I used llvm dialect implementation for the inspiration 2) I did not implement interfaces for some of the operations, e.g. for `cir.get_member`, since looks like mem2reg doesn't handle them - it's more about SROA pass. 3) simple tests are added 4) mem2reg is disabled by default, so one need to use it explicitly, with the `-fclangir-mem2reg` option 5) There are a couple bugs fixed in the code . The most important is tablegen for `BranchOp` that assumed same sized operands for branches (it was extremely hard to detect!). Note, that this `mem2reg` implementation work with flat CIR only. First of all, the MLIR's mem2reg knows how to propagate (and remove) memory slots only when we can build a graph of blocks (e.g. dom tree is a bread and butter of mem2reg). Also, given that phi nodes are represents as blocks with arguments in MLIR, there is no chance to transfer control flow and `call` a block from another region (and pass some values as well). The last one is important and has some relation to `canonicalizer` - may be you remember I mentioned this problem once somewhere, when conditional branch verification fails on target block arguments number and branch operands number mismatch. finally, short example to play with: ``` int return_42() { int y = 42; return y; } ``` First, without mem2reg enabled `clang tmp.c -fclangir -emit-llvm -S -o -` : ``` define dso_local i32 @return_42() #0 !dbg !3 { %1 = alloca i32, i64 1, align 4, !dbg !7 %2 = alloca i32, i64 1, align 4, !dbg !8 store i32 42, ptr %2, align 4, !dbg !8 %3 = load i32, ptr %2, align 4, !dbg !9 store i32 %3, ptr %1, align 4, !dbg !10 %4 = load i32, ptr %1, align 4, !dbg !10 ret i32 %4, !dbg !10 } ``` and with mem2reg enabled `clang tmp.c -fclangir -fclangir-mem2reg -emit-llvm -S -o -`: ``` define dso_local i32 @return_42() #0 !dbg !3 { ret i32 42, !dbg !7 } ``` So. What do you think? Which steps do you expect me to perform? I can try to split this PR into parts and add interfaces one-by-one, but I'n not sure the tests will work without all of them implemented.
This is PR enables generic MLIR's mem2reg pass for the flat CIR code. I submit this draft PR if someone wants to play with mem2reg right now and to define the next incremental steps if necessary. Briefly, what's important: 1) I used llvm dialect implementation for the inspiration 2) I did not implement interfaces for some of the operations, e.g. for `cir.get_member`, since looks like mem2reg doesn't handle them - it's more about SROA pass. 3) simple tests are added 4) mem2reg is disabled by default, so one need to use it explicitly, with the `-fclangir-mem2reg` option 5) There are a couple bugs fixed in the code . The most important is tablegen for `BranchOp` that assumed same sized operands for branches (it was extremely hard to detect!). Note, that this `mem2reg` implementation work with flat CIR only. First of all, the MLIR's mem2reg knows how to propagate (and remove) memory slots only when we can build a graph of blocks (e.g. dom tree is a bread and butter of mem2reg). Also, given that phi nodes are represents as blocks with arguments in MLIR, there is no chance to transfer control flow and `call` a block from another region (and pass some values as well). The last one is important and has some relation to `canonicalizer` - may be you remember I mentioned this problem once somewhere, when conditional branch verification fails on target block arguments number and branch operands number mismatch. finally, short example to play with: ``` int return_42() { int y = 42; return y; } ``` First, without mem2reg enabled `clang tmp.c -fclangir -emit-llvm -S -o -` : ``` define dso_local i32 @return_42() #0 !dbg !3 { %1 = alloca i32, i64 1, align 4, !dbg !7 %2 = alloca i32, i64 1, align 4, !dbg !8 store i32 42, ptr %2, align 4, !dbg !8 %3 = load i32, ptr %2, align 4, !dbg !9 store i32 %3, ptr %1, align 4, !dbg !10 %4 = load i32, ptr %1, align 4, !dbg !10 ret i32 %4, !dbg !10 } ``` and with mem2reg enabled `clang tmp.c -fclangir -fclangir-mem2reg -emit-llvm -S -o -`: ``` define dso_local i32 @return_42() #0 !dbg !3 { ret i32 42, !dbg !7 } ``` So. What do you think? Which steps do you expect me to perform? I can try to split this PR into parts and add interfaces one-by-one, but I'n not sure the tests will work without all of them implemented.
This is PR enables generic MLIR's mem2reg pass for the flat CIR code.
I submit this draft PR if someone wants to play with mem2reg right now and to define the next incremental steps if necessary.
Briefly, what's important:
cir.get_member
, since looks like mem2reg doesn't handle them - it's more about SROA pass.-fclangir-mem2reg
optionBranchOp
that assumed same sized operands for branches (it was extremely hard to detect!).Note, that this
mem2reg
implementation work with flat CIR only. First of all, the MLIR's mem2reg knows how to propagate (and remove) memory slots only when we can build a graph of blocks (e.g. dom tree is a bread and butter of mem2reg). Also, given that phi nodes are represents as blocks with arguments in MLIR, there is no chance to transfer control flow andcall
a block from another region (and pass some values as well).The last one is important and has some relation to
canonicalizer
- may be you remember I mentioned this problem once somewhere, when conditional branch verification fails on target block arguments number and branch operands number mismatch.finally, short example to play with:
First, without mem2reg enabled
clang tmp.c -fclangir -emit-llvm -S -o -
:and with mem2reg enabled
clang tmp.c -fclangir -fclangir-mem2reg -emit-llvm -S -o -
:So. What do you think? Which steps do you expect me to perform? I can try to split this PR into parts and add interfaces one-by-one, but I'n not sure the tests will work without all of them implemented.