Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CIR] Add support and enable MLIR's mem2reg for flat CIR #659

Merged
merged 18 commits into from
Jul 17, 2024

Conversation

gitoleg
Copy link
Collaborator

@gitoleg gitoleg commented Jun 5, 2024

This is PR enables generic MLIR's mem2reg pass for the flat CIR code.
I submit this draft PR if someone wants to play with mem2reg right now and to define the next incremental steps if necessary.

Briefly, what's important:

  1. I used llvm dialect implementation for the inspiration
  2. I did not implement interfaces for some of the operations, e.g. for cir.get_member, since looks like mem2reg doesn't handle them - it's more about SROA pass.
  3. simple tests are added
  4. mem2reg is disabled by default, so one need to use it explicitly, with the -fclangir-mem2reg option
  5. There are a couple bugs fixed in the code . The most important is tablegen for BranchOp that assumed same sized operands for branches (it was extremely hard to detect!).

Note, that this mem2reg implementation work with flat CIR only. First of all, the MLIR's mem2reg knows how to propagate (and remove) memory slots only when we can build a graph of blocks (e.g. dom tree is a bread and butter of mem2reg). Also, given that phi nodes are represents as blocks with arguments in MLIR, there is no chance to transfer control flow and call a block from another region (and pass some values as well).

The last one is important and has some relation to canonicalizer - may be you remember I mentioned this problem once somewhere, when conditional branch verification fails on target block arguments number and branch operands number mismatch.

finally, short example to play with:

int return_42() {
    int y = 42;
    return y;  
}

First, without mem2reg enabled clang tmp.c -fclangir -emit-llvm -S -o - :

define dso_local i32 @return_42() #0 !dbg !3 {
  %1 = alloca i32, i64 1, align 4, !dbg !7
  %2 = alloca i32, i64 1, align 4, !dbg !8
  store i32 42, ptr %2, align 4, !dbg !8
  %3 = load i32, ptr %2, align 4, !dbg !9
  store i32 %3, ptr %1, align 4, !dbg !10
  %4 = load i32, ptr %1, align 4, !dbg !10
  ret i32 %4, !dbg !10
}

and with mem2reg enabled clang tmp.c -fclangir -fclangir-mem2reg -emit-llvm -S -o -:

define dso_local i32 @return_42() #0 !dbg !3 {
  ret i32 42, !dbg !7
}

So. What do you think? Which steps do you expect me to perform? I can try to split this PR into parts and add interfaces one-by-one, but I'n not sure the tests will work without all of them implemented.

@bcardosolopes
Copy link
Member

Sorry I didn't had the time to look just yet, but should do that soon, can you update post rebase?

@gitoleg
Copy link
Collaborator Author

gitoleg commented Jul 9, 2024

@bcardosolopes sorry for delay!
I updated the PR, there is a strange fail in the tests, I will fix it soon. But now you could look at the changes :)

Copy link
Member

@bcardosolopes bcardosolopes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is PR enables generic MLIR's mem2reg pass for the flat CIR code. I submit this draft PR if someone wants to play with mem2reg right now and to define the next incremental steps if necessary.

This is pretty cool, very excited with the things we can achieve using this.

  1. There are a couple bugs fixed in the code . The most important is tablegen for BranchOp that assumed same sized operands for branches (it was extremely hard to detect!).

Yay!

The last one is important and has some relation to canonicalizer - may be you remember I mentioned this problem once somewhere, when conditional branch verification fails on target block arguments number and branch operands number mismatch.

Awesome, after this land, great to see one more blocker to the canonicalizer going away.

and with mem2reg enabled clang tmp.c -fclangir -fclangir-mem2reg -emit-llvm -S -o -:

define dso_local i32 @return_42() #0 !dbg !3 {
  ret i32 42, !dbg !7
}

Neat!

So. What do you think? Which steps do you expect me to perform? I can try to split this PR into parts and add interfaces one-by-one, but I'n not sure the tests will work without all of them implemented.

This is good as is, some minor comments and question and should be ready to go.

clang/include/clang/CIR/Dialect/IR/CIROps.td Outdated Show resolved Hide resolved
clang/include/clang/CIR/Dialect/IR/CIROps.td Outdated Show resolved Hide resolved
clang/lib/CIR/Dialect/IR/CIRMemorySlot.cpp Show resolved Hide resolved
clang/lib/CIR/Dialect/IR/CIRMemorySlot.cpp Show resolved Hide resolved
clang/include/clang/CIR/Dialect/IR/CIROps.td Show resolved Hide resolved
clang/lib/CIR/Dialect/IR/CIRMemorySlot.cpp Outdated Show resolved Hide resolved
clang/lib/CIR/Dialect/IR/CIRMemorySlot.cpp Show resolved Hide resolved
clang/test/CIR/Transforms/mem2reg.cir Outdated Show resolved Hide resolved
clang/test/CIR/Transforms/mem2reg.cir Show resolved Hide resolved
@bcardosolopes bcardosolopes marked this pull request as ready for review July 10, 2024 19:14
@bcardosolopes bcardosolopes requested a review from lanza as a code owner July 10, 2024 19:14
@bcardosolopes bcardosolopes changed the title [CIR] enables MLIR's mem2reg for flat CIR [CIR] Enables MLIR's mem2reg for flat CIR Jul 10, 2024
@bcardosolopes bcardosolopes changed the title [CIR] Enables MLIR's mem2reg for flat CIR [CIR] Add support and enable MLIR's mem2reg for flat CIR Jul 10, 2024
@bcardosolopes
Copy link
Member

bcardosolopes commented Jul 10, 2024

Can you also update the description with a brief mention why this is only valid for flat CIR? (I know the answer, but it's good to leave some pointers for context)

@gitoleg
Copy link
Collaborator Author

gitoleg commented Jul 17, 2024

@bcardosolopes so I think I did all you asked for. Please let me know if I missed something

Copy link
Member

@bcardosolopes bcardosolopes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few nits that can be addressed in future PRs. LGTM

@@ -17,6 +17,9 @@
#include "mlir/Pass/Pass.h"
#include "mlir/Pass/PassManager.h"
#include "mlir/Support/LogicalResult.h"
#include "mlir/Transforms/Passes.h"

#include <iostream>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need iostream here?

@@ -2982,6 +2982,10 @@ def fclangir_call_conv_lowering : Flag<["-"], "fclangir-call-conv-lowering">,
Visibility<[ClangOption, CC1Option]>, Group<f_Group>,
HelpText<"Enable ClangIR calling convention lowering">,
MarshallingInfoFlag<FrontendOpts<"ClangIREnableCallConvLowering">>;
def fclangir_mem2reg : Flag<["-"], "fclangir-mem2reg">,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be a good a idea to do the defm version already as to populate the "no" variant, the idea is to make it handy to disable whenever we flip this by default while testing in our codebase (or when anyone else tries to do the same).

@bcardosolopes bcardosolopes merged commit dc9f39d into llvm:main Jul 17, 2024
6 checks passed
Hugobros3 pushed a commit to shady-gang/clangir that referenced this pull request Oct 2, 2024
This is PR enables generic MLIR's mem2reg pass for the flat CIR code. 
I submit this draft PR if someone wants to play with mem2reg right now
and to define the next incremental steps if necessary.

Briefly, what's important:
1) I used llvm dialect implementation for the inspiration
2) I did not implement interfaces for some of the operations, e.g. for
`cir.get_member`, since looks like mem2reg doesn't handle them - it's
more about SROA pass.
3) simple tests are added 
4) mem2reg is disabled by default, so one need to use it explicitly,
with the `-fclangir-mem2reg` option
5) There are a couple bugs fixed in the code . The most important is
tablegen for `BranchOp` that assumed same sized operands for branches
(it was extremely hard to detect!).

Note, that this `mem2reg` implementation work with flat CIR only. First
of all, the MLIR's mem2reg knows how to propagate (and remove) memory
slots only when we can build a graph of blocks (e.g. dom tree is a bread
and butter of mem2reg). Also, given that phi nodes are represents as
blocks with arguments in MLIR, there is no chance to transfer control
flow and `call` a block from another region (and pass some values as
well).

The last one is important and has some relation to `canonicalizer` - may
be you remember I mentioned this problem once somewhere, when
conditional branch verification fails on target block arguments number
and branch operands number mismatch.

finally, short example to play with:
``` 
int return_42() {
    int y = 42;
    return y;  
}
```
First, without mem2reg enabled `clang tmp.c -fclangir -emit-llvm -S -o
-` :
```
define dso_local i32 @return_42() #0 !dbg !3 {
  %1 = alloca i32, i64 1, align 4, !dbg !7
  %2 = alloca i32, i64 1, align 4, !dbg !8
  store i32 42, ptr %2, align 4, !dbg !8
  %3 = load i32, ptr %2, align 4, !dbg !9
  store i32 %3, ptr %1, align 4, !dbg !10
  %4 = load i32, ptr %1, align 4, !dbg !10
  ret i32 %4, !dbg !10
}

```
and with mem2reg enabled `clang tmp.c -fclangir -fclangir-mem2reg
-emit-llvm -S -o -`:
```
define dso_local i32 @return_42() #0 !dbg !3 {
  ret i32 42, !dbg !7
}
```


So. What do you think? Which steps do you expect me to perform? I can
try to split this PR into parts and add interfaces one-by-one, but I'n
not sure the tests will work without all of them implemented.
smeenai pushed a commit to smeenai/clangir that referenced this pull request Oct 9, 2024
This is PR enables generic MLIR's mem2reg pass for the flat CIR code. 
I submit this draft PR if someone wants to play with mem2reg right now
and to define the next incremental steps if necessary.

Briefly, what's important:
1) I used llvm dialect implementation for the inspiration
2) I did not implement interfaces for some of the operations, e.g. for
`cir.get_member`, since looks like mem2reg doesn't handle them - it's
more about SROA pass.
3) simple tests are added 
4) mem2reg is disabled by default, so one need to use it explicitly,
with the `-fclangir-mem2reg` option
5) There are a couple bugs fixed in the code . The most important is
tablegen for `BranchOp` that assumed same sized operands for branches
(it was extremely hard to detect!).

Note, that this `mem2reg` implementation work with flat CIR only. First
of all, the MLIR's mem2reg knows how to propagate (and remove) memory
slots only when we can build a graph of blocks (e.g. dom tree is a bread
and butter of mem2reg). Also, given that phi nodes are represents as
blocks with arguments in MLIR, there is no chance to transfer control
flow and `call` a block from another region (and pass some values as
well).

The last one is important and has some relation to `canonicalizer` - may
be you remember I mentioned this problem once somewhere, when
conditional branch verification fails on target block arguments number
and branch operands number mismatch.

finally, short example to play with:
``` 
int return_42() {
    int y = 42;
    return y;  
}
```
First, without mem2reg enabled `clang tmp.c -fclangir -emit-llvm -S -o
-` :
```
define dso_local i32 @return_42() #0 !dbg !3 {
  %1 = alloca i32, i64 1, align 4, !dbg !7
  %2 = alloca i32, i64 1, align 4, !dbg !8
  store i32 42, ptr %2, align 4, !dbg !8
  %3 = load i32, ptr %2, align 4, !dbg !9
  store i32 %3, ptr %1, align 4, !dbg !10
  %4 = load i32, ptr %1, align 4, !dbg !10
  ret i32 %4, !dbg !10
}

```
and with mem2reg enabled `clang tmp.c -fclangir -fclangir-mem2reg
-emit-llvm -S -o -`:
```
define dso_local i32 @return_42() #0 !dbg !3 {
  ret i32 42, !dbg !7
}
```


So. What do you think? Which steps do you expect me to perform? I can
try to split this PR into parts and add interfaces one-by-one, but I'n
not sure the tests will work without all of them implemented.
smeenai pushed a commit to smeenai/clangir that referenced this pull request Oct 9, 2024
This is PR enables generic MLIR's mem2reg pass for the flat CIR code. 
I submit this draft PR if someone wants to play with mem2reg right now
and to define the next incremental steps if necessary.

Briefly, what's important:
1) I used llvm dialect implementation for the inspiration
2) I did not implement interfaces for some of the operations, e.g. for
`cir.get_member`, since looks like mem2reg doesn't handle them - it's
more about SROA pass.
3) simple tests are added 
4) mem2reg is disabled by default, so one need to use it explicitly,
with the `-fclangir-mem2reg` option
5) There are a couple bugs fixed in the code . The most important is
tablegen for `BranchOp` that assumed same sized operands for branches
(it was extremely hard to detect!).

Note, that this `mem2reg` implementation work with flat CIR only. First
of all, the MLIR's mem2reg knows how to propagate (and remove) memory
slots only when we can build a graph of blocks (e.g. dom tree is a bread
and butter of mem2reg). Also, given that phi nodes are represents as
blocks with arguments in MLIR, there is no chance to transfer control
flow and `call` a block from another region (and pass some values as
well).

The last one is important and has some relation to `canonicalizer` - may
be you remember I mentioned this problem once somewhere, when
conditional branch verification fails on target block arguments number
and branch operands number mismatch.

finally, short example to play with:
``` 
int return_42() {
    int y = 42;
    return y;  
}
```
First, without mem2reg enabled `clang tmp.c -fclangir -emit-llvm -S -o
-` :
```
define dso_local i32 @return_42() #0 !dbg !3 {
  %1 = alloca i32, i64 1, align 4, !dbg !7
  %2 = alloca i32, i64 1, align 4, !dbg !8
  store i32 42, ptr %2, align 4, !dbg !8
  %3 = load i32, ptr %2, align 4, !dbg !9
  store i32 %3, ptr %1, align 4, !dbg !10
  %4 = load i32, ptr %1, align 4, !dbg !10
  ret i32 %4, !dbg !10
}

```
and with mem2reg enabled `clang tmp.c -fclangir -fclangir-mem2reg
-emit-llvm -S -o -`:
```
define dso_local i32 @return_42() #0 !dbg !3 {
  ret i32 42, !dbg !7
}
```


So. What do you think? Which steps do you expect me to perform? I can
try to split this PR into parts and add interfaces one-by-one, but I'n
not sure the tests will work without all of them implemented.
keryell pushed a commit to keryell/clangir that referenced this pull request Oct 19, 2024
This is PR enables generic MLIR's mem2reg pass for the flat CIR code. 
I submit this draft PR if someone wants to play with mem2reg right now
and to define the next incremental steps if necessary.

Briefly, what's important:
1) I used llvm dialect implementation for the inspiration
2) I did not implement interfaces for some of the operations, e.g. for
`cir.get_member`, since looks like mem2reg doesn't handle them - it's
more about SROA pass.
3) simple tests are added 
4) mem2reg is disabled by default, so one need to use it explicitly,
with the `-fclangir-mem2reg` option
5) There are a couple bugs fixed in the code . The most important is
tablegen for `BranchOp` that assumed same sized operands for branches
(it was extremely hard to detect!).

Note, that this `mem2reg` implementation work with flat CIR only. First
of all, the MLIR's mem2reg knows how to propagate (and remove) memory
slots only when we can build a graph of blocks (e.g. dom tree is a bread
and butter of mem2reg). Also, given that phi nodes are represents as
blocks with arguments in MLIR, there is no chance to transfer control
flow and `call` a block from another region (and pass some values as
well).

The last one is important and has some relation to `canonicalizer` - may
be you remember I mentioned this problem once somewhere, when
conditional branch verification fails on target block arguments number
and branch operands number mismatch.

finally, short example to play with:
``` 
int return_42() {
    int y = 42;
    return y;  
}
```
First, without mem2reg enabled `clang tmp.c -fclangir -emit-llvm -S -o
-` :
```
define dso_local i32 @return_42() #0 !dbg !3 {
  %1 = alloca i32, i64 1, align 4, !dbg !7
  %2 = alloca i32, i64 1, align 4, !dbg !8
  store i32 42, ptr %2, align 4, !dbg !8
  %3 = load i32, ptr %2, align 4, !dbg !9
  store i32 %3, ptr %1, align 4, !dbg !10
  %4 = load i32, ptr %1, align 4, !dbg !10
  ret i32 %4, !dbg !10
}

```
and with mem2reg enabled `clang tmp.c -fclangir -fclangir-mem2reg
-emit-llvm -S -o -`:
```
define dso_local i32 @return_42() #0 !dbg !3 {
  ret i32 42, !dbg !7
}
```


So. What do you think? Which steps do you expect me to perform? I can
try to split this PR into parts and add interfaces one-by-one, but I'n
not sure the tests will work without all of them implemented.
lanza pushed a commit that referenced this pull request Nov 5, 2024
This is PR enables generic MLIR's mem2reg pass for the flat CIR code. 
I submit this draft PR if someone wants to play with mem2reg right now
and to define the next incremental steps if necessary.

Briefly, what's important:
1) I used llvm dialect implementation for the inspiration
2) I did not implement interfaces for some of the operations, e.g. for
`cir.get_member`, since looks like mem2reg doesn't handle them - it's
more about SROA pass.
3) simple tests are added 
4) mem2reg is disabled by default, so one need to use it explicitly,
with the `-fclangir-mem2reg` option
5) There are a couple bugs fixed in the code . The most important is
tablegen for `BranchOp` that assumed same sized operands for branches
(it was extremely hard to detect!).

Note, that this `mem2reg` implementation work with flat CIR only. First
of all, the MLIR's mem2reg knows how to propagate (and remove) memory
slots only when we can build a graph of blocks (e.g. dom tree is a bread
and butter of mem2reg). Also, given that phi nodes are represents as
blocks with arguments in MLIR, there is no chance to transfer control
flow and `call` a block from another region (and pass some values as
well).

The last one is important and has some relation to `canonicalizer` - may
be you remember I mentioned this problem once somewhere, when
conditional branch verification fails on target block arguments number
and branch operands number mismatch.

finally, short example to play with:
``` 
int return_42() {
    int y = 42;
    return y;  
}
```
First, without mem2reg enabled `clang tmp.c -fclangir -emit-llvm -S -o
-` :
```
define dso_local i32 @return_42() #0 !dbg !3 {
  %1 = alloca i32, i64 1, align 4, !dbg !7
  %2 = alloca i32, i64 1, align 4, !dbg !8
  store i32 42, ptr %2, align 4, !dbg !8
  %3 = load i32, ptr %2, align 4, !dbg !9
  store i32 %3, ptr %1, align 4, !dbg !10
  %4 = load i32, ptr %1, align 4, !dbg !10
  ret i32 %4, !dbg !10
}

```
and with mem2reg enabled `clang tmp.c -fclangir -fclangir-mem2reg
-emit-llvm -S -o -`:
```
define dso_local i32 @return_42() #0 !dbg !3 {
  ret i32 42, !dbg !7
}
```


So. What do you think? Which steps do you expect me to perform? I can
try to split this PR into parts and add interfaces one-by-one, but I'n
not sure the tests will work without all of them implemented.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants