Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CIR][ABI][Lowering] Supports function pointers in the calling convention lowering pass #1003

Merged
merged 16 commits into from
Oct 30, 2024

Conversation

gitoleg
Copy link
Collaborator

@gitoleg gitoleg commented Oct 22, 2024

This PR adds initial function pointers support for the calling convention lowering pass. This is a suggestion, so any other ideas are welcome.

Several ideas was described in the #995 and basically what I'm trying to do is to generate a clean CIR code without additional bitcast operations for function pointers and without mix of lowered and initial function types.

Problem

Looks like we can not just lower the function type and cast the value since too many operations are involved. For instance, for the
next simple code:

typedef struct {
  int a;
} S;

typedef int (*myfptr)(S);

int foo(S s) { return 42 + s.a; }

void bar() {
  myfptr a = foo;
}

we get the next CIR for the function bar , before the calling convention lowering pass:

cir.func no_proto  @bar() extra(#fn_attr) {
    %0 = cir.alloca !cir.ptr<!cir.func<!s32i (!ty_S)>>, !cir.ptr<!cir.ptr<!cir.func<!s32i (!ty_S)>>>, ["a", init]
    %1 = cir.get_global @foo : !cir.ptr<!cir.func<!s32i (!ty_S)>> 
    cir.store %1, %0 : !cir.ptr<!cir.func<!s32i (!ty_S)>>, !cir.ptr<!cir.ptr<!cir.func<!s32i (!ty_S)>>>
    cir.return 
  } 

As one can see, first three operations depend on the function type. Once foo is lowered, we need to fix GetGlobalOp:
otherwise the code will fail with the verification error since actual foo type (lowered) differs from the one currently expected by the GetGlobalOp.
First idea would just rewrite only the GetGlobalOp and insert a bitcast after, so both AllocaOp and StoreOp would work witth proper types.

Once the code will be more complex, we will need to take care about possible use cases, e.g. if we use arrays, we will need to track array accesses to it as well in order to insert this bitcast every time the array element is needed.

One workaround I can think of: we fix the GetGlobalOp type and cast from the lowered type to the initial, and cast back before the actual call happens - but it doesn't sound as a good and clean approach (from my point of view, of course).

So I suggest to use type converter and rewrite any operation that may deal with function pointers and make sure it has a proper type, and we don't have any unlowered function type in the program after the calling convention lowering pass.

Implementation

I added lowering for AllocaOp, GetGlobalOp, and split the lowering for FuncOp (former CallConvLoweringPattern) and lower CallOp separately.
Frankly speaking, I tried to implement a pattern for each operation, but for some reasons the tests are not passed for
windows and macOs in this case - something weird happens inside applyPatternsAndFold function. I suspect it's due to two different rewriters used - one in the LoweringModule and one in the mentioned function.
So I decided to follow the same approach as it's done for the LoweringPrepare pass and don't involve this complex rewriting framework.

Next I will add a type converter for the struct type, patterns for ConstantOp (for const arrays and GlobalViewAttr)
In the end of the day we'll have (at least I hope so) a clean CIR code without any bitcasts for function pointers.

cc @sitio-couto @bcardosolopes

Copy link

github-actions bot commented Oct 22, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@gitoleg gitoleg force-pushed the cc-fptr branch 2 times, most recently from 6df6a82 to 11547ee Compare October 23, 2024 10:46
Copy link
Member

@bcardosolopes bcardosolopes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the approach, using addConversion sounds like the way to go. LGTM after some minor fixes.

@sitio-couto let me know if you have any concerns with the approach!

//===----------------------------------------------------------------------===//
// Rewrite Patterns
//===----------------------------------------------------------------------===//
FuncOp findFun(mlir::ModuleOp mod, llvm::StringRef name) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rename to getFuncOp? Finding fun is always a cool thing tho :D

void lower(Operation *op) {
rewriter.setInsertionPoint(op);
if (auto fun = dyn_cast<FuncOp>(op))
lower(fun);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer if we don't overload in the "visitor pattern style", perhaps create one single method for each: lowerFuncOp, lowerAllocaOp, etc.

@sitio-couto
Copy link
Collaborator

Hey @gitoleg & @bcardosolopes,

Let me address some of your concerns about the Bitcast approach:

  • Mixing ABI-specific with ABI-agnostic types: While not clean, it shouldn't be an issue. With LLVM's opaque pointers, pointee types will be dropped later in the lowering, so no extra handling should be needed.

  • Cluttering CIR code with Bitcasts: I agree that excessive bitcasts can make the code messy. However, there are workarounds to explore. We could introduce operations like cir.load_coerced with relaxed constraints and store pointers opaquely in cir.alloca operations, for example.

  • Not viable due to too many operations involved: Have you found specific examples where the Bitcast approach doesn't work? From your example, it seems viable, albeit cumbersome in complex cases that might require many bitcasts.

Now, let me share my concerns with your Type-conversion approach:

  • Compilation time overhead: While we haven't focused much on compilation time yet, it's important for CIR. Correctly casting types would require rewriting every operation that uses the function type, which could be expensive in complex scenarios. Focusing on rewriting just the loads and stores might be more efficient.

  • Increased complexity in implementation and maintenance: Handling CIR's non-opaque pointers will introduce some complexity, but tackling every instance where a function type appears could require significant upkeep and development. Replacing load and stores seems simpler.

In summary, I believe the Bitcast approach is simpler and more efficient, while the Type-conversion approach might be costly regarding compilation time and maintenance/development. I would only advise the Type-conversion approach if the Bitcast doesn't work, or if it would require about the same amount of operation rewrites.


Frankly speaking, I tried to implement a pattern for each operation, but for some reasons the tests are not passed for Windows and macOS in this case.

P.S. This was a huge headache for me during GSoC. Often, these problems were related to dangling references. The TargetLowering library is still a bit brittle regarding memory safety.

@gitoleg
Copy link
Collaborator Author

gitoleg commented Oct 28, 2024

@sitio-couto

Well, the bitcast approach might work indeed, and it's definitely more simple!
My main concern is consistency - I really don't like idea to mix coerced and not coerced types, though may be it's not that bad as it seems at the first glance. But again, I still feel that we will have to deal with 'cases' instead of 'rules' - and it bother me a little.

From another point of view, we don't need to emit many casts for each load/store operations - probably, we need to insert only few - after GetGlobalOp (and cast from the coerced type back to the original) and before the indirect CallOp (and cast the pointer to the coerced type). And for the StoreOp for the particular case when the result of GetGlobalOp is stored in the const array.

I still vote for the type converter though it may be harder/ less efficient and etc.. I just really like the potential result of this approach: clean and consistent CIR code.
But I'm not against the bitcast approach either, since it's easier - and looks like it works indeed, at least in several scenarios I briefly checked it on.

So let's wait for Bruno and listen for his opinion.

cc @bcardosolopes

@sitio-couto
Copy link
Collaborator

sitio-couto commented Oct 28, 2024

Hey @gitoleg,

I really don't like the idea to mix coerced and not coerced types.

I get that, but it seems necessary for progressive lowering and consistency with the original codegen. For example, structs need to be coerced back and forth when passed as arguments. Keeping both types could also help retain higher-level information.

I still feel that we will have to deal with 'cases' instead of 'rules.'

Could you clarify what you mean by "cases" vs. "rules"?

I just really like the potential result of this approach: clean and consistent CIR code.

It's nice indeed, but the tradeoff does not seem to be worth it. If bitcasts are the "pollution", we could explore the workarounds I mentioned above. As for consistency, sticking only with ABI-specific types might actually be inconsistent with original codegen. For structs, the usual flow is: cast to ABI-specific when calling/defining, then revert to ABI-agnostic within the function body. Function pointers are an exception: since LLVM IR pointers are opaque, it has no pointee type, and so lowering everything to the ABI-specific form also works.

So let's wait for Bruno and listen for his opinion.

Sounds good. If you and @bcardosolopes still prefer this approach, I'm on board.

@gitoleg
Copy link
Collaborator Author

gitoleg commented Oct 28, 2024

Hey @sitio-couto

Could you clarify what you mean by "cases" vs. "rules"?

Sure, though nothing really important here) While we rewrite any operation that deal with a function pointer and convert a type - it's a 'rule', since it's applied for every operation. Thus, we rewrite every alloca, getGlobal and etc...
The 'case' word is about concrete examples - we don't insert bitcast everywhere, except the the mentioned places (after GetGlobal, before the indirect call, before the store into array). So soon or later we'll face with another 'case' which we didn't think of before.

@bcardosolopes
Copy link
Member

@sitio-couto thanks for the detailed pro/cons!

We could introduce operations like cir.load_coerced with relaxed constraints and store pointers opaquely in cir.alloca operations, for example.

I'd prefer if we don't have multiple load operations because we'd then need to teach multiple versions of them to different passes, which could be error prone in the long run - though this could be feasible if we introduce an interface. I think casts are more useful here and will be dropped anyways later on.

While we haven't focused much on compilation time yet, it's important for CIR. Correctly casting types would require rewriting every operation that uses the function type, which could be expensive in complex scenarios. Focusing on rewriting just the loads and stores might be more efficient.

We probably need some data here, my gut feeling is that loads/stores would be more massively used than uses of function types?

I really don't like idea to mix coerced and not coerced types

I share the same feeling with you, but...

As for consistency, sticking only with ABI-specific types might actually be inconsistent with original codegen. For structs, the usual flow is: cast to ABI-specific when calling/defining, then revert to ABI-agnostic within the function body.

This is a strong point, and I think we'd be in real trouble if we don't match traditional codegen here. @gitoleg how does this work after your change? Does it revert to ABI-agnostic within the function body?

Function pointers are an exception: since LLVM IR pointers are opaque, it has no pointee type, and so lowering everything to the ABI-specific form also works.

Can we go for a mix approach? Perhaps use type convertion for function pointers and bitcasts for the rest?

@sitio-couto
Copy link
Collaborator

sitio-couto commented Oct 28, 2024

@bcardosolopes

my gut feeling is that loads/stores would be more massively used than uses of function types?

To clarify, I was referring to loads and stores of function pointers for indirect calls using cir.call. Converting all function types would mean rewriting every operation/type that uses said type.

Can we go for a mix approach? Perhaps use type convertion for function pointers and bitcasts for the rest?

We can, but the discussion is whether we should. As I mentioned earlier, the main benefit seems to be eliminating bitcasts used to bypass type checking, which doesn't seem to outweigh the other tradeoffs. My take is that CC lowering should stay simple and efficient, focusing on lowering cir.call, cir.get_global, and cir.func operations, and keeping consistent with what the original Codegen does for other types.

@bcardosolopes
Copy link
Member

To clarify, I was referring to loads and stores of function pointers for indirect calls using cir.call. Converting all function types would mean rewriting every operation/type that uses said type.

Got it!

We can, but the discussion is whether we should. As I mentioned earlier, the main benefit seems to be eliminating bitcasts used to bypass type checking, which doesn't seem to outweigh the other tradeoffs.

You just mentioned compile time, if function pointers don't need to cast back, seems like bitcasts will have compile time impact for no good reason?

My take is that CC lowering should stay simple and efficient

I'm fine with all-bitcast approach too but can you elaborate on how converting type for function ptrs only complicate things? It's not clear to me.

... focusing on lowering cir.call, cir.get_global, and cir.func operations, and keeping consistent with what the original Codegen does for other types.

Perhaps we should have a LowerABI interface for those operations and make the pass operate on the interface?

@sitio-couto
Copy link
Collaborator

sitio-couto commented Oct 28, 2024

@bcardosolopes

You just mentioned compile time

I also pointed out the simplicity and consistency of the implementation.

if function pointers don't need to cast back, seems like bitcasts will have compile time impact for no good reason?

Adding a handful of bitcasts will not have as much impact as rewriting all operations and types that use the function type. How much more impactful it will be, I do not know.

how converting type for function ptrs only complicate things?

CC lowering will have to rewrite multiple operations unrelated to the function's definitions and calls (e.g. alloca) increasing the code's complexity. It also deviates from the behavior used for other types of CC lowering. It won't only complicate things, as mentioned, we won't need redundant bit casts (at least in this particular case).

Perhaps we should have a LowerABI interface for those operations and make the pass operate on the interface?

I'm not entirely sure how much that would help.

I'm any case, I think you and @gitoleg are well aligned. I'm fine with this approach too.

@bcardosolopes
Copy link
Member

I'm not entirely sure how much that would help.

My understand is that you are pushing for simplicity and consistency?

CC lowering will have to rewrite multiple operations unrelated to the function's definitions and calls (e.g. alloca) increasing the code's complexity. It also deviates from the behavior used for other types of CC lowering. It won't only complicate things, as mentioned, we won't need redundant bit casts (at least in this particular case).

This is a good argument. I'm actually more sold on the idea of the casts tbh.

@gitoleg let's stay with @sitio-couto approach, does that work for you?

@sitio-couto
Copy link
Collaborator

@gitoleg Just a reminder: hindsight is 20/20. I've shared a lot of ideas, but I haven’t seen them through in practice. If bitcasts don’t deliver the benefits I mentioned, don’t hesitate to drop this approach.

@gitoleg
Copy link
Collaborator Author

gitoleg commented Oct 29, 2024

This is a strong point, and I think we'd be in real trouble if we don't match traditional codegen here. @gitoleg how does this work after your change? Does it revert to ABI-agnostic within the function body?

Frankly speaking, I don't see where we can diverge with the original codegen: of course, struct types remain ABI-agnostic. The only thought that was in my mind - is to convert field types once they depend on function pointer type, which would eliminate any problems with load/store operation with such fields.

Anyways, let's stick to the 'bitcast' approach and let's check how it will go. Anyways, it doesn't look hard to come back to the 'type converter' approach once we really need it.

So I start to work on update

@gitoleg
Copy link
Collaborator Author

gitoleg commented Oct 29, 2024

Hey @sitio-couto @bcardosolopes
I updated the PR and there are two bitcasts for now: after GetGlobalOp and before the indirect CallOp. Basically that is - nothing else added so far.

Note, that I handle indirect calls while lowering a function - hope it's ok

@bcardosolopes
Copy link
Member

Some tests are failing!

@gitoleg
Copy link
Collaborator Author

gitoleg commented Oct 30, 2024

@bcardosolopes

Some tests are failing!

That's weird, but the check for windows failed during the compilation. I fixed it - hope macOS tests won't cause any troubles

I updated the PR and there are two bitcasts for now: after GetGlobalOp and before the indirect CallOp. Basically that is - nothing else added so far.

Update: It's not true for now - the only bitcast is after GetGlobalOp. The bitcast for indirect call will be added later in the upcoming PR (and in another place)

@bcardosolopes bcardosolopes merged commit 6288572 into llvm:main Oct 30, 2024
6 checks passed
lanza pushed a commit that referenced this pull request Nov 5, 2024
…tion lowering pass (#1003)

This PR adds initial function pointers support for the calling
convention lowering pass. This is a suggestion, so any other ideas are
welcome.

Several ideas was described in the #995 and basically what I'm trying to
do is to generate a clean CIR code without additional `bitcast`
operations for function pointers and without mix of lowered and initial
function types.

#### Problem
Looks like we can not just lower the function type and cast the value
since too many operations are involved. For instance, for the
next simple code: 
```
typedef struct {
  int a;
} S;

typedef int (*myfptr)(S);

int foo(S s) { return 42 + s.a; }

void bar() {
  myfptr a = foo;
}
```
we get the next CIR for the function `bar` , before the calling
convention lowering pass:
```
cir.func no_proto  @bar() extra(#fn_attr) {
    %0 = cir.alloca !cir.ptr<!cir.func<!s32i (!ty_S)>>, !cir.ptr<!cir.ptr<!cir.func<!s32i (!ty_S)>>>, ["a", init]
    %1 = cir.get_global @foo : !cir.ptr<!cir.func<!s32i (!ty_S)>> 
    cir.store %1, %0 : !cir.ptr<!cir.func<!s32i (!ty_S)>>, !cir.ptr<!cir.ptr<!cir.func<!s32i (!ty_S)>>>
    cir.return 
  } 
```
As one can see, first three operations depend on the function type. Once
`foo` is lowered, we need to fix `GetGlobalOp`:
otherwise the code will fail with the verification error since actual
`foo` type (lowered) differs from the one currently expected by the
`GetGlobalOp`.
First idea would just rewrite only the `GetGlobalOp` and insert a
bitcast after, so both `AllocaOp` and `StoreOp` would work witth proper
types.

Once the code will be more complex, we will need to take care about
possible use cases, e.g. if we use arrays, we will need to track array
accesses to it as well in order to insert this bitcast every time the
array element is needed.

One workaround I can think of: we fix the `GetGlobalOp` type and cast
from the lowered type to the initial, and cast back before the actual
call happens - but it doesn't sound as a good and clean approach (from
my point of view, of course).

So I suggest to use type converter and rewrite any operation that may
deal with function pointers and make sure it has a proper type, and we
don't have any unlowered function type in the program after the calling
convention lowering pass.

#### Implementation
I added lowering for `AllocaOp`, `GetGlobalOp`, and split the lowering
for `FuncOp` (former `CallConvLoweringPattern`) and lower `CallOp`
separately.
Frankly speaking, I tried to implement a pattern for each operation, but
for some reasons the tests are not passed for
windows and macOs in this case - something weird happens inside
`applyPatternsAndFold` function. I suspect it's due to two different
rewriters used - one in the `LoweringModule` and one in the mentioned
function.
So I decided to follow the same approach as it's done for the
`LoweringPrepare` pass and don't involve this complex rewriting
framework.

Next I will add a type converter for the struct type, patterns for
`ConstantOp` (for const arrays and `GlobalViewAttr`)
In the end of the day we'll have (at least I hope so) a clean CIR code
without any bitcasts for function pointers.

cc @sitio-couto  @bcardosolopes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants