Implementing "inline function call" flow #6627

shuffle2 · 2024-06-10T15:11:03Z

shuffle2
Jun 10, 2024

The Andes V3 ISA has "inline function call" extension, which introduces a set of instructions (IFCALL, IFCALL9, IFRET, IFRET16) which could be used to execute chunks of code at an arbitrary location (within current function body, or another function). The basic idea is that IFCALL* writes return address to a new link register USR (User Special Register) IFC_LP instead of the LP GPR, sets IFC_ON=1 and branches to the target. Then, IFRET* will return to IFC_LP if IFC_ON==1. If IFC_ON==0, the IFRET* is a nop (fall-through).

However, there are some more complications. All branch instructions of the ISA have been updated to be IFC-aware, either clearing IFC_ON, or manipulating IFC_LP. The details can be seen in gdb: https://github.com/nds32/gdb/blob/master/sim/nds32/interp.c

This leads to code patterns like:

        00024d7a 5a 00 2a 03     058           BEQC       a0,0x2a,LAB_00024d80
        00024d7e f8 4b           058           IFCALL9    thunk_FUN_00024ecc                               undefined thunk_FUN_00024ecc()
...
                             **************************************************************
                             *                       THUNK FUNCTION                       *
                             **************************************************************
                             thunk undefined thunk_FUN_00024ecc()
                               Thunked-Function: FUN_00024ecc
             undefined         a0:1           <RETURN>
                             thunk_FUN_00024ecc                                 XREF[3]:     FUN_00024d5e:00024d7e(c), 
                                                                                             FUN_00024d5e:00024dba(c), 
                                                                                             FUN_00024d5e:00024dee(c)  
        00024e14 48 00 00 5c     - ? -         J          FUN_00024ecc
...
                             **************************************************************
                             *                          FUNCTION                          *
                             **************************************************************
                             undefined FUN_00024ecc()
             undefined         a0:1           <RETURN>
                             FUN_00024ecc                                       XREF[2]:     thunk_FUN_00024ecc:00024e14(T), 
                                                                                             thunk_FUN_00024ecc:00024e14(j), 
                                                                                             FUN_00024ec0:00024ec8(R)  
        00024ecc 84 00             0           MOVI55     a0,0x0
        00024ece fc a8             0           POP25      {s0-s2,fp,gp,lp},0x40

where 24e14 and 00024ecc are reachable via other functions both by other IFCALL as well as implicit fall-through flow. This construct allows both code reuse as well as reducing code size (by outlining the 32bit J from multiple 16bit IFCALL9).

In the above example, the original return address (in IFC_LP) gets moved into LP GPR by J (noted as ji in gdb sources), so the following POP25 will return to 24d7e+2. The same behavior applies to JAL and other branches executed while IFC_ON==1.

The above is shown as the following in the decompiler:

...
    if (uVar3 != 0x2a) {
      thunk_FUN_00024ecc();
    }
...

whereas it should just be return 0; instead of calling a thunk.

Please note that this construct is used for all types of code that should be inlined - not just prologue/epilogue sequences.

This kind of flow seems to make the ghidra disassembler/decompiler rather confused. Ideally, I would like any code referenced by IFCALL* to be inlined into the function being decompiled. This is somewhat tricky for places where the IFCALL target is within the function being decompiled - it would seem the decompiler would need to know to duplicate the code in the decompiled representation.
Currently, I model IFCALL/IFRET with call/return sleigh semantics, which seemed to work better than just using goto. However, the side effect is that ghidra is now creating functions/subroutines/thunks at all IFCALL targets. The decompiler can't properly show the code at a an IFCALL target, as it doesn't know it should show control flow with assumption that IFC_ON/IFC_LP are set upon entry (which would be solved if the target was inlined into the IFCALL-er).

How can I modify my processor extension/ghidra to perform the type of flow control as outlined above, and show sensible decompiler output?

shuffle2 · 2024-06-19T17:49:01Z

shuffle2
Jun 19, 2024
Author

I guess this can be fixed in an Analyzer extension, somehow.
I'm a bit lost how to pick priority of such an extension, and the general approach that should be taken. I think I need the following:

Inhibit IFCALLs targets from being marked as functions. Basically, I want IFCALLs to behave as pcode BRANCH ops, with the hint that flow will return to inst_next. But I don't want the CALL pcode behavior of the target being a function/subroutine.
Track LP/IFC_LP values through a function. As expressed above, this should mainly entail ensuring that instructions which perform LP = IFC_LP are properly considered. ghidra doesn't seem good at this currently (I think it fails because gpr = IFC_LP is conditional on IFC_ON == 1 - some register tracking functionality of ghidra gets confused).

One thing I did realize is that I should set IFC_ON=0 in a <tracked_set>:

<context_data>
    <tracked_set space="ram">
        <set name="IFC_ON" val="0" description="Assume IFC_ON=0 on function entry"/>
    </tracked_set>
</context_data>

0 replies

shuffle2 · 2024-06-24T16:39:32Z

shuffle2
Jun 24, 2024
Author

The problem with IFCALL -> JAL -> POP25 sequence seems to be that ghidra assumes a CALL pcode op will return to the following (fallthrough) instruction, and doesn't actually track the given language's link pointer register. Furthermore, a given instruction with "CALL" semantics can have only 1 fallthrough address. So, a CALL instruction (JAL in this case) which may return to multiple locations cannot be represented. Or am I missing something?

0 replies

shuffle2 · 2024-06-25T15:43:39Z

shuffle2
Jun 25, 2024
Author

Here is another problematic code example, this time, the simple use case of using IFCALL/IFRET within what is clearly the same function body. I've modified my sleigh to emit goto for IFCALL/IFRET.

                             **************************************************************
                             *                          FUNCTION                          *
                             **************************************************************
                             undefined FUN_00000b1c()
                               assume gp = 0x2090000
                               assume ITB = 0xa38c
             undefined         a0:1           <RETURN>
                             FUN_00000b1c                                       XREF[1]:     02090208(*)  
        00000b1c fc 00             0           PUSH25     {s0,fp,gp,lp},0x0                                = 000000FFh
        00000b1e dd 42           010           EX9.IT     0x2                                              ADDI.gp fp,-0x100
        00000b20 ea 66           010           EX9.IT     0x66                                             SETHI a0,0x80000
        00000b22 04 10 00 18     010           LWI        a1,[a0 + offset DAT_80000060]                    = ??
        00000b26 3c 1e 0d 4d     010           SWI.gp     a1,[+offset DAT_02093534]
        00000b2a 84 00           010           MOVI55     a0,0x0
                             LAB_00000b2c                                       XREF[1]:     00000b40(j)  
        00000b2c 3e 28 00 98     010           ADDI.gp    a2,0x98
        00000b30 98 42           010           ADD333     a1,a0,a2
        00000b32 b4 21           010           LWI450     a1=>PTR_DAT_02090098,[a1]                        = 8000010c
        00000b34 c9 03           010           BNEZ38     a1,LAB_00000b3a
                             LAB_00000b36                                       XREF[1]:     00000b44(j)  
        00000b36 84 00           010           MOVI55     a0,0x0
        00000b38 d5 07           010           J8         LAB_00000b46
                             LAB_00000b3a                                       XREF[1]:     00000b34(j)  
        00000b3a 3e 38 0e 44     010           ADDI.gp    a3,0xe44
        00000b3e f8 0b           010           IFCALL9    LAB_00000b54
                             LAB_00000b40                                       XREF[1]:     00000b5c(j)  
        00000b40 5a 08 80 f6     - ? -         BNEC       a0,0x80,LAB_00000b2c
        00000b44 d5 f9           - ? -         J8         LAB_00000b36
                             LAB_00000b46                                       XREF[2]:     00000b38(j), 00000b5e(j)  
        00000b46 3e 28 00 18     010           ADDI.gp    a2,0x18
        00000b4a 98 42           010           ADD333     a1,a0,a2
        00000b4c b4 21           010           LWI450     a1=>DAT_0209001c,[a1]=>PTR_DAT_02090018          = 80000100
        00000b4e c1 0a           010           BEQZ38     a1,LAB_00000b62
        00000b50 3e 38 0c a0     010           ADDI.gp    a3,0xca0
                             LAB_00000b54                                       XREF[1]:     00000b3e(j)  
        00000b54 b4 21           010           LWI450     a1=>DAT_80000100,[a1]                            = ??
        00000b56 98 83           010           ADD333     a2,a0,a3
        00000b58 9c 04           010           ADDI333    a0,a0,0x4
        00000b5a b6 22           010           SWI450     a1,[a2]=>DAT_02090ca0
        00000b5c 83 ff           010           IFRET16
        00000b5e 5a 08 40 f4     010           BNEC       a0,0x40,LAB_00000b46
                             LAB_00000b62                                       XREF[1]:     00000b4e(j)  
        00000b62 fc 80           010           POP25      {s0,fp,gp,lp},0x0

void FUN_00000b1c(void)

{
  bool bVar1;
  int iVar2;
  undefined4 *puVar3;
  undefined4 *puVar4;
  code *UNRECOVERED_JUMPTABLE;
  
  DAT_02093534 = DAT_80000060;
  iVar2 = 0x0;
  if (PTR_DAT_02090098 == (undefined *)0x0) {
    iVar2 = 0x0;
    goto LAB_00000b46;
  }
  puVar4 = &DAT_02090e44;
  if (true) {
    UNRECOVERED_JUMPTABLE = (code *)0xb40;
  }
  bVar1 = true;
  puVar3 = (undefined4 *)PTR_DAT_02090098;
  while( true ) {
    puVar4 = (undefined4 *)(iVar2 + (int)puVar4);
    iVar2 = iVar2 + 0x4;
    *puVar4 = *puVar3;
                    /* WARNING: Could not recover jumptable at 0x00000b5c. Too many branches */
                    /* WARNING: Treating indirect jump as call */
    if (bVar1) {
      (*UNRECOVERED_JUMPTABLE)();
      return;
    }
    if (iVar2 == 0x40) break;
LAB_00000b46:
    bVar1 = false;
    if (*(undefined4 **)((int)&PTR_DAT_02090018 + iVar2) == (undefined4 *)0x0) {
      return;
    }
    puVar4 = &DAT_02090ca0;
    puVar3 = *(undefined4 **)((int)&PTR_DAT_02090018 + iVar2);
  }
  return;
}

This seems to almost "just work", with exception of flow to b40 being screwed up. I guess this is because IFCALL is a branch, and therefor has no fallthrough. Even though the IFRET properly references b40, the listing and decompiler both get confused (although, maybe for different reasons?).

Manually overriding the fallthrough on the IFCALL to fallthrough to b40 does not fix the - ? - stack depth in the listing nor improve decompilation, although the edge does appear in the function graph.

here is the listing with pcode visible

                             **************************************************************
                             *                          FUNCTION                          *
                             **************************************************************
                             undefined FUN_00000b1c()
                               assume gp = 0x2090000
                               assume ITB = 0xa38c
             undefined         a0:1           <RETURN>
                             FUN_00000b1c                                       XREF[1]:     02090208(*)  
        00000b1c fc 00             0           PUSH25     {s0,fp,gp,lp},0x0=>DAT_02090000                  = 000000FFh
                                                      sp = INT_SUB sp, 4:4
                                                      STORE ram(sp), lp
                                                      sp = INT_SUB sp, 4:4
                                                      STORE ram(sp), gp
                                                      sp = INT_SUB sp, 4:4
                                                      STORE ram(sp), fp
                                                      sp = INT_SUB sp, 4:4
                                                      STORE ram(sp), s0
                                                      sp = INT_SUB sp, 0:4
        00000b1e dd 42           010           EX9.IT     0x2                                              ADDI.gp fp,-0x100
                                                      CALLOTHER "ex9it", 2:1
        00000b20 ea 66           010           EX9.IT     0x66                                             SETHI a0,0x80000
                                                      CALLOTHER "ex9it", 0x66:2
        00000b22 04 10 00 18     010           LWI        a1,[a0 + offset DAT_80000060]                    = ??
                                                      $U4c80:4 = INT_ADD a0, 0x60:4
                                                      a1 = LOAD ram($U4c80:4)
        00000b26 3c 1e 0d 4d     010           SWI.gp     a1,[+offset DAT_02093534]
                                                      $U7000:4 = INT_ADD gp, 0x3534:4
                                                      STORE ram($U7000:4), a1
        00000b2a 84 00           010           MOVI55     a0,0x0
                                                      a0 = COPY 0:4
                             LAB_00000b2c                                       XREF[1]:     00000b40(j)  
        00000b2c 3e 28 00 98     010           ADDI.gp    a2,0x98
                                                      a2 = INT_ADD gp, 0x98:4
        00000b30 98 42           010           ADD333     a1,a0,a2
                                                      a1 = INT_ADD a0, a2
        00000b32 b4 21           010           LWI450     a1=>PTR_DAT_02090098,[a1]                        = 8000010c
                                                      a1 = LOAD ram(a1)
        00000b34 c9 03           010           BNEZ38     a1,LAB_00000b3a
                                                        $Ub080:1 = INT_NOTEQUAL a1, 0:4
                                                        $U0:1 = BOOL_NEGATE $Ub080:1
                                                        CBRANCH <0>, $U0:1
                                                        IFC_ON = COPY 0:4
                                                        BRANCH *[ram]0xb3a:4
                                                      <0>
                             LAB_00000b36                                       XREF[1]:     00000b44(j)  
        00000b36 84 00           010           MOVI55     a0,0x0
                                                      a0 = COPY 0:4
        00000b38 d5 07           010           J8         LAB_00000b46
                                                      IFC_ON = COPY 0:4
                                                      BRANCH *[ram]0xb46:4
                             LAB_00000b3a                                       XREF[1]:     00000b34(j)  
        00000b3a 3e 38 0e 44     010           ADDI.gp    a3,0xe44
                                                      a3 = INT_ADD gp, 0xe44:4
        00000b3e f8 0b           010           IFCALL9    LAB_00000b54
                                                        $U100:1 = INT_EQUAL IFC_ON, 1:4
                                                        CBRANCH <0>, $U100:1
                                                        IFC_LP = COPY 0xb40:4
                                                      <0>
                                                        IFC_ON = COPY 1:4
                                                        BRANCH *[ram]0xb54:4
                             LAB_00000b40                                       XREF[1]:     00000b5c(j)  
        00000b40 5a 08 80 f6     - ? -         BNEC       a0,0x80,LAB_00000b2c
                                                        $U9d80:1 = INT_NOTEQUAL a0, 0x80:4
                                                        $U0:1 = BOOL_NEGATE $U9d80:1
                                                        CBRANCH <0>, $U0:1
                                                        IFC_ON = COPY 0:4
                                                        BRANCH *[ram]0xb2c:4
                                                      <0>
        00000b44 d5 f9           - ? -         J8         LAB_00000b36
                                                      IFC_ON = COPY 0:4
                                                      BRANCH *[ram]0xb36:4
                             LAB_00000b46                                       XREF[2]:     00000b38(j), 00000b5e(j)  
        00000b46 3e 28 00 18     010           ADDI.gp    a2,0x18
                                                      a2 = INT_ADD gp, 24:4
        00000b4a 98 42           010           ADD333     a1,a0,a2
                                                      a1 = INT_ADD a0, a2
        00000b4c b4 21           010           LWI450     a1=>DAT_0209001c,[a1]=>PTR_DAT_02090018          = 80000100
                                                      a1 = LOAD ram(a1)
        00000b4e c1 0a           010           BEQZ38     a1,LAB_00000b62
                                                        $Ub000:1 = INT_EQUAL a1, 0:4
                                                        $U0:1 = BOOL_NEGATE $Ub000:1
                                                        CBRANCH <0>, $U0:1
                                                        IFC_ON = COPY 0:4
                                                        BRANCH *[ram]0xb62:4
                                                      <0>
        00000b50 3e 38 0c a0     010           ADDI.gp    a3,0xca0
                                                      a3 = INT_ADD gp, 0xca0:4
                             LAB_00000b54                                       XREF[1]:     00000b3e(j)  
        00000b54 b4 21           010           LWI450     a1=>DAT_80000100,[a1]                            = ??
                                                      a1 = LOAD ram(a1)
        00000b56 98 83           010           ADD333     a2,a0,a3
                                                      a2 = INT_ADD a0, a3
        00000b58 9c 04           010           ADDI333    a0,a0,0x4
                                                      a0 = INT_ADD a0, 4:4
        00000b5a b6 22           010           SWI450     a1,[a2]=>DAT_02090ca0
                                                      STORE ram(a2), a1
        00000b5c 83 ff           010           IFRET16
                                                        $U180:1 = INT_EQUAL IFC_ON, 0:4
                                                        CBRANCH <0>, $U180:1
                                                        IFC_ON = COPY 0:4
                                                        BRANCHIND IFC_LP
                                                      <0>
        00000b5e 5a 08 40 f4     010           BNEC       a0,0x40,LAB_00000b46
                                                        $U9d80:1 = INT_NOTEQUAL a0, 64:4
                                                        $U0:1 = BOOL_NEGATE $U9d80:1
                                                        CBRANCH <0>, $U0:1
                                                        IFC_ON = COPY 0:4
                                                        BRANCH *[ram]0xb46:4
                                                      <0>
                             LAB_00000b62                                       XREF[1]:     00000b4e(j)  
        00000b62 fc 80           010           POP25      {s0,fp,gp,lp},0x0
                                                      sp = INT_ADD sp, 0:4
                                                      s0 = LOAD ram(sp)
                                                      sp = INT_ADD sp, 4:4
                                                      fp = LOAD ram(sp)
                                                      sp = INT_ADD sp, 4:4
                                                      gp = LOAD ram(sp)
                                                      sp = INT_ADD sp, 4:4
                                                      lp = LOAD ram(sp)
                                                      sp = INT_ADD sp, 4:4
                                                      IFC_ON = COPY 0:4
                                                      RETURN lp

0 replies

shuffle2 · 2024-06-25T16:10:48Z

shuffle2
Jun 25, 2024
Author

I should also mention that I've tried to use a bit in a contextreg for IFC_ON flag, and it does not seem to help/work at all.

0 replies

shuffle2 · 2024-06-25T16:28:24Z

shuffle2
Jun 25, 2024
Author

The stack depth tracking issue seems to be "fixed" if I change

macro ifret() {
    if (IFC_ON == 0) goto <end>;
        psw_ifcon_clear();
        goto [IFC_LP];
    <end>
}

which was generating this <0> label:

00000b5c 83 ff           010           IFRET16
            $U180:1 = INT_EQUAL IFC_ON, 0:4
            CBRANCH <0>, $U180:1
            IFC_ON = COPY 0:4
            BRANCHIND IFC_LP
          <0>

to

macro ifret() {
    if (IFC_ON == 0) goto inst_next;
        psw_ifcon_clear();
        goto [IFC_LP];
}

However, this isn't ideal - I've purposefully used end-of-instruction labels instead of inst_next to workaround problems relating to the EX9.IT instruction (4byte instructions executed from the 2byte EX9.IT instruction need to refer to PC+2 instead of PC+4). This is not a worry for IFRET16, however it would be for the 4byte version of IFRET. This is OK - I had thought I needed to avoid using inst_next, but that issue seems to have been resolved some other way (probably because I've improved the injection/analyzer operations related to EX9.IT in the meantime).

In any case, the above only improves the stack tracking in the listing. it does not improve the decompilation or function graph.

0 replies

shuffle2 · 2024-06-25T17:19:10Z

shuffle2
Jun 25, 2024
Author

Similarly, I've noticed that changing the implementation of IFCALL from:

macro ifcall(target) {
    if (IFC_ON == 1) goto <ifcon>;
        IFC_LP = inst_next;
    <ifcon>
    IFC_ON = 1;
    goto target;
}

to

macro ifcall(target) {
    if (IFC_ON == 1) goto <ifcon>;
        IFC_LP = inst_next;
    <ifcon>
    IFC_ON = 1;
    goto target;
    goto inst_next;
}

(adding the trailing goto). Makes the IFCALL be considered CONDITIONAL_JUMP with a fallthrough, as opposed to UNCONDITIONAL_JUMP with no fallthrough. This adds the expected edge in the function graph, however the IFCALL still terminates the basic block (in graph view), and CONDITIONAL_JUMP is technically the incorrect semantic for the instruction. Additionally, the decompilation is not improved (it still complains that the IFRET results in unrecoverable jump table which returns afterwards - both incorrect behaviors).

0 replies

shuffle2 · 2024-06-25T19:39:54Z

shuffle2
Jun 25, 2024
Author

Another oddity:

                             LAB_00000c82                                       XREF[1]:     00000c94(j)  
        00000c82 eb 0f           010           EX9.IT     0x10f                                            SETHI a0,0x80021
        00000c84 a0 44           010           LWI333     a1,[a0, offset DWORD_80021010]                   = ??
        00000c86 42 10 8c 0b     010           BTST       a1,a1,0x3
        00000c8a 83 ff           010           IFRET16
        00000c8c c1 04           010           BEQZ38     a1,LAB_00000c94
        00000c8e a0 04           010           LWI333     a0,[a0, offset DWORD_80021010]                   = ??
        00000c90 eb d4           010           EX9.IT     0x1d4                                            BTST a0,a0,0x4
        00000c92 c8 07           010           BNEZ38     a0,LAB_00000ca0
                             LAB_00000c94                                       XREF[1]:     00000c8c(j)  
        00000c94 4e 00 ff f7     010           IFCALL     LAB_00000c82
                             LAB_00000c98                                       XREF[1]:     00000c8a(j)  
        00000c98 c9 04           010           BNEZ38     a1,LAB_00000ca0
        00000c9a a0 04           - ? -         LWI333     a0,[a0, 0x10]
        00000c9c eb 1c           - ? -         EX9.IT     0x11c                                            BTST a0,a0,0x2
        00000c9e c0 44           - ? -         BEQZ38     a0,LAB_00000d26
                             LAB_00000ca0                                       XREF[2]:     00000c92(j), 00000c98(j)  
        00000ca0 dd 5c           010           EX9.IT     0x1c                                             SETHI a0,0x81021

here, c98 fallthrough appears to cause stack analysis to die. But the problem is actually from the extra goto inst_next I've added in my previous comment (in the IFCALL @ c94) :( .

0 replies

shuffle2 · 2024-06-26T21:44:28Z

shuffle2
Jun 26, 2024
Author

I'm going to try and fix this in the decompiler (although I'm unfamiliar with it). When I first started out, using gotos in the ifcall/ifret macros, I notice (via xml produced from Debug Function Decompilation) that the decompiler isn't even recieving all bytes of FUN_00000b1c. The code at b40 and b44 is omitted, even though the disassembler and graph views show control may flow into them :/

Restoring ifcall/ifret to use call/return causes ghidra to pass the full function code bytes to the decompiler...

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementing "inline function call" flow #6627

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 8 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Implementing "inline function call" flow #6627

shuffle2 Jun 10, 2024

Replies: 8 comments

shuffle2 Jun 19, 2024 Author

shuffle2 Jun 24, 2024 Author

shuffle2 Jun 25, 2024 Author

shuffle2 Jun 25, 2024 Author

shuffle2 Jun 25, 2024 Author

shuffle2 Jun 25, 2024 Author

shuffle2 Jun 25, 2024 Author

shuffle2 Jun 26, 2024 Author

shuffle2
Jun 10, 2024

shuffle2
Jun 19, 2024
Author

shuffle2
Jun 24, 2024
Author

shuffle2
Jun 25, 2024
Author

shuffle2
Jun 25, 2024
Author

shuffle2
Jun 25, 2024
Author

shuffle2
Jun 25, 2024
Author

shuffle2
Jun 25, 2024
Author

shuffle2
Jun 26, 2024
Author