[REQUEST] Support Intel assembly syntax #106

pleroy · 2024-06-02T14:08:24Z

OSACA only supports the AT&T assembly syntax. On Windows the Intel syntax is more common (especially with MSVC). There are tools out there that purport to convert one format to the other, but they are not quite ready for prime time, and it's also unclear if OSACA would need to tackle all the complexity of, say, C++ name mangling (I'd assume that a symbol is a symbol is a symbol and that there is no need to understand what it corresponds to).

The parser seems to be pretty well structured, so I would assume that supporting a new syntax amounts to adding a new parser subclass and would be a moderate amount of work.

If you are not interested in doing it yourselves, would you accept a contribution? (Not a commitment, just exploratory at this point.)

JanLJL · 2024-06-03T07:55:25Z

Hi @pleroy ,
I don't have any experience with the MSVC compiler, I would assume there should be a compiler flag to produce AT&T syntax out of the box, like the -masm=dialect flag for GCC?

Adding Intel syntax support to our parser is on our list of TODOs for a long time, however, we haven't prioritized it so far due to limited resources. If you want to contribute, we would be more than happy about a PR (even just an initial approach we can build on and discuss about together afterwards)!

pleroy · 2024-06-03T15:31:48Z

Hi @JanLJL -- Thanks for your quick reply.

I don't think that there is a flag to ask the MSVC compiler to produce the AT&T syntax (or if there is, it is well hidden). The relevant documentation is here and it doesn't even mention the syntax that it produces. Note incidentally that godbolt displays the Intel syntax, which is not a proof of anything, but a hint that maybe that's the only thing there is for MSVC.

I don't want to put the Intel parser on the critical path of some performance-sensitive code that I am writing, but the time will come when I will really want to do precise latency/throughput analysis. I'll try to split the work in relatively small chunks for ease of review.

pleroy · 2024-11-10T22:33:05Z

For the record I have started working on this. Far from complete, but I am making good progress. If someone is interested in taking a look, the work is happening in the fork https://github.com/mockingbirdnest/OSACA.

pleroy · 2024-12-31T14:55:41Z

Hi @JanLJL -- I am ready to declare victory in my quest to add support for the Intel syntax to OSACA. I was able to use it on reasonably complex code and derived significant optimizations based on what OSACA was showing. If you are curious, you can see the optimizations in this PR and the kind of code that I am feeding to OSACA in the description of this PR.

As you can expect, the changes to add the Intel syntax are rather large and required some restructuring. You can see the diff here. Here is an overview of the changes:

There is a new class, ParserX86Intel , to parse the Intel syntax. It produces instruction forms in the order of the syntax, i.e., the first operand is the destination.
In order to gather properties that don't depend on the syntax, such as the properties of the registers, there is a new class ParserX86 which is the superclass of ParserX86ATT and ParserX86Intel.
The matching of markers is now done on the instruction forms because it needs to be independent from details like the syntax of numeric literals that should only be known to the parser.
There is a new phase in between parsing and semantics, the "normalization". Its purpose is to align the instructions with the data found in the ISA and architecture models. For Intel, this obviously involves swapping the operands. But it also exists for the other parsers: for AT&T for instance, it does the mov/movq adjustment so that the rest of the code doesn't have to care about it.
The semantics is largely unchanged, except that it checks that the instructions it processes have been normalized.
There are, of course, extensive tests and data files.

Please let me know how you want to proceed to upstream this work. There are about 2500 lines of diff (a good chunk of it for the tests) so it's maybe hard to review in a single PR. It should be possible to extract smaller chunks for review, although of course it will take some work to do such a split.

JanLJL · 2025-01-14T16:12:38Z

Hi @pleroy ,

thank you very much for your input and this interesting work, I highly appreciate your contribution!
Sorry that it took me so long to have a closer look at it...
I cloned the Diff branch of your OSACA repository and tried to run the Gauss-Seidel kernel with Intel Syntax (as the ICC compiles it with -O3 and -xCORE-AVX512, see the code here):

..B1.18:                        # Preds ..B1.18 ..B1.17
        vmovsd    xmm2, QWORD PTR [8+r11+r10]                   #26.19
        inc       bl                                            #24.3
        vaddsd    xmm3, xmm2, QWORD PTR [16+r11+r8]             #25.5
        vaddsd    xmm4, xmm3, QWORD PTR [8+r11+r9]              #25.5
        vaddsd    xmm1, xmm4, xmm1                              #25.5
        vmulsd    xmm8, xmm0, xmm1                              #27.21
        vmovsd    QWORD PTR [8+r11+r8], xmm8                    #25.5
        vmovsd    xmm5, QWORD PTR [16+r11+r10]                  #26.19
        vaddsd    xmm6, xmm5, QWORD PTR [24+r11+r8]             #26.19
        vaddsd    xmm7, xmm6, QWORD PTR [16+r11+r9]             #27.9
        vaddsd    xmm9, xmm7, xmm8                              #27.21
        vmulsd    xmm13, xmm0, xmm9                             #27.21
        vmovsd    QWORD PTR [16+r11+r8], xmm13                  #25.5
        vmovsd    xmm10, QWORD PTR [24+r11+r10]                 #26.19
        vaddsd    xmm11, xmm10, QWORD PTR [32+r11+r8]           #26.19
        vaddsd    xmm12, xmm11, QWORD PTR [24+r11+r9]           #27.9
        vaddsd    xmm14, xmm12, xmm13                           #27.21
        vmulsd    xmm18, xmm0, xmm14                            #27.21
        vmovsd    QWORD PTR [24+r11+r8], xmm18                  #25.5
        vmovsd    xmm15, QWORD PTR [32+r11+r10]                 #26.19
        vaddsd    xmm16, xmm15, QWORD PTR [40+r11+r8]           #26.19
        vaddsd    xmm17, xmm16, QWORD PTR [32+r11+r9]           #27.9
        vaddsd    xmm19, xmm17, xmm18                           #27.21
        vmulsd    xmm1, xmm0, xmm19                             #27.21
        vmovsd    QWORD PTR [32+r11+r8], xmm1                   #25.5
        add       r11, 32                                       #24.3
        cmp       bl, 49                                        #24.3
        jb        ..B1.18       # Prob 27%                      #24.3

During parsing, I ran into the following errors:

It fails to parse labels with two dots as in line 1: ..B1.18
Only comments with ; are accepted (as defined in the parser here, so it fails for comments with #
It fails to parse memory addresses with type specifiers such as "QWORD", e.g., QWORD PTR [16+r11+r8] gets not recognized and needs to be changed to [r11+r8*8+16] to work

After fixing those things manually in the assembly, I got a pretty good-looking output:

Combined Analysis Report
------------------------
                                                 Port pressure in cycles
   |  0   - 0DV  |   1   - 1DV  |  2   |  3   |  4   |   5   |  6   |  7   |  8   |  9   |  10  |  11  ||  CP  | LCD  |
-------------------------------------------------------------------------------------------------------------------------
 1 |             |              |      |      |      |       |      |      |      |      |      |      ||      |      |   .B1.18:
 2 |             |              | 0.33 | 0.33 |      |       |      |      |      |      |      | 0.33 ||  5.0 |      |   vmovsd    xmm2,  [r10+r11*8+8]
 3 | 0.00        | -0.01        |      |      |      | 0.000 | 0.50 |      |      |      | 0.50 |      ||      |      |   inc       bl
 4 |             | 0.500        | 0.33 | 0.33 |      | 0.500 |      |      |      |      |      | 0.33 ||  2.0 |      |   vaddsd    xmm3, xmm2,  [r8+r11*8+16]
 5 |             | 0.500        | 0.33 | 0.33 |      | 0.500 |      |      |      |      |      | 0.33 ||  2.0 |      |   vaddsd    xmm4, xmm3,  [r9+r11*8+8]
 6 |             | 0.750        |      |      |      | 0.250 |      |      |      |      |      |      ||  2.0 |  2.0 |   vaddsd    xmm1, xmm4, xmm1
 7 | 1.00        | 0.000        |      |      |      |       |      |      |      |      |      |      ||  4.0 |  4.0 |   vmulsd    xmm8, xmm0, xmm1
 8 |             |              |      |      | 0.50 |       |      | 0.50 | 0.50 | 0.50 |      |      ||      |      |   vmovsd    [r8+r11*8+8], xmm8
 9 |             |              | 0.33 | 0.33 |      |       |      |      |      |      |      | 0.33 ||      |      |   vmovsd    xmm5,  [r10+r11*8+16]
10 |             | 0.500        | 0.33 | 0.33 |      | 0.500 |      |      |      |      |      | 0.33 ||      |      |   vaddsd    xmm6, xmm5,  [r8+r11*8+24]
11 |             | 0.500        | 0.33 | 0.33 |      | 0.500 |      |      |      |      |      | 0.33 ||      |      |   vaddsd    xmm7, xmm6,  [r9+r11*8+16]
12 |             | 0.750        |      |      |      | 0.250 |      |      |      |      |      |      ||  2.0 |  2.0 |   vaddsd    xmm9, xmm7, xmm8
13 | 1.00        | 0.000        |      |      |      |       |      |      |      |      |      |      ||  4.0 |  4.0 |   vmulsd    xmm13, xmm0, xmm9
14 |             |              |      |      | 0.50 |       |      | 0.50 | 0.50 | 0.50 |      |      ||      |      |   vmovsd    [r8+r11*8+16], xmm13
15 |             |              | 0.33 | 0.33 |      |       |      |      |      |      |      | 0.33 ||      |      |   vmovsd    xmm10,  [r10+r11*8+24]
16 |             | 0.500        | 0.33 | 0.33 |      | 0.500 |      |      |      |      |      | 0.33 ||      |      |   vaddsd    xmm11, xmm10,  [r8+r11*8+32]
17 |             | 0.500        | 0.33 | 0.33 |      | 0.500 |      |      |      |      |      | 0.33 ||      |      |   vaddsd    xmm12, xmm11,  [r9+r11*8+24]
18 |             | 0.750        |      |      |      | 0.250 |      |      |      |      |      |      ||  2.0 |  2.0 |   vaddsd    xmm14, xmm12, xmm13
19 | 1.00        | 0.000        |      |      |      |       |      |      |      |      |      |      ||  4.0 |  4.0 |   vmulsd    xmm18, xmm0, xmm14
20 |             |              |      |      | 0.50 |       |      | 0.50 | 0.50 | 0.50 |      |      ||      |      |   vmovsd    [r8+r11*8+24], xmm18
21 |             |              | 0.33 | 0.33 |      |       |      |      |      |      |      | 0.33 ||      |      |   vmovsd    xmm15,  [r10+r11*8+32]
22 |             | 0.500        | 0.33 | 0.33 |      | 0.500 |      |      |      |      |      | 0.33 ||      |      |   vaddsd    xmm16, xmm15,  [r8+r11*8+40]
23 |             | 0.250        | 0.33 | 0.33 |      | 0.750 |      |      |      |      |      | 0.33 ||      |      |   vaddsd    xmm17, xmm16,  [r9+r11*8+32]
24 |             | 0.000        |      |      |      | 1.000 |      |      |      |      |      |      ||  2.0 |  2.0 |   vaddsd    xmm19, xmm17, xmm18
25 | 1.00        | 0.000        |      |      |      |       |      |      |      |      |      |      ||  4.0 |  4.0 |   vmulsd    xmm1, xmm0, xmm19
26 |             |              |      |      | 0.50 |       |      | 0.50 | 0.50 | 0.50 |      |      ||  0.0 |      |   vmovsd    [r8+r11*8+32], xmm1
27 | 0.00        | 0.000        |      |      |      | -0.01 | 0.50 |      |      |      | 0.50 |      ||      |      |   add       r11, 32
28 | 0.00        | -0.01        |      |      |      | 0.000 | 0.50 |      |      |      | 0.50 |      ||      |      |   cmp       bl, 49
29 |             |              |      |      |      |       |      |      |      |      |      |      ||      |      |   jb        .B1.18
     4.00          5.980          4.00   4.00   2.00   5.990   1.50   2.00   2.00   2.00   1.50   4.00      33   24.0

I like that the --syntax flag is not necessary as all parser/syntax combinations are tested but unfortunately, when running into an parser error, we now only get the error message in the inspect() function stating that no combination works instead of the actual assembly line where it broke as before, which was helpful for debugging.

Your code is nicely readable and documented, so I think we are on a very good path here and can get this integrated relatively quickly with some more extensive testing.
What do you think of opening a PR and we define the required fixed/enhancements there so we can track the process?

pleroy added the enhancement label Jun 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[REQUEST] Support Intel assembly syntax #106

[REQUEST] Support Intel assembly syntax #106

pleroy commented Jun 2, 2024

JanLJL commented Jun 3, 2024

pleroy commented Jun 3, 2024

pleroy commented Nov 10, 2024

pleroy commented Dec 31, 2024

JanLJL commented Jan 14, 2025

[REQUEST] Support Intel assembly syntax #106

[REQUEST] Support Intel assembly syntax #106

Comments

pleroy commented Jun 2, 2024

JanLJL commented Jun 3, 2024

pleroy commented Jun 3, 2024

pleroy commented Nov 10, 2024

pleroy commented Dec 31, 2024

JanLJL commented Jan 14, 2025