Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_elmhes and test_formats fail on non-x86_64 #1833

Closed
opoplawski opened this issue May 21, 2024 · 16 comments
Closed

test_elmhes and test_formats fail on non-x86_64 #1833

opoplawski opened this issue May 21, 2024 · 16 comments

Comments

@opoplawski
Copy link
Contributor

Working on updating the Fedora package to 1.0.5 and getting:

        Start  82: test_elmhes.pro
82: Test command: /builddir/build/BUILD/gdl-v1.0.5/build/src/gdl "-quiet" "-e" "if execute('test_elmhes') ne 1 then exit, status=1"
82: Working Directory: /builddir/build/BUILD/gdl-v1.0.5/build/testsuite
82: Environment variables: 
82:  LC_COLLATE=C
82:  GDL_PATH=/builddir/build/BUILD/gdl-v1.0.5/testsuite/:/builddir/build/BUILD/gdl-v1.0.5/src/pro/
82:  GDL_STARTUP=
82:  IDL_STARTUP=
82: Test timeout computed to be: 3600
82: % Compiled module: TEST_ELMHES.
82: % Compiled module: ERRORS_ADD.
82: % TEST_ELMHES: Error on operation : bad result elmhes
82: % TEST_ELMHES: Error on operation : bad result elmhes,/no_balance
82: % TEST_ELMHES: Error on operation : bad result elmhes,/column
82: % Compiled module: BANNER_FOR_TESTSUITE.
82: % Compiled module: GDL_IDL_FL.
82: % TEST_ELMHES: ===================================================
82: % TEST_ELMHES: =                                                 =
82: % TEST_ELMHES: =  3 errors encountered during TEST_ELMHES tests  =
82: % TEST_ELMHES: =                                                 =
82: % TEST_ELMHES: ===================================================
 82/212 Test  #82: test_elmhes.pro ....................***Failed    0.17 sec
        Start 100: test_formats.pro
100: Test command: /builddir/build/BUILD/gdl-v1.0.5/build/src/gdl "-quiet" "-e" "if execute('test_formats') ne 1 then exit, status=1"
100: Working Directory: /builddir/build/BUILD/gdl-v1.0.5/build/testsuite
100: Environment variables: 
100:  LC_COLLATE=C
100:  GDL_PATH=/builddir/build/BUILD/gdl-v1.0.5/testsuite/:/builddir/build/BUILD/gdl-v1.0.5/src/pro/
100:  GDL_STARTUP=
100:  IDL_STARTUP=
100: Test timeout computed to be: 3600
100: % Compiled module: TEST_FORMATS.
100: % Compiled module: GDL_IDL_FL.
100: % GDL_IDL_FL: Detected Software : GDL
100: % When using the RAN1 mode, be sure to keep the RAN1 and dSFMT seed arrays in separate variables.
100: multiple reference file <<formats.GDL>> found ! First used !!
100: /builddir/build/BUILD/gdl-v1.0.5/build/testsuite/formats.GDL
100: /builddir/build/BUILD/gdl-v1.0.5/testsuite/formats.GDL
100: Files to be compared : formats.IDL, formats.GDL
100: % Compiled module: BANNER_FOR_TESTSUITE.
100: % TEST_FORMATS: =======================================================
100: % TEST_FORMATS: =                                                     =
100: % TEST_FORMATS: =  1595 errors encountered during TEST_FORMATS tests  =
100: % TEST_FORMATS: =                                                     =
100: % TEST_FORMATS: =======================================================
100/212 Test #100: test_formats.pro ...................***Failed    0.65 sec
@alaingdl
Copy link
Contributor

alaingdl commented May 22, 2024

Thanks @opoplawski

Looking in the code of test_elmhes.pro, due to the way the tests are done internally,
I think these 2 failures (test_elmhes & test_formats) are related to issue in formats :(

I have no way to test on my side on a recent Fedora, and I have no problem on Debian, Ubuntu & OSX !

What is the version of compiler do you have ?

thanks

@opoplawski
Copy link
Contributor Author

This is with gcc 14.1.1. But it's also failing on EL9 with 11.4.1. You can check recent build logs here: https://koji.fedoraproject.org/koji/packageinfo?packageID=1830

@GillesDuvert
Copy link
Contributor

I'm pretty sure formats won't be OK on non 64 bits machines. So some tests based on formatted string comparison won't work either.
The thing is, nobody in the team knows what GDL should produce on 32 bit machines!
I would suggest to avoid doing these tests on 32 bit machines, as they do not mean that GDL does not work.
And wait for an user that reports a specific issue on 32 bit machine.

@opoplawski
Copy link
Contributor Author

opoplawski commented May 23, 2024

These are all 64 bit architectures - aarch64, ppc64le, s390x

@GillesDuvert
Copy link
Contributor

These are all 64 bit architectures - aarch64, ppc64le, s390x

@opoplawski sorry but your issue refers to "non-x86_64" architectures. My above comment holds: better to remove theses tests from the list of tests in "non-x86_64" architectures building as they are meaningless.

@opoplawski
Copy link
Contributor Author

I was just responding to your comment about 32-bits. But if the tests only apply to x86_64 that's fine. Although it would be nice if the tests could deselect themselves on non-x86_64. Anyway, I'm excluding them now.

@GillesDuvert
Copy link
Contributor

thanks @opoplawski but I feel there is a misunderstanding: according to internet, s390x is a 32 bit machine when aarch64 is not. Inasmuch as I expect trouble on 32 bit machines, as we have no such machine with a working IDL at our disposal to crosscheck, there should be no problem on a 64 bit little or big endian IEEE 754 architectures. So your report of a test failure is important in this case.

@opoplawski
Copy link
Contributor Author

s390x is definitely a 64 bit architecture: https://developer.fedoraproject.org/deployment/secondary_architectures/s390.html. s390 is 31/32 bit hybrid. I'll reopen then I guess. Let me know what other information would be helpful for tracking this down.

@opoplawski opoplawski reopened this May 23, 2024
@slayoo
Copy link
Member

slayoo commented May 24, 2024

@opoplawski, do I understand correctly that the tests pass OK on Fedora arm64 builds?
In #1788, we are introducing Apple Silicon builds to CI, but the PR is blocked by two tests failing: test_byte_conversion.pro and test_bytscl.pro; if that is the case, it then seems to be an Apple compiler issue?

@GillesDuvert
Copy link
Contributor

To go further, one needs at least to know what fails -
1595 errors on test_format: I gues every format is wrong. The test procedure creates a file "formats.GDL". @opoplawski could you send it?
For AppleSilicon, I have access to an M1, just need to find the time.

@alaingdl
Copy link
Contributor

OK, I just compiled current git version on a new M2 machine (OSX) and I have the same issues :
test_elmhes.pro and test_formats.pro (I will look at test_formats later !)

On x86 processor, IDL & GDL give (first test) :

P               DOUBLE    =   -2.8958759e-07
PT              STRING    = '-00.00000029'
ST              STRING    = '101.32080078'
T               FLOAT     =       101.321
GDL> print, b
     0.500000      11.4800      5.50000      5.00000
      6.25000      30.2200      20.7500      14.5000
     0.680000      3.02080      1.28000      1.28000
     0.360000     0.500000      0.00000      0.00000

But on M2:

P               DOUBLE    =        0.0000000
PT              STRING    = '000.00000000'
ST              STRING    = '101.32079315'
T               FLOAT     =       101.321

GDL> print, b
     0.500000      11.4800      5.50000      5.00000
      6.25000      30.2200      20.7500      14.5000
     0.680000      3.02080      1.28000      1.28000
     0.360000     0.500000      0.00000      0.00000

Then from my point of view just numerical rounding and the test should be rewritten taking into account EPS

@GillesDuvert
Copy link
Contributor

Certainly. The cumulative rounding errors make our results different between machines, and, most of all, different with IDL that does not use the same algorithms. The difficulty is to fix a safe error margin, as precisions can well drop down to 10-3 for floats.

@alaingdl
Copy link
Contributor

alaingdl commented Jun 4, 2024

I updated test_elmhes.pro in Pr #1840 with a numerical tolerance of 1e-5. For me it is close.

Concerning test_formats.pro, from what I see in the outputs, we do have a big/little indian problem ... It is a serious issue. The good news is I have now a permanent access to a M2 OSX machine (very fast feed. But Is have no time now, and I feel not competent on that. But maybe a simple flag could solve most of the problems. I hope @GillesDuvert will have time for that since he previously improved formats ...

@GillesDuvert
Copy link
Contributor

The only differences are on unsigned 32 and bits ints and +/-NaN and +INF. I would not say it is an endianess problem.

@GillesDuvert
Copy link
Contributor

see #1949 : some machines (ARM64) do not convert to unsigned ints as on Intel. NaN and INF issues in test_formats come from the fact that these floating-point pseudo-values are converted to unsigned integers (rather than bit fields?) before printing bits (to print we use C and C++ standards). #1949 should have suppressed the float-to-unsigned-int difference of conversion between IA64 and ARM64 (and others, probably).
In other words: apart some NaN and Inf 'printing' problems, no more tested in test_formats, there should be no difference anymore.

@GillesDuvert
Copy link
Contributor

Closing with the above explanation, dear Orion you can open a new issue if there is another 'portablity' problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants