Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor test apps to use unit-test framework #4014

Merged
merged 96 commits into from
Jan 8, 2025
Merged

Conversation

bennylp
Copy link
Member

@bennylp bennylp commented Jul 11, 2024

This PR contains modifications to PJSIP test apps (pjlib-test, pjlib-util-test, pjnath-test, pjmedia-test, and pjsip-test) to use the new unit test framework (#4007) with the main objective to make them complete faster.

Timing Results

Let's get straight into it. Below are the test time improvements from the original with the new framework using several worker thread settings (1-10).

test Orig 1 2 3 4 7 10
pjlib-test 9m22s 4m51s 4m09s 2m52s 2m31s 2m21s 2m05s
pjlib-util-test 5m52s 3m03s 3m03s 3m03s
pjnath-test 45m42s 11m51s 9m44s 6m10s 4m31s
pjmedia-test 4m18s 2m08s 2m05s 2m05s
pjsip-test 28m22s 16m20s 14m12s 7m37s 7m37s

GitHub CI timings:

CI Orig 1 2 3 4 7 10
ci-linux.yml 46m34s 29m17s
ci-mac.yml 1h3m32s 34m07s
ci-windows.yml 49m17s 32m23s
GH PR CI total ~1h03m ~35m

Notes on timing

Settings with three worker threads (totalling four threads with the main thread) are significant because GitHub Ubuntu and Win runners use 4 VCPU this article. Mac-latest has 3 VCPU.

Some tests cannot be made faster than certain limit with more worker threads, because that is the longest test case duration in that test.

General look and feel

All test apps have common look and feel with uniform command line options, which look something like this:

Usage:
  pjmedia-test [OPTIONS] [test_to_run] [test to run] [..]

where OPTIONS:

 -h, --help       Show this help screen
  -c, --config     Show configuration macros
  -l 0,1,2,3       0: Don't show logging after tests
                   1: Show logs of only failed tests (default)
                   2: Show logs of only successful tests
                   3: Show logs of all tests
  --log-no-cache   Do not cache logging
  -w N             Set N worker threads (0: disable. Default: 1)
  -L, --list       List the tests and exit
  --stop-err       Stop testing on error
  --shuffle        Shuffle the test order
  --seed N         Set shuffle random seed (must be >= 0)
  --stdout-buf N   Set stdout buffering mode:
  --stderr-buf N   Set stderr buffering mode:
                   0: unbufferred (default for stderr)
                   1: line
                   2: fully bufferred (default for stdout)
  -v, --verbose    Show info when starting/stopping tests
  -i               Ask ENTER before quitting
  -n               Do not trap signals

The test outputs are also uniform, which look something like this:

$ ../bin/pjlib-util-test-x86_64-unknown-linux-gnu -w 3
08:19:13.278 Performing 7 pjlib-util tests with 3 worker threads
[ 1/7] xml_test                         [OK] [0.000s]
[ 2/7] stun_test                        [OK] [0.000s]
[ 3/7] json_test                        [OK] [0.000s]
[ 4/7] encryption_benchmark             [OK] [0.016s]
[ 5/7] encryption_test                  [OK] [0.028s]
[ 6/7] resolver_test                    [OK] [168.830s]
[ 7/7] http_client_test                 [OK] [183.617s]
08:22:16.897 Unit test statistics for pjlib-util tests:
08:22:16.897     Total number of tests: 7
08:22:16.897     Number of test run:    7
08:22:16.897     Number of failed test: 0
08:22:16.897     Total duration:        3m3.618s

Running the tests

With Makefile build system, it is easier to run the tests with the make command. The Makefile accepts two environment variables: CI_ARGS contains arguments for the test apps, and CI_MODE to indicate we're running under GitHub CI (#3374). Sample invocation:

$ make pjlib-test CI_ARGS="-w 4"

Otherwise (e.g. on Windows) run each of the app directly. Use -h to get help.

GitHub CI modifications

  • New pjproject repository variables: CI_UBUNTU_ARGS, CI_WIN_ARGS, CI_MAC_ARGS, and CI_MODE .
  • Combo jobs in were split into smaller jobs for better parallelism
  • More informative job names
  • Add more tests in the jobs (e.g. actually run the tests, verify correct TLS backend is used, verify codecs are installed, etc).

I think the CI should incorporate more elaborate tests and cover more features/scenarios, but that is outside the scope of this PR.

Tips on troubleshooting errors

When the logging does not convey sufficient info about the error, use --log-no-cache to display logs as they are written, most likely with -w 0 to disable worker thread to avoid cluttering the output.

But sometimes, problem only arises with specific worker thread number and test orders. In this case, troubleshooting will be challenging indeed. :) Use -v, --verbose to display when tests are started/ended. This way you can know what tests were started when the failed test was running. After that, you can try running only these tests rather than all tests to reproduce the problem.

Test shuffling (--shuffle arg) is used by default on GitHub CI via repository variables (see above). To reproduce the error, make note of the seed value used when running the (failed) test (it is printed in the output), and re-run the test (locally) using --shuffle --seed N args.

The --stop-err option is useful to avoid waiting for all tests to complete when debugging an error.

Open issues

Reproducibility

As mentioned above, we're supposed to be able to reproduce the test sequence by using --shuffle and specific --seed value. But this is not always the case. Even with the same seed, the test sequence can be different on different machine. We already even use our own psudo random number generator in unittest.c, but sometimes this does not fix the problem.

Test app modifications

General

There is a new utility file in pjlib/src/pjlib-test/test_util.h that is shared by all test apps to parse command line arguments, show usage, register tests, and control the unit testing process.

The main front-end files (main.c) were modified to be more nice as command line apps.

The main modification in test body (test.c) is to use the unit-test framework.

Some test codes were changed, replacing manual checks with PJ_TEST_XXX() macros, mainly to test the usage of these macros and to make the test nicer. But since it made the PR very big, I didn't continue the effort, unless when it was necessary for debugging some problems.

In general, large tests needed to be split into smaller ones to make them run in parallel. But major problems arose, mainly because the tests share global states or manipulate common objects.

More specific changes are discussed below.

pjlib-test notes

pjlib-test has "special" arrangements in test.c, because it needs to test the unit-test (UT) framework first, before running the rest of the test using the UT framework. But before testing the UT framework, it needs to test the components needed by the UT framework such as list, fifobuf, and OS. And so on. That's why the test output is different than the rest of the test apps.

Other than that, the modifications to the test functions are not too major, at least compared to pjnath-test and pjsip-test, and I think the test time is quite satisfactory.

pjlib-util-test notes

We couldn't speed up more because tests such as resolver_test() and http_client_test() takes about three minutes to complete and they couldn't be split up without major effort due to the use of global states. Since the test time is already quite satisfactory, I didn't pursue further optimizations.

pjnath-test notes

pjnath-test requires large modifications to make the tests run in parallel as follows:

  • remove global mem pool factory since many tests validate the memory leak in the pool factory, therefore having a single pool factory will not work
  • remove constant server port numbers in server.c so that server can be instantiated multiple times simultaneously (this was the motivation behind API to get DNS server's bound address to allow specifying zero as port number #3999).
  • split tests with multiple configurations (such as ice_test, turn_sock_test, concur_test) into individual test for each configuration, making them parallelable.

As the result, there are 70 smaller test items in pjnath-test, and with 7 worker threads, we can save 40 minutes of test time!

pjmedia-test notes

pjmedia-test has the least modifications because it has very few tests. The original duration was 4m18.691s, and has come down a little to 2m8.363s with 1 worker thread.

Having said that, some minor modifications were done:

  • replace pjmedia_endpt_create() with pjmedia_endpt_create2() (similarly ..destroy() with ..destroy2()) in mips_test() and codec_test_vectors(), to avoid inadvertently initializing pjmedia_aud_subsys which on Ubuntu emits lots of debugging messages during initialization (although the messages should have been suppressed in the code).
  • replace printf with log in jbuf test to make the output tidy, and renamed jbuf_main function name to jbuf_test to be consistent.

pjsip-test notes

pjsip-test has also gone through the biggest and most difficult modifications to make the tests parallelable, which involves:

  • changing tests to mark and uniquely identify its own message and skip messages belonging to other tests
  • remove global loop transport and replace with individual loop transport for each test, because each test usually modifies the loop transport's settings (such as to drop packets).
  • for the above, also use pjsip_tpselector to bind transaction (and tdata in case of stateless request) with specific loop transport, otherwise the transaction/tdata may find other instance of loop transport
  • fix tests that assume there is only one global loop transport (for example, with one global loop transport, there is no failover when sending messages fails).
  • bug fixing to the test code as some test flows have changed (for example, tsx_uac_test failed because UA layer has now been registered before the test)
  • changed tsx_basic_test, tsx_uac_test, tsx_uas_test to take the index to parameters rather than the parameter itself to make the test output more informative.

bennylp and others added 30 commits June 8, 2024 17:32
…e new framework (reason: because parallel flag is not set, doh!)
…use of errors when running on Windows virtual machine
…ing, and running unit test. This can be used by all test apps. pjlib-util-test has been ported to use this utilities
…sential test because it does not exist in features test (in pjlib-test)
…up from 45m originally to 15m using 10 worker threads
…4:30 minutes with 10 worker threads, from 45:42m originally)
… automatic error reporting) hopefully make it easier to use
@bennylp
Copy link
Member Author

bennylp commented Jan 5, 2025

I think this is done.

TODO probably as follow up PRs:

  • replace all checks with PJ_TEST_xxx (we'll do it later when we need to fix something in the test)
  • more elaborate unit tests in yaml (e.g. Win64, MacOS 13, Windows 2025, all audio codecs verification, running python pjsua, etc.)

@bennylp bennylp requested review from sauwming, trengginas and nanangizz and removed request for trengginas, nanangizz and sauwming January 5, 2025 02:09
@nanangizz
Copy link
Member

For CI Windows, should we also add 64-bit version?

.github/workflows/ci-linux.yml Show resolved Hide resolved
.github/workflows/ci-linux.yml Outdated Show resolved Hide resolved
@sauwming
Copy link
Member

sauwming commented Jan 6, 2025

For CI Windows, should we also add 64-bit version?

Rather than double the number of tests, perhaps we can just make 64-bit as the default for testing (i.e. make every test config build for win64) instead? Since most machines should be 64-bit nowadays.
Then only add one 32-bit build-only.

@bennylp
Copy link
Member Author

bennylp commented Jan 7, 2025

Wholeheartedly agree on Win64 (I already wrote it as todo in my comment above). Unfortunately the pjsip/third_party_libs only has x86 as the target for most libs (some do have x64 as target). So rather than just building and testing basic exe, I was thinking to do the work as separate PR to create more elaborate CI.

@bennylp
Copy link
Member Author

bennylp commented Jan 8, 2025

Win CI sometimes failed this week (it was very stable last week):

  1. sleep test randomly failed (slept for more than tolerated, even with --ci-mode). CI machine load?
  2. pjlib and pjsip tests randomly terminated with exit code 1. Unsure why, unable to reproduce.

Will keep watching for these issues.

@bennylp bennylp merged commit 0408536 into master Jan 8, 2025
41 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants