-
Notifications
You must be signed in to change notification settings - Fork 313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rally on aarch64 appears to leak memory #1721
Comments
Can you please share the OOM killer output from Generally the system should be able to reclaim pages as required to avoid invoking the OOMKiller, but it is possible for physical memory to become fragmented in a way that despite having enough total free memory, there's not enough of it available in a physically contiguous chunk(s) which can trigger a panic. The actual OOMKiller event log should include enough information to see if that is what might be happening. |
I started a reproduction attempt using the same Rally parameters on as close-of-a hardware profile that I could:
I've noticed some strange behaviour related to how Rally handles sampling and the subsequent flushing to a remote metric store that could perhaps explain excess memory usage in some scenarios. Are you using a remote metrics store in this scenario? Regardless, the OOMKiller output will still be invaluable. |
I did not set up a metrics store for this run incidentally and it's a long run in my track params, perhaps it's as simple as that. OOM info
|
I think that actually could be it, the default in-memory metrics store stores all collected during the execution of a particular task in a per-core ( The OOMKiller output pretty much confirms this for me as the RSS for In my reproduction I found that this particular benchmark and challenge ( The default You can see the full details here #1723 and here #1724 For now, there's two things you can do to work around this:
|
Rally version (get with
esrally --version
):esrally 2.7.1
Invoked command:
~/.local/bin/esrally race --challenge logging-indexing-querying --track elastic/logs --target-hosts=${URL}:9200 --pipeline=benchmark-only --client-options="enable_cleanup_closed:true,use_ssl:true,verify_certs:false,basic_auth_user:'elastic',basic_auth_password:$PASSWORD" --track-params="bulk_indexing_clients:48,number_of_shards:3,number_of_replicas:1,start_date:2022-12-22,end_date:2022-12-24,raw_data_volume_per_day:1024GB,data_generation_clients:16,throttle_indexing:true,query_min_date:2022-12-22,query_max_date:2022-12-24" --kill-running-processes
Configuration file (located in
~/.rally/rally.ini
)):JVM version:
N/A - running against remote cluster
OS version:
ubuntu@ip-192-168-6-238:
$ uname -a22.04.1-Ubuntu SMP Mon Apr 24 01:58:03 UTC 2023 aarch64 aarch64 aarch64 GNU/LinuxLinux ip-192-168-6-238 5.19.0-1025-aws #26
Description of the problem including expected versus actual behavior:
Steps to reproduce:
Provide logs (if relevant):
On aarch64 there appears to be a leak that I have not seen on x86_64
Start before rally run:
Run command
Beginning of rally run:
About 8 minutes in:
About 17 minutes in:
About 40 minutes in:
It continues to consume system memory until the server ooms.
The text was updated successfully, but these errors were encountered: