Reserving CPU resource in CPU inference #27321

sunxiaoxia2022 · 2024-10-30T01:49:06Z

Details:

Add property ov::hint::enable_cpu_reservation to reserve CPU resource in CPU inference
ov::hint::enable_cpu_reservation defaults to false, user can explicitly set it to true to enable CPU reservation.
update proc_type_table before stream scheduling in compile_model()

Tickets:

…cpu_pinning=false in GPU plugin

wangleis · 2024-10-31T02:01:30Z

src/plugins/intel_gpu/src/runtime/execution_config.cpp

+        }
+    }
+    if (get_property(ov::hint::enable_cpu_reservation) && !get_property(ov::hint::enable_cpu_pinning)) {
+        set_property(ov::hint::enable_cpu_pinning(true));


Why we need this logic here? User can set enable_cpu_reservation to true and enable_cpu_pinning to false.

I think if enable_cpu_reservation is true, enable_cpu_pinning must be true. Because reserving cpus means pinning tasks to fixed cpus firstly, then reserving these cpus.

User still can use enable_cpu_reservation true with enable_cpu_pinning false together. Then the next model only can use remained CPU cores for latency/throughput hit.

…ilable and PluginAllSupportedPropertiesAreAvailable

This reverts commit b6ba3a6.

sunxiaoxia2022 · 2025-01-10T07:38:00Z

@wangleis Could you please clarify why we need #28117 to be part of this PR?

@dmitry-gorokhov There is two changes in #28117 for CPU reservation.

OV RT will identify the numa node of app thread and update proc_type_table to create inference stream on same numa node. In CPU reservation use case, user may load different models on different threads. So move identifying and updating work from loading OV core stage to compiling model stage.

On multiple sockets platform, if same app thread load two models on same socket with latency hint, the first model will reserve all CPU core of this socket. The logic of master branch will continue loading second model to same socket and loading fail due to no CPU resource. Update proc_type_table in compile_model() #28117 will fix this issue and load second model to another socket which has CPU resource.

I am wondering how parallel model compilation will work in that case? Like if both compilation processes tries to update proc_table. Given proc_table is singleton how do we guarantee thread safity?

@dmitry-gorokhov I added _streams_executor_mutex to guarantee the thread safety when multiple models compilation are in progress at the same time. Please have a look.

sunxiaoxia2022 · 2025-01-10T07:46:29Z

We need to add some behavior tests that will cover cpu_reservation logic. Like we should validate that ov throws error if no available CPUs left during compilation. We also need to implement basic test that compiles to model with cpu_reservation=true and specific nthreads, and then check that compiled model parameters (like nthreads and cpu_reservation) via get_property API.

@dmitry-gorokhov These test cases are added in streams_e2e_test.cpp, please have a look, thank you!

dmitry-gorokhov · 2025-01-10T08:55:18Z

@wangleis Could you please clarify why we need #28117 to be part of this PR?

@dmitry-gorokhov There is two changes in #28117 for CPU reservation.

OV RT will identify the numa node of app thread and update proc_type_table to create inference stream on same numa node. In CPU reservation use case, user may load different models on different threads. So move identifying and updating work from loading OV core stage to compiling model stage.

On multiple sockets platform, if same app thread load two models on same socket with latency hint, the first model will reserve all CPU core of this socket. The logic of master branch will continue loading second model to same socket and loading fail due to no CPU resource. Update proc_type_table in compile_model() #28117 will fix this issue and load second model to another socket which has CPU resource.

I am wondering how parallel model compilation will work in that case? Like if both compilation processes tries to update proc_table. Given proc_table is singleton how do we guarantee thread safity?

@dmitry-gorokhov I added _streams_executor_mutex to guarantee the thread safety when multiple models compilation are in progress at the same time. Please have a look.

get_proc_type_table() is called in several places, not only in get_num_streams(). So how the mutex here actually helps?
Like how we can guarantee correct behavior of IStreamsExecutor::Config::get_default_num_streams() in case proc_type_table is beign updated concurantely?

dmitry-gorokhov · 2025-01-10T08:56:34Z

We need to add some behavior tests that will cover cpu_reservation logic. Like we should validate that ov throws error if no available CPUs left during compilation. We also need to implement basic test that compiles to model with cpu_reservation=true and specific nthreads, and then check that compiled model parameters (like nthreads and cpu_reservation) via get_property API.

@dmitry-gorokhov These test cases are added in streams_e2e_test.cpp, please have a look, thank you!

streams_e2e_test.cpp contains unit tests for proc_type_table. I was talking about behavior tests that validate whole plugin behavior.

sunxiaoxia2022 · 2025-01-13T05:58:11Z

We need to add some behavior tests that will cover cpu_reservation logic. Like we should validate that ov throws error if no available CPUs left during compilation. We also need to implement basic test that compiles to model with cpu_reservation=true and specific nthreads, and then check that compiled model parameters (like nthreads and cpu_reservation) via get_property API.

@dmitry-gorokhov These test cases are added in streams_e2e_test.cpp, please have a look, thank you!

streams_e2e_test.cpp contains unit tests for proc_type_table. I was talking about behavior tests that validate whole plugin behavior.

@dmitry-gorokhov Ok, added test case in src/plugins/intel_cpu/tests/functional/custom/behavior/ov_executable_network/properties.cpp. Please have a look.

sunxiaoxia2022 added 2 commits October 29, 2024 18:35

add property enable_cpu_reservation

dd848c2

update enable_cpu_pinning when enable_cpu_reservation=true && enable_…

23a6e42

…cpu_pinning=false in GPU plugin

sunxiaoxia2022 requested review from peterchen-intel, wangleis and riverlijunjie October 30, 2024 01:49

sunxiaoxia2022 requested review from a team as code owners October 30, 2024 01:49

github-actions bot added category: inference OpenVINO Runtime library - Inference category: GPU OpenVINO GPU plugin category: CPU OpenVINO CPU plugin category: CPP API OpenVINO CPP API bindings labels Oct 30, 2024

update test case

ce804ee

wangleis reviewed Oct 31, 2024

View reviewed changes

sunxiaoxia2022 added 3 commits November 12, 2024 23:53

support cpu_reservation=true,cpu_pinning=false

bf63bf9

change comments

dd43e4a

add enable_cpu_reservation to CpuExecNetworkSupportedPropertiesAreAva…

13a146e

…ilable and PluginAllSupportedPropertiesAreAvailable

peterchen-intel requested a review from wangleis November 18, 2024 02:52

wangleis approved these changes Nov 18, 2024

View reviewed changes

dmitry-gorokhov mentioned this pull request Nov 25, 2024

IR model under 2024.4.1 and 2022.1 #27601

Open

peterchen-intel assigned dmitry-gorokhov Dec 1, 2024

wangleis and others added 6 commits December 4, 2024 01:29

initial implementation

79099f0

update current_socket_id

b6ba3a6

Merge branch 'master' into xiaoxia/cpu_reservation

ae3e1f0

Revert "update current_socket_id"

7c53ce3

This reverts commit b6ba3a6.

Merge branch 'master' into xiaoxia/cpu_reservation

e27cc45

Merge branch 'pr27873' into xiaoxia/cpu_reservation

967f2fa

peterchen-intel mentioned this pull request Dec 11, 2024

update streams calculation for cpu reservation #27873

Closed

sunxiaoxia2022 requested a review from a team as a code owner January 10, 2025 07:28

sunxiaoxia2022 requested review from akopytko and removed request for a team January 10, 2025 07:28

github-actions bot added the category: docs OpenVINO documentation label Jan 10, 2025

fix ci issue

4e9d6a1

sunxiaoxia2022 added 2 commits January 11, 2025 18:01

fix ci test issue

2e0295f

add test case of reservation

76bb972

dmitry-gorokhov approved these changes Jan 13, 2025

View reviewed changes

sunxiaoxia2022 added 2 commits January 13, 2025 17:45

add parallel running multiple compiled model test case

a0c8793

Merge branch 'master' into xiaoxia/cpu_reservation

af4cf60

sunxiaoxia2022 requested review from a team as code owners January 13, 2025 09:51

github-actions bot added the category: IE Tests OpenVINO Test: plugins and common label Jan 13, 2025

sunxiaoxia2022 added 2 commits January 13, 2025 20:11

rm invalid log

9a36e8d

add test case

ab7484d

github-actions bot removed the category: IE Tests OpenVINO Test: plugins and common label Jan 14, 2025

sunxiaoxia2022 added 6 commits January 14, 2025 11:38

fix conflict

aab8519

fix ci test issue

f46103c

change test case

85b03f6

fix conflicts

b1143c6

change test case

bb622bb

change _cpu_ids_all to Impl()

d0e19e1

wangleis enabled auto-merge January 16, 2025 09:34

sunxiaoxia2022 added 2 commits January 16, 2025 17:57

fix ci test

c0f2e69

remove smoke_CpuExecNetworkCheckCpuReservation

e06fbdd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reserving CPU resource in CPU inference #27321

Reserving CPU resource in CPU inference #27321

sunxiaoxia2022 commented Oct 30, 2024 •

edited by wangleis

Loading

wangleis Oct 31, 2024

sunxiaoxia2022 Nov 4, 2024

wangleis Nov 11, 2024

sunxiaoxia2022 Nov 12, 2024

sunxiaoxia2022 commented Jan 10, 2025

sunxiaoxia2022 commented Jan 10, 2025

dmitry-gorokhov commented Jan 10, 2025

dmitry-gorokhov commented Jan 10, 2025

sunxiaoxia2022 commented Jan 13, 2025

Reserving CPU resource in CPU inference #27321

Are you sure you want to change the base?

Reserving CPU resource in CPU inference #27321

Conversation

sunxiaoxia2022 commented Oct 30, 2024 • edited by wangleis Loading

Details:

Tickets:

wangleis Oct 31, 2024

Choose a reason for hiding this comment

sunxiaoxia2022 Nov 4, 2024

Choose a reason for hiding this comment

wangleis Nov 11, 2024

Choose a reason for hiding this comment

sunxiaoxia2022 Nov 12, 2024

Choose a reason for hiding this comment

sunxiaoxia2022 commented Jan 10, 2025

sunxiaoxia2022 commented Jan 10, 2025

dmitry-gorokhov commented Jan 10, 2025

dmitry-gorokhov commented Jan 10, 2025

sunxiaoxia2022 commented Jan 13, 2025

sunxiaoxia2022 commented Oct 30, 2024 •

edited by wangleis

Loading