Merge branch 'master' into xufang/reuse_streams_of_compile_model_for_…

…sync_infer
xufang-lisa · Oct 18, 2023 · 30e68aa · 30e68aa
2 parents 85f1e13 + 4574fb1
commit 30e68aa
Show file tree

Hide file tree

Showing 259 changed files with 10,448 additions and 4,059 deletions.
diff --git a/.github/workflows/linux_conditional_compilation.yml b/.github/workflows/linux_conditional_compilation.yml
@@ -111,6 +111,9 @@ jobs:
 
           # For running Paddle frontend unit tests
           python3 -m pip install -r ${OPENVINO_REPO}/src/frontends/paddle/tests/requirements.txt
+          # see https://github.com/PaddlePaddle/Paddle/issues/55597#issuecomment-1718131420
+          wget http://nz2.archive.ubuntu.com/ubuntu/pool/main/o/openssl/libssl1.1_1.1.1f-1ubuntu2.19_amd64.deb
+          apt-get install ./libssl1.1_1.1.1f-1ubuntu2.19_amd64.deb
 
       #
       # Build

diff --git a/.github/workflows/windows.yml b/.github/workflows/windows.yml
@@ -1,14 +1,14 @@
 name: Windows (VS 2022, Python 3.11)
 on:
   workflow_dispatch:
-  pull_request:
-    paths-ignore:
-      - '**/docs/**'
-      - 'docs/**'
-      - '**/**.md'
-      - '**.md'
-      - '**/layer_tests_summary/**'
-      - '**/conformance/**'
+#  pull_request:
+#    paths-ignore:
+#      - '**/docs/**'
+#      - 'docs/**'
+#      - '**/**.md'
+#      - '**.md'
+#      - '**/layer_tests_summary/**'
+#      - '**/conformance/**'
   push:
     paths-ignore:
       - '**/docs/**'

diff --git a/README.md b/README.md
@@ -33,7 +33,7 @@ OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference.
  - Reduce resource demands and efficiently deploy on a range of Intel® platforms from edge to cloud
 
 
-This open-source version includes several components: namely [Model Optimizer], [OpenVINO™ Runtime], [Post-Training Optimization Tool], as well as CPU, GPU, GNA, multi device and heterogeneous plugins to accelerate deep learning inference on Intel® CPUs and Intel® Processor Graphics.
+This open-source version includes several components: namely [OpenVINO Model Converter (OVC)], [OpenVINO™ Runtime], as well as CPU, GPU, GNA, multi device and heterogeneous plugins to accelerate deep learning inference on Intel® CPUs and Intel® Processor Graphics.
 It supports pre-trained models from [Open Model Zoo], along with 100+ open
 source and public models in popular formats such as TensorFlow, ONNX, PaddlePaddle, MXNet, Caffe, Kaldi.
 
@@ -48,8 +48,7 @@ source and public models in popular formats such as TensorFlow, ONNX, PaddlePadd
         * [python](./src/bindings/python) - Python API for OpenVINO™ Runtime
 * [Plugins](./src/plugins) - contains OpenVINO plugins which are maintained in open-source by the OpenVINO team. For more information, take a look at the [list of supported devices](#supported-hardware-matrix).
 * [Frontends](./src/frontends) - contains available OpenVINO frontends that allow reading models from the native framework format.
-* [Model Optimizer] - is a cross-platform command-line tool that facilitates the transition between training and deployment environments, performs static model analysis, and adjusts deep learning models for optimal execution on end-point target devices.
-* [Post-Training Optimization Tool] - is designed to accelerate the inference of deep learning models by applying special methods without model retraining or fine-tuning, for example, post-training 8-bit quantization. 
+* [OpenVINO Model Converter (OVC)] - is a cross-platform command-line tool that facilitates the transition between training and deployment environments, and adjusts deep learning models for optimal execution on end-point target devices.
 * [Samples] - applications in C, C++ and Python languages that show basic OpenVINO use cases.
 
 ## Supported Hardware matrix
@@ -62,15 +61,15 @@ The OpenVINO™ Runtime can infer models on different hardware devices. This sec
             <th>Device</th>
             <th>Plugin</th>
             <th>Library</th>
-            <th>ShortDescription</th>
+            <th>Short Description</th>
         </tr>
     </thead>
     <tbody>
         <tr>
             <td rowspan=2>CPU</td>
             <td> <a href="https://docs.openvino.ai/2023.1/openvino_docs_OV_UG_supported_plugins_CPU.html#doxid-openvino-docs-o-v-u-g-supported-plugins-c-p-u">Intel CPU</a></tb>
             <td><b><i><a href="./src/plugins/intel_cpu">openvino_intel_cpu_plugin</a></i></b></td>
-            <td>Intel Xeon with Intel® Advanced Vector Extensions 2 (Intel® AVX2), Intel® Advanced Vector Extensions 512 (Intel® AVX-512), and AVX512_BF16, Intel Core Processors with Intel AVX2, Intel Atom Processors with Intel® Streaming SIMD Extensions (Intel® SSE)</td>
+            <td>Intel Xeon with Intel® Advanced Vector Extensions 2 (Intel® AVX2), Intel® Advanced Vector Extensions 512 (Intel® AVX-512), and AVX512_BF16, Intel Core Processors with Intel AVX2, Intel Atom Processors with Intel® Streaming SIMD Extensions (Intel® SSE), Intel® Advanced Matrix Extensions (Intel® AMX)</td>
         </tr>
         <tr>
             <td> <a href="https://docs.openvino.ai/2023.1/openvino_docs_OV_UG_supported_plugins_CPU.html#doxid-openvino-docs-o-v-u-g-supported-plugins-c-p-u">ARM CPU</a></tb>
@@ -98,7 +97,7 @@ OpenVINO™ Toolkit also contains several plugins which simplify loading models
         <tr>
             <th>Plugin</th>
             <th>Library</th>
-            <th>ShortDescription</th>
+            <th>Short Description</th>
         </tr>
     </thead>
     <tbody>
@@ -196,6 +195,5 @@ Report questions, issues and suggestions, using:
 
 [Open Model Zoo]:https://github.com/openvinotoolkit/open_model_zoo
 [OpenVINO™ Runtime]:https://docs.openvino.ai/2023.1/openvino_docs_OV_UG_OV_Runtime_User_Guide.html
-[Model Optimizer]:https://docs.openvino.ai/2023.1/openvino_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide.html
-[Post-Training Optimization Tool]:https://docs.openvino.ai/2023.1/pot_introduction.html
+[OpenVINO Model Converter (OVC)]:https://docs.openvino.ai/2023.1/openvino_docs_model_processing_introduction.html#convert-a-model-in-cli-ovc
 [Samples]:https://github.com/openvinotoolkit/openvino/tree/master/samples
diff --git a/cmake/developer_package/add_target_helpers.cmake b/cmake/developer_package/add_target_helpers.cmake
@@ -172,7 +172,9 @@ function(ov_add_test_target)
     else()
         add_test(NAME ${ARG_NAME} COMMAND ${ARG_NAME})
     endif()
-    set_property(TEST ${ARG_NAME} PROPERTY LABELS ${ARG_LABELS})
+    if(ARG_LABELS)
+        set_property(TEST ${ARG_NAME} PROPERTY LABELS ${ARG_LABELS})
+    endif()
 
     install(TARGETS ${ARG_NAME}
             RUNTIME DESTINATION tests

diff --git a/cmake/templates/OpenVINODeveloperPackageConfigRelocatable.cmake.in b/cmake/templates/OpenVINODeveloperPackageConfigRelocatable.cmake.in
@@ -9,9 +9,7 @@ include(CMakeFindDependencyMacro)
 # Variables to export in plugin's projects
 
 set(ov_options "@OV_OPTIONS@")
-list(APPEND ov_options CMAKE_CXX_COMPILER_LAUNCHER CMAKE_C_COMPILER_LAUNCHER
-                       CMAKE_CXX_LINKER_LAUNCHER CMAKE_C_LINKER_LAUNCHER
-                       CMAKE_INSTALL_PREFIX CPACK_GENERATOR)
+list(APPEND ov_options CPACK_GENERATOR)
 
 if(APPLE)
     list(APPEND ov_options CMAKE_OSX_ARCHITECTURES CMAKE_OSX_DEPLOYMENT_TARGET)

diff --git a/docs/OV_Runtime_UG/auto_device_selection.md b/docs/OV_Runtime_UG/auto_device_selection.md
@@ -56,11 +56,10 @@ The logic behind the choice is as follows:
 To put it simply, when loading the model to the first device on the list fails, AUTO will try to load it to the next device in line, until one of them succeeds.
 What is important, **AUTO starts inference with the CPU of the system by default**, as it provides very low latency and can start inference with no additional delays.
 While the CPU is performing inference, AUTO continues to load the model to the device best suited for the purpose and transfers the task to it when ready.
-This way, the devices which are much slower in compiling models, GPU being the best example, do not impede inference at its initial stages.
+This way, the devices which are much slower in compiling models, GPU being the best example, do not impact inference at its initial stages.
 For example, if you use a CPU and a GPU, the first-inference latency of AUTO will be better than that of using GPU alone.
 
-Note that if you choose to exclude CPU from the priority list or disable the initial CPU acceleration feature via ``ov::intel_auto::enable_startup_fallback``, it will be unable to support the initial model compilation stage.
-
+Note that if you choose to exclude CPU from the priority list or disable the initial CPU acceleration feature via ``ov::intel_auto::enable_startup_fallback``, it will be unable to support the initial model compilation stage. The models with dynamic input/output or stateful :doc:`stateful<openvino_docs_OV_UG_model_state_intro>` operations will be loaded to the CPU if it is in the candidate list. Otherwise, these models will follow the normal flow and be loaded to the device based on priority.
 
 .. image:: _static/images/autoplugin_accelerate.svg
 
@@ -91,7 +90,7 @@ Following the OpenVINO™ naming convention, the Automatic Device Selection mode
 
 
 +----------------------------------------------+--------------------------------------------------------------------+
-| Property                                     | Values and Description                                             |
+| Property(C++ version)                        | Values and Description                                             |
 +==============================================+====================================================================+
 | <device candidate list>                      | **Values**:                                                        |
 |                                              |                                                                    |
@@ -170,6 +169,25 @@ Following the OpenVINO™ naming convention, the Automatic Device Selection mode
 Inference with AUTO is configured similarly to when device plugins are used:
 you compile the model on the plugin with configuration and execute inference.
 
+The code samples on this page assume following import(Python)/using (C++) are included at the beginning of code snippets. 
+
+.. tab-set::
+
+    .. tab-item:: Python
+        :sync: py
+
+        .. doxygensnippet:: docs/snippets/ov_auto.py
+           :language: python
+           :fragment: [py_ov_property_import_header]
+
+    .. tab-item:: C++
+        :sync: cpp
+
+        .. doxygensnippet:: docs/snippets/AUTO0.cpp
+            :language: cpp
+            :fragment: [py_ov_property_import_header]
+
+
 
 Device Candidates and Priority
 ++++++++++++++++++++++++++++++
@@ -303,7 +321,7 @@ If device priority is specified when using CUMULATIVE_THROUGHPUT, AUTO will run
 
         .. code-block:: sh
          
-           compiled_model = core.compile_model(model, "AUTO:GPU,CPU", {"PERFORMANCE_HINT" : {"CUMULATIVE_THROUGHPUT"}})
+           compiled_model = core.compile_model(model, "AUTO:GPU,CPU", {hints.performance_mode: hints.PerformanceMode.CUMULATIVE_THROUGHPUT})
 
     .. tab-item:: C++
         :sync: cpp
@@ -322,7 +340,7 @@ If AUTO is used without specifying any device names, and if there are multiple G
 
         .. code-block:: sh
          
-           compiled_model = core.compile_model(model, "AUTO:GPU.1,GPU.0", {"PERFORMANCE_HINT" : {"CUMULATIVE_THROUGHPUT"})
+           compiled_model = core.compile_model(model, "AUTO:GPU.1,GPU.0", {hints.performance_mode: hints.PerformanceMode.CUMULATIVE_THROUGHPUT})
 
     .. tab-item:: C++
         :sync: cpp

diff --git a/docs/benchmarks/performance_benchmarks.md → .../about_openvino/performance_benchmarks.md b/docs/benchmarks/performance_benchmarks.md → .../about_openvino/performance_benchmarks.md
diff --git a/...pare_model/Getting_performance_numbers.md → ...benchmarks/Getting_performance_numbers.md b/...pare_model/Getting_performance_numbers.md → ...benchmarks/Getting_performance_numbers.md
diff --git a/.../benchmarks/performance_benchmarks_faq.md → ..._benchmarks/performance_benchmarks_faq.md b/.../benchmarks/performance_benchmarks_faq.md → ..._benchmarks/performance_benchmarks_faq.md
diff --git a/docs/benchmarks/performance_int8_vs_fp32.md → ...ce_benchmarks/performance_int8_vs_fp32.md b/docs/benchmarks/performance_int8_vs_fp32.md → ...ce_benchmarks/performance_int8_vs_fp32.md
diff --git a/...cumentation/openvino_legacy_features/mo_ovc_transition/legacy_conversion_api.md b/...cumentation/openvino_legacy_features/mo_ovc_transition/legacy_conversion_api.md
@@ -2,8 +2,6 @@
 
 @sphinxdirective
 
-.. _deep learning model optimizer:
-
 .. toctree::
    :maxdepth: 1
    :hidden:

diff --git a/...legacy_features/mo_ovc_transition/legacy_conversion_api/setting_input_shapes.md b/...legacy_features/mo_ovc_transition/legacy_conversion_api/setting_input_shapes.md
@@ -8,9 +8,6 @@ With model conversion API you can increase your model's efficiency by providing
    :description: Learn how to increase the efficiency of a model with MO by providing an additional shape definition with the input_shape and static_shape parameters.
 
 
-.. _when_to_specify_input_shapes:
-
-
 Specifying input_shape parameter
 ################################
 

diff --git a/docs/articles_en/learn_openvino/openvino_samples/cpp_benchmark_tool.md b/docs/articles_en/learn_openvino/openvino_samples/cpp_benchmark_tool.md
@@ -24,7 +24,7 @@ To use the C++ benchmark_app, you must first build it following the :doc:`Build
 
    If you installed OpenVINO Runtime using PyPI or Anaconda Cloud, only the :doc:`Benchmark Python Tool <openvino_inference_engine_tools_benchmark_tool_README>` is available, and you should follow the usage instructions on that page instead.
 
-The benchmarking application works with models in the OpenVINO IR (``model.xml`` and ``model.bin``) and ONNX (``model.onnx``) formats. Make sure to :doc:`convert your models <openvino_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide>` if necessary.
+The benchmarking application works with models in the OpenVINO IR, TensorFlow, TensorFlow Lite, PaddlePaddle, PyTorch and ONNX formats. If you need it, OpenVINO also allows you to :doc:`convert your models <openvino_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide>`.
 
 To run benchmarking with default options on a model, use the following command:
 

diff --git a/docs/articles_en/openvino_workflow.md b/docs/articles_en/openvino_workflow.md
@@ -20,29 +20,39 @@
    pytorch_2_0_torch_compile
 
 
+.. image:: ./_static/images/model_conversion_diagram.svg
+   :alt: model conversion diagram
+
+OpenVINO offers multiple workflows, depending on the use case and personal or project preferences.
+The diagram above is only a rough representation of the available options, but this section will
+give you a detailed view of how you can go from preparing your model, through optimizing it,
+to executing inference, and deploying your solution. 
+
+
 | :doc:`Model Preparation <openvino_docs_model_processing_introduction>`
-| With model conversion API guide, you will learn to convert pre-trained models for use with OpenVINO™. You can use your own models or choose some from a broad selection in online databases, such as `TensorFlow Hub <https://tfhub.dev/>`__, `Hugging Face <https://huggingface.co/>`__, `Torchvision models <https://pytorch.org/hub/>`__..
+|    Learn how to convert pre-trained models to OpenVINO IR, using different approaches for more convenience or higher performance.
+
 
 | :doc:`Model Optimization and Compression <openvino_docs_model_optimization_guide>`
-| In this section you will find out how to optimize a model to achieve better inference performance. It describes multiple optimization methods for both the training and post-training stages. 
+|    Find out how to optimize a model to achieve better inference performance, utilizing multiple optimization methods for both in-training compression and post-training quantization. 
+
 
 | :doc:`Running Inference <openvino_docs_OV_UG_OV_Runtime_User_Guide>`
-| This section explains describes how to run inference which is the most basic form of deployment and the quickest way of launching inference.
+|    See how to run inference with OpenVINO, which is the most basic form of deployment, and the quickest way of running a deep learning model.
 
+| :doc:`Deployment Option 1. Using OpenVINO Runtime <openvino_deployment_guide>` 
+|    Deploy a model locally, reading the file directly from your application and utilizing resources available to the system.
+|    Deployment on a local system uses the steps described in the section on running inference.
 
-Once you have a model that meets both OpenVINO™ and your requirements, you can choose how to deploy it with your application. 
 
+| :doc:`Deployment Option 2. Using Model Server <ovms_what_is_openvino_model_server>`
+|    Deploy a model remotely, connecting your application to an inference server and utilizing external resources, with no impact on the app's performance.
+|    Deployment on OpenVINO Model Server is quick and does not require any additional steps described in the section on running inference.
 
-| :doc:`Option 1. Deployment via OpenVINO Runtime <openvino_deployment_guide>` 
-| Local deployment uses OpenVINO Runtime that is called from, and linked to, the application directly. 
-| It utilizes resources available to the system and provides the quickest way of launching inference.
-| Deployment on a local system requires performing the steps from the running inference section.
 
+| :doc:`Deployment Option 3. Using torch.compile for PyTorch 2.0  <pytorch_2_0_torch_compile>`
+|    Deploy a PyTorch model using OpenVINO in a PyTorch-native application.
 
-| :doc:`Option 2. Deployment via Model Server <ovms_what_is_openvino_model_server>`
-| Deployment via OpenVINO Model Server allows the application to connect to the inference server set up remotely. 
-| This way inference can use external resources instead of those available to the application itself. 
-| Deployment on a model server can be done quickly and without performing any additional steps described in the running inference section.
 
 
 @endsphinxdirective