Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix execution of WPS remote providers and corresponding metadata #342

Merged
merged 17 commits into from
Sep 29, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
17 commits
Select commit Hold shift + click to select a range
b020a39
auto-handle recoverable invalid XML characters (help support remote p…
fmigneault Sep 28, 2021
c522f5a
fix links reported by WPS provider process (was missing provider part…
fmigneault Sep 28, 2021
6b4277e
fix invalid doc setting references
fmigneault Sep 28, 2021
94f6e86
refactor provider views/utils + block provider on local-only weaver (…
fmigneault Sep 28, 2021
a82fcfe
add end-2-end test of remote WPS-1 provider based on Hummingbird ncdu…
fmigneault Sep 28, 2021
df70240
add reference to validation of provider exec body (fixes #340)
fmigneault Sep 28, 2021
c7d8c04
fix resolution of empty provider list during schema validation (fixes…
fmigneault Sep 28, 2021
20bf6a8
document fix of schemas for query parameters with invalid resolution…
fmigneault Sep 28, 2021
49b5799
fix import lint
fmigneault Sep 28, 2021
dd0ab92
handle HYBRID mode as local process vs EMS resolving it as remote + d…
fmigneault Sep 28, 2021
32e19bf
Merge branch 'master' into fixes-wps-providers
fmigneault Sep 28, 2021
dca3c04
patch test following updated OpenAPI query reference naming (https://…
fmigneault Sep 28, 2021
e943543
add missing parser param for lxml util to combine OWSLib/pywps/Weaver…
fmigneault Sep 28, 2021
25a00d6
fix lint security on test tmp dir
fmigneault Sep 28, 2021
ab6495a
add more explicit messages about cause of error for register/listing …
fmigneault Sep 29, 2021
f989d82
detail WeaverException defintions to auto-generate appropriate HTTP e…
fmigneault Sep 29, 2021
30e4c5e
fix doc lint
fmigneault Sep 29, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 31 additions & 2 deletions CHANGES.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,40 @@ Changes

Changes:
--------
- No change.
- Improve reporting of mismatching `Weaver` configuration for `Process` and `Application Package` definitions that
always require remote execution. Invalid combinations will be raised during execution with detailed problem.
- Forbid `Provider` and applicable `Process` definitions to be deployed, executed or queried when corresponding remote
execution is not supported according to `Weaver` instance configuration since `Provider` must be accessed remotely.
- Refactor endpoint views and utilities referring to `Provider` operations into appropriate modules.
- Apply ``weaver.configuration = HYBRID`` by default in example INI configuration since it is the most common use case.
Apply same configuration by default in tests. Default resolution still employs ``DEFAULT`` for backward compatibility
in case the setting was omitted completely from a custom INI file.
- Add query parameter ``ignore`` to ``GET /providers`` listing in order to obtain full validation of
remote providers (including XML contents parsing) to return ``200``. Invalid definitions will raise
and return a ``[422] Unprocessable Entity`` HTTP error.
- Add more explicit messages about the problem that produced an error (XML parsing, unreachable WPS, etc.) and which
caused request failure when attempting registration of a remote `Provider`.

Fixes:
------
- No change.
- Fix reported ``links`` by processes nested under a provider ``Service``.
Generated URL references were omitting the ``/providers/{id}`` portion.
- Fix documentation referring to incorrect setting name in some cases for WPS outputs configuration.
- Fix strict XML parsing failing resolution of some remote WPS providers with invalid characters such as ``<``, ``<=``
within process description fields. Although invalid, those easily recoverable errors will be handled by the parser.
- Fix resolution and execution of WPS-1 remote `Provider` and validate it against end-to-end test procedure from
scratch `Service` registration down to results retrieval
(fixes `#340 <https://github.com/crim-ca/weaver/issues/340>`_).
- Fix resolution of applicable `Provider` listing schema validation when none have been registered
(fixes `#339 <https://github.com/crim-ca/weaver/issues/339>`_).
- Fix incorrect schema definition of `Process` items for ``GET /processes`` response that did not report the
alternative identifier-only listing when ``detail=false`` query is employed.
- Fix incorrect reporting of documented OpenAPI reference definitions for ``query`` parameters with same names shared
across multiple endpoints. Fix is directly applied on relevant reference repository that generates OpenAPI schemas
(see `fmigneault/cornice.ext.swagger@70eb702 <https://github.com/fmigneault/cornice.ext.swagger/commit/70eb702>`_).
- Fix ``weaver.exception`` definitions such that raising them directly will employ the corresponding ``HTTPException``
code (if applicable) to generate the appropriate error response automatically when raising them directly without
further handling. The order of class inheritance were always using ``500`` due to ``WeaverException`` definition.

`4.0.0 <https://github.com/crim-ca/weaver/tree/4.0.0>`_ (2021-09-21)
========================================================================
Expand Down
7 changes: 6 additions & 1 deletion config/weaver.ini.example
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,12 @@ cache.result.enabled = false
# responsibility to correctly map your server routes to internal routes after applying the server-side operation.

# --- Weaver Configuration ---
weaver.configuration = ems
# See documentation for more details
# DEFAULT: current instance offers a minimal subset of operations as 'basic' OGC-API - Processes (i.e.: no deploy)
dbyrns marked this conversation as resolved.
Show resolved Hide resolved
# ADES: current instance deploys and executes applications locally (disable remote, providers and Workflow support)
# EMS: current instance dispatches execution to other remote ADES or WPS providers
# HYBRID: current instance does both ADES and EMS roles with flexible combinations depending on situation
weaver.configuration = HYBRID
weaver.url = http://localhost:4001

# Static endpoint to external schema locations for reference in responses.
Expand Down
4 changes: 2 additions & 2 deletions docs/source/processes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -585,8 +585,8 @@ combinations.

.. [#file2http]
When a ``file://`` (or empty scheme) maps to a local file that needs to be exposed externally for
another remote process, the conversion to ``http(s)://`` scheme employs setting ``weaver.wps_outputs_url`` to form
the result URL reference. The file is placed in ``weaver.wps_outputs_dir`` to expose it as HTTP(S) endpoint.
another remote process, the conversion to ``http(s)://`` scheme employs setting ``weaver.wps_output_url`` to form
the result URL reference. The file is placed in ``weaver.wps_output_dir`` to expose it as HTTP(S) endpoint.

.. [#wps3]
When the process refers to a remote :ref:`WPS-REST` process (i.e.: remote :term:`WPS` instance that supports
Expand Down
4 changes: 2 additions & 2 deletions tests/functional/test_builtin.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

import pytest

from tests.functional.utils import WpsPackageConfigBase
from tests.functional.utils import WpsConfigBase
from tests.utils import get_settings_from_testapp, mocked_execute_process, mocked_sub_requests
from weaver.execute import (
EXECUTE_CONTROL_OPTION_ASYNC,
Expand All @@ -22,7 +22,7 @@


@pytest.mark.functional
class BuiltinAppTest(WpsPackageConfigBase):
class BuiltinAppTest(WpsConfigBase):
@classmethod
def setUpClass(cls):
cls.settings = {
Expand Down
4 changes: 2 additions & 2 deletions tests/functional/test_docker_app.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
import pytest
from owslib.wps import ComplexDataInput, WPSExecution

from tests.functional.utils import WpsPackageConfigBase
from tests.functional.utils import WpsConfigBase
from tests.utils import mocked_execute_process, mocked_sub_requests
from weaver import xml_util
from weaver.execute import EXECUTE_MODE_ASYNC, EXECUTE_RESPONSE_DOCUMENT, EXECUTE_TRANSMISSION_MODE_REFERENCE
Expand All @@ -16,7 +16,7 @@


@pytest.mark.functional
class WpsPackageDockerAppTest(WpsPackageConfigBase):
class WpsPackageDockerAppTest(WpsConfigBase):
@classmethod
def setUpClass(cls):
cls.settings = {
Expand Down
6 changes: 3 additions & 3 deletions tests/functional/test_wps_package.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
from pyramid.httpexceptions import HTTPBadRequest

from tests import resources
from tests.functional.utils import WpsPackageConfigBase
from tests.functional.utils import WpsConfigBase
from tests.utils import (
MOCK_AWS_REGION,
mocked_aws_credentials,
Expand Down Expand Up @@ -77,7 +77,7 @@


@pytest.mark.functional
class WpsPackageAppTest(WpsPackageConfigBase):
class WpsPackageAppTest(WpsConfigBase):
@classmethod
def setUpClass(cls):
cls.settings = {
Expand Down Expand Up @@ -1964,7 +1964,7 @@ def test_multi_outputs_file_from_wps_xml_reference(self):


@pytest.mark.functional
class WpsPackageAppWithS3BucketTest(WpsPackageConfigBase):
class WpsPackageAppWithS3BucketTest(WpsConfigBase):
@classmethod
def setUpClass(cls):
cls.settings = {
Expand Down
248 changes: 248 additions & 0 deletions tests/functional/test_wps_provider.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,248 @@
import contextlib
from typing import TYPE_CHECKING

import mock
import pyramid.testing
import pytest

from tests import resources
from tests.functional.utils import WpsConfigBase
from tests.utils import (
mocked_execute_process,
mocked_remote_server_requests_wps1,
mocked_sub_requests,
setup_mongodb_jobstore,
setup_mongodb_processstore,
setup_mongodb_servicestore
)
from weaver.config import WEAVER_CONFIGURATION_HYBRID
from weaver.execute import (
EXECUTE_CONTROL_OPTION_ASYNC,
EXECUTE_MODE_ASYNC,
EXECUTE_RESPONSE_DOCUMENT,
EXECUTE_TRANSMISSION_MODE_REFERENCE
)
from weaver.formats import CONTENT_TYPE_APP_NETCDF, CONTENT_TYPE_TEXT_PLAIN, CONTENT_TYPE_TEXT_XML
from weaver.processes.types import PROCESS_WPS_REMOTE
from weaver.processes.wps1_process import Wps1Process

if TYPE_CHECKING:
from responses import RequestsMock

from tests.utils import MockPatch


# pylint: disable=C0103,invalid-name
@pytest.mark.functional
class WpsProviderTest(WpsConfigBase):
settings = {
# NOTE: important otherwise cannot execute "remote" provider (default local only)
"weaver.configuration": WEAVER_CONFIGURATION_HYBRID,
"weaver.wps_output_dir": "/tmp", # nosec: B108 # don't care hardcoded for test
"weaver.wps_output_url": resources.TEST_REMOTE_SERVER_URL + "/wps-outputs"
}

@classmethod
def tearDownClass(cls):
pyramid.testing.tearDown()

def setUp(self):
# rebuild clean db on each test
self.service_store = setup_mongodb_servicestore(self.config)
self.process_store = setup_mongodb_processstore(self.config)
self.job_store = setup_mongodb_jobstore(self.config)
self.app_url = self.settings["weaver.url"]

@mocked_remote_server_requests_wps1([
resources.TEST_REMOTE_SERVER_URL,
resources.TEST_INVALID_ESCAPE_CHARS_GETCAP_WPS1_XML,
[],
])
def test_register_finch_with_invalid_escape_chars(self):
"""
Remote provider that has invalid characters that should be escaped (<, <=) but should succeeds registration.

This test ensure that temporary patch of XML parser with *permissive* invalid characters that can be recovered
fmigneault marked this conversation as resolved.
Show resolved Hide resolved
in some obvious cases are handled (e.g.: values within a description string field).

.. seealso::
- Parameter ``recover`` from instance :data:`weaver.xml_util.XML_PARSER` and
override :meth:`weaver.xml_util.fromstring` employed by :mod:`OWSLib` during WPS-XML requests.
- Comments in :data:`resources.TEST_INVALID_ESCAPE_CHARS_GETCAP_WPS1_XML` file describe bad characters.
"""
self.service_store.clear_services()

# register the provider
remote_provider_name = "test-wps-remote-provider-finch"
path = "/providers"
data = {"id": remote_provider_name, "url": resources.TEST_REMOTE_SERVER_URL}
resp = self.app.post_json(path, params=data, headers=self.json_headers)
assert resp.status_code == 201

# validate service capabilities
path = "/providers/{}".format(remote_provider_name)
resp = self.app.get(path, headers=self.json_headers)
assert resp.status_code == 200

@mocked_remote_server_requests_wps1([
resources.TEST_REMOTE_SERVER_URL,
resources.TEST_HUMMINGBIRD_GETCAP_WPS1_XML,
[resources.TEST_HUMMINGBIRD_DESCRIBE_WPS1_XML],
])
def test_register_describe_execute_ncdump(self, mock_responses):
# type: (RequestsMock) -> None
"""
Test the full workflow from remote WPS-1 provider registration, process description, execution and fetch result.

The complete execution and definitions (XML responses) of the "remote" WPS are mocked.
Requests and response negotiation between Weaver and that "remote" WPS are effectively executed and validated.

Validation is accomplished against the same process and mocked server from corresponding test deployment
server in order to detect early any breaking feature. Responses XML bodies employed to simulate the mocked
server are pre-generated from real request calls to the actual service that was running on a live platform.

.. seealso::
- Reference notebook testing the same process on a live server:
https://github.com/Ouranosinc/pavics-sdi/blob/master/docs/source/notebook-components/weaver_example.ipynb
- Evaluate format of submitted Execute body (see `#340 <https://github.com/crim-ca/weaver/issues/340>`_).
"""
self.service_store.clear_services()

# register the provider
remote_provider_name = "test-wps-remote-provider-hummingbird"
path = "/providers"
data = {"id": remote_provider_name, "url": resources.TEST_REMOTE_SERVER_URL}
resp = self.app.post_json(path, params=data, headers=self.json_headers)
assert resp.status_code == 201

# validate service capabilities
path = "/providers/{}".format(remote_provider_name)
resp = self.app.get(path, headers=self.json_headers)
assert resp.status_code == 200
body = resp.json
assert "id" in body and body["id"] == remote_provider_name
assert "hummingbird" in body["title"].lower()
assert body["type"] == PROCESS_WPS_REMOTE

# validate processes capabilities
path = "/providers/{}/processes".format(remote_provider_name)
resp = self.app.get(path, headers=self.json_headers)
body = resp.json
assert resp.status_code == 200
assert "processes" in body and len(body["processes"]) == 14 # in TEST_HUMMINGBIRD_GETCAP_WPS1_XML
processes = {process["id"]: process for process in body["processes"]}
assert "ncdump" in processes
assert processes["ncdump"]["version"] == "4.4.1.1"
assert processes["ncdump"]["metadata"][0]["rel"] == "birdhouse"
assert processes["ncdump"]["metadata"][1]["rel"] == "user-guide"
# keyword 'Hummingbird' in this case is from GetCapabilities ProviderName
# keyword of the service name within Weaver is also provided, which can be different than provider
expect_keywords = [PROCESS_WPS_REMOTE, "Hummingbird", remote_provider_name]
assert all(key in processes["ncdump"]["keywords"] for key in expect_keywords)
proc_desc_url = processes["ncdump"]["processDescriptionURL"]
proc_wps1_url = processes["ncdump"]["processEndpointWPS1"]
proc_exec_url = processes["ncdump"]["executeEndpoint"]
assert proc_wps1_url.startswith(resources.TEST_REMOTE_SERVER_URL)
assert proc_desc_url == self.app_url + path + "/ncdump"
assert proc_exec_url == self.app_url + path + "/ncdump/jobs"

# validate process description
resp = self.app.get(proc_desc_url, headers=self.json_headers)
assert resp.status_code == 200
body = resp.json

assert "inputs" in body and len(body["inputs"]) == 2
assert all(iid in body["inputs"] for iid in ["dataset", "dataset_opendap"])
assert body["inputs"]["dataset"]["minOccurs"] == 0
assert body["inputs"]["dataset"]["maxOccurs"] == 100
assert "formats" in body["inputs"]["dataset"]
assert len(body["inputs"]["dataset"]["formats"]) == 1
assert body["inputs"]["dataset"]["formats"][0]["default"] is True
assert "literalDataDomains" not in body["inputs"]["dataset"]
assert body["inputs"]["dataset"]["formats"][0]["mediaType"] == CONTENT_TYPE_APP_NETCDF
assert body["inputs"]["dataset_opendap"]["minOccurs"] == 0
assert body["inputs"]["dataset_opendap"]["maxOccurs"] == 100
assert "formats" not in body["inputs"]["dataset_opendap"]
assert "literalDataDomains" in body["inputs"]["dataset_opendap"]
assert len(body["inputs"]["dataset_opendap"]["literalDataDomains"]) == 1
assert body["inputs"]["dataset_opendap"]["literalDataDomains"][0]["dataType"]["name"] == "string"
assert body["inputs"]["dataset_opendap"]["literalDataDomains"][0]["valueDefinition"]["anyValue"] is True
assert body["inputs"]["dataset_opendap"]["literalDataDomains"][0]["default"] is True

assert "outputs" in body and len(body["outputs"]) == 1
assert "output" in body["outputs"]
assert "formats" in body["outputs"]["output"]
assert len(body["outputs"]["output"]["formats"]) == 1
assert body["outputs"]["output"]["formats"][0]["default"] is True
assert body["outputs"]["output"]["formats"][0]["mediaType"] == CONTENT_TYPE_TEXT_PLAIN
assert "literalDataDomains" not in body["outputs"]["output"]

assert body["processDescriptionURL"] == proc_desc_url
assert body["processEndpointWPS1"] == proc_wps1_url
assert body["executeEndpoint"] == proc_exec_url
job_exec_url = proc_exec_url.replace("/execution", "/jobs") # both are aliases, any could be returned
ogc_exec_url = proc_exec_url.replace("/jobs", "/execution")
assert any(link["href"] in [job_exec_url, ogc_exec_url] and link["rel"] == "execute" for link in body["links"])
assert any(link["href"] == proc_desc_url and link["rel"] == "process-desc" for link in body["links"])
# WPS-1 URL also includes relevant query parameters to obtain a valid response
assert any(link["href"].startswith(proc_wps1_url) and link["rel"] == "service-desc" for link in body["links"])

assert EXECUTE_CONTROL_OPTION_ASYNC in body["jobControlOptions"]
assert EXECUTE_TRANSMISSION_MODE_REFERENCE in body["outputTransmission"]

# validate execution submission
# (don't actually execute because server is mocked, only validate parsing of I/O and job creation)

# first setup all expected contents and files
exec_file = "http://localhost.com/dont/care.nc"
exec_body = {
"mode": EXECUTE_MODE_ASYNC,
"response": EXECUTE_RESPONSE_DOCUMENT,
"inputs": [{"id": "dataset", "href": exec_file}],
"outputs": [{"id": "output", "transmissionMode": EXECUTE_TRANSMISSION_MODE_REFERENCE}]
}
status_url = resources.TEST_REMOTE_SERVER_URL + "/status.xml"
output_url = resources.TEST_REMOTE_SERVER_URL + "/output.txt"
with open(resources.TEST_HUMMINGBIRD_STATUS_WPS1_XML) as status_file:
status = status_file.read().format(
TEST_SERVER_URL=resources.TEST_REMOTE_SERVER_URL,
LOCATION_XML=status_url,
OUTPUT_FILE=output_url,
)

xml_headers = {"Content-Type": CONTENT_TYPE_TEXT_XML}
ncdump_data = "Fake NetCDF Data"
with contextlib.ExitStack() as stack_exec:
# mock direct execution bypassing celery
for mock_exec in mocked_execute_process():
stack_exec.enter_context(mock_exec)
# mock responses expected by "remote" WPS-1 Execute request and relevant documents
mock_responses.add("GET", exec_file, body=ncdump_data, headers={"Content-Type": CONTENT_TYPE_APP_NETCDF})
mock_responses.add("POST", resources.TEST_REMOTE_SERVER_URL, body=status, headers=xml_headers)
mock_responses.add("GET", status_url, body=status, headers=xml_headers)
mock_responses.add("GET", output_url, body=ncdump_data, headers={"Content-Type": CONTENT_TYPE_TEXT_PLAIN})

# add reference to specific provider execute class to validate it was called
# (whole procedure must run even though a lot of parts are mocked)
real_wps1_process_execute = Wps1Process.execute
handle_wps1_process_execute = stack_exec.enter_context(
mock.patch.object(Wps1Process, "execute", side_effect=real_wps1_process_execute, autospec=True)
) # type: MockPatch

# launch job execution and validate
resp = mocked_sub_requests(self.app, "post_json", proc_exec_url, timeout=5,
data=exec_body, headers=self.json_headers, only_local=True)
assert resp.status_code in [200, 201], "Failed with: [{}]\nReason:\n{}".format(resp.status_code, resp.json)
assert handle_wps1_process_execute.called, "WPS-1 handler should have been called by CWL runner context"

status_url = resp.json["location"]
job_id = resp.json["jobID"]
assert status_url == proc_exec_url + "/" + job_id
results = self.monitor_job(status_url)
output_url = "{}/{}/{}".format(self.settings["weaver.wps_output_url"], job_id, "output.txt")
output_path = "{}/{}/{}".format(self.settings["weaver.wps_output_dir"], job_id, "output.txt")
assert results["output"]["format"]["mediaType"] == CONTENT_TYPE_TEXT_PLAIN
assert results["output"]["href"] == output_url
with open(output_path) as out_file:
data = out_file.read()
assert data == ncdump_data
Loading