Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(ingest/tableau): project_path_pattern use in _is_denied_project #12010

Merged
merged 10 commits into from
Dec 11, 2024

Conversation

sid-acryl
Copy link
Contributor

No description provided.

@github-actions github-actions bot added the ingestion PR or Issue related to the ingestion of metadata label Dec 3, 2024
@datahub-cyborg datahub-cyborg bot added the needs-review Label for PRs that need review from a maintainer. label Dec 3, 2024
@datahub-cyborg datahub-cyborg bot added pending-submitter-response Issue/request has been reviewed but requires a response from the submitter and removed needs-review Label for PRs that need review from a maintainer. labels Dec 3, 2024
@datahub-cyborg datahub-cyborg bot added needs-review Label for PRs that need review from a maintainer. and removed pending-submitter-response Issue/request has been reviewed but requires a response from the submitter labels Dec 6, 2024
Copy link
Collaborator

@hsheth2 hsheth2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall this code still confuses me

why do we have both is_allowed_project and is_denied_project? why isn't one method sufficient here? also, ideally we'd just use AllowDenyPattern's built-in methods instead of re-implementing a bunch of regex matching logic here

@datahub-cyborg datahub-cyborg bot added pending-submitter-merge and removed needs-review Label for PRs that need review from a maintainer. labels Dec 6, 2024
@sid-acryl
Copy link
Contributor Author

overall this code still confuses me

why do we have both is_allowed_project and is_denied_project? why isn't one method sufficient here? also, ideally we'd just use AllowDenyPattern's built-in methods instead of re-implementing a bunch of regex matching logic here

@hsheth2
It took me while to recall why I did it like this. I added a below doc string in the function _is_denied_project and also renamed the AllowDenyPattern._denied to AllowDenyPattern.denied to reuse the regx logic.

    """
    Why use an explicit denial check instead of the `AllowDenyPattern.allowed` method?

    Consider a scenario where a Tableau site contains four projects: A, B, C, and D, with the following hierarchical relationship:

    - **A**
      - **B** (Child of A)
      - **C** (Child of A)
    - **D**

    In this setup:

    - `project_pattern` is configured with `allow: ["A"]` and `deny: ["B"]`.
    - `extract_project_hierarchy` is set to `True`.

    The goal is to extract assets from project A and its children while explicitly denying the child project B.

    If we rely solely on the `project_pattern.allowed()` method, project C's assets will not be ingested.
    This happens because project C is not explicitly included in the `allow` list, nor is it part of the `deny` list.
    However, since `extract_project_hierarchy` is enabled, project C should ideally be included in the ingestion process unless explicitly denied.

    To address this, the function explicitly checks the deny regex to ensure that project C’s assets are ingested if it is not specifically denied in the deny list. This approach ensures that the hierarchy is respected while adhering to the configured allow/deny rules.
    """

I will create a separate PR to add test cases for this scenario.

@hsheth2
Copy link
Collaborator

hsheth2 commented Dec 9, 2024

@sid-acryl let's create some unit tests for that scenario

@sid-acryl
Copy link
Contributor Author

@sid-acryl let's create some unit tests for that scenario

@hsheth2
Added test-case here #12079

@hsheth2 hsheth2 changed the title fix(tableau/ingestion): project_path_pattern use in _is_denied_project fix(ingest/tableau): project_path_pattern use in _is_denied_project Dec 11, 2024
@hsheth2 hsheth2 merged commit d062411 into datahub-project:master Dec 11, 2024
72 of 73 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ingestion PR or Issue related to the ingestion of metadata pending-submitter-merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants