Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(tableau-exposer-crawler): using batch API to fetch data #24

Merged
merged 3 commits into from
Jan 25, 2024

Conversation

samanmasarat
Copy link
Contributor

Overview: Improving crawler efficiency

In the current Tableau exposure crawler, we undergo individual retrievals of both native and custom SQLs, along with their respective model references. This process involves iterating over each workbook to access workbook details and owner metadata via the Tableau API. However, this sequential API calling and authentication per request significantly extend the runtime, especially when dealing with a sizable number of workbooks.

Proposed Changes: Using batch approach

This PR introduces a fundamental change by shifting from the existing logic to a more efficient batch approach. Instead of making individual API requests per workbook, we leverage batch APIs to fetch all Tableau workbooks and users (in two individual calls). The retrieved data is then stored in memory, optimizing subsequent steps and reducing the runtime required for processing.

Expected Impact: Runtime reduction

Currently, Voi has 502 workbooks (with native and custom SQL references), resulting in a runtime of approximately 25 minutes for the exposure process. Local testing of this PR indicates a run-time reduction down to 30 seconds.

By merging this PR, we anticipate an improvement in efficiency, streamlining the Tableau exposure process and enhancing overall system performance.

Copy link
Collaborator

@gabby-dol gabby-dol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@samanmasarat samanmasarat merged commit 7074c91 into main Jan 25, 2024
2 checks passed
@samanmasarat samanmasarat deleted the tableau-exposure-crawler-refactor branch January 25, 2024 13:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants