chore(tableau-exposer-crawler): using batch API to fetch data #24
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Overview: Improving crawler efficiency
In the current Tableau exposure crawler, we undergo individual retrievals of both native and custom SQLs, along with their respective model references. This process involves iterating over each workbook to access workbook details and owner metadata via the Tableau API. However, this sequential API calling and authentication per request significantly extend the runtime, especially when dealing with a sizable number of workbooks.
Proposed Changes: Using batch approach
This PR introduces a fundamental change by shifting from the existing logic to a more efficient batch approach. Instead of making individual API requests per workbook, we leverage batch APIs to fetch all Tableau workbooks and users (in two individual calls). The retrieved data is then stored in memory, optimizing subsequent steps and reducing the runtime required for processing.
Expected Impact: Runtime reduction
Currently, Voi has 502 workbooks (with native and custom SQL references), resulting in a runtime of approximately 25 minutes for the exposure process. Local testing of this PR indicates a run-time reduction down to 30 seconds.
By merging this PR, we anticipate an improvement in efficiency, streamlining the Tableau exposure process and enhancing overall system performance.