Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(Dataframe): pull method to fetch dataset from remote server #1446

Merged
merged 10 commits into from
Dec 5, 2024

Conversation

ArslanSaleem
Copy link
Collaborator

@ArslanSaleem ArslanSaleem commented Dec 4, 2024

Important

Adds remote dataset fetching with a new pull method, updates data loading, and introduces DatasetNotFound exception.

  • Behavior:
    • Adds pull method in DataFrame class in base.py to fetch datasets from a remote server.
    • Updates load function in __init__.py to fetch datasets from a remote server if not found locally.
    • Introduces DatasetNotFound exception in exceptions.py for handling missing datasets.
  • Helpers:
    • Adds get_pandaai_session function in request.py to create a session with API key and URL.
  • Data Loading:
    • Modifies _read_cache in loader.py to include path in DataFrame initialization for cache reading.
  • Misc:
    • Changes in bamboo_vectorstore.py to handle API responses correctly.

This description was created by Ellipsis for ab21002. It will automatically update as commits are pushed.

@ArslanSaleem ArslanSaleem requested a review from gventuri December 4, 2024 20:41
@dosubot dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Dec 4, 2024
@gventuri gventuri marked this pull request as draft December 5, 2024 08:25
@gventuri gventuri marked this pull request as ready for review December 5, 2024 08:26
Copy link

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested. Reviewed everything up to eebfcb5 in 1 minute and 34 seconds

More details
  • Looked at 192 lines of code in 5 files
  • Skipped 0 files when reviewing.
  • Skipped posting 2 drafted comments based on config settings.
1. pandasai/dataframe/base.py:262
  • Draft comment:
    Consider using the get_pandaai_session() function to obtain a session and make the request instead of using requests.get directly. This ensures consistent request handling across the codebase.
  • Reason this comment was not posted:
    Marked as duplicate.
2. pandasai/dataframe/base.py:258
  • Draft comment:
    Check if api_key and api_url are None before using them, and raise a PandasAIApiKeyError if they are not set. This prevents potential TypeError when constructing headers or making requests.
  • Reason this comment was not posted:
    Marked as duplicate.

Workflow ID: wflow_Bi96VOywUjApszj1


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

api_url = os.environ.get("PANDAAI_API_URL", None)
headers = {"accept": "application/json", "x-authorization": f"Bearer {api_key}"}

file_data = requests.get(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using the get_pandaai_session() function to obtain a session and make the request instead of using requests.get directly. This ensures consistent request handling across the codebase.

@@ -74,6 +81,21 @@
DataFrame: A new PandasAI DataFrame instance with loaded data.
"""
global _dataset_loader
dataset_full_path = os.path.join(find_project_root(), "datasets", dataset_path)
if not os.path.exists(dataset_full_path):
api_key = os.environ.get("PANDAAI_API_KEY", None)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check if api_key and api_url are None before using them, and raise a PandasAIApiKeyError if they are not set. This prevents potential TypeError when constructing headers or making requests.

Copy link

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested. Incremental review on 5fd17ac in 1 minute and 2 seconds

More details
  • Looked at 212 lines of code in 5 files
  • Skipped 0 files when reviewing.
  • Skipped posting 5 drafted comments based on config settings.
1. pandasai/dataframe/base.py:261
  • Draft comment:
    The error message in PandasAIApiKeyError is misleading. It mentions pushing datasets, but the context is about pulling datasets. Consider changing it to "Set PANDAAI_API_URL and PANDAAI_API_KEY in environment to pull dataset from the remote server."
  • Reason this comment was not posted:
    Marked as duplicate.
2. pandasai/helpers/request.py:35
  • Draft comment:
    The error message in PandasAIApiKeyError is misleading. It mentions pushing datasets, but the context is about pulling datasets. Consider changing it to "Set PANDAAI_API_URL and PANDAAI_API_KEY in environment to pull dataset from the remote server."
  • Reason this comment was not posted:
    Marked as duplicate.
3. pandasai/__init__.py:91
  • Draft comment:
            "Set PANDAAI_API_URL and PANDAAI_API_KEY in environment to pull dataset from the remote server"
  • Reason this comment was not posted:
    Marked as duplicate.
4. pandasai/dataframe/base.py:262
  • Draft comment:
            "Set PANDAAI_API_URL and PANDAAI_API_KEY in environment to pull dataset from the remote server"
  • Reason this comment was not posted:
    Marked as duplicate.
5. pandasai/helpers/request.py:36
  • Draft comment:
"Set PANDAAI_API_URL and PANDAAI_API_KEY in environment to pull dataset from the remote server"
  • Reason this comment was not posted:
    Marked as duplicate.

Workflow ID: wflow_vULjCRdxqgSJJMf9


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

api_url = os.environ.get("PANDAAI_API_URL", None)
if not api_url or not api_key:
raise PandasAIApiKeyError(
"Set PANDAAI_API_URL and PANDAAI_API_KEY in environment to push dataset to the remote server"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message in PandasAIApiKeyError is misleading. It mentions pushing datasets, but the context is about pulling datasets. Consider changing it to "Set PANDAAI_API_URL and PANDAAI_API_KEY in environment to pull dataset from the remote server."

Copy link

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested. Incremental review on ab21002 in 48 seconds

More details
  • Looked at 41 lines of code in 3 files
  • Skipped 0 files when reviewing.
  • Skipped posting 2 drafted comments based on config settings.
1. pandasai/dataframe/base.py:261
  • Draft comment:
                "Set PANDAAI_API_URL and PANDAAI_API_KEY in environment to pull dataset from the remote server"
  • Reason this comment was not posted:
    Marked as duplicate.
2. pandasai/helpers/request.py:106
  • Draft comment:
            "Set PANDAAI_API_URL and PANDAAI_API_KEY in environment to push/pull dataset from the remote server"
  • Reason this comment was not posted:
    Comment looked like it was already resolved.

Workflow ID: wflow_kbXHVA7JgpmqAh5q


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

pandasai/__init__.py Outdated Show resolved Hide resolved
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:M This PR changes 30-99 lines, ignoring generated files. labels Dec 5, 2024
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
@gventuri gventuri merged commit 61773a8 into release/v3 Dec 5, 2024
0 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants