Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(Dataframe): add classmethod get_default_schema to generate default schema #1525

Merged
merged 2 commits into from
Jan 16, 2025

Conversation

scaliseraoul
Copy link

@scaliseraoul scaliseraoul commented Jan 16, 2025

refactor pai.create to accept name, description, columns, removes cache from datasetloader


Important

Add get_default_schema to DataFrame, refactor create to accept additional parameters, and remove caching from DatasetLoader.

  • Behavior:
    • Add get_default_schema class method to DataFrame to generate default schema.
    • Refactor create function in __init__.py to accept name, description, and columns parameters.
    • Remove caching functionality from DatasetLoader in loader.py.
  • Tests:
    • Update tests in test_loader.py, test_pandasai_init.py, and other test files to cover new create function parameters and get_default_schema method.
    • Remove tests related to caching in test_loader.py.
  • Misc:
    • Remove Destination import from base.py and other files where it is no longer used.

This description was created by Ellipsis for 5e38b30. It will automatically update as commits are pushed.

…lt schema, refactor pai.create to accept name, description, columns, removes cache from datasetloader
Copy link
Collaborator

@gventuri gventuri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All great, just a minor comment!

assert loader._is_cache_valid("dummy_path") is False

def test_read_cache_parquet(self, sample_schema):
def test_read_csv_or_parquet_parquet(self, sample_schema):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be def test_read_csv_or_parquet?

@scaliseraoul scaliseraoul marked this pull request as ready for review January 16, 2025 15:05
@dosubot dosubot bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Jan 16, 2025
Copy link

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Reviewed everything up to 5e38b30 in 1 minute and 19 seconds

More details
  • Looked at 823 lines of code in 7 files
  • Skipped 0 files when reviewing.
  • Skipped posting 3 drafted comments based on config settings.
1. pandasai/__init__.py:46
  • Draft comment:
    Add a note in the docstring to specify that each dictionary in columns must contain at least the 'name' key, and optionally 'type' and 'description'.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The PR refactors the create function to accept additional parameters like name, description, and columns. However, the columns parameter is expected to be a list of dictionaries with specific keys. If the dictionaries do not have the required keys, it will raise an error when trying to create Column objects. This should be documented in the function's docstring to inform users of the expected structure.
2. pandasai/data_loader/loader.py:43
  • Draft comment:
    Caching functionality has been removed. Ensure this change is intentional and consider the performance implications of loading datasets without caching.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The PR removes caching functionality from the DatasetLoader class. This is a significant change that could impact performance if datasets are frequently loaded from local sources. It's important to ensure that this change is intentional and that the performance implications are acceptable.
3. pandasai/__init__.py:91
  • Draft comment:
    Ensure that the schema has default values for name and description to avoid potential AttributeError if they are accessed without being set.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The PR modifies the create function to update the schema's name and description based on the provided parameters. However, it does not handle the case where the schema might not have a name or description initially, which could lead to unexpected behavior if these attributes are accessed later.

Workflow ID: wflow_U2xCzZ0U4TdDvt7y


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

@gventuri gventuri merged commit c5a2a13 into Sinaptik-AI:release/v3 Jan 16, 2025
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:XL This PR changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants