Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(Dataframe): adding default dataframe name to enable sql query on it, simplified dataframe serialization #1523

Merged
merged 3 commits into from
Jan 15, 2025

Conversation

scaliseraoul
Copy link

@scaliseraoul scaliseraoul commented Jan 15, 2025


Important

Enhance DataFrame handling by adding default SQL table names and simplifying serialization, with updates to templates and tests.

  • Behavior:
    • read_csv() in __init__.py now assigns a default table name using sanitize_sql_table_name().
    • DataFrame in base.py assigns a default name if none is provided, using a column hash.
  • Serialization:
    • Simplified serialize_dataframe() in base.py to remove index parameter.
    • DataframeSerializer in dataframe_serializer.py now only supports CSV-like serialization.
  • Templates:
    • Removed index handling in correct_execute_sql_query_usage_error_prompt.tmpl and generate_python_code_with_sql.tmpl.
  • Helpers:
    • Added sql_sanitizer.py for sanitizing SQL table names.
  • Tests:
    • Updated tests in test_agent.py, test_dataframe_serializer.py, and test_sql_sanitizer.py to reflect changes in serialization and SQL sanitization.

This description was created by Ellipsis for fd22f8c. It will automatically update as commits are pushed.

Copy link

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Reviewed everything up to fd22f8c in 1 minute and 10 seconds

More details
  • Looked at 446 lines of code in 12 files
  • Skipped 0 files when reviewing.
  • Skipped posting 8 drafted comments based on config settings.
1. pandasai/core/prompts/templates/correct_execute_sql_query_usage_error_prompt.tmpl:1
  • Draft comment:
    The removal of the index variable is a good simplification. Ensure that this change is consistently applied across all relevant template files.
  • Reason this comment was not posted:
    Confidence changes required: 10%
    The change in the template file removes the unnecessary index variable, which is a good simplification. However, the comment should reflect that this change applies to multiple files.
2. pandasai/core/prompts/templates/generate_python_code_with_sql.tmpl:3
  • Draft comment:
    The removal of the index variable is a good simplification. Ensure that this change is consistently applied across all relevant template files.
  • Reason this comment was not posted:
    Confidence changes required: 10%
    The change in the template file removes the unnecessary index variable, which is a good simplification. However, the comment should reflect that this change applies to multiple files.
3. pandasai/core/prompts/templates/shared/dataframe.tmpl:1
  • Draft comment:
    The removal of the index variable is a good simplification. Ensure that this change is consistently applied across all relevant template files.
  • Reason this comment was not posted:
    Confidence changes required: 10%
    The change in the template file removes the unnecessary index variable, which is a good simplification. However, the comment should reflect that this change applies to multiple files.
4. pandasai/dataframe/base.py:140
  • Draft comment:
    The removal of the DataframeSerializerType enum and related logic simplifies the code. Ensure that this change is consistently applied across all relevant files.
  • Reason this comment was not posted:
    Confidence changes required: 10%
    The removal of the DataframeSerializerType enum and related logic simplifies the code. However, ensure that this change is consistently applied across all relevant files.
5. pandasai/helpers/dataframe_serializer.py:8
  • Draft comment:
    The removal of the DataframeSerializerType enum and related logic simplifies the code. Ensure that this change is consistently applied across all relevant files.
  • Reason this comment was not posted:
    Confidence changes required: 10%
    The removal of the DataframeSerializerType enum and related logic simplifies the code. However, ensure that this change is consistently applied across all relevant files.
6. pandasai/helpers/dataframe_serializer.py:10
  • Draft comment:
    Consider capitalizing 'dataframe' to 'DataFrame' for consistency with Pandas terminology.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The term 'dataframe' should be capitalized as 'DataFrame' to maintain consistency with the Pandas library terminology.
7. pandasai/helpers/dataframe_serializer.py:15
  • Draft comment:
    Consider capitalizing 'dataframe' to 'DataFrame' for consistency with Pandas terminology.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The term 'dataframe' should be capitalized as 'DataFrame' to maintain consistency with the Pandas library terminology.
8. pandasai/helpers/dataframe_serializer.py:12
  • Draft comment:
    Consider capitalizing 'dataframe' to 'DataFrame' for consistency with Pandas terminology.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The term 'dataframe' should be capitalized as 'DataFrame' to maintain consistency with the Pandas library terminology.

Workflow ID: wflow_1j10Mmuu7YLIKI4G


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Jan 15, 2025
@gventuri gventuri merged commit 21fade7 into Sinaptik-AI:release/v3 Jan 15, 2025
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants