fix(Dataframe): adding default dataframe name to enable sql query on it, simplified dataframe serialization #1523

scaliseraoul · 2025-01-15T16:48:52Z

Important

Enhance DataFrame handling by adding default SQL table names and simplifying serialization, with updates to templates and tests.

Behavior:
- read_csv() in __init__.py now assigns a default table name using sanitize_sql_table_name().
- DataFrame in base.py assigns a default name if none is provided, using a column hash.
Serialization:
- Simplified serialize_dataframe() in base.py to remove index parameter.
- DataframeSerializer in dataframe_serializer.py now only supports CSV-like serialization.
Templates:
- Removed index handling in correct_execute_sql_query_usage_error_prompt.tmpl and generate_python_code_with_sql.tmpl.
Helpers:
- Added sql_sanitizer.py for sanitizing SQL table names.
Tests:
- Updated tests in test_agent.py, test_dataframe_serializer.py, and test_sql_sanitizer.py to reflect changes in serialization and SQL sanitization.

^{This description was created by}^{for fd22f8c. It will automatically update as commits are pushed.}

…it, simplified dataframe serialization

ellipsis-dev

👍 Looks good to me! Reviewed everything up to fd22f8c in 1 minute and 10 seconds

More details

1. pandasai/core/prompts/templates/correct_execute_sql_query_usage_error_prompt.tmpl:1

Draft comment:
The removal of the index variable is a good simplification. Ensure that this change is consistently applied across all relevant template files.
Reason this comment was not posted:
Confidence changes required: 10%
The change in the template file removes the unnecessary index variable, which is a good simplification. However, the comment should reflect that this change applies to multiple files.

2. pandasai/core/prompts/templates/generate_python_code_with_sql.tmpl:3

Draft comment:
The removal of the index variable is a good simplification. Ensure that this change is consistently applied across all relevant template files.
Reason this comment was not posted:
Confidence changes required: 10%
The change in the template file removes the unnecessary index variable, which is a good simplification. However, the comment should reflect that this change applies to multiple files.

3. pandasai/core/prompts/templates/shared/dataframe.tmpl:1

Draft comment:
The removal of the index variable is a good simplification. Ensure that this change is consistently applied across all relevant template files.
Reason this comment was not posted:
Confidence changes required: 10%
The change in the template file removes the unnecessary index variable, which is a good simplification. However, the comment should reflect that this change applies to multiple files.

4. pandasai/dataframe/base.py:140

Draft comment:
The removal of the DataframeSerializerType enum and related logic simplifies the code. Ensure that this change is consistently applied across all relevant files.
Reason this comment was not posted:
Confidence changes required: 10%
The removal of the DataframeSerializerType enum and related logic simplifies the code. However, ensure that this change is consistently applied across all relevant files.

5. pandasai/helpers/dataframe_serializer.py:8

Draft comment:
The removal of the DataframeSerializerType enum and related logic simplifies the code. Ensure that this change is consistently applied across all relevant files.
Reason this comment was not posted:
Confidence changes required: 10%
The removal of the DataframeSerializerType enum and related logic simplifies the code. However, ensure that this change is consistently applied across all relevant files.

6. pandasai/helpers/dataframe_serializer.py:10

Draft comment:
Consider capitalizing 'dataframe' to 'DataFrame' for consistency with Pandas terminology.
Reason this comment was not posted:
Confidence changes required: 50%
The term 'dataframe' should be capitalized as 'DataFrame' to maintain consistency with the Pandas library terminology.

7. pandasai/helpers/dataframe_serializer.py:15

Draft comment:
Consider capitalizing 'dataframe' to 'DataFrame' for consistency with Pandas terminology.
Reason this comment was not posted:
Confidence changes required: 50%
The term 'dataframe' should be capitalized as 'DataFrame' to maintain consistency with the Pandas library terminology.

8. pandasai/helpers/dataframe_serializer.py:12

Draft comment:
Consider capitalizing 'dataframe' to 'DataFrame' for consistency with Pandas terminology.
Reason this comment was not posted:
Confidence changes required: 50%
The term 'dataframe' should be capitalized as 'DataFrame' to maintain consistency with the Pandas library terminology.

Workflow ID: wflow_1j10Mmuu7YLIKI4G

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

fix(Dataframe): adding default dataframe name to enable sql query on …

fd22f8c

…it, simplified dataframe serialization

ellipsis-dev bot reviewed Jan 15, 2025

View reviewed changes

dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Jan 15, 2025

scaliseraoul added 2 commits January 15, 2025 18:05

fix(DataframeSerializer): fixing test failure on windows

72d4732

fix(DataframeSerializer): fixing test failure on windows

3aa6a35

gventuri merged commit 21fade7 into Sinaptik-AI:release/v3 Jan 15, 2025
12 checks passed

Provide feedback