Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for issue with parsing CSVs with mixed newline delimiters #64

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

jacob-morrison-22
Copy link

@jacob-morrison-22 jacob-morrison-22 commented Jan 13, 2025

One line description of your change (less than 72 characters)

Adds the newline param to the config for papaparse during csv parsing

Problem

Issue described here

Basically if a file has the first line delimited by \r\n but the rest of the file is \n, then papaparse has trouble inferring which delimieter to use.

Solution

This PR uses the suggestion here from the papaparse maintainer to set the newline config param.

This uses input from the user (provided from hpt-validator-cli) to add newline to the config. If no newline option is provided, it is not added to the config and papaparse should continue behaving as before (detecting the newline on its own)

For files with mixed line endings, the papaparse maintainers say you can set \n as the newline delimiter and then subsequently strip out the \r from the resulting rows, but I am not sure the latter part is necessary for this implementation.

Result

Solves issue 29

Test Plan

(Write your test plan here. If you changed any code, please provide us with
clear instructions on how you verified your changes work.)

To verify change works:

  1. Download csv snippets from issue 29
  2. Attempt to validate both with changes in this PR
  3. Verify output is the same for both (no warnings or errors)
  4. Download https://msc.rochesterregional.org/practitionerPortal/docs/161012691_canton-potsdam-hospital_standardcharges.csv that is linked in the above issue
  5. Attempt to validate both with changes in this PR
  6. Verify that validation completes and finds no errors

Steps taken to verify this change does not alter working validations:

  • Select 100 MRFs in our database that were verified to be compliant according to the validator
  • Validate them using changes in this PR
  • Verify that output is the same

To do for more confidence

  • Run on MRFs that were verified to be non-compliant according to the validator
  • Run on a larger selection of both compliant and non-compliant MRFs

@jacob-morrison-22 jacob-morrison-22 changed the title Adds \n as the newline parameter to the config for papaparse during csv parsing Fix for issue with parsing CSVs with mixed newline delimiters Jan 13, 2025
@mint-thompson
Copy link
Collaborator

Thank you for opening this PR! I'll try to get the team to review this as soon as possible. It looks like there are some test issues unrelated to this PR that need to get fixed first (#65) so that the test workflow will succeed, but once those are fixed, I think we will be able to accept this change.

@jacob-morrison-22
Copy link
Author

jacob-morrison-22 commented Jan 16, 2025

@mint-thompson thanks for getting back to me so quickly! Sounds good on that test workflow.

I was thinking about this a little more and I think the safest way to handle this is to keep the current behavior (i.e. papaparse detects the line endings) as the default, only setting a newline if the user specifies one directly.
To that end i've:

  • refactored this PR slightly to only add the newline property to the papaparse config if its supplied from the CLI command.
  • put up another small PR in a fork of hpt-validator-cli that just adds newline as an option for the cli command and passes it to the validation function.

Please let me know if you have any questions, suggestions, or ways for me to be of further assistance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants