Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add realistic test data #24

Merged
merged 38 commits into from
Aug 27, 2024
Merged

Add realistic test data #24

merged 38 commits into from
Aug 27, 2024

Conversation

BaptisteBR
Copy link
Contributor

@BaptisteBR BaptisteBR commented Aug 16, 2024

Add OMOP dummy data (for MEASUREMENT and OBSERVATION tables).
New script to import dummy data.
New script to produce realistic data.
Import realistic test CSVs

Fixes #14

@milanmlft milanmlft self-requested a review August 19, 2024 08:32
Copy link
Member

@milanmlft milanmlft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for getting this in @BaptisteBR! I just have a minor worry about the test data not being reproducible. Could you have a look at that? Some more documentation would also be nice 🙏

dev/test_db/produce_test_data.R Show resolved Hide resolved
dev/test_db/produce_test_data.R Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
dev/test_db/produce_test_data.R Show resolved Hide resolved
dev/test_db/produce_test_data.R Outdated Show resolved Hide resolved
dev/test_db/insert_dummy_tables.R Outdated Show resolved Hide resolved
@milanmlft milanmlft changed the title baptistebr/realistic test data Add realistic test data Aug 22, 2024
@milanmlft milanmlft force-pushed the baptistebr/realistic-test-data branch from 71fd240 to ed59327 Compare August 23, 2024 11:12
@milanmlft
Copy link
Member

@BaptisteBR I redacted the git history so that the sensitive data from the dummy CSV files is removed. I'm going to do a few last polishing steps and then merge this in main. But please do have a double check whether all trace to the sensitive bits has been removed! 🙏

@milanmlft
Copy link
Member

milanmlft commented Aug 23, 2024

After playing around with re-running the pre-processing scripts, I noticed that it's not just the ordering of the data that is different, but I get actually different results for the summary statistics for some of the concepts, so there was clearly something going wrong somewhere along the pre-processing pipeline.

I traced it back to the following bug: in insert_dummy_tables.R: reading the data back from the database after writing to it gave different data then what was used to write. Not sure what the cause of this is, but when I changed the table writing to have overwrite = TRUE, the bug disappeared (see individual comments above).

So I assume it had something to do with the database not getting properly updated with the data that we want, leaving it in an inconsistent state? I also updated this in analyse_omop_cdm.R when writing the result tables to the database.

I added some tests in test-utils_get_data.R to verify whether the dummy data is consistent. This is just as an additional check to flag when the data has changed (git should also pick that up, but it's always possible to accidentally commit changed data).

The good thing is that the app now works with the test data! 🎉
image

Copy link
Member

@milanmlft milanmlft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleaned up a few more bits and made some fixes to make sure we always get the same data.

@BaptisteBR I think this is good to go now, but I'll leave it up to you to merge after reviewing my changes and only if you agree with them 🙂
Here's a brief summary of what I did:

  • Changed to overwrite = TRUE when writing tables to the database, to ensure consistency
  • Used dummy data in the data getters for development and added some sanity tests
  • Added some additional sanity checks in insert_dummy_tables.R to make sure the data is consistent
  • Subsetting of test data
  • Added some logging messages to the scripts

I added individual comments to the relevant lines to make it easier to see what changed.

Copy link
Contributor

@stefpiatek stefpiatek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking nice

@BaptisteBR
Copy link
Contributor Author

It seems good! I'll try to produce more logging and tests for my next contributions

@BaptisteBR BaptisteBR enabled auto-merge (squash) August 27, 2024 15:36
Copy link
Contributor Author

@BaptisteBR BaptisteBR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review from BaptisteBR before merging the pull request

@BaptisteBR BaptisteBR merged commit 5e08bb8 into main Aug 27, 2024
2 checks passed
@BaptisteBR BaptisteBR deleted the baptistebr/realistic-test-data branch August 27, 2024 15:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add realistic test data for app development
3 participants