-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New dicom archive #1117
base: main
Are you sure you want to change the base?
New dicom archive #1117
Conversation
Okay, I think the PR is finished, I am happy with the code and tested it on a few DICOMs on my side. The tests are failing because I am using a few We should also test it on DICOMs from other projects. Then, when this PR is merged, I can make another PR to adopt it in the different superscripts, although a problem will be that this script requires the Python profile file, while the scripts that call I will note a few things on the code:
|
Blocked on Python 3.10, as well as #1141 and #1142 to see if this PR should be updated with new libraries. NOTE for myself: Bring a fix from the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maximemulder I read through the code and have a few comments/refactoring suggestions. Let me know your thoughts.
Note: I did not test yet, just went through the code.
def print_error_exit(message: str, code: int): | ||
print(f'ERROR: {message}', file=sys.stderr) | ||
sys.exit(code) | ||
|
||
|
||
def print_warning(message: str): | ||
print(f'WARNING: {message}', file=sys.stderr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you check the log.py library for the messaging?
python/dicom_archive.py
Outdated
|
||
|
||
# Modified version of 'lorisgetopt.load_config_file'. | ||
# We use argparse to parse the command line options in this script, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be best to use the lorisgetopt.py
library to parse arguments since it checks and has a few other loris specific functionalities. Also, it avoids the duplicated code for load_config_file.py
. It also avoids having to add another dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not a fan of lorisgetopt.py
design, but you are probably right that it is best to avoid duplication, I'll do it. Also, argparse
is part of the Python standard library, so I would not consider it another dependency.
@@ -0,0 +1,231 @@ | |||
from datetime import datetime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file should go under the database_lib
directory. This is where we store the DB interaction files. The files are organized by table names and holds the table specific queries. It would be best to keep that organization.
python/dicom_archive.py
Outdated
if db is not None: | ||
print('Checking database presence') | ||
|
||
archive = lib.dicom.dicom_database.get_archive_with_study_uid(db, summary.info.study_uid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe rename to db_archive to make is clear that this is the archive found in the DB?
archive = lib.dicom.dicom_database.get_archive_with_study_uid(db, summary.info.study_uid) | |
db_archive = lib.dicom.dicom_database.get_archive_with_study_uid(db, summary.info.study_uid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense, done !
python/dicom_archive.py
Outdated
|
||
print('Calculating DICOM tar MD5 sum') | ||
|
||
tarball_md5_sum = lib.dicom.text.make_hash(tar_path, True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should use the function compute_md5_hash
in lib.utilities to compute the hash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to do that and encountered two problems:
- The behaviour of
dicomTar.pl
is to not only write the hash, but also the filename ("filename hash"
).make_hash
emulates this behaviour but notcompute_md5_hash
(although I would not be against writing just the hash). compute_md5_hash
also works on non-existing files, whilemake_hash
does not. This makes typing (rightfully) unhappy because I need astr
, not astr | None
(compute_md5_hash
docstring is incorrect with regards to its real return type).
python/lib/dicom/dicom_log.py
Outdated
from lib.dicom.text_dict import DictWriter | ||
|
||
|
||
class Log: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is already a class Log
that exists. I would rename this one DicomLog
instead to make it clear this is for the DICOM log file.
""" | ||
Order comparison between two nullable integers. | ||
""" | ||
match a, b: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After our LORIS-MRI call, I checked with the install instructions and unfortunately, there might be some instances where earlier versions of python might be installed as the default. Two solutions:
- modify the install instructions to make sure that users always use python >3.10
- change the code so that it works with earlier versions of python
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with both options but have a preference for updating the docs for Python 3.10+. It seems that 3.8 should (theoretically) become EOL soon, so we might as well do the transition imo.
case _, None: | ||
return -1 | ||
case None, _: | ||
return 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I understand what the _
stands for and why the returns here are -1 or 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_
is the wildcard pattern, which kind of means "anything works" (so any Some(x)
such as Some(0)
, Some(1)
... work here, since None
was already matched before).
The returns -1
, 1
... are because this is a comparison function, a negative number means that a < b
, 0
that a == b
, and a positive number that a > b
. This is required by cmp_to_key
in summary_make.py
but I'll add some documentation.
python/lib/dicom/summary_make.py
Outdated
def get_dir_files(prefix: str, path: str) -> list[str]: | ||
""" | ||
Recursively get the files of a directory. | ||
""" | ||
if os.path.isdir(prefix + '/' + path): | ||
files = [] | ||
for file in os.listdir(prefix + '/' + path): | ||
files += get_dir_files(prefix, path + '/' + file) | ||
|
||
# Flatten the lists of files | ||
return files | ||
|
||
return [path] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function could be moved to lib.utilities
and could be used by other processes since it is not specific to DICOMs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, done !
python/dicom_archive.py
Outdated
|
||
with open(tar_path, 'rb') as tar: | ||
with gzip.open(zip_path, 'wb') as zip: | ||
shutil.copyfileobj(tar, zip) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could copy_file
from lib.utilities
be used here? Maybe not since you are using copyfileobj. Just wondering
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, the code here gzips the copied file, which the copyfile
does not do.
Thanks for the review @cmadjar ! I wrote some comments on a few points. Points with no comments mean that I agree or will look into it. |
@maximemulder I think if you rebase this one, it will pass the flake8 now that your PR with the python install update has been merged. Thanks! Also, regarding the |
Ok I replied to all your (very good !) comments @cmadjar. The only ones I have intentionally left out are these ones :
If you like the rest of the PR, I can add these very quickly after the next imaging meeting. |
Waiting for the first test implementation to polish it. |
2e7b3c4
to
3503263
Compare
3503263
to
3626bcb
Compare
Description
This PR is a rewrite (and cleaning/modernization) of
dicom-archive/dicomTar.pl
,dicom-archive/dicomSummary.pl
anddicom-archive/DICOM/DCMSUM.pm
in Python, which intends to replace them in the long term. This PR does not replace them for now as we should do more testing and integrate these scripts with the rest of the pipeline before doing so, likely in the next LORIS version imo.Presentation
This PR adds three Python scripts:
dicom_summary.py
, which is the port ofdicomSummary.pl
and generates a textual summary of a DICOM directory (surprise !).dicom_archive.py
, which is the port ofdicomTar.pl
, creates a DICOM directory archive for and (optionally) inserts or updates its information in the LORIS database.To support these scripts, the
python/lib/dicom/
directory was added, which contains various modules used by the aforementioned scripts.summary_*.py
files contain code for the DICOM summary.dicom_log.py
file contains code for the DICOM archiving log.text_*.py
files contains text utilities code used by the two previous points.dicom_database.py
file contains the code used to insert a DICOM directory's information in the database.Breaking changes
I removed the following options from
dicomTar.pl
/dicom_archive.py
, as they did not seem to be used by any other script:-centerName
, which fills thetarchive.neurodbCenterName
field in the database, but seems unused in LORIS and LORIS-MRI.-mri_upload_update
, which launchs theupdateMRI_Upload.pl
script afterdicomTar.pl
and also seems unused (but less confident about that one).-clobber
into--db-update
(to update instead of inserting in the database) and--owerwrite
(to replace files on the disk if they exist) might be a good idea.--year
option, which adds the created DICOM archive in a dedicated year subdirectory (according to the scan date), which is convenient for me at the BIC.I did not port the
tools/addSeriesAndFileRecords.pl
script as it seems to not be used in LORIS / LORIS-MRI, and should maybe be deleted in the long term. It is however very easy to re-code with the new architecture (it is currently already a simple script).PLEASE TELL ME IF ANYONE HAS A PROBLEM WITH THESE CHANGES. Other suggestions are also welcome.
Remaining work
Before merging:
Longer term work:
tarchive
table (remove theneurodbCenterName
/LastUpdated
fields ? makeCenterName
nullable ?).