Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add configuration for DESPIAD project #572

Open
wants to merge 35 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 24 commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
54eb64f
Add initial anonymisation config for ct
p-j-smith Dec 12, 2024
2730317
Add initial anonymisation config for pet
p-j-smith Dec 12, 2024
ff2a8c3
Add config for despiad
p-j-smith Dec 12, 2024
f20d5b8
anonymnise all resources before notifying the export api
p-j-smith Dec 12, 2024
d13b916
remove despaid.yaml from project config
p-j-smith Dec 12, 2024
6097314
Merge branch 'main' into paul/despiad-config
p-j-smith Dec 12, 2024
8a4a793
Merge branch 'main' into paul/despiad-config
p-j-smith Dec 16, 2024
01bf793
generate label based on patient id and study count in xnat project
p-j-smith Dec 17, 2024
5a47136
Use pseudo-anonymised StudyInstanceUID for xnat experiment label
p-j-smith Dec 17, 2024
28f92a6
Fix XNAT destination
p-j-smith Dec 17, 2024
901eaf1
Merge branch 'main' into paul/despiad-config
p-j-smith Dec 19, 2024
4fd226a
remove changes related to grouping resources before notifying export api
p-j-smith Dec 19, 2024
c2fcbe4
remove duplicated tags
p-j-smith Dec 19, 2024
5d3c69d
Add series_number_filters and allowed_manufacturers parameters to pix…
p-j-smith Jan 2, 2025
32dd84f
clarify docstring of _import_study_from_raw
p-j-smith Jan 6, 2025
7867eff
Add min_instances_per_series parameter to project config
p-j-smith Jan 6, 2025
7215191
Merge branch 'main' into paul/despiad-config
p-j-smith Jan 6, 2025
70d6794
Merge branch 'main' into paul/despiad-config
p-j-smith Jan 7, 2025
0b90ddf
Merge branch 'main' into paul/despiad-config
p-j-smith Jan 8, 2025
4cd4c7e
Keep study date and patient dob for despiad
p-j-smith Jan 8, 2025
397c6b1
Changes after reviewing the PET data for DESPIAD (#592)
davecash75 Jan 13, 2025
b6bcb3c
Add Radiopharmaceutical Start DateTime to pet.yaml
p-j-smith Jan 13, 2025
ffeb70d
remove blank lines from ct.yaml
p-j-smith Jan 13, 2025
f6095a8
remove tab from config file
p-j-smith Jan 13, 2025
b231a60
filter series number by manufacturer
p-j-smith Jan 14, 2025
5a9c52e
Add allowed_manufacturers for all test configs
p-j-smith Jan 14, 2025
f7d94e1
Count number of instances skipped due to series having too few instances
p-j-smith Jan 15, 2025
1891df0
move get_series_to_skip to dcmd
p-j-smith Jan 15, 2025
9597559
Add philips and carestream as allowed manufacturers for test project
p-j-smith Jan 15, 2025
855f82d
Merge branch 'main' into paul/despiad-config
p-j-smith Jan 15, 2025
5ea6418
Update description of project config in readme
p-j-smith Jan 15, 2025
aa170a6
Check _should_exclude_manufacurer before _should_exclude_series
p-j-smith Jan 21, 2025
f1eed49
filter out instance if manufacturer tag is missing
p-j-smith Jan 21, 2025
46d2109
allow all manufacturers for existing projects
p-j-smith Jan 21, 2025
09748bf
Merge branch 'main' into paul/despiad-config
p-j-smith Jan 21, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,12 @@ The configuration file defines:

- Project name: the `<project-slug>` name of the Project
- The DICOM dataset modalities to retain (e.g. `["DX", "CR"]` for X-Ray studies)
- The minimum number of instances required by a series (defaults to 1). May be set higher than 1 to filter out
series with a single screenshot containing patient identifiable data
- A list of series description filters (e.g. `['loc', 'pos']`). Series with descriptions matching any of these
filters will be skipped
- A list of series number filters (e.g. `[3, 4]`). Series with SeriesNumber matching any of these filters will
be skipped
- The [anonymisation operations](/pixl_dcmd/README.md#tag-scheme-anonymisation) to be applied to the DICOM tags,
by providing a file path to one or multiple YAML files.
We currently allow two types of files:
Expand Down
37 changes: 37 additions & 0 deletions orthanc/orthanc-anon/plugin/pixl.py
Original file line number Diff line number Diff line change
Expand Up @@ -331,6 +331,7 @@ def _anonymise_study_instances(
Return a list of the bytes of anonymised instances, and the anonymised StudyInstanceUID.
"""
config = load_project_config(project_name)
series_to_skip = get_series_to_skip(zipped_study, config.min_instances_per_series)
anonymised_instances_bytes = []
skipped_instance_counts = defaultdict(int)
dicom_validation_errors = {}
Expand All @@ -339,6 +340,15 @@ def _anonymise_study_instances(
with zipped_study.open(file_info) as file:
logger.debug("Reading file {}", file)
dataset = dcmread(file)

if dataset.SeriesInstanceUID in series_to_skip:
logger.warning(
"Skipping series {} for study {} due to too few instances",
dataset.SeriesInstanceUID,
study_info,
)
continue
p-j-smith marked this conversation as resolved.
Show resolved Hide resolved

try:
anonymised_instance, instance_validation_errors = _anonymise_dicom_instance(
dataset, config
Expand Down Expand Up @@ -376,6 +386,33 @@ def _anonymise_study_instances(
return anonymised_instances_bytes, anonymised_study_uid


def get_series_to_skip(zipped_study: ZipFile, min_instances: int) -> set[str]:
p-j-smith marked this conversation as resolved.
Show resolved Hide resolved
"""
Determine which series to skip based on the number of instances in the series.

If a series has fewer instances than `min_instances`, add it to a set of series to skip.

Args:
zipped_study: ZipFile containing the study
min_instances: Minimum number of instances required to include a series

"""
if min_instances <= 1:
return set()

series_instances = {}
for file_info in zipped_study.infolist():
with zipped_study.open(file_info) as file:
logger.debug("Reading file {}", file)
dataset = dcmread(file)
if dataset.SeriesInstanceUID not in series_instances:
series_instances[dataset.SeriesInstanceUID] = 1
continue
series_instances[dataset.SeriesInstanceUID] += 1

return {series for series, count in series_instances.items() if count < min_instances}


def _anonymise_dicom_instance(dataset: pydicom.Dataset, config: PixlConfig) -> tuple[bytes, dict]:
"""Anonymise a DICOM instance."""
validation_errors = anonymise_dicom_and_update_db(dataset, config=config)
Expand Down
40 changes: 34 additions & 6 deletions pixl_core/src/core/project_config/pixl_config_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@

from __future__ import annotations

import re
from enum import Enum
from pathlib import Path
from typing import Any, Optional
Expand Down Expand Up @@ -133,20 +134,47 @@
"""Project-specific configuration for Pixl."""

project: _Project
series_filters: Optional[list[str]] = None
min_instances_per_series: Optional[int] = 1
series_filters: Optional[list[str]] = [] # pydantic makes a deep copy of the empty default list
series_number_filters: Optional[list[str]] = []
allowed_manufacturers: Optional[str] = ".*"
tag_operation_files: TagOperationFiles
destination: _Destination

def is_series_excluded(self, series_description: str) -> bool:
def is_series_description_excluded(self, series_description: str | None) -> bool:
"""
Return whether this config excludes the series with the given description
Return whether this config excludes the series with the given description.

Do a simple case-insensitive substring check - this data is ultimately typed by a human, and
different image sources may have different conventions for case conversion.

:param series_description: the series description to test
:returns: True if it should be excluded, False if not
"""
if self.series_filters is None or series_description is None:
if not self.series_filters or series_description is None:
return False
# Do a simple case-insensitive substring check - this data is ultimately typed by a human,
# and different image sources may have different conventions for case conversion.

return any(
series_description.upper().find(filt.upper()) != -1 for filt in self.series_filters
)

def is_series_number_excluded(self, series_number: str | None) -> bool:
"""
Return whether this config excludes the series with the given number

:param series_number: the series number to test
:returns: True if it should be excluded, False if not
"""
if not self.series_number_filters or series_number is None:
return False

return any(series_number.find(filt) != -1 for filt in self.series_number_filters)

Check warning on line 171 in pixl_core/src/core/project_config/pixl_config_model.py

View check run for this annotation

Codecov / codecov/patch

pixl_core/src/core/project_config/pixl_config_model.py#L171

Added line #L171 was not covered by tests

def is_manufacturer_allowed(self, manufacturer: str) -> bool:
"""
Check whether the manufacturer is in the allow-list.

:param manufacturer: name of the manufacturer
:returns: True is the manufacturer is allowed, False if not
"""
return bool(re.search(rf"{self.allowed_manufacturers}", manufacturer, flags=re.IGNORECASE))
2 changes: 1 addition & 1 deletion pixl_core/tests/project_config/test_project_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -181,4 +181,4 @@ def test_series_filtering(base_yaml_data, series_filters, test_series_desc, expe
if series_filters is not None:
base_yaml_data["series_filters"] = series_filters
cfg = PixlConfig.model_validate(base_yaml_data)
assert cfg.is_series_excluded(test_series_desc) == expect_exclude
assert cfg.is_series_description_excluded(test_series_desc) == expect_exclude
25 changes: 23 additions & 2 deletions pixl_dcmd/src/pixl_dcmd/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,13 +57,31 @@


def _should_exclude_series(dataset: Dataset, cfg: PixlConfig) -> bool:
"""
Check whether the dataset series should be exlucded based on its description
and number.
"""
series_description = dataset.get("SeriesDescription")
if cfg.is_series_excluded(series_description):
if cfg.is_series_description_excluded(series_description):
logger.debug("FILTERING OUT series description: {}", series_description)
return True

series_number = dataset.get("SeriesNumber")
if cfg.is_series_number_excluded(series_number):
logger.debug("FILTERING OUT series number: {}", series_number)
return True

Check warning on line 72 in pixl_dcmd/src/pixl_dcmd/main.py

View check run for this annotation

Codecov / codecov/patch

pixl_dcmd/src/pixl_dcmd/main.py#L71-L72

Added lines #L71 - L72 were not covered by tests

return False


def _should_exclude_manufacturer(dataset: Dataset, cfg: PixlConfig) -> bool:
manufacturer = dataset.get("Manufacturer")
should_exclude = not cfg.is_manufacturer_allowed(manufacturer=manufacturer)
if should_exclude:
logger.debug("FILTERING out manufacturer: {}", manufacturer)

Check warning on line 81 in pixl_dcmd/src/pixl_dcmd/main.py

View check run for this annotation

Codecov / codecov/patch

pixl_dcmd/src/pixl_dcmd/main.py#L81

Added line #L81 was not covered by tests
return should_exclude


def anonymise_dicom_and_update_db(
dataset: Dataset,
*,
Expand Down Expand Up @@ -125,10 +143,13 @@
)

# Do before anonymisation in case someone decides to delete the
# Series Description tag as part of anonymisation.
# Series Description or Manufacturer tags as part of anonymisation.
if _should_exclude_series(dataset, config):
msg = "DICOM instance discarded due to its series description"
raise PixlSkipInstanceError(msg)
if _should_exclude_manufacturer(dataset, config):
msg = "DICOM instance discarded due to its manufacturer"
raise PixlSkipInstanceError(msg)

Check warning on line 152 in pixl_dcmd/src/pixl_dcmd/main.py

View check run for this annotation

Codecov / codecov/patch

pixl_dcmd/src/pixl_dcmd/main.py#L151-L152

Added lines #L151 - L152 were not covered by tests
if dataset.Modality not in config.project.modalities:
msg = f"Dropping DICOM Modality: {dataset.Modality}"
raise PixlSkipInstanceError(msg)
Expand Down
36 changes: 36 additions & 0 deletions projects/configs/despiad.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# Copyright (c) 2024 University College London Hospitals NHS Foundation Trust
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

project:
name: "despiad"
modalities:
- "CT"
p-j-smith marked this conversation as resolved.
Show resolved Hide resolved
- "PT"

tag_operation_files:
base:
- "base.yaml"
- "ct.yaml"
- "pet.yaml"
- "despiad.yaml"
manufacturer_overrides: []

min_instances_per_series: 1
series_filters: []
series_number_filters: []
allowed_manufacturers: ".*"

destination:
dicom: "none"
parquet: "none"
102 changes: 102 additions & 0 deletions projects/configs/tag-operations/ct.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
# Copyright (c) University College London Hospitals NHS Foundation Trust
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

- name: Conversion Type
group: 0x0008
element: 0x0064
op: keep
- name: Spacing Between Slices
group: 0x0018
element: 0x0088
op: keep
- name: Data Collection Diameter
group: 0x0018
element: 0x0090
op: keep
- name: Reconstruction Diameter
group: 0x0018
element: 0x1100
op: keep
- name: Distance Source to Detector
group: 0x0018
element: 0x1110
op: keep
- name: Distance Source to Patient
group: 0x0018
element: 0x1111
op: keep
- name: Gantry Detector Tilt
group: 0x0018
element: 0x1120
op: keep
- name: Table Height
group: 0x0018
element: 0x1130
op: keep
- name: Rotation Direction
group: 0x0018
element: 0x1140
op: keep
- name: Exposure Time
group: 0x0018
element: 0x1150
op: keep
- name: X-Ray Tube Current
group: 0x0018
element: 0x1151
op: keep
- name: Exposure
group: 0x0018
element: 0x1152
op: keep
- name: Filter Type
group: 0x0018
element: 0x1160
op: keep
- name: Generator Power
group: 0x0018
element: 0x1170
op: keep
- name: Convolution Kernel
group: 0x0018
element: 0x1210
op: keep
- name: Revolution Time
group: 0x0018
element: 0x9305
op: keep
- name: Single Collimation Width
group: 0x0018
element: 0x9306
op: keep
- name: Total Collimation Width
group: 0x0018
element: 0x9307
op: keep
- name: Table Speed
group: 0x0018
element: 0x9309
op: keep
- name: Table Feed per Rotation
group: 0x0018
element: 0x9310
op: keep
- name: Spiral Pitch Factor
group: 0x0018
element: 0x9311
op: keep
- name: Slice Location
group: 0x0020
element: 0x1041
op: keep
42 changes: 42 additions & 0 deletions projects/configs/tag-operations/despiad.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# Copyright (c) University College London Hospitals NHS Foundation Trust
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

- name: Study Date
group: 0x0008
element: 0x0020
op: keep
- name: Series Date
group: 0x0008
element: 0x0021
op: keep
- name: Acquisition Date
group: 0x0008
element: 0x0022
op: keep
- name: Series Time
group: 0x0008
element: 0x0031
op: keep
- name: Acquisition Time
group: 0x0008
element: 0x0032
op: keep
- name: Station Name
group: 0x0008
element: 0x1010
op: keep
- name: Patient's Birth Date
group: 0x0010
element: 0x0030
op: keep
Loading