Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add configuration for DESPIAD project #572

Open
wants to merge 35 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
54eb64f
Add initial anonymisation config for ct
p-j-smith Dec 12, 2024
2730317
Add initial anonymisation config for pet
p-j-smith Dec 12, 2024
ff2a8c3
Add config for despiad
p-j-smith Dec 12, 2024
f20d5b8
anonymnise all resources before notifying the export api
p-j-smith Dec 12, 2024
d13b916
remove despaid.yaml from project config
p-j-smith Dec 12, 2024
6097314
Merge branch 'main' into paul/despiad-config
p-j-smith Dec 12, 2024
8a4a793
Merge branch 'main' into paul/despiad-config
p-j-smith Dec 16, 2024
01bf793
generate label based on patient id and study count in xnat project
p-j-smith Dec 17, 2024
5a47136
Use pseudo-anonymised StudyInstanceUID for xnat experiment label
p-j-smith Dec 17, 2024
28f92a6
Fix XNAT destination
p-j-smith Dec 17, 2024
901eaf1
Merge branch 'main' into paul/despiad-config
p-j-smith Dec 19, 2024
4fd226a
remove changes related to grouping resources before notifying export api
p-j-smith Dec 19, 2024
c2fcbe4
remove duplicated tags
p-j-smith Dec 19, 2024
5d3c69d
Add series_number_filters and allowed_manufacturers parameters to pix…
p-j-smith Jan 2, 2025
32dd84f
clarify docstring of _import_study_from_raw
p-j-smith Jan 6, 2025
7867eff
Add min_instances_per_series parameter to project config
p-j-smith Jan 6, 2025
7215191
Merge branch 'main' into paul/despiad-config
p-j-smith Jan 6, 2025
70d6794
Merge branch 'main' into paul/despiad-config
p-j-smith Jan 7, 2025
0b90ddf
Merge branch 'main' into paul/despiad-config
p-j-smith Jan 8, 2025
4cd4c7e
Keep study date and patient dob for despiad
p-j-smith Jan 8, 2025
397c6b1
Changes after reviewing the PET data for DESPIAD (#592)
davecash75 Jan 13, 2025
b6bcb3c
Add Radiopharmaceutical Start DateTime to pet.yaml
p-j-smith Jan 13, 2025
ffeb70d
remove blank lines from ct.yaml
p-j-smith Jan 13, 2025
f6095a8
remove tab from config file
p-j-smith Jan 13, 2025
b231a60
filter series number by manufacturer
p-j-smith Jan 14, 2025
5a9c52e
Add allowed_manufacturers for all test configs
p-j-smith Jan 14, 2025
f7d94e1
Count number of instances skipped due to series having too few instances
p-j-smith Jan 15, 2025
1891df0
move get_series_to_skip to dcmd
p-j-smith Jan 15, 2025
9597559
Add philips and carestream as allowed manufacturers for test project
p-j-smith Jan 15, 2025
855f82d
Merge branch 'main' into paul/despiad-config
p-j-smith Jan 15, 2025
5ea6418
Update description of project config in readme
p-j-smith Jan 15, 2025
aa170a6
Check _should_exclude_manufacurer before _should_exclude_series
p-j-smith Jan 21, 2025
f1eed49
filter out instance if manufacturer tag is missing
p-j-smith Jan 21, 2025
46d2109
allow all manufacturers for existing projects
p-j-smith Jan 21, 2025
09748bf
Merge branch 'main' into paul/despiad-config
p-j-smith Jan 21, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -147,6 +147,13 @@ The configuration file defines:

- Project name: the `<project-slug>` name of the Project
- The DICOM dataset modalities to retain (e.g. `["DX", "CR"]` for X-Ray studies)
- The minimum number of instances required by a series (defaults to 1). May be set higher than 1 to filter out
series with a single screenshot containing patient identifiable data
- A list of series description filters (e.g. `['loc', 'pos']`). Series with descriptions matching any of these
filters will be skipped
- A list of allowed manufacturers. By default, no manufacturers are allowed. For each manufacturer:
- A regex to identify the allowed manufacturer (e.g. `^philips`)
- A list of series numbers to exclude for the given manufacturer (e.g. `[3, 4]`)
- The [anonymisation operations](/pixl_dcmd/README.md#tag-scheme-anonymisation) to be applied to the DICOM tags,
by providing a file path to one or multiple YAML files.
We currently allow two types of files:
Expand Down
2 changes: 1 addition & 1 deletion cli/src/pixl_cli/_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@
"password": config("PIXL_DB_PASSWORD"),
"database": config("PIXL_DB_NAME"),
},
} # type: dict
}


class APIConfig:
Expand Down
13 changes: 13 additions & 0 deletions orthanc/orthanc-anon/plugin/pixl.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@
from pixl_dcmd.dicom_helpers import get_study_info
from pixl_dcmd.main import (
anonymise_dicom_and_update_db,
get_series_to_skip,
parse_validation_results,
write_dataset_to_bytes,
)
Expand Down Expand Up @@ -331,6 +332,7 @@ def _anonymise_study_instances(
Return a list of the bytes of anonymised instances, and the anonymised StudyInstanceUID.
"""
config = load_project_config(project_name)
series_to_skip = get_series_to_skip(zipped_study, config.min_instances_per_series)
anonymised_instances_bytes = []
skipped_instance_counts = defaultdict(int)
dicom_validation_errors = {}
Expand All @@ -339,6 +341,17 @@ def _anonymise_study_instances(
with zipped_study.open(file_info) as file:
logger.debug("Reading file {}", file)
dataset = dcmread(file)

if dataset.SeriesInstanceUID in series_to_skip:
logger.debug(
"Skipping series {} for study {} due to too few instances",
dataset.SeriesInstanceUID,
study_info,
)
key = "DICOM instance discarded as series has too few instances"
skipped_instance_counts[key] += 1
continue

try:
anonymised_instance, instance_validation_errors = _anonymise_dicom_instance(
dataset, config
Expand Down
69 changes: 63 additions & 6 deletions pixl_core/src/core/project_config/pixl_config_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@

from __future__ import annotations

import re
from enum import Enum
from pathlib import Path
from typing import Any, Optional
Expand Down Expand Up @@ -61,6 +62,16 @@
modalities: list[str]


class Manufacturer(BaseModel):
"""
An allowed manufacturer for a project.
Also defines which series numbers to exclude for this manufacturer.
"""

regex: str = "no manufacturers allowed ^"
exclude_series_numbers: list[str] = []


class TagOperationFiles(BaseModel):
"""Tag operations files for a project. At least a base file is required."""

Expand Down Expand Up @@ -133,20 +144,66 @@
"""Project-specific configuration for Pixl."""

project: _Project
series_filters: Optional[list[str]] = None
min_instances_per_series: Optional[int] = 1
series_filters: Optional[list[str]] = [] # pydantic makes a deep copy of the empty default list
allowed_manufacturers: list[Manufacturer] = [Manufacturer()]
tag_operation_files: TagOperationFiles
destination: _Destination

def is_series_excluded(self, series_description: str) -> bool:
def is_series_description_excluded(self, series_description: str | None) -> bool:
"""
Return whether this config excludes the series with the given description
Return whether this config excludes the series with the given description.

Do a simple case-insensitive substring check - this data is ultimately typed by a human, and
different image sources may have different conventions for case conversion.

:param series_description: the series description to test
:returns: True if it should be excluded, False if not
"""
if self.series_filters is None or series_description is None:
if not self.series_filters or series_description is None:
return False
# Do a simple case-insensitive substring check - this data is ultimately typed by a human,
# and different image sources may have different conventions for case conversion.

return any(
series_description.upper().find(filt.upper()) != -1 for filt in self.series_filters
)

def is_series_number_excluded(self, manufacturer: str, series_number: str | None) -> bool:
"""
Return whether this config excludes the series with the given number for the given
manufacturer.

:param manufacturer: the manufacturer to test
:param series_number: the series number to test
:returns: True if it should be excluded, False if not
"""
if not self.is_manufacturer_allowed(manufacturer) or series_number is None:
return False

Check warning on line 180 in pixl_core/src/core/project_config/pixl_config_model.py

View check run for this annotation

Codecov / codecov/patch

pixl_core/src/core/project_config/pixl_config_model.py#L180

Added line #L180 was not covered by tests

exclude_series_numbers = self._get_manufacturer(manufacturer).exclude_series_numbers
return any(series_number.find(filt) != -1 for filt in exclude_series_numbers)

def is_manufacturer_allowed(self, manufacturer: str) -> bool:
"""
Check whether the manufacturer is in the allow-list.

:param manufacturer: name of the manufacturer
:returns: True is the manufacturer is allowed, False if not
"""
for manufacturer_config in self.allowed_manufacturers:
if re.search(rf"{manufacturer_config.regex}", manufacturer, flags=re.IGNORECASE):
return True
return False

Check warning on line 195 in pixl_core/src/core/project_config/pixl_config_model.py

View check run for this annotation

Codecov / codecov/patch

pixl_core/src/core/project_config/pixl_config_model.py#L195

Added line #L195 was not covered by tests

def _get_manufacturer(self, manufacturer: str) -> Manufacturer:
"""
Get the manufacturer configuration for the given manufacturer.

:param manufacturer: name of the manufacturer
:returns: Manufacturer configuration
:raises: ValueError: if the manufacturer is not allowed
"""
for manufacturer_config in self.allowed_manufacturers:
if re.search(rf"{manufacturer_config.regex}", manufacturer, flags=re.IGNORECASE):
return manufacturer_config
msg = f"Manufacturer {manufacturer} is not allowed by project {self.project.name}"
raise ValueError(msg)

Check warning on line 209 in pixl_core/src/core/project_config/pixl_config_model.py

View check run for this annotation

Codecov / codecov/patch

pixl_core/src/core/project_config/pixl_config_model.py#L208-L209

Added lines #L208 - L209 were not covered by tests
2 changes: 1 addition & 1 deletion pixl_core/tests/project_config/test_project_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -181,4 +181,4 @@ def test_series_filtering(base_yaml_data, series_filters, test_series_desc, expe
if series_filters is not None:
base_yaml_data["series_filters"] = series_filters
cfg = PixlConfig.model_validate(base_yaml_data)
assert cfg.is_series_excluded(test_series_desc) == expect_exclude
assert cfg.is_series_description_excluded(test_series_desc) == expect_exclude
69 changes: 65 additions & 4 deletions pixl_dcmd/src/pixl_dcmd/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
import typing
from functools import lru_cache
from io import BytesIO
from zipfile import ZipFile

import requests
from core.exceptions import PixlSkipInstanceError
Expand All @@ -26,7 +27,7 @@
anonymize_dataset,
)
from loguru import logger
from pydicom import DataElement, Dataset, dcmwrite
from pydicom import DataElement, Dataset, dcmread, dcmwrite

from core.project_config.pixl_config_model import PixlConfig
from pixl_dcmd._database import (
Expand Down Expand Up @@ -56,14 +57,71 @@
return buffer.read()


def get_series_to_skip(zipped_study: ZipFile, min_instances: int) -> set[str]:
"""
Determine which series to skip based on the number of instances in the series.

If a series has fewer instances than `min_instances`, add it to a set of series to skip.

Args:
zipped_study: ZipFile containing the study
min_instances: Minimum number of instances required to include a series

"""
if min_instances <= 1:
return set()

Check warning on line 72 in pixl_dcmd/src/pixl_dcmd/main.py

View check run for this annotation

Codecov / codecov/patch

pixl_dcmd/src/pixl_dcmd/main.py#L71-L72

Added lines #L71 - L72 were not covered by tests

series_instances = {}
for file_info in zipped_study.infolist():
with zipped_study.open(file_info) as file:
logger.debug("Reading file {}", file)
dataset = dcmread(file)
if dataset.SeriesInstanceUID not in series_instances:
series_instances[dataset.SeriesInstanceUID] = 1
continue
series_instances[dataset.SeriesInstanceUID] += 1

Check warning on line 82 in pixl_dcmd/src/pixl_dcmd/main.py

View check run for this annotation

Codecov / codecov/patch

pixl_dcmd/src/pixl_dcmd/main.py#L74-L82

Added lines #L74 - L82 were not covered by tests

return {

Check warning on line 84 in pixl_dcmd/src/pixl_dcmd/main.py

View check run for this annotation

Codecov / codecov/patch

pixl_dcmd/src/pixl_dcmd/main.py#L84

Added line #L84 was not covered by tests
series for series, count in series_instances.items() if count < min_instances
}


def _should_exclude_series(dataset: Dataset, cfg: PixlConfig) -> bool:
"""
Check whether the dataset series should be exlucded based on its description
and number.
"""
series_description = dataset.get("SeriesDescription")
if cfg.is_series_excluded(series_description):
if cfg.is_series_description_excluded(series_description):
logger.debug("FILTERING OUT series description: {}", series_description)
return True

manufacturer = dataset.get("Manufacturer")
series_number = dataset.get("SeriesNumber")
if cfg.is_series_number_excluded(
manufacturer=manufacturer, series_number=series_number
):
logger.debug(

Check warning on line 104 in pixl_dcmd/src/pixl_dcmd/main.py

View check run for this annotation

Codecov / codecov/patch

pixl_dcmd/src/pixl_dcmd/main.py#L104

Added line #L104 was not covered by tests
"FILTERING OUT series number: {} for manufacturer: {}",
series_number,
manufacturer,
)
return True

Check warning on line 109 in pixl_dcmd/src/pixl_dcmd/main.py

View check run for this annotation

Codecov / codecov/patch

pixl_dcmd/src/pixl_dcmd/main.py#L109

Added line #L109 was not covered by tests

return False


def _should_exclude_manufacturer(dataset: Dataset, cfg: PixlConfig) -> bool:
manufacturer = dataset.get("Manufacturer")
if manufacturer is None:
logger.debug("FILTERING out as manufacturer tag is missing")

Check warning on line 117 in pixl_dcmd/src/pixl_dcmd/main.py

View check run for this annotation

Codecov / codecov/patch

pixl_dcmd/src/pixl_dcmd/main.py#L117

Added line #L117 was not covered by tests

should_exclude = not cfg.is_manufacturer_allowed(manufacturer=manufacturer)
if should_exclude:
logger.debug("FILTERING out manufacturer: {}", manufacturer)

Check warning on line 121 in pixl_dcmd/src/pixl_dcmd/main.py

View check run for this annotation

Codecov / codecov/patch

pixl_dcmd/src/pixl_dcmd/main.py#L121

Added line #L121 was not covered by tests
return should_exclude


def anonymise_dicom_and_update_db(
dataset: Dataset,
*,
Expand Down Expand Up @@ -125,9 +183,12 @@
)

# Do before anonymisation in case someone decides to delete the
# Series Description tag as part of anonymisation.
# Series Description or Manufacturer tags as part of anonymisation.
if _should_exclude_manufacturer(dataset, config):
msg = "DICOM instance discarded due to its manufacturer"
raise PixlSkipInstanceError(msg)

Check warning on line 189 in pixl_dcmd/src/pixl_dcmd/main.py

View check run for this annotation

Codecov / codecov/patch

pixl_dcmd/src/pixl_dcmd/main.py#L188-L189

Added lines #L188 - L189 were not covered by tests
if _should_exclude_series(dataset, config):
msg = "DICOM instance discarded due to its series description"
msg = "DICOM instance discarded due to its series description or number"

Check warning on line 191 in pixl_dcmd/src/pixl_dcmd/main.py

View check run for this annotation

Codecov / codecov/patch

pixl_dcmd/src/pixl_dcmd/main.py#L191

Added line #L191 was not covered by tests
raise PixlSkipInstanceError(msg)
if dataset.Modality not in config.project.modalities:
msg = f"Dropping DICOM Modality: {dataset.Modality}"
Expand Down
4 changes: 3 additions & 1 deletion pixl_dcmd/tests/test_main.py
Original file line number Diff line number Diff line change
Expand Up @@ -194,7 +194,9 @@ def ids_for_parameterised_test(val: pathlib.Path) -> str:


@pytest.mark.parametrize(
("yaml_file"), PROJECT_CONFIGS_DIR.glob("*.yaml"), ids=ids_for_parameterised_test
("yaml_file"),
PROJECT_CONFIGS_DIR.glob("*.yaml"),
ids=ids_for_parameterised_test,
)
def test_anonymise_and_validate_dicom(caplog, request, yaml_file) -> None:
"""
Expand Down
39 changes: 39 additions & 0 deletions projects/configs/despiad.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# Copyright (c) 2024 University College London Hospitals NHS Foundation Trust
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

project:
name: "despiad"
modalities:
- "CT"
p-j-smith marked this conversation as resolved.
Show resolved Hide resolved
- "PT"

tag_operation_files:
base:
- "base.yaml"
- "ct.yaml"
- "pet.yaml"
- "despiad.yaml"
manufacturer_overrides: []

allowed_manufacturers:
- regex: ".*"
exclude_series_numbers: []

min_instances_per_series: 1

series_filters: []

destination:
dicom: "xnat"
parquet: "none"
6 changes: 6 additions & 0 deletions projects/configs/ms-pinpoint.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,12 @@ tag_operation_files:
- "ms-pinpoint.yaml"
manufacturer_overrides: ["mri.yaml"]

allowed_manufacturers:
- regex: ".*"
exclude_series_numbers: []

min_instances_per_series: 1

series_filters:
- "localizer"
- "localiser"
Expand Down
6 changes: 6 additions & 0 deletions projects/configs/prognosis-ai.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,12 @@ tag_operation_files:
- "ion-neuro-db.yaml"
manufacturer_overrides: ["mri.yaml"]

allowed_manufacturers:
- regex: ".*"
exclude_series_numbers: []

min_instances_per_series: 1

series_filters:
- "localizer"
- "localiser"
Expand Down
Loading
Loading