-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: overloaded 'SeqDict.from_sam()' method to support SAM files (#176) #183
base: main
Are you sure you want to change the base?
Conversation
TimD1
commented
Sep 4, 2024
- in addition to pysam SAM headers and SAM header dicts, SeqDicts can now be initialized from SAM file paths and pysam SAM files
- added basic tests for this new functionality
- in addition to pysam SAM headers and SAM header dicts, SeqDicts can now be initialized from SAM file paths and pysam SAM files - added basic tests for this new functionality
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #183 +/- ##
==========================================
+ Coverage 89.35% 89.45% +0.10%
==========================================
Files 18 18
Lines 2094 2095 +1
Branches 464 461 -3
==========================================
+ Hits 1871 1874 +3
Misses 146 146
+ Partials 77 75 -2 ☔ View full report in Codecov by Sentry. |
2f19451
to
bdf1284
Compare
@@ -446,28 +449,41 @@ def to_sam_header( | |||
|
|||
@staticmethod | |||
@overload | |||
def from_sam(header: pysam.AlignmentHeader) -> "SequenceDictionary": ... | |||
def from_sam(data: Path) -> "SequenceDictionary": ... # pragma: no cover |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion Could you exclude overloaded stubs from coverage checks, instead of adding the no cover
pragma to each line?
the list of sequences returned by `pysam.AlignmentHeader#to_dict()["SQ"]`.""" | ||
if isinstance(header, pysam.AlignmentHeader): | ||
return SequenceDictionary.from_sam(header=header.to_dict()["SQ"]) | ||
the list of sequences returned by `pysam.AlignmentHeader.to_dict()["SQ"]`.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue Could you update the docstring to reflect the changes introduced by this PR?
"""
Create a `SequenceDictionary` from a SAM file or its header.
Args:
data: The input may be any of:
- a path to a SAM file
- an open `AlignmentFile`
- the `AlignmentHeader` associated with an `AlignmentFile`
- the contents of the header's `SQ` fields, as returned by `AlignmentHeader.to_dict()`
Returns:
A `SequenceDictionary` mapping reference names to their metadata.
|
||
infos: List[SequenceMetadata] = [ | ||
SequenceMetadata.from_sam(meta=meta, index=index) for index, meta in enumerate(header) | ||
SequenceMetadata.from_sam(meta=meta, index=index) for index, meta in enumerate(data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue Can you make the above series of if
statements an if-elif
chain instead, and include a final branch to verify that data
is a list of dicts here?
I'd also suggest deferring the return
to the end of the chain, e.g.
seq_dict: SequenceDictionary
if isinstance(data, pysam.AlignmentHeader):
seq_dict = ...
elif isinstance(data, pysam.AlignmentFile):
seq_dict = ...
elif isinstance(data, Path):
seq_dict = ...
else:
# assuming `data` is a `list[dict[str, Any]]`
try:
infos = ...
seq_dict = SequenceDictionary(infos=infos)
except:
raise ValueError(f"Could not parse sequence information from data: {data}")
return seq_dict
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion You could probably avoid this file with SamBuilder
and a temp file?
@@ -333,7 +332,10 @@ def test_sequence_dictionary_to_and_from_sam() -> None: | |||
header = pysam.AlignmentHeader.from_dict( | |||
header_dict={"HD": {"VN": "1.5"}, "SQ": mapping, "RG": [{"ID": "foo"}]} | |||
) | |||
|
|||
samfile = Path(__file__).parent / "data" / "sequence.dict" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion it's odd to have a sam file without a .sam
extension, if we keep the data file can we change its extension?