Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: overloaded 'SeqDict.from_sam()' method to support SAM files (#176) #183

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

TimD1
Copy link
Contributor

@TimD1 TimD1 commented Sep 4, 2024

  • in addition to pysam SAM headers and SAM header dicts, SeqDicts can now be initialized from SAM file paths and pysam SAM files
  • added basic tests for this new functionality

 - in addition to pysam SAM headers and SAM header dicts, SeqDicts can now be initialized
   from SAM file paths and pysam SAM files
 - added basic tests for this new functionality
@TimD1 TimD1 requested review from nh13 and tfenne as code owners September 4, 2024 15:52
Copy link

codecov bot commented Sep 4, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.45%. Comparing base (c13d3cc) to head (bdf1284).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #183      +/-   ##
==========================================
+ Coverage   89.35%   89.45%   +0.10%     
==========================================
  Files          18       18              
  Lines        2094     2095       +1     
  Branches      464      461       -3     
==========================================
+ Hits         1871     1874       +3     
  Misses        146      146              
+ Partials       77       75       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@TimD1 TimD1 force-pushed the 176_td_add-read-seq-dict branch from 2f19451 to bdf1284 Compare September 4, 2024 17:19
@@ -446,28 +449,41 @@ def to_sam_header(

@staticmethod
@overload
def from_sam(header: pysam.AlignmentHeader) -> "SequenceDictionary": ...
def from_sam(data: Path) -> "SequenceDictionary": ... # pragma: no cover
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion Could you exclude overloaded stubs from coverage checks, instead of adding the no cover pragma to each line?

nedbat/coveragepy#970 (comment)

https://coverage.readthedocs.io/en/latest/config.html

the list of sequences returned by `pysam.AlignmentHeader#to_dict()["SQ"]`."""
if isinstance(header, pysam.AlignmentHeader):
return SequenceDictionary.from_sam(header=header.to_dict()["SQ"])
the list of sequences returned by `pysam.AlignmentHeader.to_dict()["SQ"]`."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue Could you update the docstring to reflect the changes introduced by this PR?

    """
    Create a `SequenceDictionary` from a SAM file or its header.

    Args:
        data: The input may be any of:
            - a path to a SAM file
            - an open `AlignmentFile`
            - the `AlignmentHeader` associated with an `AlignmentFile`
            - the contents of the header's `SQ` fields, as returned by `AlignmentHeader.to_dict()`

    Returns:
        A `SequenceDictionary` mapping reference names to their metadata.


infos: List[SequenceMetadata] = [
SequenceMetadata.from_sam(meta=meta, index=index) for index, meta in enumerate(header)
SequenceMetadata.from_sam(meta=meta, index=index) for index, meta in enumerate(data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue Can you make the above series of if statements an if-elif chain instead, and include a final branch to verify that data is a list of dicts here?

I'd also suggest deferring the return to the end of the chain, e.g.

seq_dict: SequenceDictionary

if isinstance(data, pysam.AlignmentHeader):
    seq_dict = ...
elif isinstance(data, pysam.AlignmentFile):
    seq_dict = ...
elif isinstance(data, Path):
    seq_dict = ...
else:
    # assuming `data` is a `list[dict[str, Any]]`
    try:
        infos = ...
        seq_dict = SequenceDictionary(infos=infos)
    except:
        raise ValueError(f"Could not parse sequence information from data: {data}")

return seq_dict

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion You could probably avoid this file with SamBuilder and a temp file?

@@ -333,7 +332,10 @@ def test_sequence_dictionary_to_and_from_sam() -> None:
header = pysam.AlignmentHeader.from_dict(
header_dict={"HD": {"VN": "1.5"}, "SQ": mapping, "RG": [{"ID": "foo"}]}
)

samfile = Path(__file__).parent / "data" / "sequence.dict"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion it's odd to have a sam file without a .sam extension, if we keep the data file can we change its extension?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a class/module method to read in a .dict file to a SequenceDictionary
3 participants