You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
SourceApp is designed to do competitive read mapping. That is, we should only be counting each read's best alignment if that alignment is above user-specified criteria (e.g., percent identity, etc.). Although not necessarily super common, tied matches occur when a read has multiple best-scoring matches to multiple subjects.
This isn't a huge deal when the subject sequences are simply different contigs (or regions of the same) belonging to the same genome or even different genomes but belonging to the same source category. It's not quite the same when there are instances of tied matches belonging to genomes across different source categories. What should we do about that? Right now, we have --remove-crx as a step in sourceapp_build.py which serves a sort of stopgap for this issue (the idea being, if we remove genomes belonging to the same cluster, then this is less likely to occur).
Either way, depending on the read mapper used, primary alignments are usually just selected at random when there are ties like this. Should we handle this differently, perhaps retaining information on tied matches and creating some sort of error bounds from this (e.g., 2% +/-0.2%)?
The text was updated successfully, but these errors were encountered:
SourceApp is designed to do competitive read mapping. That is, we should only be counting each read's best alignment if that alignment is above user-specified criteria (e.g., percent identity, etc.). Although not necessarily super common, tied matches occur when a read has multiple best-scoring matches to multiple subjects.
This isn't a huge deal when the subject sequences are simply different contigs (or regions of the same) belonging to the same genome or even different genomes but belonging to the same source category. It's not quite the same when there are instances of tied matches belonging to genomes across different source categories. What should we do about that? Right now, we have
--remove-crx
as a step insourceapp_build.py
which serves a sort of stopgap for this issue (the idea being, if we remove genomes belonging to the same cluster, then this is less likely to occur).Either way, depending on the read mapper used, primary alignments are usually just selected at random when there are ties like this. Should we handle this differently, perhaps retaining information on tied matches and creating some sort of error bounds from this (e.g., 2% +/-0.2%)?
The text was updated successfully, but these errors were encountered: