How does sourceapp handle tied matches? #2

bglindner · 2024-01-19T23:03:05Z

SourceApp is designed to do competitive read mapping. That is, we should only be counting each read's best alignment if that alignment is above user-specified criteria (e.g., percent identity, etc.). Although not necessarily super common, tied matches occur when a read has multiple best-scoring matches to multiple subjects.

This isn't a huge deal when the subject sequences are simply different contigs (or regions of the same) belonging to the same genome or even different genomes but belonging to the same source category. It's not quite the same when there are instances of tied matches belonging to genomes across different source categories. What should we do about that? Right now, we have --remove-crx as a step in sourceapp_build.py which serves a sort of stopgap for this issue (the idea being, if we remove genomes belonging to the same cluster, then this is less likely to occur).

Either way, depending on the read mapper used, primary alignments are usually just selected at random when there are ties like this. Should we handle this differently, perhaps retaining information on tied matches and creating some sort of error bounds from this (e.g., 2% +/-0.2%)?

The text was updated successfully, but these errors were encountered:

bglindner added documentation Improvements or additions to documentation enhancement New feature or request labels Jan 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How does sourceapp handle tied matches? #2

How does sourceapp handle tied matches? #2

bglindner commented Jan 19, 2024 •

edited

Loading

How does sourceapp handle tied matches? #2

How does sourceapp handle tied matches? #2

Comments

bglindner commented Jan 19, 2024 • edited Loading

bglindner commented Jan 19, 2024 •

edited

Loading