Generating a genome index is very slow when the genome doesn't contain one of the nucleotides. #6

jiyongkun · 2024-06-11T09:00:52Z

When using meRanGs mkbsidx, the SAindex (k-mer index) generation is taking much more time for C->T or G ->A genome, which, in hindsight, should have been obvious. SAindex contains the positions of all k=14-mers in the suffix array. With one base missing, a lot of k-mers will be mssing, which slows down the calculation significantly.
Please see the STAR issue: alexdobin/STAR#1263

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generating a genome index is very slow when the genome doesn't contain one of the nucleotides. #6

Generating a genome index is very slow when the genome doesn't contain one of the nucleotides. #6

jiyongkun commented Jun 11, 2024 •

edited

Loading

Generating a genome index is very slow when the genome doesn't contain one of the nucleotides. #6

Generating a genome index is very slow when the genome doesn't contain one of the nucleotides. #6

Comments

jiyongkun commented Jun 11, 2024 • edited Loading

jiyongkun commented Jun 11, 2024 •

edited

Loading