-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problems of generating splice junctions files from regtools #6
Comments
Hi santata, I'll leave this issue open, I've been thinking about alternative pipelines, for times when Regtools doesn't seem to work. Will keep you updated. |
Dear CraigIDent, Thank you for your suggestions. I identified the cause of the issue. I was using the Partek Flow platform to align my RNA-seq reads with the HISAT2 aligner. The default setting in Partek Flow is unstranded, so the output BAM files do not contain the "XS" attributes. There are two options in HISAT2 that generate "XS" attributes in the output BAM files: the --dta option and the --rna-strandness option. I tried both options, and while both produced BAM files with "XS" attributes, the --dta option resulted in significantly fewer alignments. This aligns with the HISAT2 manual, which notes that the --dta option is tailored for transcript assemblers, such as StringTie. With this option, HISAT2 requires longer anchor lengths for de novo discovery of splice sites, which leads to fewer alignments. In my case, the --rna-strandness option worked well for both versions of Regtools (v1.0.0 and v0.0.5). For Regtools v1.0.0, the appropriate command is:
whereas, for v0.0.5, the command is:
Both versions of Regtools produced the same results on my test BAM file. I did notice that the Regtools junction extract tool (in both versions v0.0.5 and v1.0.0) counts overlapping reads as double support for a junction, which has been discussed in this post: griffithlab/regtools#187. According to the developer's response in that thread, this issue has been addressed in the latest update (v1.1.0). However, I have been unable to find a way to download the latest update of Regtools. Thank you again for your help. Best regards, |
Dear CraigIDent, While Regtools (v1.0.0) seems to work with my test BAM file generated with HISAT2 from a test FASTQ file containing only 10 reads, it malfunctions when processing the actual BAM file from the full FASTQ dataset. The output .bed file contains a total of 606,976 junctions, with 223,254 marked with "-", 231,447 marked with "+", and 152,275 marked with "?". I am encountering the same issue as I previously posted, so I'll leave this open for now. Best regards, |
I have RNA-seq data from two conditions (control and splicing factor knockout) and would like to use the SpliSER tool to analyze differential splice site usage between these conditions. To do so, I need to generate splice junction files in .bed format, which will serve as part of the input for the SpliSER tool.
I'm using regtools 0.5.0 for data from HISAT2, and I know the strandedness of my RNA-seq data is fr-firststrand/RF. According to the regtools options, I tried using
regtools junctions extract -s 1 indexed_alignments.bam
. In the 6th column of the output .bed file, I see some junctions labeled with "+", some with "-", and some with "?". I observe an apparently random alternation of + and - in the 6th column of the bed file.I also tried using
regtools junctions extract -s 0 indexed_alignments.bam
, since someone from the forum suggested this option will make regtools use the -XS tag from the bam files of HISAT2 alignment, but I see similar patterns in the 6th column of the output.bed file, with some junctions labeled with "+", some with "-", and some with "?". I'm unsure what the correct approach is in this case.I appreciate any of your input.
The text was updated successfully, but these errors were encountered: