Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ME_filter1.py is not working #11

Open
s-weissbach opened this issue Aug 31, 2020 · 3 comments
Open

ME_filter1.py is not working #11

s-weissbach opened this issue Aug 31, 2020 · 3 comments

Comments

@s-weissbach
Copy link

Hello,
I try to run MicroExonator on paired-end RNA-seq data. I can't resolve the issue with ME_filter1.py.
I did everything as descriped in the manual, my config.yaml file:
Genome_fasta : /data/resources/mouse/genome/GRCm38.p6.genome.fa Gene_anontation_bed12 : /data/resources/mouse/genome/mm10_UCSC_knownGene.bed GT_AG_U2_5 : /data/MicroExonator/PWM/Mouse/mm10_GT_AG_U2_5.good.matrix GT_AG_U2_3 : /data/MicroExonator/PWM/Mouse/mm10_GT_AG_U2_3.good.matrix conservation_bigwig : /data/resources/mouse/PhastCons/mm10.60way.phastCons.bw working_directory : /data/MicroExonator ME_len : 30 Optimize_hard_drive : T min_number_files_detected : 3 paired_samples : /data/MicroExonator/paired_samples.txt
Then I started with
snakemake -s MicroExonator.skm --use-conda -k -j 32
Which led to more or less the same error for every single input file:
Error in rule Round1_filter: jobid: 159 RuleException: CalledProcessError in line 56 of /data/MicroExonator/rules/Round1_post_processing.skm: Command 'source activate /data/MicroExonator/.snakemake/conda/f2d123d5; set -euo pipefail; python3 src/ME_filter1.py /data/resources/mouse/genome/GRCm38.p6.genome.fa Round1/Scnn_3_R2.sam.row_ME Round1/Scnn_3_R2.sam.row_ME.Genome.Aligned.out.sam data/GT_AG_U2_5.pwm data/GT_AG_U2_3.pwm /data/resources/mouse/PhastCons/mm10.60way.phastCons.bw 30 > Round1/Scnn_3_R2.sam.row_ME.filter1 ' returned non-zero exit status 1. File "/data/MicroExonator/rules/Round1_post_processing.skm", line 56, in __rule_Round1_filter File "/home/stephan/anaconda3/envs/snakemake_env/lib/python3.6/concurrent/futures/thread.py", line 56, in run output: Round1/Scnn_1_R2.sam.row_ME.filter1 Removing output files of failed job Round1_filter since they might be corrupted: Round1/Scnn_3_R2.sam.row_ME.filter1 conda-env: /data/MicroExonator/.snakemake/conda/f2d123d5 Job failed, going on with independent jobs.

I figured out, that I had not installed all dependencies since they weren't given in the installation. I installed them manually following the import statetments in ME_filter1.py. Since Biopython is not longer supported for Python2, I switched everything to Python3 and removed the not working print statements from the script.
Now, I can get a few lines of output, by manually calling ME_filter1.py, but at some point the script will crash with:
python3 src/ME_filter1.py /data/resources/mouse/genome/GRCm38.p6.genome.fa Round1/SST_1_R1.sam.row_ME Round1/SST_1_R1.sam.row_ME.Genome.Aligned.out.sam data/GT_AG_U2_5.pwm data/GT_AG_U2_3.pwm /data/resources/mouse/PhastCons/mm10.60way.phastCons.bw 30
Working output:
D00535:22:CBLNNANXX:2:2202:18520:19601 CGCCAGCCAGAGCAGGCCCGCCGGCCCCTCAGTGTTGCCACAGACAACATGATGCTGGAGTTTTACAAGAAGGATGGCCTTAGGAAAATCCAAAGCATGGG GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGBBB@? chr11:65012098-65021940|uc007jku.1|100_100|76M18I7M 101 chr11|-|65012906|6S18M9016N77M 95 True 18 CCTTAGGAAAATCCAAAG 1 78.9604235156872 1.0 chr11_-_65012906_65012924 78.9604235156872 1.0 [...]
and then the error:
Traceback (most recent call last): File "src/ME_filter1.py", line 372, in <module> main(sys.argv[2], sys.argv[3], sys.argv[4], sys.argv[5], sys.argv[6], int(sys.argv[7])) File "src/ME_filter1.py", line 364, in main print(read, seq, qual, tag_alingment, t_score, genome_alingment, g_score, same_ME, len(DR_corrected_micro_exon_seq_found), DR_corrected_micro_exon_seq_found, len(micro_exons), max(U2_scores), max(TOTAL_mean_conservation), micro_exons_coords, ",".join(map(str, U2_scores)), ",".join(map(str, TOTAL_mean_conservation))) TypeError: '>' not supported between instances of 'NoneType' and 'float'
How can I fix this? Can you please provide a full list of needed dependencies?

@geparada
Copy link
Collaborator

geparada commented Sep 6, 2020

Hello,

Dependences are solved automatically by snakemake when you run it with --use-conda, this is why I am not providing the list of dependencies in the documentation. With this flag, snakemake creates the conda environments dynamically from the YAML files that are located at MicroExonator/envs/. In the particular case of ME_filter1.py uses pybedtools.yaml, take a look at this file and you will know which are the dependencies.

Looking at:

'source activate /data/MicroExonator/.snakemake/conda/f2d123d5; set -euo pipefail; python3 src/ME_filter1.py /data/resources/mouse/genome/GRCm38.p6.genome.fa Round1/Scnn_3_R2.sam.row_ME Round1/Scnn_3_R2.sam.row_ME.Genome.Aligned.out.sam data/GT_AG_U2_5.pwm data/GT_AG_U2_3.pwm /data/resources/mouse/PhastCons/mm10.60way.phastCons.bw 30 > Round1/Scnn_3_R2.sam.row_ME.filter1 '

I can see that snakemake successfully created the environment from pybedtools.yaml, you can try to activate this environment by using:

conda activate /data/MicroExonator/.snakemake/conda/f2d123d5

This should have all the dependencies installed.

What really caught my attention is that then snakemake tries to run the script with python3:

python3 src/ME_filter1.py /data/resources/mouse/genome/GRCm38.p6.genome.fa Round1/Scnn_3_R2.sam.row_ME Round1/Scnn_3_R2.sam.row_ME.Genome.Aligned.out.sam data/GT_AG_U2_5.pwm data/GT_AG_U2_3.pwm /data/resources/mouse/PhastCons/mm10.60way.phastCons.bw 30 > Round1/Scnn_3_R2.sam.row_ME.filter1

This should be run with python2, I am not sure why in your run python2 is not correctly called here. I will update conda in my machine and see if I also get this error. If this keeps causing problems, I can update all the code to python3, but it will take me a while. Could you perform a snakemake dry-run with -np and check if the command is generated for this step is using python3 instead of python2?

Thanks for reporting this,
Guillermo

@s-weissbach
Copy link
Author

s-weissbach commented Sep 9, 2020

Hey,

thanks for the reply. I just reinstalled everything, removed Anaconda which was also installed, but I still get the following error messages:
Error in rule hisat2_Genome_index: rule download_fastq: input: download/VIP_W4_R2.download.sh output: FASTQ/VIP_W4_R2.fastq jobid: 800 wildcards: sample=VIP_W4_R2 priority: -10 resources: get_data=1 /usr/bin/bash: activate: No such file or directory jobid: 591
and later:
/usr/bin/bash: activate: No such file or directory RuleException: CalledProcessError in line 25 of /data/MicroExonator/rules/Round1_post_processing.skm: Command 'source activate /data/MicroExonator/.snakemake/conda/95f9e5c7; set -euo pipefail; hisat2-build /data/resources/mouse/genome/GRCm38.p6.genome.fa data/Genome ' returned non-zero exit status 127. File "/data/MicroExonator/rules/Round1_post_processing.skm", line 25, in __rule_hisat2_Genome_index File "/home/stephan/programs/miniconda3/envs/snakemake_env/lib/python3.6/concurrent/futures/thread.py", line 56, in run RuleException: CalledProcessError in line 31 of /data/MicroExonator/rules/Round2_post_processing.skm: Command 'source activate /data/MicroExonator/.snakemake/conda/95f9e5c7; set -euo pipefail; bowtie-build data/Genome data/Genome ' returned non-zero exit status 127. File "/data/MicroExonator/rules/Round2_post_processing.skm", line 31, in __rule_bowtie_Genome_index File "/home/stephan/programs/miniconda3/envs/snakemake_env/lib/python3.6/concurrent/futures/thread.py", line 56, in run

I used the same config file as before and ran the command:
snakemake -s MicroExonator.skm --use-conda -k -j 32

Everything is installed according to the provided manual. When I run the test command by adding the -np flag everything works without an error. Is there a way to fix this?

Best,
Stephan

@geparada
Copy link
Collaborator

Sorry for the delay,

I think this is a bug with the Optimize_hard_drive : T feature. Now that we finally published MicroExonator on Genome Biology, I am going to be addressing the issues more actively. Please let me know if switching this to Optimize_hard_drive : F solves the issue so far (delete this line may also work).

I will close this issue once I manage to fix Optimize_hard_drive : T. I have now made a lot of changes to optimise the disk usage space in other ways, so this feature is not that relevant anymore, but I will still try to fix this soon.

Best,
Guillermo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants