-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem when running LMM tutorial #109
Comments
Hi @flashton2003, what I can see so far is a slight difference in the number of input kmers:
versus the tutorial:
I'm unsure that really matters though. I will try to find some time to run again the tutorial on my end and see if I can diagnose this difference. For what it's worth I don't see any mistake on your part, and recent re-runs of pyseer on my own data gave the same results as runs with previous versions. |
I've seen numeric errors before (giving wildly inaccurate p-values) when I
had an older version of numpy / inconsistent conda installation. I also
recall a QQ plot of roughly that shape
If possible, it might be worth running the pyseer step in a clean conda
environment, and/or on a different machine and see if you get the expected
results.
…On Thu, 23 Jan 2020 at 14:51, Marco Galardini ***@***.***> wrote:
Hi @flashton2003 <https://github.com/flashton2003>, what I can see so far
is a slight difference in the number of input kmers:
16630178 loaded variants
1028222 filtered variants
15601956 tested variants
15601956 printed variants
versus the tutorial:
15167239 loaded variants
1042215 filtered variants
14125024 tested variants
14124993 printed variants
I'm unsure that really matters though. I will try to find some time to run
again the tutorial on my end and see if I can diagnose this difference. For
what it's worth I don't see any mistake on your part, and recent re-runs of
pyseer on my own data gave the same results as runs with previous versions.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#109?email_source=notifications&email_token=ABQJ3PKZKIEM2JMF73A377DQ7GVGFA5CNFSM4KKQ2SJKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJXTTGQ#issuecomment-577714586>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABQJ3PJG4X2LWL4FFOVZMMTQ7GVGFANCNFSM4KKQ2SJA>
.
|
In terms of the environment etc, sorry should have mentioned, it's running in a fresh conda env. PySeer v1.3.4. Other packages installed in the env
|
@flashton2003 great, thanks for the info. I'm away for the next week, but will try and replicate this as soon as I get a chance Must be an issue with some of the numerical packages, but hopefully we can work it out and stop it in future |
Hi @flashton2003, could you clone the repository, activate your environment and try to run the unit tests? That could help us diagnose the problem or its source a bit better. The tests can be run like this, from the root of the repo:
I will see if I can reproduce this behavior by mimicking your environment. |
Hi @mgalardini, ok I've done that. First thing, not quite sure if this is the intended output of pytest command?
Then, all the tests passed, this was the output
|
The elastic net tests (31-34) might have a problem, and a couple of others have warnings, but the LMM tests all passed (though as you point out the unit tests didn't run, I think @mgalardini would have more idea what went wrong there). Would you be able to run the original command again with this clone using |
I've re-run with |
Hi @flashton2003, maybe pytest is not the best idea to make sure you are running the tests correctly; from the repository's root try |
Hi @mgalardini and @johnlees, Thanks for all the insights shared here, I'm also having trouble with inconsistent results running the tutorial, see below: $ pyseer --phenotypes resistances.pheno --vcf snps.vcf.gz --load-m mash_mds.pkl --lineage --print-samples > penicillin_SNPs.txt $ pyseer --phenotypes resistances.pheno --vcf snps.vcf.gz --load-m mash_mds.pkl --lineage --print-samples > penicillin_SNPs.txt After updating numpy:No observations of 26_2219533_C in selected samples real 25m48.461s $ time pyseer --phenotypes resistances.pheno --vcf snps.vcf.gz --load-m mash_mds.pkl --min-af 0.02 --max-af 0.98 > penicillin_SNPs.txt real 16m57.107s time python $(which phylogeny_distance.py) --lmm core_genome_aln.tree > phylogeny_K.tsv orls -1 assemblies/*.contigs_velvet.fa | xargs -n 1 basename | cut -d '.' -f1 > samples.txt time similarity_pyseer --vcf snps.vcf.gz samples.txt > gg.snps.txt real 4m37.942s $ time pyseer --lmm --phenotypes resistances.pheno --kmers fsm_kmers.txt.gz --similarity phylogeny_K.tsv --output-patterns kmer_patterns.txt --cpu 2 > penicillin_kmers.txt real 401m47.563s $ time pyseer --lmm --phenotypes resistances.pheno --kmers fsm_kmers.txt.gz --similarity gg.snps.txt --output-patterns kmer_patterns2.txt --cpu 4 > penicillin_kmers2.txt real 119m38.755s I also cant seem to get unitig-counter to install on mac: $ conda install unitig-counter unitig-caller PackagesNotFoundError: The following packages are not available from current channels:
Current channels:
To search for alternate channels that may provide the conda package you're
and use the search bar at the top of the page. I would appreciate any ideas on how to work around these. |
|
Thanks John, turns out that the sample names (ids) were in decimal point format with some ending with zeros. The trailing zeros would be dropped in some cases creating inconsistency. Adding a character at the end of the names so they are always treated as text helped. |
Hello,
We are running through the tutorial for the linear mixed model and are having some issues.
We downloaded the data from FigShare, ran fsm-lite (can't see how to get the fsm-lite version, but was fresh conda install).
fsm-lite -l ../2020.01.21/fsm_file_list.txt -s 6 -S 610 -v -t 2020.01.21.fsm_kmers.tmp | gzip -c - > ../2020.01.21/2020.01.21.fsm_kmers.txt.gz
There were 16630178 kmers.
Generated the phylogeny distance from the tree in the figshare:
python scripts/phylogeny_distance.py --lmm core_genome_aln.tree > phylogeny_K.tsv
Then, ran pyseer:
pyseer --lmm --phenotypes ../resistances.pheno --kmers 2020.01.21.fsm_kmers.txt.gz --similarity phylogeny_K.tsv --output-patterns kmer_patterns.txt --cpu 4 > penicillin_kmers.txt
Which reported this
Counted the patterns to get the significance threshold, which was 5.52E-08. Then filtered for significant k-mers
cat <(head -1 penicillin_kmers.txt) <(awk '$4<5.52E-08 {print $0}' penicillin_kmers.txt) > significant_kmers.txt
Which reports no significant k-mers.
Also, the qq_plot looks like this:
We've run through it all a couple of times in case we made a blunder somewhere, can't figure out where we've gone wrong.
Any help much appreciated!
The text was updated successfully, but these errors were encountered: