-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question]: How to improve number of BINS/MAGS from soil ? #171
Comments
Hi Edward, Based on the qfiltering plot it seems like your samples have a decent sequencing depth (over 10 Gbp/sample), although increasing this in future experiments may help improve recovery of genomes, especially for complex samples. Base quality also looks good. Regarding your assemblies, it looks like you have around 40 Mbp per sample. Unfortunately the average contig length is quite low at about 1600 bp, although this is not uncommon for complex/soil samples. Considering that the smallest binnable contig is 1000bp, you are likely losing a lot of sequences at the binning stage due to poor quality assemblies. One metric that may be helpful is to concatenate your MAGs, then map your short reads to the MAG concatenation to get an idea of what % of reads from the samples end up in MAGs. Similarly, you can map your short reads to the assembly to get an idea of the % of reads that end up getting assembled. To get a general idea of what values you may expect, have a look at supplementary figure 4 from the metaGEM paper. You may want to try playing around with the assembly presets/parameters to see if you can get a higher average contig length and/or assembly size. Considering an average bacterial size of about 4Mbp, and the fact that you are losing sequences due to small contig size, I think your 4-5 genomes is reasonable given the samples you have. If those two samples are from the same biological material, you could indeed coassemble them to improve the quality of assembly. There is currently no coassembly rule in the main Snakefile, but you could have a look at this example code where I coassembled some samples, also have a look at the megahit repo/wiki. You just have to list all the R1 samples and R2 samples in the megahit call: metaGEM/workflow/rules/Snakefile_experimental.smk.py Lines 21 to 65 in 8609ad6
Finally, you may want to bin your MAGs using contig coverage across more samples which has been shown to improve results. Hope this helps! |
Hi.
I am nearing the MAG classification step for samples extracted from soil around plant roots, but the number of MAGS seems to only be around 4 or 5? The sequencing was completed on a NovaSeq, so not ideal as it is short read, but we hoped we would still get a better coverage. Is there any way to adjust the pipeline to allow a larger error to get more MAGS assembled? Also we do have replicate samples, and I saw from another issue report than the samples could maybe be merged together? I've attached the visualisation files as well to give more data. These two samples are also the same, just from different sequencing runs.
assemblyVis.pdf
binningVis.pdf
qfilterVis.pdf
The text was updated successfully, but these errors were encountered: