produce the --replicons input content based on the flye assembly_info.txt #256

splaisan · 2023-10-31T14:55:04Z

The tool works great using docker but the naming is not very nice when starting from a flye assembly where contigs are named randomly.

I found the use of --replicons great in that regard but it requires to create a tsv file upfront which is not easy when looping through many assemblies in an integrated pipeline (dozens of assemblies in a row).

Would it be possible to internally create the --replicons input file based on the content of the flye assembly_info.txt file which contains columns #seq_name length cov. circ. and taking the largest contig as chromosome and the others as plasmids?

That would create genbank files which are closer to submission quality

My fix now is to re-run the bakta after all genomes are assembled and after building the --replicons input file by hand

thanks

The text was updated successfully, but these errors were encountered:

oschwengers · 2023-11-10T15:58:53Z

Hi @splaisan ,
thanks for reaching out and asking. Indeed, it would be very nice if Bakta were able to instantly use circularity information from Flye. Actually, this is already possible for Unicycler assemblies from which Bakta extracts circularity information from the Fasta headers. So, whenever an assembled sequence has a circular=true tag in its Fasta header description, Bakta will use that information in the annotation process and output files.

I totally see your point here and I'd like very much to address this. However, I'm a bit reluctant to address this by Flye-specific paramters as there are other assemblers which would soon mess up Bakta's usage. I guess, the better approach would be to ask the Fyle developers to put the required information into the Fasta header, so that Bakta can use the apprach that is already implemented. In addition, this would have the nice bonus, that circularity information on sequences produced by Flye would be stored along with the sequences themselves, instead of additional txt files w/o standardized format. To this end, I've opened an issue in the Flye repo: mikolmogorov/Flye#647
Maybe, you would like to endorse this?

splaisan · 2023-11-10T16:49:17Z

Can you please give an example of a fasta header that would work.
It is really easy to add a script in between to adapt the flye headers and make them compatible, when i have this done I will share it (bash / bioawk most likely) in the issue page.
Thanks a lot for your info

oschwengers · 2023-11-10T17:02:46Z

Sure. This is a recent example from a Unicycler assembly:
>1 length=4635742 depth=1.00x circular=true

In this case, Bakta is able to extract this information and mark this sequence as complete and circular.

splaisan added the enhancement New feature or request label Oct 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

produce the --replicons input content based on the flye assembly_info.txt #256

produce the --replicons input content based on the flye assembly_info.txt #256

splaisan commented Oct 31, 2023

oschwengers commented Nov 10, 2023

splaisan commented Nov 10, 2023

oschwengers commented Nov 10, 2023

produce the --replicons input content based on the flye assembly_info.txt #256

produce the --replicons input content based on the flye assembly_info.txt #256

Comments

splaisan commented Oct 31, 2023

oschwengers commented Nov 10, 2023

splaisan commented Nov 10, 2023

oschwengers commented Nov 10, 2023