-
Notifications
You must be signed in to change notification settings - Fork 17
Read length matters: identifying the phiStx2 att site
HPA has releases a third WGS sequence of an isolate of a 2011 HUSEC outbreak strain. Sequencing was done on a 454 GS Junior system which typically yields read length of 400-500 nt. This is 4-5fold longer than with either Ion Torrent or Illumina. The initial assembling of the data comprises only 13 contigs with the biggest contig being longer than 2.6 Mb or half the chromosomal size.
With this assembling it is possible to umambiguously identify the location of the stx2 carrying prophage in the chromosome. The stx2 carrying phage is integrated into the wrbA gene, a well known attachment site for stx carrying bacteriophages. The attachment site is located in the first (and largest) contig of the HPA assembling. Due to attachment of the phage the wrbA becomes interrupted.
The phage attachment site within the wrbA gene is not occupied in strain 55989 (CU928145.2).
Preliminary feature table:
wrbA scaffold00001 498032 498591 -1 truncated int_wrbA scaffold00001 498607 499941 -1 xis_wrbA scaffold00001 499970 500269 -1 stxA2 scaffold00001 522514 523473 1 stxB2 scaffold00001 523485 523754 1 B7MQ08 scaffold00001 549237 550503 1 B7MQ09 scaffold00001 550573 558954 1 wrbA scaffold00001 559460 559513 -1 truncated
The site-specific integrase and exisionase genes are located at one end of the prophage.
Two bacteriophagal genes of unknown function (highly similar to Unigene accession B7MQ08,B7MQ09 from ED1a) reside next to the other end of the prophage. Allelic variants of B7MQ08,B7MQ09 are also found at a second site within scaffold00001, thus pointing to another prophage insertion site. B7MQ09 encodes a large protein with 2793 aa, which may be a phage structural protein.
The multisequence alignment of B7MQ09 shows that several contigs from the AFOB and AFOG assemblies matched this large gene only partially. One can easily guess that the assembling process for short reads gets fooled due to the presence of two sligthly different targets.
The approximate size of prophage phiStx2 is:
559460-498591 = 60869 nt
A multisequence alignment of the wrbA gene clearly reveals that the Life-Technologies AFOB00000000 assembling is not reliable at the attachment site. Is contains three contigs matching the wrbA gene, two of them (AFOB01000188.1,AFOB01000143.1) representing the interrupted gene, while the third contig (AFOB01000030.1) shows an uninterrupted gene (though with a distorted reading frame). The third contig must now be said to be an artefact of the assembling process where CU928145.2 (which comprises the uninterupted wrbA gene) was used as a skeleton.
I'm wondering if we will see conventional Sanger sequencing in the next round of this sequencing race.
BGI has solved the problem of assembling small reads by creating an additional set of Illumina paired end reads (spacer sizes 500bp, 2kb, 6kb, see [https://github.com/ehec-outbreak-crowdsourced/BGI-data-analysis/wiki/Assemblies]). In this way the BGI was able to assemble the first complete genome of the 2011 outbreak strain.
As with the HPA assembling, the prophage habouring stx2 is integrated into the wrbA gene.
wrbA TY-2482_chromosome 5166557 5167116 -1 truncated int_wrbA TY-2482_chromosome 5167132 5168466 -1 xis_wrbA TY-2482_chromosome 5168495 5168794 -1 stxA2 TY-2482_chromosome 5191039 5191998 1 stxB2 TY-2482_chromosome 5192010 5192279 1 B7MQ08 TY-2482_chromosome 5217762 5219027 1 B7MQ09 TY-2482_chromosome 5219097 5227478 1 wrbA TY-2482_chromosome 5228139 5228181 -1 truncated