You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First of all, i downloaded Araport11 from www.arabidopsis.org
Then I used cgat-apps to generate a gtf file out of the file i downloaded $ cat Araport11_GFF3_genes_transposons.201606.gff | cgat gff32gtf > Araport.gtf
and had no problems.
I saw that input gtf files must be sorted. so i sorted Araport.gtf: cat Araport.gtf | cgat gtf2gtf > Araport_sort.gtf --method=sort --sort-order=gene+transcript
and then, I filtered the sorted file by longest_transcript: cat Araport_sort.gtf | cgat gtf2gtf > Araport_longest.gtf --method=filter --filter-method=longest-transcript
the problem is that gtf2gtf is not always selecting the transcript with the bigger length. On gene AT1G01030, for example, the sum of the length of all exons in transcript AT1G01030.1 is bigger than that of AT1G01030.2, but on Araport_longest.gtf the AT1G01030.2 transcript is selected.
Here is the AT1G01030 gene in the original Araport_sort.gtf:
Chr1 Araport11 three_prime_UTR 11649 11863 . - . gene_id "AT1G01030"; transcript_id "AT1G01030.1"; ID "AT1G01030:three_prime_UTR:1"; Parent "AT1G01030.2,AT1G01030.1"; Name "NGA3:three_prime_UTR:1";
Chr1 Araport11 exon 11649 13173 . - . gene_id "AT1G01030"; transcript_id "AT1G01030.1"; ID "AT1G01030:exon:3"; Parent "AT1G01030.1"; Name "NGA3:exon:3";
Chr1 Araport11 CDS 11864 12940 . - 0 gene_id "AT1G01030"; transcript_id "AT1G01030.1"; ID "AT1G01030:CDS:2"; Parent "AT1G01030.1"; Name "NGA3:CDS:2";
Chr1 Araport11 five_prime_UTR 12941 13173 . - . gene_id "AT1G01030"; transcript_id "AT1G01030.1"; ID "AT1G01030:five_prime_UTR:2"; Parent "AT1G01030.2,AT1G01030.1"; Name "NGA3:five_prime_UTR:2";
Chr1 Araport11 exon 13335 13714 . - . gene_id "AT1G01030"; transcript_id "AT1G01030.1"; ID "AT1G01030:exon:1"; Parent "AT1G01030.2,AT1G01030.1"; Name "NGA3:exon:1";
Chr1 Araport11 five_prime_UTR 13335 13714 . - . gene_id "AT1G01030"; transcript_id "AT1G01030.1"; ID "AT1G01030:five_prime_UTR:1"; Parent "AT1G01030.2,AT1G01030.1"; Name "NGA3:five_prime_UTR:1";
Chr1 Araport11 exon 11649 12354 . - . gene_id "AT1G01030"; transcript_id "AT1G01030.2"; ID "AT1G01030:exon:4"; Parent "AT1G01030.2"; Name "NGA3:exon:4";
Chr1 Araport11 three_prime_UTR 11649 11863 . - . gene_id "AT1G01030"; transcript_id "AT1G01030.2"; ID "AT1G01030:three_prime_UTR:1"; Parent "AT1G01030.2,AT1G01030.1"; Name "NGA3:three_prime_UTR:1";
Chr1 Araport11 CDS 11864 12354 . - 2 gene_id "AT1G01030"; transcript_id "AT1G01030.2"; ID "AT1G01030:CDS:3"; Parent "AT1G01030.2"; Name "NGA3:CDS:3";
Chr1 Araport11 CDS 12424 12940 . - 0 gene_id "AT1G01030"; transcript_id "AT1G01030.2"; ID "AT1G01030:CDS:1"; Parent "AT1G01030.2"; Name "NGA3:CDS:1";
Chr1 Araport11 exon 12424 13173 . - . gene_id "AT1G01030"; transcript_id "AT1G01030.2"; ID "AT1G01030:exon:2"; Parent "AT1G01030.2"; Name "NGA3:exon:2";
Chr1 Araport11 five_prime_UTR 12941 13173 . - . gene_id "AT1G01030"; transcript_id "AT1G01030.2"; ID "AT1G01030:five_prime_UTR:2"; Parent "AT1G01030.2,AT1G01030.1"; Name "NGA3:five_prime_UTR:2";
Chr1 Araport11 exon 13335 13714 . - . gene_id "AT1G01030"; transcript_id "AT1G01030.2"; ID "AT1G01030:exon:1"; Parent "AT1G01030.2,AT1G01030.1"; Name "NGA3:exon:1";
Chr1 Araport11 five_prime_UTR 13335 13714 . - . gene_id "AT1G01030"; transcript_id "AT1G01030.2"; ID "AT1G01030:five_prime_UTR:1"; Parent "AT1G01030.2,AT1G01030.1"; Name "NGA3:five_prime_UTR:1";
And here is the same gene on the Araport_longest.gtf:
Chr1 Araport11 exon 11649 12354 . - . gene_id "AT1G01030"; transcript_id "AT1G01030.2"; ID "AT1G01030:exon:4"; Parent "AT1G01030.2"; Name "NGA3:exon:4";
Chr1 Araport11 CDS 11864 12354 . - 2 gene_id "AT1G01030"; transcript_id "AT1G01030.2"; ID "AT1G01030:CDS:3"; Parent "AT1G01030.2"; Name "NGA3:CDS:3";
Chr1 Araport11 CDS 12424 12940 . - 0 gene_id "AT1G01030"; transcript_id "AT1G01030.2"; ID "AT1G01030:CDS:1"; Parent "AT1G01030.2"; Name "NGA3:CDS:1";
Chr1 Araport11 exon 12424 13173 . - . gene_id "AT1G01030"; transcript_id "AT1G01030.2"; ID "AT1G01030:exon:2"; Parent "AT1G01030.2"; Name "NGA3:exon:2";
Chr1 Araport11 exon 13335 13714 . - . gene_id "AT1G01030"; transcript_id "AT1G01030.2"; ID "AT1G01030:exon:1"; Parent "AT1G01030.2,AT1G01030.1"; Name "NGA3:exon:1";
The text was updated successfully, but these errors were encountered:
First of all, i downloaded Araport11 from www.arabidopsis.org
Then I used cgat-apps to generate a gtf file out of the file i downloaded
$ cat Araport11_GFF3_genes_transposons.201606.gff | cgat gff32gtf > Araport.gtf
and had no problems.
I saw that input gtf files must be sorted. so i sorted Araport.gtf:
cat Araport.gtf | cgat gtf2gtf > Araport_sort.gtf --method=sort --sort-order=gene+transcript
and then, I filtered the sorted file by longest_transcript:
cat Araport_sort.gtf | cgat gtf2gtf > Araport_longest.gtf --method=filter --filter-method=longest-transcript
the problem is that gtf2gtf is not always selecting the transcript with the bigger length. On gene AT1G01030, for example, the sum of the length of all exons in transcript AT1G01030.1 is bigger than that of AT1G01030.2, but on Araport_longest.gtf the AT1G01030.2 transcript is selected.
Here is the AT1G01030 gene in the original Araport_sort.gtf:
And here is the same gene on the Araport_longest.gtf:
The text was updated successfully, but these errors were encountered: