You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am working with Whole Exome Sequencing (WES) data from 147 trios (441 samples total). I have merged VCF files containing structural variants (SVs) for these samples and now need to calculate population frequencies to identify common and rare variants. Specifically, I want to focus on variants present in only 10% or less of the population. Can I use your pipeline to estimate this frequency and later Rank my variants.
What tools and methods are best suited for calculating SV frequencies in WES data?[tried BCFtools not successful].
What filtering criteria should be applied to ensure high-quality SV calls before frequency calculation?
Thanks in advance,
HP.
The text was updated successfully, but these errors were encountered:
Hi,
Because the data contain trios, I would not recommend estimate AF directly from 441 individuals. Rather, I would use something like SVAFotate to look up in public databases such as gnomAD, 1KGP, or CCDG. I think filter QUAL=PASS is important. May I also ask what types of SVs are there in your WES data? Are they primarily inferred CNV, or DUP/DEL that are coding with breakpoints? Also it would be helpful to know the tool you used to generate the SV calls.
To clarify, I’m using the structural variant caller DYSGU [https://github.com/kcleal/dysgu] and my VCF files contain DUP/DEL/INS/TRA/INV variants with breakpoints. For reference, I’ve attached a test.vcf file to give you a clearer picture of the data I’m handling.
I’ve been trying to estimate population frequency within my cohort of 441 samples to retain the variants common to at least 90% of the population that's why I didn't use SVAFotate to estimate the AF. I opted for this approach instead of calculating allele frequency (AF) because after merging the VCF files, I seem to lose a lot of information, particularly GT fields. You can observe this issue in the attached test file.
Let me know if there’s any additional info you need or if you have suggestions to handle this better. I really appreciate your insights and look forward to your advice!
Hi @jasonbhn
I am working with Whole Exome Sequencing (WES) data from 147 trios (441 samples total). I have merged VCF files containing structural variants (SVs) for these samples and now need to calculate population frequencies to identify common and rare variants. Specifically, I want to focus on variants present in only 10% or less of the population. Can I use your pipeline to estimate this frequency and later Rank my variants.
What tools and methods are best suited for calculating SV frequencies in WES data?[tried BCFtools not successful].
What filtering criteria should be applied to ensure high-quality SV calls before frequency calculation?
Thanks in advance,
HP.
The text was updated successfully, but these errors were encountered: