Use new Frameshift VEP plugin instead of Downstream plugin for frameshift peptide sequence predictions #634
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR removes the dependency on the VEP Downstream plugin and instead uses a custom Frameshift plugin which annotates frameshift variants with the full mutated peptide sequence, instead of only the mutated tail downstream of the mutation. This fixes error when using VEP version 100 and above with the Downstream plugin caused by the downstream peptide sequence starting at a non-deterministic position after the mutation, probably due to left-shifting.
The protein length change, which previously came from the Downstream plugin, is now calculated by comparing the length of the mutated frameshift peptide sequence to the wildtype peptide sequence (from the Wildtype plugin).
Test data was mostly updated by reannotating the VCFs with VEP 95 with the new Frameshift plugin. For short test VCFs without frameshift sequences, the VCFs were updates manually to fix the CSQ header and entries, since they would be empty for missense and in-frame indels. Some test were removed (i.e. ones that tested that the error for the missing leading wildtype amino acid worked). For the main pVACseq tests, some transcripts included in the test data had changes to the transcript sequence leading to different mutated frameshift predictions. This, in turn, necessitated updates to the mock test data from IEDB and Blast. As a result the filtered epitope list is significantly different from previous tests but the core functionality that is being tested remains the same.
Closes #596 and #576