All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog.
- Added ability to parse second upstream / downstream region in
IlluminaBarcodeParser
by addingupstream2
anddownstream2
parameters. Also modifiedIlluminaBarcodeParser
so that reads will only be parsed if they are long enough to fully cover the region containing the barcodes and specified upstream / downstream sequences. Based on docs, this is how it was supposed to function before but did not. Additionally, this adds another row ("reads too short") to the fates from the barcode parser, as well as theouter_flank_fates
option to report just failing the additional upstream and downstream regions. - Change default color of heatmaps made by
CodonVariantTable
due to current one being obsolete. - Remove obsolete
guide=False
from someplotnine
plots in examples / tests (this was removed inplotnine
version 0.13).
- Remove use of deprecated
scipy
functions likeflip
to usenumpy
alternatives instead (fixes [this issue](#86)). - Re-format code with latest version of
black
. - Lint with
ruff
rather thanflake8
- Add
pyarrow
as dependency as required bypandas
. - Tweaks to work with new versions of
pandas
andplotnine
- Test with GitHub Actions rather than Travis CI
- Added
primary_target_only
optionprob_escape
.
- Use
altair
version 5.0.0rc1
- Added
**kwargs
tofit_models
to propagate arbitrary model fitting options.
- Fixed test failures caused by deprecations and output changes.
- Add
CodonVariantTable.prob_escape
to calculate escape probabilities from absolute standard, and notebook illustrating its use.
- Gaps (
-
) are now valid characters
- formatted code with
black
and added to Travis tests.
- Increased width of docs
- Added
inverse_simpson_index
- Added
allowed_aa_muts
parameter tosimulate_CodonVariantTable
.
- Require Python >= 3.8
- Moved BinaryMap to its own package at https://github.com/jbloomlab/binarymap
- Require Python >= 3.7
- Increase
pandas
version requirement - Fix plotting bugs with
plotnine
0.8.0 by remove unused categories in Categoricals
- Fixed bug in
extra_cols
toCodonVariantTable.from_variant_count_df
implementation.
CodonVariantTable.from_variant_count_df
acceptsextra_cols
parameter.
- Fixed datatype for
maxpoints
default inbarcodes.rarefyBarcodes
- Discourage and stop testing multiple latent phenotypes.
- Remove from docs and stop testing
predict_variants.ipynb
as this doesn't seem to be a common use case.
- Updated compilation arguments for Windows.
- Pass Travis tests.
pdb_utils
module withreassign_b_factor
function.
- Classify amino-acid mutations as single-nucleotide accessible: added
constants.SINGLE_NT_AA_MUTS
andutils.single_nt_accessible
.
- Made compatible with
biopython
1.78 by fixing import ofambiguous_dna_values
to be fromBio.Data.IUPACData
.
- Unpin
plotnine
now that this bug fixed.
- Only test on Python 3.7.
- Bug fix in
filter_by_subs_observed
.
CodonVariantTable.escape_scores
now computes score typefrac_escape
.- Added
filter_by_subs_observed
.
CodonVariantTable.escape_scores
now requires specification of score type, and implements a new score type of log fraction escape. The output of this method is also slightly changed.
- Bug in calculation of variance in
CodonVariantTable.escape_scores
.
- Fixed bug in
CodonVariantTable.escape_scores
that sometimes gives null escape scores.
- Added
CodonVariantTable.escape_scores
- Added
CodonVariantTable.add_frac_counts
- Added
CodonVariantTable.plotCountsPerVariant
CodonVariantTable.classifyVariants
requires instructions on how to handle non-primary targets.
- Added capability of having other "reference" targets in a
CodonVariantTable
.
simulate.rand_seq
generates unique sequences.
plotCumultMutCoverage
now has y-axis that extends from 0 to 1.
- In
CodonVariantTable
plotting, by default do not label facets for library when just one library, and addone_lib_facet
parameter to plotting functions. - Made compatible with
pandas
>= 1.0
- Show estimates data frame for
bottlenecks.estimateBottleneck
doctest. - Remove use of deprecated
scipy.array
fornumpy.array
.
- The ability to fit multiple latent phenotypes in the global epistasis models. This adds the
n_latent_phenotypes
flag toAbstractEpistasis
models, and changes calls to certain methods / properties of that abstract model class and its concrete subclasses. - The concept of "bottleneck" likelihoods in global epistasis models, implemented in
BottleneckLikelihood
. - The
bottlenecks
module to estimate bottlenecks. - Added
AbstractEpistasis.aic
property. - Added
globalepistasis.fit_models
- Added
MultiLatentSigmoidPhenotypeSimulator
. - An equals (
__eq__
) comparison operation toBinaryMap
. - Added
n_pre
andn_post
attributes toBinaryMap
. This changes the initialization to add new parameters,n_pre_col
,n_post_col
, andcols_optional
.
BinaryMap
objects can now be deep copied (they don't have a compiled regex as attributed).
- The
expand
option toBinaryMap
to have maps encode all possible characters at each site.
- Fixed bug in
AbstractEpistasis.preferences
withreturnformat
of 'tidy'. Previously the wildtype was set incorrectly for missing values.
- The new
AbstractEpistasis.single_mut_effects
method. - Options
returnformat
andstringency_param
toAbstractEpistasis.preferences
andutils.scores_to_prefs
.
AbstractEpistasis.preferences
andutils.scores_to_prefs
return site as integer.
- Errors related to using
pandas.query
fornan
values. Not sure of the cause, but the errors are fixed now.
- Eliminated the default log base for conversion of scores / phenotypes. This is because base 2 gave excessively flat preferences, and the choice of a base is something that the user should need to think about. Added explanation about the consequences of this choice to docs and examples.
- The preferenes returned by
scores_to_prefs
andAbstractEpistasis.preferences
are now naturally sorted by site.
- The new
AbstractEpistasis.preferences
method gets amino-acid preferences from phenotypes. - Added
utils.scores_to_prefs
.
- The
isplines
module now uses a simple dict-implemented cache rather thanmethodtools.lru_cache
. This fixes excess memory usage and allows objects to be pickled. AbstractEpistasis
internally clears the cache via__getstate__
to reduce size of pickled objects. This avoids pickled models being huge. Also added theclearcache
option toAbstractEpistasis.fit
to serve a similar purpose of memory savings.
- Added additional forms of likelihood function to the global epistasis models. This involves substantial re-factoring the epistasis models in
globalepistasis
. In particular, theMonotonicSplineEpistasis
andNoEpistasis
classes no longer are fully concrete subclasses ofAbstractEpistasis
. Instead, there are also likelihood calculation subclasses (GaussianLikelihood
andCauchyLikelihood
), and the concrete subclasses inherit from both an epistasis function and likelihood calculation subclass. So for instance, what was previouslyMonotonicSplineEpistasis
(with Gaussian likelihood assumed) is nowMonotonicSplineEpistasisGaussianLikelihood
. Note that this an API-breaking change. - Added the
narrow_bottleneck.ipynb
notebook to demonstrate use of the Cauchy likelihood for analysis of experiments with a lot of noise. - Added the
predict_variants.ipynb
to demonstrate prediction of variant phenotypes using global epistasis models. - Added
simulate.codon_muts
.
- Some minor fixes to
codonvariat_sim_data.ipynb
.
- Added
utils.tidy_to_corr
. - Added
binarymap
module. - Added
globalepistasis
module. - Added
ispline
module.
- Order of rows in data frames from
CodonVariantTable.func_scores
. - Updated
codonvariant_sim_data.ipynb
to be smaller and fit global epistasis models, and move plot formatting examples to a new dedicated notebook. - Changed
SigmoidPhenotypeSimulator
so that the enrichment is a sigmoidal function of the latent phenotype, and the observed phenotype is the log (base 2) of the latent phenotype. This change harmonizes the simulator with the definitions in the newglobalepistasis
module. Also changed the input to thelatentPhenotype
andobservedPhenotype
methods. Note that these are backwards-compatibility breaking changes.
- Removed use of deprecated
Bio.Alphabet
- Capabilities to parse barcodes from Illumina data: FASTQ readers and
IlluminaBarcodeParser
. CodonVariantTable.numCodonMutsByType
method to get numerical values for codon mutations per variant.- Can specify names of columns when initializing a
CodonVariantTable
. CodonVariantTable.func_scores
now takeslibraries
rather thancombine_libs
argument.- Added
CodonVariantTable.add_sample_counts_df
method. - Added
CodonVariantTable.plotVariantSupportHistogram
method. - Added
CodonVariantTable.avgCountsPerVariant
andCodonVariantTable.plotAvgCountsPerVariant
methods. - Add custom
plotnine
theme inplotnine_themes
and improved formatting of plots fromCodonVariantTable
. - Added
sample_rename
parameter toCodonVariantTable
plotting methods. - Added
syn_as_wt
toCodonVariantTable.classifyVariants
. - Added
random_seq
andmutate_seq
tosimulate
module.
- Changed how
variant_call_support
set insimulate_CodonVariantTable
. - Better xlimits on
CodonVariantTable.plotCumulMutCoverage
.
- Docs /formatting in Jupyter notebooks.
- Fixed bugs that arose when
pandas
updated to 0.25 (related togroupby
no longer dropping empty categories). - Bugs in
CodonVariantTable
histogram plots whensamples
set.
Initial release. Ported code from dms_tools2
and made some improvements.