Releases: nanxstats/tinytopics
Releases · nanxstats/tinytopics
tinytopics 0.7.2
New features
- Add
TorchDiskDataset
class to support using.pt
or.pth
files as inputs forfit_model()
andfit_model_distributed()
(#38). Similar toNumpyDiskDataset
added in tinytopics 0.6.0, this class also uses memory-mapped mode to load data so that larger than system memory datasets can be used for training.
tinytopics 0.7.1
Documentation
- Add distributed training speed and cost metrics on 8x A100 (40 GB SXM4) to the distributed training article (#34). This supplements the existing 1x H100 and 4x H100 metrics.
Testing
tinytopics 0.7.0
New features
- Add
fit_model_distributed()
to support distributed training using Hugging Face Accelerate. See the distributed training article for details (#32).
Improvements
tinytopics 0.6.0
New features
fit_model()
now supports using PyTorchDataset
as input, in addition to in-memory tensors. This allows fitting topic models on data larger than GPU VRAM or system RAM. TheNumpyDiskDataset
class is added to read.npy
document-term matrices from disk on-demand (#26).
Documentation
- Added a memory-efficient training article demonstrating the new features for fitting topic models on large datasets (#27).
tinytopics 0.5.1
tinytopics 0.5.0
Improvements
-
Increased the speed of
generate_synthetic_data()
significantly by using direct mixture sampling, which leverages the properties of multinomial distributions (#21).This change makes simulating data at the scale of 100K x 100K more feasible. Although the approaches before and after are mathematically equivalent, the data generated with the same seed in previous versions and this version onward will be bitwise different.
tinytopics 0.4.1
Documentation
- Use
pip
andpython3
in command line instructions consistently.
tinytopics 0.4.0
Breaking changes
- tinytopics now requires Python >= 3.10 to use PEP 604 style shorthand syntax for union and optional types (#14).
Typing
- Refactor type hints to use more base abstract classes, making them less limiting to specific implementations (#14).
Testing
- Add unit tests for all functions using pytest, with a GitHub Actions workflow to run tests under Linux and Windows (#18).
Improvements
- Update articles to simplify import syntax using
import tinytopics as tt
(#16). - Close precise figure handles in plot functions instead of the current figure (#18).
Bug fixes
- Plot functions now correctly use string and list type color palette inputs when specified (do not call them as functions) (#18).
tinytopics 0.3.0
Improvements
- Refactor the code to use a more functional style and add type hints to improve code clarity (#9).
tinytopics 0.2.0
New features
- Add
scale_color_tinytopics()
to support the coloring need for arbitrary number of topics (#4).
Improvements
- Simplify hyperparameter tuning by adopting modern stochastic gradient methods.
fit_model()
now uses a combination of the AdamW optimizer (with weight decay) and the cosine annealing (with warm restarts) scheduler (#2).
Bug fixes
- Fix "Structure plot" y-axis range issue by adding a
normalize_rows
argument toplot_structure()
for normalizing rows so that they all sum exactly to 1, and explicitly setting the y-axis limit to [0, 1]. (#1).
Documentation
- Add text data topic modeling example article (#7).