Skip to content

Releases: nanxstats/tinytopics

tinytopics 0.7.2

05 Jan 07:27
ab01f81
Compare
Choose a tag to compare

New features

  • Add TorchDiskDataset class to support using .pt or .pth files as inputs for fit_model() and fit_model_distributed() (#38). Similar to NumpyDiskDataset added in tinytopics 0.6.0, this class also uses memory-mapped mode to load data so that larger than system memory datasets can be used for training.

tinytopics 0.7.1

30 Dec 05:05
a516757
Compare
Choose a tag to compare

Documentation

  • Add distributed training speed and cost metrics on 8x A100 (40 GB SXM4) to the distributed training article (#34). This supplements the existing 1x H100 and 4x H100 metrics.

Testing

  • Add unit tests for fit_model_distributed() (#35).
  • Add pytest-cov to development dependencies (#35).

tinytopics 0.7.0

28 Dec 08:59
5fd26ba
Compare
Choose a tag to compare

New features

  • Add fit_model_distributed() to support distributed training using Hugging Face Accelerate. See the distributed training article for details (#32).

Improvements

  • Use tqdm.auto for better progress bar visuals when used in notebooks (#30).
  • Move dataset classes and loss functions into dedicated modules to improve code structure and reusability (#31).

tinytopics 0.6.0

26 Dec 09:15
4872817
Compare
Choose a tag to compare

New features

  • fit_model() now supports using PyTorch Dataset as input, in addition to in-memory tensors. This allows fitting topic models on data larger than GPU VRAM or system RAM. The NumpyDiskDataset class is added to read .npy document-term matrices from disk on-demand (#26).

Documentation

tinytopics 0.5.1

25 Dec 03:35
aa75155
Compare
Choose a tag to compare

Documentation

  • Add badges for CI tests and mkdocs workflows to README.md (#24).
  • Add PyTorch management guide link for uv to README.md (735fcca).

Maintenance

  • Use hatchling 1.26.3 in pyproject.toml to work around rye publish errors (c56387c).

tinytopics 0.5.0

08 Dec 23:47
b78f953
Compare
Choose a tag to compare

Improvements

  • Increased the speed of generate_synthetic_data() significantly by using direct mixture sampling, which leverages the properties of multinomial distributions (#21).

    This change makes simulating data at the scale of 100K x 100K more feasible. Although the approaches before and after are mathematically equivalent, the data generated with the same seed in previous versions and this version onward will be bitwise different.

tinytopics 0.4.1

04 Dec 20:24
Compare
Choose a tag to compare

Documentation

  • Use pip and python3 in command line instructions consistently.

tinytopics 0.4.0

30 Nov 05:39
b4a9e0e
Compare
Choose a tag to compare

Breaking changes

  • tinytopics now requires Python >= 3.10 to use PEP 604 style shorthand syntax for union and optional types (#14).

Typing

  • Refactor type hints to use more base abstract classes, making them less limiting to specific implementations (#14).

Testing

  • Add unit tests for all functions using pytest, with a GitHub Actions workflow to run tests under Linux and Windows (#18).

Improvements

  • Update articles to simplify import syntax using import tinytopics as tt (#16).
  • Close precise figure handles in plot functions instead of the current figure (#18).

Bug fixes

  • Plot functions now correctly use string and list type color palette inputs when specified (do not call them as functions) (#18).

tinytopics 0.3.0

11 Nov 23:01
80e5816
Compare
Choose a tag to compare

Improvements

  • Refactor the code to use a more functional style and add type hints to improve code clarity (#9).

tinytopics 0.2.0

27 Oct 03:47
Compare
Choose a tag to compare

New features

  • Add scale_color_tinytopics() to support the coloring need for arbitrary number of topics (#4).

Improvements

  • Simplify hyperparameter tuning by adopting modern stochastic gradient methods. fit_model() now uses a combination of the AdamW optimizer (with weight decay) and the cosine annealing (with warm restarts) scheduler (#2).

Bug fixes

  • Fix "Structure plot" y-axis range issue by adding a normalize_rows argument to plot_structure() for normalizing rows so that they all sum exactly to 1, and explicitly setting the y-axis limit to [0, 1]. (#1).

Documentation

  • Add text data topic modeling example article (#7).