Skip to content

1.13.0

Compare
Choose a tag to compare
@fhieber fhieber released this 21 Nov 15:12
cc24739

[1.13.0]

Fixed

  • Transformer models do not ignore --num-embed anymore as they did silently before.
    As a result there is an error thrown if --num-embed != --transformer-model-size.
  • Fixed the attention in upper layers (--rnn-attention-in-upper-layers), which was previously not passed correctly
    to the decoder.

Removed

  • Removed RNN parameter (un-)packing and support for FusedRNNCells (removed --use-fused-rnns flag).
    These were not used, not correctly initialized, and performed worse than regular RNN cells. Moreover,
    they made the code much more complex. RNN models trained with previous versions are no longer compatible.
  • Removed the lexical biasing functionality (Arthur ETAL'16) (removed arguments --lexical-bias
    and --learn-lexical-bias).

[1.12.2]

Changed

  • Updated to MXNet 0.12.1, which includes an important
    bug fix for CPU decoding.

[1.12.1]

Changed

  • Removed dependency on sacrebleu pip package. Now imports directly from contrib/.

[1.12.0]

Changed

  • Transformers now always use the linear output transformation after combining attention heads, even if input & output
    depth do not differ.

[1.11.2]

Fixed

  • Fixed a bug where vocabulary slice padding was defaulting to CPU context. This was affecting decoding on GPUs with
    very small vocabularies.

[1.11.1]

Fixed

  • Fixed an issue with the use of ignore in CrossEntropyMetric::cross_entropy_smoothed. This was affecting
    runs with Eve optimizer and label smoothing. Thanks @kobenaxie for reporting.

[1.11.0]

Added

  • Lexicon-based target vocabulary restriction for faster decoding. New CLI for top-k lexicon creation, sockeye.lexicon.
    New translate CLI argument --restrict-lexicon.