NLPlay

What is NLPlay?

NLPlay is a toolbox / repository, centralizing implementations of key NLP algorithms in one place,to tackle Text Classification, Sentiment Analysis & Question Answering problems. The idea is to have a collection of ready to use algorithms & building blocks , to allow people to quickly benchmark/customize those different model architectures, over standard datasets or their own ones.

Supported models & features

Python/Sklearn (CPU Only)

TFIDF Ngrams + SGD linear Model : A statistical interpretation of term specificity and its application in retrieval - 1972
FastText : Bag of Tricks for Efficient Text Classification - 2016
NBSVM : Baselines and Bigrams: Simple, Good Sentiment and Topic Classification - 2012

Pytorch (CPU/GPU)

FastText : Bag of Tricks for Efficient Text Classification - 2016
DAN : Deep Unordered Composition Rivals Syntactic Methods for Text Classification - 2015
MLP : A model based on an embedding layer and a configurable pooling & feed-forward neural network on top
NBSVM++ : Baselines and Bigrams: Simple, Good Sentiment and Topic Classification - 2012 - Source : FastAI
CharCNN : Character-level Convolutional Networks for Text Classification - 2015
TextCNN : Convolutional Neural Networks for Sentence Classification - 2014 - Source : Galsang
TextRCNN : Recurrent Convolutional Neural Networks for Text Classification - 2015
AttentiveConvNet : Attentive Convolution: Equipping CNNs with RNN-style Attention Mechanisms - 2017 - Source : Tencent
LEAM : Joint Embedding of Words and Labels for Text Classification - 2018
EXAM : Explicit Interaction Model towards Text Classification - 2018 - !UNDER DEVELOPMENT!
DPCNN : Deep Pyramid Convolutional Neural Networks for Text Categorization - 2017 - Source : Cheneng
QRNN : Quasi-Recurrent Neural Networks - 2016 - Source : Dreamgonfly
SWEM : Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms - 2018
SRU : Simple Recurrent Units for Highly Parallelizable Recurrence - 2017 - Source : Asappresearch
LSTM/BiLSTM: Long Short Term Memory - 1997, Neural architectures for named entity recognition - 2016
GRU/BiGRU : Neural Machine Translation by Jointly Learning to Align and Translate - 2014

Additional Pytorch Optimizers

AdaBelief : AdaBelief Optimizer: Adapting Stepsizes by the Belief in Observed Gradients - 2020 - Source : juntang-zhuang
AdaBound : Adaptive Gradient Methods with Dynamic Bound of Learning Rate - 2019 - Source : Luolc
DiffGrad : diffGrad: An Optimization Method for Convolutional Neural Networks - 2019 - Source : Less Wright
Lookahead : Lookahead Optimizer: k steps forward, 1 step back - 2019 - Source : lonePatient
QHAdam : Quasi-hyperbolic momentum and Adam for deep learning - 2019 - Source : FacebookResearch
RAdam : On the Variance of the Adaptive Learning Rate and Beyond - 2020 - Source : LiyuanLucasLiu
Ranger : An Adaptive Remote Stochastic Gradient Method for Training Neural Networks - 2019 - Source : Less Wright

Additional Pytorch Activation Functions

Mish : Mish: A Self Regularized Non-Monotonic Neural Activation Function - 2019 - Source : Diganta Misra
Swish/SwishPlus: Flatten-T Swish: a thresholded ReLU-Swish-like activation function for deep learning - 2019 - Source : Geffnet
LiSHT/LightRelu: LiSHT: Non-Parametric Linearly Scaled Hyperbolic Tangent Activation Function for Neural Networks - 2019 - Source : Less Wright
Threshold Relu : An improved activation function for deep learning - Threshold Relu, or TRelu - 2019 - Source : Less Wright

Additional Pytorch loss

FocalLoss : Focal Loss for Dense Object Detection - 2017 - Source : mbsariyildiz
LabelSmoothingLoss : Rethinking the Inception Architecture for Computer Vision - 2015 - Source : OpenNMT
Supervised Contrastive Loss: Supervised Contrastive Learning - 2020 - Source : Yonglong Tian

Datasets

Sentiment analysis : IMDB, MR
Question classification : TREC6, TREC50
Text classification : 20 newsgroups, AGNews, Amazon Review Polarity, Amazon Review Full , DBpedia, Yelp Review Polarity, Yelp Review Full, Sogou News, Yahoo Answers

Others

parlib : Parallel Processing for large lists (ie corpus pre-processing), Pandas DataFrames or Series, using joblib
DSManager / WordVectorsManager : Automatic reference and download of key datasets & pretrained vectors (Glove, FastText...)

Examples

Tutorials

Todo / Next Steps:

Include additional Models :
- HAN : Hierarchical Attention Networks for Document Classification - 2016
- SIF : A Simple but Tough-to-Beat Baseline for Sentence Embeddings - 2016
- USIF : Unsupervised Random Walk Sentence Embeddings: A Strong but Simple Baseline - 2018
- RE2 : Simple and Effective Text Matching with Richer Alignment Features - 2019
- BiMPM : Bilateral Multi-Perspective Matching for Natural Language Sentences - 2017
- MaLSTM/MaGRU : Siamese Recurrent Architectures for Learning Sentence Similarity - 2016
Include additional Datasets :
- SNLI : The Stanford Natural Language Inference (SNLI) Corpus
- Quora : The Quora question pairs dataset
- SciTail : A Textual Entailment Dataset from Science Question Answering
- WikiQA : WikiQA: A Challenge Dataset for Open-Domain Question Answering
Others :
- ~~Include Nvidia Apex - Mixed Precision to improve GPU memory footprint on Turing/Volta/Ampere architectures~~
- Include support of Google TPU for training & inference via PyTorch/XLA
- Include Cross validation mechanism
- Include Metrics (F1,AUC...) + Confusion Matrix
- Include automatic EDA reporting features
- Include a streamlit app to easily explore & debug model predictions errors and identify potential root causes (ie tokenization, unseen tokens, sentence length,class confusion..)
- Include Microsoft NNI for Hyper Parameters Tuning (TPE, SMAC, Hyperband, BOHB... )
- Include MLflow for Experiments tracking

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
nlplay		nlplay
scripts		scripts
.gitignore		.gitignore
environment.yml		environment.yml
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLPlay

What is NLPlay?

Supported models & features

Python/Sklearn (CPU Only)

Pytorch (CPU/GPU)

Additional Pytorch Optimizers

Additional Pytorch Activation Functions

Additional Pytorch loss

Datasets

Others

Examples

Tutorials

Todo / Next Steps:

About

Releases

Packages

Languages

jeremypoulain/nlplay

Folders and files

Latest commit

History

Repository files navigation

NLPlay

What is NLPlay?

Supported models & features

Python/Sklearn (CPU Only)

Pytorch (CPU/GPU)

Additional Pytorch Optimizers

Additional Pytorch Activation Functions

Additional Pytorch loss

Datasets

Others

Examples

Tutorials

Todo / Next Steps:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages