Auto-Complete is a feature that provides relevant suggestions based on input by the user. It works best in domains with a limited number of possible words.
- N-grams Language Model
- Word Level RNN Language Model
- Subword Level RNN Language Model
- Character Level RNN Language Model
- Transformer Language Model
1.5 billion words Arabic Corpus dataset was used for this specific task. In this paper, the researcher has chosen ten sources to be used in the corpus. Several news websites were tested before selecting the source that will be used. The fame of the website, and the news source, or the number of readers were not the criterion for selection. There were other criteria and technical reasons for selecting the news resources used in building the corpus.
Out of all these ten sources, only two were chosen to be used due to limited resources. Almasralyoum which was used to train all our models and youm7 which was only used in fine tuning the transformer along with Almasralyoum.
The Transformer was the model used to build an API using a new easy web framework which is FastAPI. In order to try out the application, please follow the following instructions:
- Install all Python libraries that the notebooks depend on:
pip install -r requirements.txt
-
Download the trained model from this link: Finetuned-Transformer
-
Clone the Arabert repo:
git clone https://github.com/aub-mind/arabert.git
- Run the server:
uvicorn transformer_fastapi:app --reload
-
Navigate to your local host
http://localhost:8000/docs
-
Assign the prefix text and the number of words to be predicted