Audio & NLP Lab – NepaliGPT: Nepali Language Generative Pretrained Transformer Model

This model is an experiment for developing a language generation model for the Nepali language. Causal Language Model which can predict the next possible tokens given a context in Nepali language.

Dataset Used

A large corpus of 9.3 GB size has been collected from different sources on the internet. The sources include:

  • Nepali Books found online.
  • Nepali News Articles from Nepali news portals.
  • Nepali text collected from different open-source Nepali NLP datasets.

Hyperparameters Used

  • Learning rate → 2e-5
  • Weight Decay → 0.01
  • Number of training epochs → 5, bf16 → True
  • Base Model Architecture → GPT-2

Training Results

It achieves the following results on the evaluation set:

Training Loss Validation Loss Perplexity
3.3968 3.2705 26.3245