This model is an experiment for developing a language generation model for the Nepali language. Causal Language Model which can predict the next possible tokens given a context in Nepali language.
Dataset Used
A large corpus of 9.3 GB size has been collected from different sources on the internet. The sources include:
- Nepali Books found online.
- Nepali News Articles from Nepali news portals.
- Nepali text collected from different open-source Nepali NLP datasets.
Hyperparameters Used
- Learning rate → 2e-5
- Weight Decay → 0.01
- Number of training epochs → 5, bf16 → True
- Base Model Architecture → GPT-2
Training Results
It achieves the following results on the evaluation set:
Training Loss | Validation Loss | Perplexity |
---|---|---|
3.3968 | 3.2705 | 26.3245 |