Audio & NLP Lab – NepaliGPT: Nepali Language Generative Pretrained Transformer Model – Department of Electronics and Computer Engineering

This model is an experiment for developing a language generation model for the Nepali language. Causal Language Model which can predict the next possible tokens given a context in Nepali language.

Dataset Used

A large corpus of 9.3 GB size has been collected from different sources on the internet. The sources include:

Nepali Books found online.
Nepali News Articles from Nepali news portals.
Nepali text collected from different open-source Nepali NLP datasets.

Hyperparameters Used

Learning rate → 2e-5
Weight Decay → 0.01
Number of training epochs → 5, bf16 → True
Base Model Architecture → GPT-2

Training Results

It achieves the following results on the evaluation set:

Training Loss	Validation Loss	Perplexity
3.3968	3.2705	26.3245

S	M	T	W	T	F	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30