each model is trained for the specified iterations across all models: LR = 0.005 CONTEXT = 6 HIDDEN = 256 using NO punctuation tokenizer random seed is 1234 final average losses: untrained = 6.987885 10k = 5.628099 50k = 5.014620 100k = 4.293952 250k = 2.538245 500k = 0.901952 1mil = 0.268483 2mil = 0.176046 3mil = 0.149419