Viewing README.md

filename: README.md
branch: main
back to repo
this very simple language model is written by Ryan Alport for demo use

in order to run, clone the repo, then simply compile an exectuable from train.c and generate.c, no linker, no cross platform quirks (yay!)

training expects a file called "suess.txt" to exist as the training corpus
during training the model will continuously write itself to a file called model.bin
generation expects to find model.bin in order to initialize itself

see TRAINED/ for pretrained models, just move one to the executable directory and rename it 'model.bin' to use

i have found the optimal training duration to be no more than about 1 million steps but feel free to adjust. constants like CONTEXT, LR, temperature, hidden size, etc. are all adjustable as well

my training training text of choice is a number of works by Dr. Suess(< 40kb)

the idea here is to train this simple feed forward model on a small vocabulary in order to keep this simple and demonstrate basic principles of LLMs without worrying about scale

the model does not implement a transformer or attention, for the purposes of this demonstration, i feel that they increase the scope too much however i am not opposed to extending this or trying to explain how those tools would improve our model

expect the output to be somewhat nonsense, this project is built to demonstrate basic principles and show that even this basic model can generate new text and follow similar patterns to the text on which it was trained

the accompanying presentation can be found here:
https://iheartcomputer.club/projects/baby-lm/

reach out to me at [email protected] with any questions at all!