Machine Learning Blogs. Decoder-only language models like Llama are usually trained u