A recipe for frontier model post-training

Interconnects - A podcast by Nathan Lambert

Categories:

Apple, Meta, and Nvidia all agree -- synthetic data, iterative training, human preference labels, and lots of filtering.This is AI generated audio with Python and 11Labs.Source code: https://github.com/natolambert/interconnects-toolsOriginal post: https://www.interconnects.ai/p/frontier-model-post-training00:00 Llama 3.1 post-training and the new normal for RLHF01:18 A new standard pipeline01:45 Human preference data02:59 Scaling RLHF05:03 Synthetic data06:10 The new normal06:51 Data quality is king07:18 Apple confirms the new normalFig 1: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/frontier-rlhf/img_018.pngFig 2: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/frontier-rlhf/img_020.pngFig 3: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/frontier-rlhf/img_031.pngFig 4: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/frontier-rlhf/img_033.pngFig 5: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/frontier-rlhf/img_035.png This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.interconnects.ai/subscribe