{"id":230672,"date":"2024-04-30T18:25:11","date_gmt":"2024-04-30T10:25:11","guid":{"rendered":"https:\/\/magicalbits.net\/?p=230672"},"modified":"2024-04-30T18:25:11","modified_gmt":"2024-04-30T10:25:11","slug":"andrej-karpathy-on-x-congrats-to-aiatmeta-on-llama-3-release-%f0%9f%8e%89-https-t-co-fsw615ze8s-notes-releasing-8b-and-70b-both-base-and-finetuned-models-strong-performing-in-their-model-c","status":"publish","type":"post","link":"https:\/\/magicalbits.net\/?p=230672","title":{"rendered":"Andrej Karpathy on X: &#8220;Congrats to @AIatMeta on Llama 3 release!! \ud83c\udf89 https:\/\/t.co\/fSw615zE8S Notes: Releasing 8B and 70B (both base and finetuned) models, strong-performing in their model class (but we&#8217;ll see when the rankings come in @ @lmsysorg :)) 400B"},"content":{"rendered":"<blockquote class=\"twitter-tweet\" data-width=\"550\" data-dnt=\"true\">\n<p lang=\"en\" dir=\"ltr\">Congrats to <a href=\"https:\/\/twitter.com\/AIatMeta?ref_src=twsrc%5Etfw\">@AIatMeta<\/a> on Llama 3 release!! \ud83c\udf89<a href=\"https:\/\/t.co\/fSw615zE8S\">https:\/\/t.co\/fSw615zE8S<\/a><br \/>Notes:<\/p>\n<p>Releasing 8B and 70B (both base and finetuned) models, strong-performing in their model class (but we&#39;ll see when the rankings come in @ <a href=\"https:\/\/twitter.com\/lmsysorg?ref_src=twsrc%5Etfw\">@lmsysorg<\/a>  :))<br \/>400B is still training, but already encroaching\u2026<\/p>\n<p>&mdash; Andrej Karpathy (@karpathy) <a href=\"https:\/\/twitter.com\/karpathy\/status\/1781028605709234613?ref_src=twsrc%5Etfw\">April 18, 2024<\/a><\/p><\/blockquote>\n<p><script async src=\"https:\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script><\/p>\n<blockquote><p>Scaling laws. Very notably, 15T is a very very large dataset to train with for a model as &#8220;small&#8221; as 8B parameters, and this is not normally done and is new and very welcome. The Chinchilla &#8220;compute optimal&#8221; point for an 8B model would be train it for ~200B tokens. (if you were only interested to get the most &#8220;bang-for-the-buck&#8221; w.r.t. model performance at that size). So this is training ~75X beyond that point, which is unusual but personally, I think extremely welcome. Because we all get a very capable mod<\/p><\/blockquote>\n","protected":false},"excerpt":{"rendered":"<p>Scaling laws. Very notably, 15T is a very very large dataset to train with for a model as &#8220;small&#8221; as 8B parameters, and this is not normally done and is new and very welcome. The Chinchilla &#8220;compute optimal&#8221; point for an 8B model would be train it for ~200B tokens. (if you were only interested [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"ep_exclude_from_search":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[1],"tags":[],"class_list":["post-230672","post","type-post","status-publish","format-standard","hentry","category-uncategorised"],"jetpack_featured_media_url":"","jetpack-related-posts":[],"jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/magicalbits.net\/index.php?rest_route=\/wp\/v2\/posts\/230672","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/magicalbits.net\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/magicalbits.net\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/magicalbits.net\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/magicalbits.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=230672"}],"version-history":[{"count":1,"href":"https:\/\/magicalbits.net\/index.php?rest_route=\/wp\/v2\/posts\/230672\/revisions"}],"predecessor-version":[{"id":230673,"href":"https:\/\/magicalbits.net\/index.php?rest_route=\/wp\/v2\/posts\/230672\/revisions\/230673"}],"wp:attachment":[{"href":"https:\/\/magicalbits.net\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=230672"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/magicalbits.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=230672"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/magicalbits.net\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=230672"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}