{"id":230941,"date":"2025-04-01T14:43:18","date_gmt":"2025-04-01T06:43:18","guid":{"rendered":"https:\/\/magicalbits.net\/?p=230941"},"modified":"2025-04-01T14:43:18","modified_gmt":"2025-04-01T06:43:18","slug":"tracing-the-thoughts-of-a-large-language-model-anthropic","status":"publish","type":"post","link":"https:\/\/magicalbits.net\/?p=230941","title":{"rendered":"Tracing the thoughts of a large language model \\ Anthropic"},"content":{"rendered":"<blockquote><p>It turns out that, in Claude, refusal to answer is the default behavior: we find a circuit that is &#8220;on&#8221; by default and that causes the model to state that it has insufficient information to answer any given question. However, when the model is asked about something it knows well\u2014say, the basketball player Michael Jordan\u2014a competing feature representing &#8220;known entities&#8221; activates and inhibits this default circuit (see also this recent paper for related findings). This allows Claude to answer the question whe<\/p><\/blockquote>\n<p>Source: <em><a href=\"https:\/\/www.anthropic.com\/research\/tracing-thoughts-language-model\">Tracing the thoughts of a large language model \\ Anthropic<\/a><\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>It turns out that, in Claude, refusal to answer is the default behavior: we find a circuit that is &#8220;on&#8221; by default and that causes the model to state that it has insufficient information to answer any given question. However, when the model is asked about something it knows well\u2014say, the basketball player Michael Jordan\u2014a [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"ep_exclude_from_search":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[1],"tags":[],"class_list":["post-230941","post","type-post","status-publish","format-standard","hentry","category-uncategorised"],"jetpack_featured_media_url":"","jetpack-related-posts":[],"jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/magicalbits.net\/index.php?rest_route=\/wp\/v2\/posts\/230941","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/magicalbits.net\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/magicalbits.net\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/magicalbits.net\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/magicalbits.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=230941"}],"version-history":[{"count":1,"href":"https:\/\/magicalbits.net\/index.php?rest_route=\/wp\/v2\/posts\/230941\/revisions"}],"predecessor-version":[{"id":230942,"href":"https:\/\/magicalbits.net\/index.php?rest_route=\/wp\/v2\/posts\/230941\/revisions\/230942"}],"wp:attachment":[{"href":"https:\/\/magicalbits.net\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=230941"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/magicalbits.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=230941"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/magicalbits.net\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=230941"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}