The only thing that we have come up with (so far!) that fully explains this picture is that the hypothesis is correct: the model is rapidly learning to recognise examples even just seeing them once. Let’s work through each part of the loss curve in turn…