No the article is badly worded. Earlier models already have reasoning skills with some rudimentary CoT, but they leaned more heavily into it for this model.
My guess is they didn’t train it on the 10 trillion words corpus (which is expensive and has diminishing returns) but rather a heavily curated RLHF dataset.
Same. I remember playing the original on an Amstrad in the 90s and it was already mind blowing. I was so happy they remade it, and even happier that they barely changed anything about it.