It seems mistral finally released their own version of a small 3.1 2503 with CoT reasoning pattern embedding. Before this the best CoT finetune of Small was DeepHermes with deepseeks r1 distill patterns. According to the technical report, mistral baked their own reasoning patterns for this one so its not just another deepseek distill finetune.

HuggingFace

Blog

Magistral technical research academic paper

  • brucethemoose@lemmy.world
    link
    fedilink
    English
    arrow-up
    2
    ·
    edit-2
    19 days ago

    I don’t want to be ungrateful complaining that they dont give us everything.

    For sure.

    But I guess it’s still kinda… interesting? Like you’d think Qwen3, Gemma 3, Falcon H1, Nemotron 49B and such would pressure them to release Medium, but I guess there are factors that help them sell it.

    As stupid as this is, they’re European and specifically not Chinese. In the business world, there’s this mostly irrational fear that the Deepseek or Qwen weights by themselves will jump out of their cage and hack you, heh.