AI model Poro sets new milestones for multilingual LLMs in Europe

Helsinki-based Silo AI has completed the training of the Poro model — a new milestone in its mission to create large language models (LLMs) for low-resource languages.

Named after the Finnish word for “reindeer,” Poro is the first of a family of open-source multilingual LLMs. The startup is building the models alongside the University of Turku and the EU’s High Performance Language Technologies (HPLT) project.

Poro is a 34.2 billion parameter model, designed to process English, Finnish, and code. It’s been trained on a dataset of 1 trillion tokens.

“What we are proving with Poro is that we can build competitive models for low-resource languages, like Finnish,” Peter Sarlin, co-founder and CEO of Silo AI, told TNW.

Sarlin explained that in generic LLMs, high-resource languages like English dominate, meaning that the capabilities of low-resource languages reach the extent of translation, but aren’t representative of the language and the culture of a specific country.

According to the startup, Poro outperforms all existing open-source language models in Finnish, including Mistral, FinGPT, Llama, and the BLUUMI 176 billion parameter model.

To achieve this, the team used a novel training approach, by pairing Finnish with high-resource languages. It determined optimal data reuse frequencies for low-resource languages and integrated translated paired texts between Finnish and English. This method relies on cross-lingual signals to boost the understanding of the connections between languages — and in turn, boost performance for Finnish, while not compromising it in English.

Poro has also achieved another milestone: it’s the first multilingual model that has been trained on a EuroHPC supercomputer. “This is proof that we’re able to train LLMs on the AMD-based LUMI supercomputer, instead of an NVIDIA-based supercomputer,” Sarlin said.

A step towards European sovereignty

Open-source multilingual LLMs are key to ensuring language diversity, cultural representation, and democratic access in artificial intelligence. They’re also critical for Europe’s AI sovereignty.

“From a commercial perspective, these models build a baseline and infrastructure that allows European companies to innovate on top,” Sarlin noted. “This way companies can create IP, create competitive edge, and [create] great business that ensures that value stays in Europe with them.”

Poro is available for free under the Apache 2.0 License, which allows both commercial and research use. SiloAI is currently working on the Nordic languages (Swedish, Norwegian, Danish, and Icelandic), and is planning to expand to all other official languages of the EU.

One of the themes of this year’s TNW Conference is Ren-AI-ssance: The AI-Powered Rebirth. If you want to go deeper into all things artificial intelligence, or simply experience the event (and say hi to our editorial team), we’ve got something special for our loyal readers. Use the code TNWXMEDIA at checkout to get 30% off your business pass, investor pass or startup packages (Bootstrap & Scaleup).

Story by Ioanna Lykiardopoulou

Ioanna is a writer at TNW. She covers the full spectrum of the European tech ecosystem, with a particular interest in startups, sustainabili (show all) Ioanna is a writer at TNW. She covers the full spectrum of the European tech ecosystem, with a particular interest in startups, sustainability, green tech, AI, and EU policy. With a background in the humanities, she has a soft spot for social impact-enabling technologies.

Get the TNW newsletter

Get the most important tech news in your inbox each week.

AI model Poro sets new milestones for multilingual LLMs in Europe

A step towards European sovereignty

Get the TNW newsletter

Also tagged with

Why Europe could quietly win the humanoid race

In recruitment, an AI-on-AI war is rewriting the hiring playbook

Discover TNW All Access

Vibe coding platform Lovable becomes fastest-growing software startup ever

ChatGPT advises women to ask for lower salaries, study finds