New MIT algorithm automatically deciphers lost languages

A new AI system can automatically decipher a lost language that’s no longer understood — without knowing its relationship to other languages.

Researchers at MIT CSAIL developed the algorithm in response to the rapid disappearance of human languages. Most of the languages that have existed are no longer spoken, and at least half of those remaining are predicted to vanish in the next 100 years.

The new system could help recover them. More importantly, it could preserve our understanding of the cultures and wisdom of their speakers.

[Read: What audience intelligence data tells us about the 2020 US presidential election]

The algorithm works by harnessing key principles from historical linguistics, such as the predictable ways in which languages use sound substitutions. The researchers give the example of a word with a “p” in a parent language possibly changing to a “b” in its descendent, but probably not to a “k” due to the difference in pronunciation.

The 💜 of EU tech

The latest rumblings from the EU tech scene, a story from our wise ol' founder Boris, and some questionable AI art. It's free, every week, in your inbox. Sign up now!

These kinds of patterns are then turned into computational constraints. This allows the model to segment words from an ancient language and map them to a related language.

The algorithm can also identify different language families. For example, their method suggested that Iberian is not related to Basque, supporting recent scholarship.

The project was led by MIT Professor Regina Barzilay, who last month won a $1 million award from the world’s largest AI society for her pioneering work on drug development and breast cancer detection.

She now wants to expand the work to identifying the semantic meaning of words — even if we don’t know how to read them.

“For instance, we may identify all the references to people or locations in the document which can then be further investigated in light of the known historical evidence,” Barzilay said in a statement.

“These methods of ‘entity recognition’ are commonly used in various text processing applications today and are highly accurate, but the key research question is whether the task is feasible without any training data in the ancient language.”

You can read the full research study here.

Story by Thomas Macaulay

Managing editor

Thomas is the managing editor of TNW. He leads our coverage of European tech and oversees our talented team of writers. Away from work, he e (show all) Thomas is the managing editor of TNW. He leads our coverage of European tech and oversees our talented team of writers. Away from work, he enjoys playing chess (badly) and the guitar (even worse).

Get the TNW newsletter

Get the most important tech news in your inbox each week.

New MIT algorithm automatically deciphers lost languages

Get the TNW newsletter

Also tagged with

Bananas, champagne, and robots: Why automation still needs humans

Engineering’s AI reality check

Discover TNW All Access

When the machines started talking to each other

Kembara closes €750M first close to fuel growth of European deep tech startups