Dataset and model | Knowledge for policy

Text Mining

Representation of textual dataset and related model

A collection of news articles in many languages

We began by gathering a wide variety of news articles written in six different European languages into a labelled dataset. Imagine it as a huge selection of news pieces about important topics like climate change and global events. We collected 1,612 articles, and added detailed notes on what kind of news they are and how they try to persuade readers, if at all.

Our goal was to have a mix of languages and topics to see how news works in different places. We then proceeded with identifying the persuasion techniques in each of these articles..

Smart computer models as news detectives

We needed smart computer programs to help us analyse the news. We used advanced models like XLM-RoBERTa, which handle many languages and can be trained to find persuasion techniques like using charged words ("Loaded Language") or arbitrary labels ("Name Calling"). These models can even transfer what they've learned from one language to another, which means they can work no matter the language of the article.

The models are also very good at figuring out whether an article was opinion, fact-reporting, or satire and what techniques it used to persuade readers. This helps us and others understand how media tries to influence public opinion and how it presents different subjects.

Learning and improving for the future

While we've made a lot of progress, we also know there is more to do. Our project is not perfect, and human judgment can bring some bias. But we have worked hard to create clear rules and checks to keep everything as fair as possible.

Read our paper

Back Use detection tool

Originally Published \| Last Updated	27 Mar 2025 \| 07 Nov 2025
Knowledge service \| Metadata	Text Mining

More information and links