Language Technology Resources

This page gives you an overview of Linguistic Resources and Tools (multilingual software, parallel corpora, and more).

The data releases are in line with the general effort of the European Commission to support multilingualism, language diversity and the re-use of Commission information.

The JRC has developed Language Technology (text mining, computational linguistics) tools for more than twenty languages and it has been analysing up to 300,000 online news articles per day since 2004, thus creating valuable meta-data. Some of this software and of the created meta-data have been released publicly, starting in 2006 with the large-scale multilingual parallel corpus JRC-Acquis, covering twenty-two languages. The JRC also helps distribute the linguistic resources produced by other European Union organisations. The most outstanding feature of all these resources is their high multilinguality and the fact that the texts are parallel (i.e. the corpora consist of texts and their manually produced translations). For comparative details, see the journal publication An overview of the European Union’s highly multilingual parallel corpora (PDF).

Resource list

The resources listed below (not exhuastive) are useful to academia and industry to carry out research and development into highly multilingual text analysis tools and especially into cross-lingual applications.

To better understand the background of our work, you may want to have a look at a list of the publications produced by the Language Technology team (PDF).

Originally Published	30 Apr 2018
Knowledge service \| Metadata	Text Mining and Analysis (TMA)
Digital Europa Thesaurus (DET)	Data

Text Mining and Analysis Competence Centre

Resource list

Read more

Interactive Sankey diagrams of woody biomass flows in the EU and Member States

New study highlights key challenges for biodiversity monitoring in Europe

Nature Finance - Science for Policy Workshop