Skip to main content

Competence Centre on Composite Indicators and Scoreboards

Our expertise on statistical methodologies and in developing sound composite indicators provides policy-makers with the ‘big picture’ for informed policy decisions and progress monitoring.

Page | Last updated: 05 Sep 2022

Socioeconomic Tracker

The Socioeconomic Tracker

Modern economies and societies produce massive datasets that need to be analysed using new modelling techniques. Given the increasing adoption of unconventional (big) data services and systems, this is a hot and timely topic for the research communities on social and economic analyses. The exploration of such a large real-time amount of information produces new insights that are potentially useful for policymakers when nowcasting and forecasting economic and social indicators.

In particular, measuring the informational content of text in economic and social news is useful for market participants and societies to adjust their perception and expectations on the dynamics of future events and trends, thus significantly influencing policy-makers’ perception and decisions. Social media and news may provide a larger set of information than standard lower frequency socio-economic indicators. To this end, this dashboard analyses three distinct datasets, bringing together insights on different socio-economic trends: the Global Database of Events, Language and Tone dataset (GDELT), the DNA dataset and Google Trends data, analysing Europeans’ searches for topics relevant to this analysis.

News data from GDELT

GDELT is an open Big Data platform on news collected at worldwide level. It provides translation in over 65 languages, extracts people, locations, organizations, counts, quotes, images and millions of themes from commonly used practitioners' topical taxonomies, such as the World Bank Topical Taxonomy. It also measures thousands of emotional dimensions expressed by means of dictionaries popular in the literature, such as the IV-4 Harvard Psychosocial Dictionary or WordNet Affect.

We report the extracted GDELT tone for the specific topic of interest within the selected European countries. The GDELT tone score expresses whether a certain message conveys a positive or negative sentiment with respect to the selected topic identified in the text, and is calculated by GDELT by averaging the tone scores of the terms contained in the text using the VADER sentiment lexicon. We report both the Tone in the sample period of interest, which is the average tone score weighted by the number of articles and outlets available in each sample period. We also report the Popularity rate of the topic in each sample period, measured as the number of articles around the selected topic in the sample period weighted by the number of available articles and outlets.

Newspaper sentiment

News published in online and printed newspapers contains information about past events, current developments and future expectations of socio-economic activity. We study millions of news articles using sentiment analysis. We compute timely indicators about various topics related to socio-economic activities for the EU27 countries, relying on a fine-grained, aspect-based sentiment analysis method particularly suited to fit the economic and financial lexicon. For each topic and country, we report the daily sentiment and volume observed in the news.

Web searches data

Google Search data are available through Google Trends. Google Trends returns the Search Volume Index (SVI) of both queries and topics. Results are normalized to the time and location of a query. By time range (either daily, weekly or monthly) and geography (either country or ISO 3166-2), each data point is divided by the total searches to obtain relative popularity. Depending on the type of access the resulting numbers are then (i) scaled on a range of 0 to 100 based on a query's proportion to all searches on all queries (if the Google Trends end-point is used); or (ii) multiplied by 10 million (if the private non-commercial Google Trends API is used). In both cases, numbers are calculated on a uniformly distributed random sample of Google web searches done since 2004, updated once a day, thus there may be some variance between similar requests. Finally, Google also provides the top-25 (when available) queries and topics related to any given topic or query. Top queries and topics are queries (or topics) that are most frequently searched by users within the same session for any given time and geography. Finally, Google search algorithms merge topics and queries into higher-level classiffiers called 'categories'. A full list of categories is available (here).


Tracking socio-economic activities in European countries with unconventional data | Proceedings of the 2022 ACM Conference on Information Technology for Social Good