THE COMPUTATIONAL LINGUIST
Alongside photography, I’ve always been passionate about my work as a research scientist in Computational Linguistics—the field that teaches computers to process and understand human language. I found the ideal environment for this work at the European Commission’s Joint Research Centre (JRC), located near Lago Maggiore in Northern Italy. I consider myself fortunate to have been active during the pioneering years of the field, right through to the era of widely used language technology applications.
This section of my website is dedicated to that important chapter of my life: to explain what computational linguistics is, what motivated my work, and to highlight key projects, experiences, and resources I helped create.
What is Computational Linguistics?
Put simply, Computational Linguistics (CL) focuses on enabling computers to handle human language. It powers software that helps people retrieve information efficiently, access content in multiple languages, and communicate across linguistic boundaries. Typical applications include:
-
Machine Translation
-
Document Categorisation
-
Information Extraction
-
Multi-document Summarisation
-
Sentiment Analysis.
CL overlaps closely with fields such as Text Mining, Natural Language Processing (NLP), and Language Engineering. The methods used range from rule-based (symbolic) approaches to Machine Learning and Artificial Intelligence.
My Specialisations
My main areas of expertise within CL include:
-
Multilinguality and cross-lingual information access
-
Developing text mining tools for many languages with limited human effort
-
Fusing and linking information across languages
-
Giving users access to foreign-language information.
Career Highlights
In 1998, I joined the Joint Research Centre (JRC) of the European Commission, where I served as a senior scientist at the Competence Centre for Text Mining and Analysis.
One of my key contributions was to the development of the Europe Media Monitor (EMM), a publicly accessible media monitoring platform. EMM analyses over 320,000 online news articles daily in about 70 languages, sourced from approximately 12,000 outlets (status: mid-2019).
EMM capabilities include:
-
Grouping related articles
-
Categorising content into thousands of subject domains
-
Extracting and disambiguating entities (persons, organisations, places, products, events)
-
Identifying and translating direct speech quotations
-
Tracking news over time
-
Linking stories across languages.
EMM enables users to explore diverse perspectives by comparing how the same story is reported across countries and languages. It promotes transparency, cultural understanding, and media literacy. Main public applications include:
-
NewsExplorer
EMM is used by a wide range of institutions, including EU bodies, EU Member State authorities, UN sub-organisations, the African Union, and the Organisation of American States.
▶︎ Read: An introduction to the Europe Media Monitor family of applications
Career Path
-
Ph.D. in Computational Linguistics / Machine Translation
University of Manchester Institute of Science and Technology (UMIST), UK, 1994 -
Previous roles:
-
Sharp Laboratories of Europe, Oxford (UK)
-
Kyushu Institute of Technology, Japan
-
Institute for Applied Information Science (IAI), Saarbrücken (Germany)
-
African Union, Addis Ababa (Ethiopia).
-
Scientific Publications and Outreach
I co-authored around 130 international peer-reviewed publications, many of which are accessible via:
Open Language Resources
A central goal in my work was to create and freely share large-scale multilingual resources to accelerate research in language technology. These include:
-
Parallel Corpora:
-
JRC-Acquis, DGT-Acquis, Digital Corpus of the European Parliament
-
-
Translation Memories:
-
DGT-TM, ECDC-TM, EAC-TM
-
-
Text Categorisation Tool:
-
JRC Eurovoc Indexer (JEX)
-
-
Name Variant Resource:
-
JRC-Names
-
▶︎ Read: An overview of the European Union’s highly multilingual parallel corpora
Keywords for Search and Discovery
Computational Linguistics, Text Mining, Natural Language Processing, Information Extraction, Named Entity Recognition (persons, organisations, locations, events, more), Document Clustering, (Multi-label) Categorisation, Summarisation, Terminology Extraction, Quotation Recognition, Opinion Mining (Sentiment Analysis), Multilingual Linguistic Resources.
In 2019, I took early retirement from the JRC to devote more time to photography and other creative pursuits.

Photo by Ricardo Rodrigues da Silva