THE COMPUTATIONAL LINGUIST
Besides photography, I have always felt a strong passion for my work as a research scientist in computational linguistics. I found a perfect nurturing ground for this passion by working at the European Commission’s Joint Research Centre (JRC), based in the Lago Maggiore area in Northern Italy. I consider myself lucky for having worked in the field during the challenging and exciting pioneering years and up to the moment where working applications started to be commonly used.
I want to dedicate one page on this photo website to this important part of my life. I want to say a few words on what Computational Linguistics is, what has driven me and what my major experiences are.
Said in simple terms, Computational Linguistics (CL) is the field of study that teaches computers to deal with human language. CL produces software that helps people retrieve information quickly and efficiently and to get access to information even if it is written in different languages. CL helps people communicate and educate themselves. Example software applications are Machine Translation, document categorisation, information extraction, multi-document summarisation and sentiment analysis. The term Computational Linguistics is closely related to Text Mining, Natural Language Processing (NLP) and Language Engineering. Methods used include rule-based or symbolic methods, Machine Learning and Artificial Intelligence.
My specialisations within Computational Linguistics are multilinguality and cross-lingual information access. One challenge is the development of text mining software for large numbers of foreign languages with a reasonable human effort. Another one is to fuse, merge and link information found in different languages. A third challenge is to give people access to information written in foreign languages.
In 1998, I joined the Joint Research Centre (JRC) of the European Commission, where I last worked as a senior scientist at the Competence Centre for Text Mining and Analysis. The main project was the development of the publicly accessible media monitoring platform Europe Media Monitor (EMM), which processes both traditional print media and social media posts. EMM gathers and analyses a daily average of 320,000 online news articles in about 70 languages from about 12,000 online news sources (status mid-2019). EMM automatically groups related articles, categorises them into thousands of subject domain categories, extracts and disambiguates information such as mentions of persons, organisations, locations, products and events, identifies direct speech quotations from and about people and translates them into English. EMM tracks news stories over time and it links related stories across languages. EMM helps people educate themselves by showing multiple news reports on the same story, including those written in other languages. EMM is a fundamentally democratic tool that contributes to heightened transparency and awareness. EMM has the potential of raising the understanding of other countries and peoples’ views. The main public EMM applications are NewsBrief, NewsExplorer and the Medical Information System MedISys. Besides the public users, EMM is used by EU institutions, EU Member State authorities, various United Nations sub-organisations, the African Union, the Organisation of American States and many more. (Read: An introduction to the Europe Media Monitor family of applications).
Career History: I received my Ph.D. in 1994 in the field of Computational Linguistics/Machine Translation from the University of Manchester Institute of Science and Technology (UMIST) in England. Before joining the European Commission's Joint Research Centre in 1998, I worked at the Sharp Laboratories of Europe in Oxford (UK), at the Kyushu Institute of Technology in Japan and at the Institute for Applied Information Science IAI in Saarbrücken (Germany). I also spent some time working with the African Union in Addis Ababa (Ethiopia).
Scientific publications: I have co-authored around 130 international peer-reviewed scientific publications, most of which are accessible via the JRC's Publications Repository and via my Google Scholar profile. I was invited to be a keynote speaker at various scientific conferences and workshops.
It was important for me to create and freely distribute large-scale and highly multilingual Language Technology resources in order to speed up Research & Development efforts in the field. These resources include the parallel corpora JRC-Acquis, DGT-Acquis, and the Digital Corpus of the European Parliament; the Translation Memories DGT-TM, ECDC-TM, EAC-TM; the multi-label document categorisation software JRC Eurovoc Indexer (JEX); as well as the multilingual name variant resource JRC-Names (Read: An overview of the European Union's highly multilingual parallel corpora).
Keywords describing my main scientific interests: Computational Linguistics; Text Mining; Natural Language Processing; Information Extraction; Named Entity Recognition (persons, organisations, geographic locations, events and more); Document Clustering and (Multi-label) Categorisation; Summarisation; Terminology Extraction; Quotation Recognition; Opinion Mining (Sentiment Analysis); Multilingual Linguistic Resources.
Photo by Ricardo Rodrigues da Silva