Dr. Tuomo Hiippala


I am a tenure track Associate Professor in English Language and Digital Humanities (2023–) in the Department of Languages at the University of Helsinki, where I have worked since 2018. I also hold a Title of Docent in Multimodality Research and Digital Methods at the University of Turku.

I am a member of the Digital Geography Lab, where I worked as a post-doctoral researcher for the year 2017, and affiliated with the Helsinki Institute of Sustainability Science and Helsinki Institute of Urban and Regional Studies.

In 2015–2016, I worked as a post-doctoral researcher at the Centre for Applied Language Studies at the University of Jyväskylä.

I hold a PhD in English philology from the University of Helsinki (2014), supervised by Eija Ventola.

I am an alumni of Young Academy Finland (2018–2022). Since 2015, I am also a member of Sport-Verein „Werder“ v. 1899 e. V. 😉

A photograph of me by Veikko


I do research on multimodal communication, that is, how multiple forms of expression interact and co-operate with each other in different communicative situations.

To exemplify, spoken language, gestures, posture and gaze are constantly coordinated in face-to-face interaction, while magazines, newspapers, websites and other page-based texts organize written language, photographs, illustrations, diagrams, information graphics and other modes of expression into coherent layouts.

In both cases, communication builds on appropriate combinations of different forms of expression. Theories of multimodality attempt to explain how such appropriate combinations are formed and how they become understandable in context.

I am particularly interested in how theories of multimodality can inform research on artificial intelligence, and conversely, how artificial intelligence can support empirical research on multimodal communication. In addition, I am also interested in applications of natural language processing and computer vision in the humanities and beyond.

You can find out more about my current projects, doctoral and post-doctoral researchers working under my supervision, recent publications and teaching resources below.

For a comprehensive and up-to-date list of my publications, please see my profile on the University of Helsinki research portal.

Get in touch

If you have questions or comments about my work, feel free to reach out to me! The same applies to prospective MA students (University of Helsinki) wishing to write their thesis on multimodality or a related topic.

You can reach me via e-mail at tuomo.hiippala@helsinki.fi and find me on GitHub, ResearchGate and Mastodon. My ORCID is 0000-0002-8504-9422.


Digital Sweatshops as Research Infrastructure? Critical and Practical Perspectives on Crowdsourcing in Academic Research (CROWDINFRA)


This project studies microtask crowdsourcing as a part of the academic research infrastructure. In this context, microtask crowdsourcing refers to distributing small tasks that require human intelligence to masses of non-expert workers on online platforms.

We examine the role of crowdsourcing, focusing especially on the role of platforms, and seek to rethink the role of crowdsourcing as a form of participatory research governed by fair and transparent algorithms. The project is funded by a four-year grant from Kone Foundation.

For more information, see the project page.

Laypersons as Experts: Crowdsourcing Descriptions of Multimodal Communication for Empirical Research (CROWDSRC)

2021–2024 PUBLICATION #1

A part of the crowdsourcing pipeline illustrated.

This project develops crowdsourcing methods for empirical multimodality research. Crowdsourcing is a technique that involves breaking complex tasks into piecemeal work and distributing this effort to a large pool of workers on online platforms.

We develop annotation tasks that are motivated by theories of multimodality, which are then distributed to non-expert workers. The resulting descriptions are then converted into systematically annotated corpora using statistical and computational methods.

This project is funded by a three-year grant from the University of Helsinki Research Funds, which is used to support the work of one doctoral researcher.

For more information, see the project page.

Mapping the Linguistic Landscape of the Helsinki Metropolitan Area (MAPHEL)


A map showing the number of unique languages per a 250m grid cell in the Helsinki Metropolitan Area.

This project studies which languages are used in the Helsinki Metropolitan Area and the users of different languages move around the region by combining official register data and social media data.

The project, which is conducted in collaboration with other members of the Digital Geography Lab, combines methods from natural language processing and geoinformatics with measurements developed in ecology and information sciences. The quantitative and spatial analyses are complemented by ethnographic approaches.

The research project is funded by a three-year project grant from the Emil Aaltonen Foundation, which is used to support the work of one doctoral researcher and one post-doctoral researcher.

For more information, see the project page.

Enhancing the AI2D Diagrams Dataset using Rhetorical Structure Theory (AI2D-RST)


An example of a diagram from the AI2D dataset

We developed a new annotation schema for describing how diagrams in primary school science textbooks combine multiple modes of expression and applied the schema to 1000 diagrams in an existing dataset of diagrams.

The new corpus, AI2D-RST, is based on AI2D Diagrams dataset, which has been developed at the Allen Institute for Artificial Intelligence for tasks such as automatic diagram understanding and visual question answering.

AI2D-RST is intended to support research on the computational processing of diagrams and their multimodal structure. The discourse relations in diagrams are described using Rhetorical Structure Theory.

Methods for Empirical Multimodality Research


An example of showing the application of SIFT, a computer vision algorithm, to an image of the Lutheran Cathedral in Helsinki.

My long-term interest is to move research on multimodality towards a more empirically-responsible direction by establishing a closer bond between theories and data. As a part of this effort, I have studied how computational techniques from the fields of computer vision and natural language processing could be used to describe multimodal communication at scale.

Some publications emerging from this work can be found here, here and here.

This work has been supported by City of Helsinki Urban Facts (Helsingin kaupungin tietokeskus) and the Finnish Cultural Foundation (Suomen Kulttuurirahasto).


I currently supervise the following doctoral and post-doctoral researchers:

My former supervisees are listed below:

Prospective doctoral researchers: I do not reply to mass e-mails about doctoral positions. If you wish to pursue a PhD under my supervision, you need to first convince me that I am the right supervisor for you. You also need to tell me how you plan to finance your doctoral research. Note that I do not currently have the possibility to support PhD students financially. Any funded positions will be announced on this website.

Prospective post-doctoral researchers: If you are interested in pursuing a post-doctoral research project under my supervision, please get in touch. Note that I do not currently have funding for hiring post-doctoral researchers, but I am happy to support applications for post-doctoral fellowships e.g. from Marie Skłodowska-Curie Actions, if our research interests are aligned.


Corpus-based insights into multimodality and genre in primary school science diagrams

Visual Communication / 2023 / DOI PDF

This article demonstrates how unsupervised machine learning algorithms can be used to discover patterns that characterise particular types of diagrams in the AI2D-RST corpus.

In addition, the article proposes a novel method for quantifying information about the spatial organisation of different expressive resources.

Developing a tool for fair and reproducible use of paid crowdsourcing in the digital humanities

Co-authored with Helmiina Hotti and Rosa Suviranta / Proceedings of the 6th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature at COLING 2022, Gyeongju / 2022 / URL PDF GITHUB

This system demonstration paper introduces a tool for fair and reproducible microtask crowdsourcing on the Toloka platform. The tool allows building complex crowdsourcing pipelines using YAML configuration files.

In addition to increasing transparency by using explicit configuration files that may be shared in academic publications, the tool incorporates other mechanisms that mitigate ethical issues associated with crowdsourcing.

The development of the tool has been supported by the Helsinki Institute for Social Sciences and Humanities.

Introducing the diagrammatic semiotic mode

Co-authored with John A. Bateman / Diagrams 2022: Diagrammatic Representation and Inference / 2022 / DOI PDF

This article, co-authored with John A. Bateman (Bremen University), discusses diagrams from a multimodal perspective, drawing on the AI2D-RST corpus for data.

We propose that diagrams may be treated as a full-blown mode of communication, which allows integrating diverse expressive resources such as natural language and various forms of depiction, and provides mechanisms that support their interpretation.

The article won the award for best paper at the 13th International Conference on the Theory and Application of Diagrams.

Mapping urban linguistic diversity with social media and population register data

Co-authored with Tuomas Väisänen, Olle Järv and Tuuli Toivonen / Computers, Environment and Urban Systems / 2022 / DOI GITHUB PDF

In this article, we study the diversity of languages in the Helsinki Metropolitan Area by combining social media and register data. Using measurements of diversity developed in ecology and information sciences, we show that the two data sources provide radically different views into the diversity of languages. Our analyses also demonstrate that linguistic diversity is a spatio-temporal phenomenon.

Semiotically-grounded distant viewing of diagrams: insights from two multimodal corpora

Co-authored with John A. Bateman / Digital Scholarship in the Humanities / 2022 / DOI GITHUB PDF

In this article, we analyse the layout and visual modes of expression in the AI2D-RST corpus by combining multimodality theory and distant viewing, as proposed in this article.

By combining statistical and computational methods, we show that particular diagram types are characterised by specific layout patterns. Our analyses also reveal the diversity of visual expressive resources used in the diagrams.

Distant viewing and multimodality theory: Prospects and challenges

Multimodality & Society / 2021 / DOI PDF

This article discusses the prospects and challenges of combining multimodality theory with distant viewing, a recent framework proposed in the field of digital humanities, advocates the use of computational methods to enable large-scale analysis of visual and multimodal materials.

I argue that multimodality theory is well-positioned to support this effort by providing descriptive schemas that impose structure on the materials under analysis.

Exploring human–nature interactions in national parks with social media photographs and computer vision

Co-authored with Tuomas Väisänen, Vuokko Heikinheimo and Tuuli Toivonen / Conservation Biology / 2021 / DOI GITHUB PDF

In this article, we report on a study of human–nature interactions in Finnish national parks, which we examined by applying computer vision to Flickr photographs. We propose an approach named semantic clustering for detecting activities, which combines both computer vision and clustering algorithms. We then compared the activities among domestic and foreign visitors.

AI2D-RST: A multimodal corpus of 1000 primary school science diagrams

Co-authored with Malihe Alikhani, Jonas Haverinen, Timo Kalliokoski, Evanfiya Logacheva, Serafina Orekhova, Aino Tuomainen, Matthew Stone & John Bateman / Language Resources and Evaluation / 2021 / DOI CORPUS GITHUB PDF

This publication introduces a multimodal corpus that contains 1000 primary school science diagrams. The corpus builds on the crowdsourced annotations available in the Allen Institute for Artificial Intelligence Diagrams Dataset (AI2D) to add multiple layers of expert annotations for diagram type, logical structure and relations between elements, as proposed in this publication.

Mapping the languages of Twitter in Finland: Richness and diversity in space and time

Co-authored with Tuomas Väisänen, Olle Järv and Tuuli Toivonen / Neuphilologische Mitteilungen / 2020 / DOI GITHUB PDF

We studied the language choices of Twitter users in Finland by detecting languages in their tweet histories and estimating their home locations at the levels of regions and municipalities based on geotagged tweets.

The analyses revealed a multilingual platform: the vast majority of Twitter users in Finland use more than one language to communicate on the platform.

A multimodal perspective on data visualization

Data Visualization in Society / Amsterdam University Press / 2020 / DOI

This chapter, published open-access in a volume edited by Helen Kennedy and Martin Engebretsen, presents a multimodal perspective on data visualization, building on the approach presented in our recent textbook on multimodality.

Social media data for conservation science: a methodological overview

Co-authored with Tuuli Toivonen, Vuokko Heikinheimo, Christoph Fink, Anna Hausmann, Olle Järv, Henrikki Tenkanen and Enrico Di Minin / Biological Conservation / 2019 / DOI GITHUB

A photograph of elephants crossing the road in a national park in Africa, which have been segmented from the image by a computer vision algorithm.

This article provides a thorough overview of how social media data can be used for various purposes in the field of conservation science.

My contribution involved writing the sections on computer vision and natural language processing, and illustrating their use in examples.

A framework for investigating illegal wildlife trade on social media with machine learning

Co-authored with Enrico Di Minin, Christoph Fink and Henrikki Tenkanen / Conservation Biology / 2019 / DOI

This article proposes a set of techniques that combine computer vision, natural language processing and machine learning for tracking illegal wildlife trade on social media platform.

This article was preceded by a call to action published in Nature Ecology and Evolution. The HELICS Lab continues this work.

Systemic-Functional Linguistics and Computation: New Directions, New Challenges

Co-authored with John Bateman, Daniel McDonald, Daniel Couto-Vale and Eugeniu Costetchi / Cambridge Handbook of Systemic Functional Linguistics / Cambridge University Press / 2019 / PREPRINT DOI

This chapter, published in a handbook edited by Geoff Thompson, Wendy L. Bowcher, Lise Fontaine and David Schöntal, considers the relationship between the currently dominant statistical paradigm in natural language processing and systemic-functional linguistics, while also touching upon issues of multimodality.

Exploring the linguistic landscape of geotagged social media content in urban environments

Co-authored with Anna Hausmann, Henrikki Tenkanen and Tuuli Toivonen / Digital Scholarship in the Humanities / 2019 / PREPRINT DOI GITHUB

This article, co-authored with Anna Hausmann, Henrikki Tenkanen and Tuuli Toivonen, examines the distribution of languages in Instagram posts geotagged to the Senate Square, Helsinki, over a period of four and half years.

The results showed that the languages and their respective proportions change over time, but English remains the dominant language, as the language gains users from many other linguistic groups. Finns, for example, post in English half the time.

Multimodality: Foundations, Research and Analysis - A Problem-Oriented Introduction

Co-authored with John Bateman and Janina Wildfeuer / De Gruyter / 2017 / PUBLISHER

We co-authored a problem-based introduction to multimodal analysis with John Bateman (Bremen University) and Janina Wildfeuer (University of Groeningen).

This book is intended to provide a foundational introduction to multimodality, that is, how multiple modes of communication work together in different texts and situations.

We adopt a problem-based approach, showing how the foundation provided by the textbook can be used to derive appropriate methods for every situation in which multimodality needs to be of concern.

We illustrate the application of these methods using a large number of case studies dealing with different multimodal phenomena.

Enhancing the AI2 Diagrams dataset using Rhetorical Structure Theory

Co-authored with Serafina Orekhova / Proceedings of the 11th International Conference on Language Resources and Evaluation, Miyazaki / 2018 / PDF GITHUB

Example of a diagram from the AI2D dataset

This conference paper, co-authored with research assistant Serafina Orekhova, outlines our plan for improving the AI2D Diagrams dataset as a part of the project.

In addition to sketching out some of the rhetorical relations necessary for describing diagrams, the paper reports on a preliminary study on inter-annotator agreement in applying these relations to diagrams in the AI2D dataset.

Media evolution and genre expectations

Co-edited with Chiao-I Tseng / Discourse, Context & Media / 2017 / JOURNAL

We edited a special section for Discourse, Context & Media with Chiao-I Tseng (Bremen University), which brings together two concepts, media and genre.

These concepts are frequently invoked across a wide range of disciplines broadly concerned with discourse (and multimodality), but rarely brought into productive dialogue with each other.

The special section contains invited contributions from leading researchers, whose articles offer different perspectives to these key concepts.

Recognizing military vehicles in social media images using deep learning

Proceedings of the 2017 IEEE International Conference on Intelligence and Security Informatics, Beijing / 2017 / PDF GITHUB

Screenshot of the tool

This conference paper presents a system for detecting military vehicles in social media images, which I developed with non-governmental (i.e. citizen journalists) and intergovernmental organizations in mind, as such organizations may have limited resources available for monitoring social media in conflict zones. The system draws on recent advances in applying deep neural networks to computer vision tasks, while also making extensive use of openly available libraries, models and data.

An overview of research within the Genre and Multimodality framework

Discourse, Context & Media / 2017 / PDF DOI

This review article provides an overview of the research conducted within the Genre and Multimodality framework, which has been used to describe documents and other multimodal artefacts over the last 15 years. In addition to reviewing the previous work, the article introduces the central theoretical concepts of the framework – medium, mode and genre – and their application to the study of multimodality.

The multimodality of digital longform journalism

Digital Journalism / 2017 / PDF DOI CORPUS

Screenshot © The New York Times

In this article, which grew out of the additional research conducted for my book, I examined how digital longform journalism combines written language, news photography, short videos and other modes of communication using a corpus of 12 longform articles published between 2012 and 2013.

Individual and collaborative semiotic work in document design

Hermes – Journal of Language and Communication in Business / 2016 / PDF PUBLISHER

Screenshot of a diagram

This article discusses how different specialists contribute to document design, presenting a case study focusing on the production of Finnair's corporate annual reports.

By interviewing a project manager and a graphic designer and contrasting the findings from the interview with a multimodal analysis of the annual report, I attempt to trace how their contributions shape the design of the document.

Helsingin kaupungin matkailuesitteiden multimodaalinen korpus [A multimodal corpus of tourist brochures advertising Helsinki, Finland]

Terra / 2016 / PDF CORPUS

Screenshot of a figure

Tämä datankuvausartikkeli esittelee väitöskirjaani varten kootun multimodaalisen korpuksen, joka sisältää 58 aukeamaa Helsingin kaupungin vuosina 1967–2008 julkaisemista englanninkielisistä matkailuesitteistä.

This article describes the multimodal corpus that I compiled for my doctoral dissertation. The corpus contains 58 double-pages from English-language tourist brochures published by the city of Helsinki between 1967 and 2008.

Semi-automated annotation of page-based documents within the Genre and Multimodality framework

Proceedings of the 10th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities at ACL 2016, Berlin / 2016 / PDF PUBLISHER GITHUB

Screenshot of the annotator

This conference paper presents the annotator I've developed for automating the description of multimodal documents using the Genre and Multimodality model. The annotator leverages open source libraries such as OpenCV, NLTK and Tesseract for generating the annotation. The work on the annotator continues: feel free to get in touch if you wish to contribute to its development.

Aspects of multimodality in higher education monographs

Multimodality in Higher Education / Brill / 2016 / PDF DOI

Learning to read academic texts is a key skill required by any student or researcher. In this chapter, I aimed to show how multimodal analysis can be used to reveal different types of structures typically found in academic texts. The chapter appeared in a volume edited by Arlene Archer (University of Cape Town) and Esther Breuer (University of Cologne).

Combining computer vision and multimodal analysis: a case study of layout symmetry in bilingual in-flight magazines

Building Bridges for Multimodal Research: International Perspectives on Theories and Practices of Multimodal Analysis / Peter Lang / 2015 / PDF

In this article, I examined how bilingual documents use layout to signal the reader that their contents are semantically equivalent. Recently, I've been interested in computer vision and machine learning: I applied some of these techniques to sketch a method for studying layout symmetry in large data sets. The article was published in a volume edited by Janina Wildfeuer (Bremen University).

The Structure of Multimodal Documents: An Empirical Approach


This book, which is largely based on my doctoral dissertation, develops a framework for studying how language, images and other modes of communication work together in page-based documents. In addition to revising and rewriting each chapter, two new chapters have been added, which focus on the page as a unit of analysis in a multimodal document and show how the framework can be applied to digital media.

In addition to conducting an extensive, empirical study of the Helsinki tourist brochures published between 1967 and 2008, I examine how digital longform journalism, exemplified by publications such as The New York Times' "Snow Fall", combines photography, short videos, cinematic transitions and written language.

Multimodal genre analysis

Interactions, Images, and Texts: A Reader in Multimodality / De Gruyter Mouton / 2014 / PDF DOI

Drawing on the research conducted for my doctoral dissertation, I also contributed a chapter to a handbook edited by Sigrid Norris (Auckland University of Technology) and Carmen Daniela Maier (Aarhus University), which outlines how the concept of genre may be used to describe and compare different documents.

The central argument of the chapter is that in order for the concept of genre to be useful, the concept needs to be tied to multimodal structures identified in the document.

The chapter also shows how the visualisation tools developed for my dissertation can reveal different types of structure commonly found in multimodal documents.

Modelling the structure of a multimodal artefact

University of Helsinki / 2013 / PDF URN CORPUS

In my doctoral dissertation, I developed a framework for studying how language, image and other communicative resources work together in page-based documents. To test the framework, I collected a set of tourist brochures published between 1967 and 2008, which advertised the city of Helsinki.

I compiled the brochures into a multimodal corpus, describing their content, layout, rhetorical organisation and appearance using the Genre and Multimodality model.

I then investigated the corpus for patterns that characterised the tourist brochures as a document genre. I also explored how these patterns have changed over time as a result of new production technologies.

The interface between rhetoric and layout in multimodal artefacts

Literary & Linguistic Computing / 2013 / PDF DOI

In 2011, I gave a two-minute lightning talk about my research at the InterFace 2011 digital humanities conference. On the basis of my presentation, I was invited to contribute to a special section of Literary & Linguistic Computing. This paper develops some ideas about layout and how it guides the interpretation of its contents. I later explored these issues in greater detail in my doctoral dissertation.

Reading paths and visual perception in multimodal research, psychology and brain sciences

Journal of Pragmatics / 2012 / PDF DOI

This article emerged from a side project during my doctoral research. After reading several articles on visual perception, while simultaneously annotating a multimodal corpus containing a rich description of document structure, I wanted to find a way to interface the corpus with an eye-tracker. The article proposes a way of doing so and outlines some research questions that could be investigated using such a combination.

The localisation of advertising print media as a multimodal process

Multimodal Texts from Around the World: Linguistic and Cultural Insights / Palgrave Macmillan / 2012 / PDF DOI

My first publication ever, a chapter in a volume edited by Wendy Bowcher (Sun Yat-Sen University). The chapter discusses localisation, that is, how documents are translated into another language and adapted to the target culture. I argue that the process of localisation must pay equal attention to visual communication and multimodality.


Tekstilingvistiikkaa kääntäjille

2015 / PDF

Lyhyt johdanto tekstilingvistiikkaan kevään 2015 luentokurssilta.

Aineisto, joka esittelee tekstilingvistiikan keskeisiä käsitteitä, on lisensoitu Creative Commons Nimeä - Ei Kaupallinen 4.0 Kansainvälinen -lisenssillä.

Voit siis vapaasti ladata ja jakaa aineistoa, sekä muokata sitä tarpeen mukaan, kunhan mainitset aineiston alkuperän.

Applied Language Technology

2020– / WEB

I have developed two courses on applied language technology for absolute beginners, which are accompanied by interactive learning materials and a YouTube channel with short explanation videos.