Research Projects
I advise MSc theses at University of Helsinki (primarily at the Department of Digital Humanities, others departments also possible), and research projects, previously through Aalto CS-E4875 and CS-E4000. Most projects are about machine learning/deep learning techniques, natural language processing, and their applications in healthcare and social good. Deep learning methods include recent neural architectures and learning paradigms such as multitask learning and federated learning. Projects focus on multilingual NLP (e.g., multilingual instruction tuning of large language models, and machine translation), healthcare (e.g., clinical text analytics, electronic health records, and AI-assisted diagnosis) and social good (e.g., mental disorder detection from social media, toxicity/abusive text/cyberbullying detection, sentiment analysis).
Available Topics
Topic 1: AI for Social Good
Background: AI for social good (AI4SG) is a research field that focuses on tackling important social, environmental, and public health challenges that exist today using AI. This topic focuses more on social and public health challenges. Specifically, this topic develops deep learning for proactive social care such as early detection of mental illness (e.g, anxiety, depression and suicidal ideation) expressed in social media, which can raise early alert for effective prevention. Abusive text detection from social text is also one possible task under this topic. Abusive text detection classifies social text which may contain various toxic information such hate speech, cyberbullying, aggressiveness, and offensiveness.
Prerequisites
Each topic has similar prerequisites including, but are not limited to:
- Good knowledge of deep learning;
- Programming skills with deep learning frameworks (e.g., PyTorch);
- Experience with LaTex typesetting and Linux servers.
Past Projects
- Deep learning for medical code assignment from clinical notes (2020-2022)
- Deep model fusion in federated learning (2020-2021)
- Conversational/multimodal sentiment analysis (2020-2021)
- NLP for mental health (e.g, depression detection and suicidal ideation detection) (2021-2023)
- Adverse drug event detection and extraction (2021-2022)
- Multilingual complex named entity recognition at SemEval shared tasks (2021-2022)
- Risk adjustment for healthcare plan payment (2019-2020)
MSc Thesis Advising
-
Ya Gao (MSc, Aalto University, now PhD candidate at Aalto University)
Joint entity and relation extraction via contrastive learning on knowledge-augmented graph embeddings, 2023 -
Tuulia Denti (MSc, Aalto University, jointly with HUS, now Data Analyst at HUS)
Natural Language Processing with Topic Models for Clinical Texts of Prostate Cancer Patients, 2022. -
Wei Sun (MSc, Aalto University, jointly with HUS, now at PhD candidate KU Leuven, Belgium)
Extracting Medical Entities from Radiology Reports with Ontology-based Distant Supervision, 2022.
BSc Thesis Supervision
Previous Projects
- Risk adjustment for health plan payment (2019 Winter)
- Deep learning for cyberbullying detection (2020 Summer)
- Pretrained language models for diagnosis code prediction (2020 Summer)
- Federated learning (2020 Fall)
- Depression detection from social content (2021 Spring)
- Biomedical text classification (2022 Spring)
Published Project Reports
Here is a list of some project reports published in scientific venues after some revisions of the original reports.
- Ya Gao, Shaoxiong Ji, Tongxuan Zhang, Prayag Tiwari and Pekka Marttinen. Contextualized Graph Embeddings for Adverse Drug Event Detection. ECML-PKDD, 2022.
- Aapo Pietiläinen and Shaoxiong Ji. AaltoNLP at SemEval-2022 Task 11: Ensembling Task-adaptive Pretrained Transformers for Multilingual Complex NER. Proceedings of International Workshop on Semantic Evaluation (SemEval), 2022.
- Luna Ansari, Shaoxiong Ji, Qian Chen, and Erik Cambria. Ensemble Hybrid Learning Methods for Automated Depression Detection. IEEE Transactions on Computational Social Science, 2022.
- Wei Sun, Shaoxiong Ji, Erik Cambria, and Pekka Marttinen. Multitask Recalibrated Aggregation Network for Medical Code Prediction. ECML-PKDD, 2021.