Lu Wang, CSE, University of Michigan

Research Overview

My research activities focus on training and using large language models (LLMs) to answer the following language-related questions:

(1) Trustworthy LLMs: How to build models to generate factual and attributable content (Cao and Wang, EMNLP 2021; Liu et al, EMNLP 2024)? And how to calibrate their confidence based on what they know and what they don't know (Liu et al., ICLR 2024)? I share my thoughts on factuality of GenAI in this article and on AI safety and its usage in this webinar.

(2) Reasoning: How to train models with improved reasoning skills using self-verification and step-wise rewards (Zhang et al., ACL findings 2024; Khalifa et al., EMNLP findings 2023)?

(3) Evaluating LLMs: How to evaluate models' performance on challenging and in-the-wild tasks beyond traditional benchmarks with multi-choice or short references (Jabbour et al., arXiv 2025; Zhang et al., arXiv 2025; Bayat et al., arXiv 2025)? How to evaluate specific properties of LLMs, such as long-context understanding (Zou et al., arXiv 2025)? Some of my thoughts on AI evaluation can be found in this article.

(4) Narrative understanding: How human values are reflected in the story-telling processes and how does that influence the target audience (Zhang et al., NAACL 2024; Wu et al., EMNLP 2023)? And whether LLMs can discern values of human perspectives in narratives (Lee et al., arXiv 2025)?

For core natural language processing (NLP) problems, I have been building summarization systems for inputs of long documents (Huang et al., NAACL 2021) and from multiple sources (Peper et al., NAACL 2024) and developing controllable generation techniques (Liu et al., ACL 2023).

I am also interested in building AI applications to achieve domain impacts, including developing argument mining models (Hua and Wang, ACL findings 2022) to support writing assistants building (Nair et al., EMNLP 2024) and using information extraction and sentiment analysis models to understand how media informs and persuades the public by selecting and packaging information (Fan et al., EMNLP 2019).

Ongoing Projects

Argument Graph Supported Multi-Level Approach for Argumentative Writing Assistance, National Science Foundation (NSF).
CAREER: Long Document Summarization with Question-Summary Hierarchy and User Preference Control, National Science Foundation (NSF).
Computational Modeling of Narrative Representation, Shaping, and Influence, Air Force Office of Scientific Research (AFOSR).
Collaborative Research: III: Small: Entity- and Event-driven Media Bias Detection, National Science Foundation (NSF).
Effective and Fine-grained Feedback for Enhanced Language Model Reasoning and Alignment, LG AI Research.
Multi-Document Reasoning with Large Language Models, Cisco Systems.

Past Projects

Collaborative Research: From User Reviews to User-Centered Generative Design: Automated Methods for Augmented Designer Performance, National Science Foundation (NSF).
Knowledge-grounded Scientific Reasoning, LG AI Research.
RI: Small: Collaborative Research: Computational Methods for Argument Mining: Extraction, Aggregation, and Generation, National Science Foundation (NSF).
Reasoning with Large Models, LG AI Research.
Machine Translation for English Retrieval of Information in Any Language (MATERIAL), Intelligence Advanced Research Projects Activity (IARPA).
Measuring the Effects of Peer Sharing on Fake and Polarized News Consumption, Social Science Research Council (SSRC).
CRII: RI: Towards Abstractive Summarization of Meetings, National Science Foundation (NSF).