My research activities focus on several core NLP problems: (1) text summarization that distills salient content from large numbers of lengthy documents and produces concise summaries, (2) controllable and factual natural language generation that enables knowledge transformation to let machines communicate with humanin with trustworthiness, (3) reasoning that endows large language models to perform accurate logical and math reasoning, (4) narrative understanding that reveals how human values are reflected by story-telling process, and (5) argument mining that makes sense of argumentative content and structure. They share the same objective of dismantling barriers to information consumption and knowledge acquisition for the general public. I also establish and lead interdisciplinary collaborations, e.g., in computational social science and in AI for education.
• To solve the first problem, our work enhances the document encoder with a knowledge graph encoder to connect relevant entities and events as well as maintain global context, such as topic flows (Huang et al., ACL 2020). We further design a question answering-based reward to drive the model to better capture entity-related knowledge using reinforcement learning. This work is the first to employ graph neural networks to explicitly summarize and encode entity-centered knowledge for abstractive summarization.
• To address the second challenge of handling long documents, we propose an efficient encoder-decoder attention and conduct the first systematic study on efficient Transformers for long document summarization (Huang et al., NAACL 2021). In our model, the encoder-decoder attention heads follow a strided pattern and have varying starting positions, to maintain the power of emphasizing important tokens while reducing computational and memory costs. Our model can process documents that are 10 times longer than what previous models can handle and produce more informative summaries. Our recent study further formulates long document summarization as a hierarchical question-summary generation process to support varying information needs (Cao and Wang, ACL 2022).
• Finally, to resolve the model “hallucination” problem, we design a contrastive learning formulation that teaches a summarizer to expand the margin between factual summaries (i.e., positive samples) and their incorrect peers (i.e., negative samples), to improve summary faithfulness and factuality (Cao and Wang, EMNLP 2021). Prior methods that reduce errors in summaries largely use three types of remedies: running a separate error correction model, removing noisy training samples, and creating new architectures on top of Transformer. Our system is end-to-end trained without the need of modifying model architecture. We rely on four types of newly designed strategies to construct negative samples by editing reference summaries via rewriting entity-/relation-anchored text, and using system generated summaries that may contain unfaithful errors. Our model improve summary quality especially on outputs that are more abstractive.
• We propose neural NLG frameworks that use traditional generation components, such as content planning and style selection, to promote the control of content (Hua and Wang, ACL 2018; Hua and Wang, EMNLP 2020) and linguistic style (Hua and Wang, EMNLP 2019) of the produced text. In Hua et al., ACL 2019, we study the task of counterargument generation, where our goal is to generate an argument to refute a given statement on a controversial issue. Our model performs sentence-level content planning via talking point selection and ordering, and style-controlled surface realization based on model predicted language style to produce the final output. We also augment our generation model with passages retrieved from a large-scale search engine, which indexes 12 million articles from Wikipedia and four popular English news media of varying ideological leanings. This ensures our system has access to reliable evidence, high-quality reasoning, and diverse opinions from different sources.
• We further address the challenge of producing coherent long-form text (Hua et al., ACL 2021), where even the large pretrained language models still fall short of producing coherent text due to the lack of efficient content planning and control. One potential issue with employing an explicit content planning component resides in the need for separate training signals, which are often unavailable. Therefore, we propose an end-to-end generation framework based on mixed language models to conduct content selection and ordering as text is produced, without requiring ground-truth content planning labels. Concretely, at each decoding step, our system selects which content to reflect, and predicts a word based on probabilities marginalized over all language models. This system can be built upon large pretrained models and offer an interface for system decision interpretation via the predicted content selection scores.
• I aim to answer “what determines the outcome of a debate? (Wang et al., TACL 2017)" Most efforts of predicting the persuasiveness of debates have focused on linguistic features of the debate speech or on simple measures of topic control. In an ideal setting, however, we would expect the winning side to win based on the strength and merits of their arguments, not based on their skillful deployment of linguistic style. I hypothesize that, within a debate, some topics will be inherently more persuasive when deployed by one side than the other, such as the execution of innocents for those opposed to the death penalty, or the gory details of a murder for those in favor of it. I thus develop a latent variable model that simultaneously infers the latent persuasive strength inherent in debate topics and how it differs between opposing sides, as well as captures the interactive dynamics between topics of different strengths and the linguistic structures with which these topics are presented.
• Moreover, I design data efficient methods such as transfer learning and active learning (Hua and Wang, ACL findings 2022) to parse peer reviews, which is the cornerstone of scientific discovery.
• Our study (Fan et al., EMNLP 2019) finds that the most important and subtle way by which media shape the views of their readers is through content selection or omission. Therefore, we examine and detect media bias that may occur as a result of systematic manipulation of news via the selection and organization of contents in each article. We have annotated news stories with bias spans, and released the first dataset containing 300 articles from media of different ideological leanings.
• I also aim to create general-purpose tools for analyzing ideological content, to be used by researchers and practitioners in the broad communities. We thus study pretraining techniques to create representations that can better discern the embedded ideological content for different genres of text (Liu et al., NAACL findings 2022). Concretely, for pretraining, we design an ideology objective operating over clusters of same-story articles to compact articles with similar ideology and contrast them with articles of different ideology. Our model outperforms strong comparisons on 8 out of 11 ideology prediction and stance detection tasks, using datasets covering congressional speech, news articles, social media comments, and tweets.