Lu Wang, CSE, University of Michigan

U.S. government reports with 6,153 human annotated question-summary hierarchies.
DATA
This corpus is distributed together with:
HIBRIDS: Attention with Hierarchical Biases for Structure-aware Long Document Summarization
Shuyang Cao and Lu Wang
Proceedings of Annual Meeting of the Association for Computational Linguistics (ACL), 2022.

Peer reviews from the AMPERE dataset that are annotated with argument relations of SUPPORT and ATTACK.
DATA
This corpus is distributed together with:
Efficient Argument Structure Extraction with Transfer Learning and Active Learning
Xinyu Hua and Lu Wang
Findings of the Association for Computational Linguistics (Findings of ACL), 2022

3.7 million news articles of 11 media outlets with different ideological leanings. 1.1 million stories with each story cluster containing articles from different media.
DATA
This corpus is distributed together with:
POLITICS: Pretraining with Same-story Article Comparison for Ideology Prediction and Stance Detection
Yujian Liu, Xinliang Frederick Zhang, David Wegsman, Nick Beauchamp, and Lu Wang
Findings of the Association for Computational Linguistics (Findings of NAACL), 2022

5k open-ended questions labeled with a new question type ontology and open-ended question generation datasets collected from Reddit and Yahoo.
DATA
This corpus is distributed together with:
Controllable Open-ended Question Generation with A New Question Type Ontology
Shuyang Cao and Lu Wang
Proceedings of Annual Meeting of the Association for Computational Linguistics (ACL), 2021.

19.5k U.S. government reports with expert-written long-form abstractive summaries.
DATA
This corpus is distributed together with:
Efficient Attentions for Long Document Summarization
Luyang Huang, Shuyang Cao, Nikolaus Parulian, Heng Ji, and Lu Wang
Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2021.

NLG datasets for arguments and Wikipedia articles.
DATA
This corpus is distributed together with:
Sentence-Level Content Planning and Style Specification for Neural Text Generation
Xinyu Hua and Lu Wang
Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019.

News articles labeled with lexical bias and informational bias on phrase-level and sentence-level.
DATA
This corpus is distributed together with:
In Plain Sight: Media Bias through the Lens of Factual Reporting
Lisa Fan, Marshall White, Eva Sharma, Ruisi Su, Prafulla Kumar Choubey, Ruihong Huang, and Lu Wang
Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), short paper, 2019.

CMV arguments with collected relevant arguments from mainstream media of different ideological leanings.
DATA
This corpus is distributed together with:
Argument Generation with Retrieval, Planning, and Realization
Xinyu Hua, Zhe Hu, and Lu Wang
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL), 2019.

1.3 million U.S. patent documents paired with abstractive summaries.
DATA
This corpus is distributed together with:
BIGPATENT: A Large-Scale Dataset for Abstractive and Coherent Summarization
Eva Sharma, Chen Li, and Lu Wang
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL), short paper, 2019.

Peer reviews collected from machine learning conferences that are annotated with argument types of EVALUATION, REQUEST, FACT, REFERENCE, or QUOTE.
DATA
This corpus is distributed together with:
Argument Mining for Understanding Peer Reviews
Xinyu Hua, Mitko Nikolov, Nikhil Badugu, and Lu Wang
Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), short paper, 2019

Arguments and counter-arguments related with politics and policy, collected from reddit.com/r/changemyview.
DATA
This corpus is distributed together with:
Neural Argument Generation Augmented with Externally Retrieved Evidence
Xinyu Hua, Lu Wang
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL), 2018.

Twitter conversations collected from TREC and from 2016 U.S. election.
DATA (.zip)
README (.txt)
This corpus is distributed together with:
Microblog Conversation Recommendation via Joint Modeling of Topics and Discourse
Xingshan Zeng, Jing Li, Lu Wang, Nicholas Beauchamp, Sarah Shugars, and Kam-Fai Wong
Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2018.

Arguments collected from idebate.org, annotated with different types.
DATA
This corpus is distributed together with:
Understanding and Detecting Supporting Arguments of Diverse Types
Xinyu Hua, Lu Wang
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL), short paper, 2017.

Movie critics and their consensus from Rotten Tomatoes.
Online arguments from idebate.org.
DATA (.zip)
README (.txt)
This corpus is distributed together with:
Neural Network-Based Abstract Generation for Opinions and Arguments
Lu Wang and Wang Ling
Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2016.

New York Times, CNN, and BBC news articles and user comments on four major events happened in 2014.
New York Times news articles and user comments in 2013.
DATA (.zip)
README (.txt)
This corpus is distributed together with:
Socially-Informed Timeline Generation for Complex Events
Lu Wang, Claire Cardie, and Galen Marchetti
Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2015.

Dispute and non-dispute discussions from Wikipedia talkpages.
DATA (.zip)
README (.txt)
This corpus is distributed together with:
A Piece of My Mind: A Sentiment Analysis Approach for Online Dispute Detection
Lu Wang and Claire Cardie
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL), 2014.