Wikipedia Disputed Discussion Corpus (first released on January 2016) URL: http://www.ccs.neu.edu/home/luwang/ This corpus is distributed together with: A Piece of My Mind: A Sentiment Analysis Approach for Online Dispute Detection Lu Wang and Claire Cardie. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL), short paper, 2014. ==== Content ==== I. Description of the datasets II. Contact ==== I. Description of the datasets ==== There are two json files under directory /PATH/TO/dispute_wikipedia: 1) dispute_discussions.json It contains 3609 disputed discussions collected from Wikipedia talkpages. The description of data collection can be found in our ACL 2014 paper. 2) non_dispute_discussions.json It contains 3609 discussions sampled from Wikipedia talkpages that are never tagged with disputes by the time this dataset was collected. >> Data structure For each file, it contains a list of discussions. Each discussion has fields of _page_name: the name of Wikipedia talkpage, e,g, Discussion:1066 Granada massacre _page_ID: an integer number (unique) _discussion_name: the discussion title on Wikipedia Talkpage, e.g. Islam and antisemitism _discussion_ID: an integer number (unique) _lines: content of the discussion. Each line has fields of "_line_number", "_text", "_user", and "_timestamp". User and timestamp are parsed according to Wikipedia templates (https://en.wikipedia.org/wiki/Wikipedia:Signatures). ==== II. Contact ==== Should you have any questions, please contact luwang@ccs.neu.edu (Lu Wang).