Active Learning and Crowdsourced Datasets (Sentiment Analysis For Tweets)

Terms of use

Please cite the following papers if you use any part of this dataset:

@article{DBLP:journals/pvldb/MozafariSFJM14,
   author = {Barzan Mozafari and
            Purnamrita Sarkar and
            Michael J. Franklin and
            Michael I. Jordan and
            Samuel Madden},
   title = {Scaling Up Crowd-Sourcing to Very Large Datasets: {A} Case for Active Learning},
   journal = {{PVLDB}},
   volume = {8},
   number = {2},
   pages = {125--136},
   year = {2014},
}

@article{DBLP:journals/corr/abs-1209-3686,
   author = {Barzan Mozafari and
            Purnamrita Sarkar and
            Michael J. Franklin and
            Michael I. Jordan and
            Samuel Madden},
   title = {Active Learning for Crowd-Sourced Databases},
   journal = {CoRR},
   volume = {abs/1209.3686},
   year = {2012},
}

Questions For all inquiries please contact mozafari AT umich.edu

Datasets

The format of each dataset is provided in the corresponding .desc file. The binary files need to be loaded with Matlab. For ease of use I have the MATLAB file that loads the datasets available, named CrowdManager.m

CrowdManager.m
face4.crd
face4.csv
face4.desc
face4.details
face4.vis
index.html
README
tweets100k.desc
tweets100k.details
tweets100k.dict
tweets100k.vis
tweets10k.crd
tweets10k.desc
tweets10k.details
tweets10k.dict
tweets10k.vis