Tasks
01. English all words
02. Italian all words
03. Basque lexical sample
04. Catalan lexical sample
05. Chinese lexical sample
06. English lexical sample
07. Italian lexical sample
08. Romanian lexical sample
09. Spanish lexical sample
10. Automatic subcategorization acquisition
11. Multilingual lexical sample
12. WSD of WordNet glosses
13. Semantic Roles
14. Logic Forms
15. Swedish lexical sample
16. Semantic roles for Swedish
The figures next to each task refer to the number of teams who responded to the call for interest in participation. Senseval-3 is still open to all. The call for participation will come out in February 2004.
English all words [64 teams]
As we did for Senseval2, we will tag approximately 5000 words
of coherent Penn Treebank text with WN 1.7.1 tags. We will tag
all of the predicating words and the head words of their arguments,
and as many adjectives and adverbs as we can. We will do
double-blind tagging with adjudication.
Coordinator: Martha Palmer mpalmer@cis.upenn.edu
Italian all words[7 teams]
In addition to the lexical sample task, we propose an "all words" task for
Italian.
Each participant will be provided with a relatively small set extracted from
the Italian Treebank, consisting of about 5000 words.
The
content words (nouns, verbs, and adjectives and a small set of proper nouns)
will be semantically tagged according to the sense repository of
ItalWordNet. Participants to the Italian All Words task can obtain
"ItalWordNet for
Senseval-3" from ELDA (Evaluations and Language resources Distribution
Agency) by contacting Ms Valérie Mapelli at mapelli@elda.fr, who will
inform you on the licensing and delivery procedure
Coordinators:
Nicoletta Calzolari (ILC-CNR, Pisa, Italy - glottolo@ilc.cnr.it)
Bernardo Magnini (ITC-irst, Trento, Italy - magnini@itc.it)
Basque lexical sample[8 teams]
We propose a "Lexical-Sample" task for Basque in order to evaluate
supervised and semi-supervised learning systems for WSD. Each participant
will be provided with a relatively small set of labelled examples (2 thirds
of
75+15*senses+7*multiwords) and a comparatively very large set of unlabelled
examples (ten times more when possible) for around 40 words. The test set
will be comprised with one third of 75+15*senses+7*multiwords. We target at
two types of participants: supervised systems (not using unlabelled data)
and semi-supervised systems (those taking profit from the unlabelled data),
but unspervised systems can also participate, of course. The sense inventory
will be manually linked to WordNet 1.6 (automatic links to WordNet 1.7
will be also provided). This task will be coordinated with other
lexical-sample tasks (Catalan, English, Italian, Romanian, Spanish) in order
to share around 10 of the target words.
Coordinator: Eneko Agirre eneko@si.ehu.es
Catalan lexical sample[8 teams]
We propose a "Lexical-Sample" task for Catalan in order to evaluate
supervised and semi-supervised learning systems for WSD. Each participant
will be provided with a relatively small set of labelled examples (2 thirds
of 75+15*#senses) and a comparatively very large set of unlabelled
examples (ten times more, when possible) for around 45 words. The test set
will be comprised with one third of 75+15*#senses. We target at
two types of participants: supervised systems (not using unlabelled data)
and semi-supervised systems (those taking profit from the unlabelled data),
but unspervised systems can also participate, of course. The sense inventory,
which is specially developed for the task, will be manually linked to
WordNet 1.6 (automatic links to WordNet 1.7 will be also provided).
This task will be coordinated with other lexical-sample tasks (Basque,
English, Italian, Romanian, Spanish) in order to share around 10 of the
target words.
Coordinators:
Lluís Màrquez (lluism@lsi.upc.es)
M. Antonia Marti (amarti@ub.edu),
Mariona Taule (mtaule@uoc.edu)
Chinese lexical sample[16 teams]
The mainland Chinese lexical sample task will consist of three sets of data: dictionary, training data, and test data. The dictionary will contain entries for 20 different Chinese words. For each word, several senses will be defined based on HowNet knowledge base. For each sense, the dictionary entry will list: an id for the sense, a part of speech tag, a definition, and an English translation, as well as some additional information regarding the sense distinctions. Training data will consist of 20-100 examples per word, with more examples for words with larger number of senses. Two sets of training data will be provided: one with part of speech tagging information included, and one without. A part of speech tagging system will be also provided. Evaluation data will consist of about half the number of examples in the training data.
Coordinators:
PengYuan Liu, pyliu@mtlab.hit.edu.cn
English lexical sample[65 teams]
The goal of this task is to create a framework for the evaluation of systems that perform Word Sense Disambiguation. The data will be collected via the Open Mind Word Expert (OMWE) interface. To ensure reliability, we collect at least two tags per item, and conduct inter-tagger agreement and replicability tests. Previously performed evaluations have proved the high quality and usefulness of the OMWE data. By the time Senseval-3 will take place, we estimate to have enough data for about 60 ambiguous nouns, adjectives, and verbs. Part of the test data will be created by lexicographers from the Department of Linguistics at UNT. Another part of the test data will be extracted from the sense tagged corpus collected over the Web.
We will use WordNet 1.7.1 as sense inventory for nouns and adjectives, and Wordsmyth for verbs. We will provide sense maps to enable both fine grained and coarse grained evaluations.
A mapping between Worsmyth and WordNet verb entries is now available, and it is included in the English lexical sample training/test data distribution.
Coordinators:
Rada Mihalcea, rada@cs.unt.edu
Adam Kilgarriff, Adam.Kilgarriff@itri.brighton.ac.uk
Tim Chklovski, timc@mit.edu
Italian lexical sample[11 teams]
We propose a "Lexical-Sample" task for Italian in order to evaluate supervised and
semi-supervised learning systems for WSD. Each participant will be provided with a
relatively small set of labelled examples (2 thirds of 75+15*#senses) and a
comparatively very large set of unlabelled examples (ten times more, when possible)
for around 45 words. The test set will be comprised with one third of 75+15*#senses.
We target at two types of participants: supervised systems (not using unlabelled data)
and semi-supervised systems (those taking profit from the unlabelled data), but
unsupervised systems can also participate, of course.
The sense inventory, called "Italian MultiWordNet for Senseval-3" has been
specially developed for the task. This task will be coordinated with other
lexical-sample tasks (Basque, English, Catalan, Romanian, Spanish) in order to share around 10 of the target words.
Participants in the Italian Lexical Sample task can get "Italian MultiWordNet for Senseval-3" for free, contacting Alessandro Vallin (vallin@itc.it), who will send the license agreement form and the information to download the resource.
Coordinators:
Nicoletta Calzolari (ILC-CNR, Pisa, Italy - glottolo@ilc.cnr.it)
Bernardo Magnini (ITC-irst, Trento, Italy - magnini@itc.it)
Romanian lexical sample[8 teams]
A lexical task for Senseval-3 that addresses the Romanian language. We will select about 50 words, covering all open class parts of speech, with various degrees of ambiguity, and for each such word collect a set of examples from a large Romanian corpus. The number of examples per word will be determined using the 15n+10m+75 formula used during Senseval-1 and Senseval-2 (n = number of senses, m = number of multi-word expressions). The senses and multi-word expressions for each ambiguous word will be taken from the new Romanian WordNet, or DEX (a widely recognized dictionary of the Romanian language). The data will be collected via the Open Mind Word Expert (Romanian edition). A comparatively very large set of unlabelled examples (ten times more, when possible) will be also provided. This task will be coordinated with other lexical-sample tasks (Basque,
Catalan, English, Italian, Spanish) in order to share around 10 of the target words.
Coordinators:
Rada Mihalcea, rada@cs.unt.edu
Vivi Nastase, vnastase@site.uottawa.ca
Dan Tufis, tufis@racai.ro
Tim Chklovski, timc@mit.edu
Spanish lexical sample[18 teams]
[webpage]
We propose a "Lexical-Sample" task for Spanish in order to evaluate
supervised and semi-supervised learning systems for WSD. Each participant
will be provided with a relatively small set of labelled examples (2 thirds
of 75+15*#senses) and a comparatively very large set of unlabelled
examples (ten times more, when possible) for around 45 words. The test set
will be comprised with one third of 75+15*#senses. We target at
two types of participants: supervised systems (not using unlabelled data)
and semi-supervised systems (those taking profit from the unlabelled data),
but unspervised systems can also participate, of course. The sense inventory,
which is specially developed for the task, will be manually linked to
WordNet 1.6 (automatic links to WordNet 1.7 will be also provided).
This task will be coordinated with other lexical-sample tasks (Basque,
Catalan, English, Italian, Romanian) in order to share around 10 of
the target words.
Coordinators:
Lluís Màrquez (lluism@lsi.upc.es),
M. Antonia Marti (amarti@ub.edu),
Mariona Taule (mtaule@uoc.edu)
Automatic subcategorization acquisition[35 teams]
[
webpage]
This task involves evaluating word sense disambiguation (WSD) systems in
the context of automatic subcategorization acquisition. Our task will
restrict to a set of 30 verbs. These are "hard" verbs: high in frequency
and with multiple senses. The participants will be given the list of verbs
in advance to allow a training phase (no training data will be made
available). We will provide the test corpus. This will contain around 1000
instances of each verb, which the participants will be expected to
annotate with WordNet 1.7.1 senses. After receiving the sense annotated
data, we will map the detected WordNet senses to our senses, which are
based on broad Levin style verb classes. We will feed the sense annotated
data from each system to Anna Korhonen's subcategorization acquisition
software. The acquired frames will be evaluated against manually obtained
gold standard frames, which will yield a ranking of the WSD systems.
Coordinators:
Judita Preiss (Judita.Preiss@cl.cam.ac.uk)
Anna Korhonen (Anna.Korhonen@cl.cam.ac.uk)
Multilingual lexical sample[23 teams]
The goal of this task is to create a framework for the evaluation of systems that perform Machine Translation, with a focus on the translation of ambiguous words. The task will be very similar to the lexical sample task, except that rather than using the sense inventory from a dictionary we will follow the suggestion of Resnik and Yarowsky and use the translations of the target words into a second language as the "inventory". The contexts will be in English, and the tags for the target words will be their translations in a second language.
We plan to select words with various degrees of "interlingual-ambiguity", to create a complete picture of the various problems that may appear in this task. At the moment, we plan on two pairs of languages, English-French, and English-Hindi, with an estimated number of about 50 ambiguous words per language pair. The data will be collected via the Open Mind Word Expert (bilingual edition).
Coordinators:
Tim Chklovski, timc@mit.edu
Rada Mihalcea, rada@cs.unt.edu
Ted Pedersen, tpederse@d.umn.edu
Amruta Purandare, pura0010@d.umn.edu
Word-Sense Disambiguation of WordNet Glosses [36 teams]
[webpage]
Trial data: available from the task webpage
In connection with WordNet 2.0 (George Miller et
al.) and eXtended WordNet
(XWN, Dan Moldovan et al.), a large number of the WordNet glosses are being
hand-tagged. Each content word (noun, verb, adjective, and adverb) is being
labelled with their WordNet senses. This manual effort is time-consuming and
energy intensive. The Senseval-3 task is to perform this tagging automatically
using all hand-tagged glosses from XWN as the test set, with the hand-tagging
also serving as the gold standard for evaluation. The task will be performed as
an "all-words" task, except that no context will be provided.
However, it is expected that participants will make use of additional WordNet
information (synset, the WordNet hierarchy, and other WordNet relations) in
their disambiguation. This task is intended to promote the exploitation of
ordinary dictionary definitions in machine-readable dictionaries.
Coordinator: Ken Litkowski
(ken@clres.com)
Automatic Labeling of Semantic Roles [36 teams]
[webpage]
Trial data: available from the task webpage
Word-sense disambiguation has frequently been criticized as a task in
search of a reason. Heretofore, the focus of disambiguation has been on
the sense inventory and has not examined the major reason why we would
have lexical knowledge bases: how the meanings would be represented and
thus, available for use in natural language processing applications. An
important baseline study for automatic labelling of semantic roles
(following the FrameNet paradigm) has recently appeared in the
literature ("Automatic Labeling of Semantic Roles" by Daniel Gildea and
Daniel Jurafsky). The FrameNet project has put together a body of
hand-labeled data and this study has put together a set of suitable
metrics for evaluating the performance of an automatic system. The
proposed Senseval-3 task would call for the development of systems to
meet the same objectives as the Gildea and Jurafsky study. The data for
this task would be a sample of the FrameNet hand-annotated data.
Evaluation of systems would follow the metrics of the Gildea and
Jurafsky study.
Coordinator: Ken Litkowski (ken@clres.com)
Identification of Logic Forms in English[26 teams]
[webpage]
[mailing list]
Trial data: available from the task webpage
Automated reasoning is one major goal of humankind, but lately only
little attention has been paid to the task of automatically creating
reliable logic forms. Natural language based representations are more
powerful when predicates are disambiguated. This task is complementary to
the mainstream task in Senseval The goal is to transform English
sentences into a first order logic notation. A predicate corresponds to
each content word, conjunctions and prepositions and arguments have
syntactic values. Guidelines and examples of logic form will be provided
to participants. The performance of the systems will be evaluated at
sentence and predicate level, using precision and recall measures
determined against the gold standard, which will consist of logic forms
created by human annotators.
Coordinator: Vasile Rus (vasile@cs.iusb.edu)
Swedish lexical sample [4 teams]
canceled?
A lexical sample task for Swedish, similar in spirit with the Swedish task organized for Senseval-2.
Coordinator: Dimitrios Kokkinakis, Dimitrios.Kokkinakis@svenska.gu.se
Identification of Semantic Roles in Swedish[2 teams]
canceled?
Organize a task based on "semantic roles", using
labels such as "Agent", "Recipient", "Material", "Phenomenon",
"Location" etc. In order to do this type of semantic role annotation
there is a requirement for syntactic tagged texts which we are
willing to provide from our treebank for the task (thus potential
participants will use a uniform syntactic annotation).
Coordinator: Dimitrios Kokkinakis, Dimitrios.Kokkinakis@svenska.gu.se