Sentence Alignment and Word Alignment: Projects, Papers, Evaluation, etc.
Alignment Projects
-
ARCADE I
http://www.up.univ-mrs.fr/veronis/arcade/
First stage included a competition among 6 systems for sentence alignment. ARCADE II to start soon
Some studies on evaluation. Evaluation measures include standard precision, recall and F-measure.
Languages: French.
-
EGYPT
http://www.clsp.jhu.edu/ws99/projects/mt
Includes an word-alignment visualization tool Cairo.
Languages: Arabic, French, Czech, Timorese.
- MULTEXT-EAST
http://nl.ijs.si/ME/CD/mte-home.html
Sentence alignment of Orwell's 1984 in languages from East Europe.
Languages: Bulgarian, Czech, Estonian, Hungarian, Lithuanian, Latvian, Romanian, Russian, Serbo-Croatian, Slovene.
- PLUG
http://numerus.ling.uu.se/~corpora/plug/pwa/
Word Alignment tool available free of charge. (binaries)
Demo available for Swedish-English corpora.
Languages: Swedish.
- GIZA ++
http://www-i6.informatik.rwth-aachen.de/Colleagues/och/software/GIZA++.html
Improvement over GIZA; includes an implementation of the models in F. Och, H. Ney. "Improved Statistical Alignment Models".
Parallel Texts. Data for Text Alignment
Guidelines for Word Alignment
Guidelines for Sentence Alignment
Software for Word Alignment
Software for Sentence Alignment
Papers on Word Alignment Evaluation
-
Word Alignment for Languages with Scarce Resources. Joel Martin and Rada Mihalcea and Ted Pedersen. Proceedings of the ACL 2005 Workshop on "Building and Using Parallel Texts: Data Driven Machine Translation and Beyond", Ann Arbor, MI, June 2005. [pdf]
-
An Evaluation Exercise for Word Alignment. Rada Mihalcea and Ted Pedersen. Proceedings of the HLT-NAACL 2004 Workshop on "Building and Using Parallel Texts: Data Driven Machine Translation and Beyond", Edmonton, Canada, May 2003. [pdf]
-
Evaluation of word alignment systems. Lars Ahrenberg, Magnus Merkel,
Anna Sågvall Hein and Jörg Tiedemann. Proceedings of the Second
International Conference on Linguistic Resources and Evaluation (LREC-2000),
Athens, Greece, 31 May - 2 June, 2000, Volume III: 1255-1261.
[pdf]
Some Papers on Word Alignment
- See the papers from the shared task on word alignment from the HLT-NAACL 2003 Workshop on "Building and Using Parallel Texts: Data Driven Machine Translation and Beyond"
http://www.cs.unt.edu/~rada/wpt
- See Jean Veronis bibliography (up to 1998)
http://www.up.univ-mrs.fr/~veronis/biblios/ptp.htm
-
Using Similarity Scoring to Improve the Bilingual Dictionary for Sub-sentential Alignment, Katharina Probst and Ralf Brown, Proceedings of ACL 2002.
[ps]
(Evaluation done with traditional precision and recall)
-
Improved Statistical Alignment Models, F. Och and H. Ney, Proceedings of ACL 2001.[ps]
-
A Comparison of Alignment Models for Statistical Machine Translation, F. Och and H. Ney, Proceedings of COLING 2000.
[ps].
- A comprehensive bilingual word alignment system. Application to
disparate languages: Hebrew and English; Y. Choueka, et al., in Veronis, J., "Parallel Text Processing: Alignment and use of translation corpora." (Kluwer Academic, 2000).
- A knowledge-lite approach to word alignment; L. Ahrenberg, et al., in Veronis, J., "Parallel Text Processing: Alignment and use of translation corpora." (Kluwer Academic, 2000).
- From sentences to words and clauses; S. Piperidis, et al., in Veronis, J., "Parallel Text Processing: Alignment and use of translation corpora." (Kluwer Academic, 2000).
- Bracketing and aligning words and constituents in parallel text using
Stochastic Inversion Transduction Grammars; D. Wu., in Veronis, J., "Parallel Text Processing: Alignment and use of translation corpora." (Kluwer Academic, 2000).
-
A Simple Hybrid Aligner for Generating Lexical Correspondences in Parallel Texts, Lars Ahrenberg, Mikael Andersson, Magnus Merkel, COLING-ACL 1998.
[ps].
-
Flow Network Models for Word Alignment and Terminology Extraction from Bilingual Corpora, Éric Gaussier, Proceedings of COLING-ACL 1998.
[ps].
-
Line `em up: Advances in Alignment Technology and Their Impact on Translation Support Tools, Elliott Macklovitch, Marie-Louise Hannan, 2nd AMTA, Montreal, Canada, 1996.
[ps].
(Includes some evaluation and error analysis)
-
S. Vogel, H. Ney and C. Tillmann. "HMM-Based Word Alignment in Statistical Translation". In Procs. of the ACL'96, pp. 836-841, Copenhagen Denmark. Aug. 1996.
-
Aligning Noisy Parallel Corpora Across Language Groups: Word Pair Feature Matching by Dynamic Time Warping, Pascale Fung, Kathleen McKeown, AMTA 1994.
[ps].
-
I. Dagan, K. Church and W. Gale. Robust bilingual word alignment for machine aided translation. In Proceedings of the Workshop on Very Large Corpora: Academic and Industrial
Perspectives, pp. 1--8, 1993.
- Several of Dan Melamed's publications
http://www.cs.nyu.edu/~melamed/pubs.html
MT Related Events
Maintained by Rada Mihalcea. Last updated: March 13, 2005.
Thanks to those who contributed with updates/suggestions (listed in chornological order): Tanya Harvey Ciampi, Ruprecht von Waldenfels.