 |
NEWS
ITEM

|
19.1.2009
ICLT Lecture
on correcting a PoS-tagged corpus
The next talk in the Icelandic Centre for Language Technology (ICLT)
seminar series will be given at Reykjavik University, Kringlan 1, room
K5, Tuesday January 20th, and starts at 12:00. The speaker is Hrafn
Loftsson, Assistant Professor, from Reykjavik University. The title of
his talk is Correcting
a PoS-tagged corpus using three complementary methods.
The
talk will be given in English if someone in the audience does not
understand Icelandic.
The quality of the part-of-speech (PoS) annotation in a corpus is
crucial for the development of PoS taggers. In this talk, we experiment
with three complementary methods for automatically detecting errors in
the PoS annotation for the Icelandic Frequency Dictionary corpus. The
first two methods are language independent and we argue that the third
method can be adapted to other morphologically complex languages. Once
possible errors have been detected, we examine each error candidate and
hand-correct the corresponding PoS tag if necessary. Overall, based on
the three methods, we hand-correct the PoS tagging of 1,334 tokens
(0.23% of the tokens) in the corpus. Furthermore, we re-evaluate
existing state-of-the-art PoS taggers on Icelandic text using the
corrected corpus.
Hrafn Loftsson graduated with a BSc degree in Computer Science from
University of Iceland in 1989. He received an MSc degree in Computer
Science and Operation Research from Pennsylvania State University in
1992 and a PhD in Natural Language Processing from University of
Sheffield in 2007. Hrafn is an Assistant Professor in the School of
Computer Science at Reykjavik University and sits on the board of the
ICLT.
|
|
|