Machine learning techniques for word sense disambiguation

Tesis doctoral de Gerard Escudero Bakx

In the natural language processing (nlp) community, word sense disambiguation (wsd) has been described as the task which selects the appropriate meaning (sense) to a given word in a text or discourse where this meaning is distinguishable from other senses potentially attributable to that word. These senses could be seen as the target labels of a classification problem. That is, machine learning (ml) seems to be a posible way to tackle this problem. this work studies the possible application of the algorithms and techniques of the machine learning field in order to handle the wsd task. the first issue treated has been the adaptation of alternative ml algorithms to deal with word senses as classes. Then, a comparison of these methods is performed under the same conditions. The evaluation measures applied to compare the performances of these methods are the typical precision and recall, but also agreement rates and kappa statistics. the second topic explored is the cross-corpora application of supervised machine learning systems for wsd to test the generalisation ability across corpora and domains. The results obtained are very disappointing, seriously questioning the possibility of constructing a general enough training corpus (labelled or unlabelled), and the way its examples should be used to develop a general purpose word sense tagger. the use of unlabelled data to train classifiers for word sense disambiguation is a very challenging line of research in order to develop a really robust, complete and accurate word sense tagger. Due to this fact, the next topic treated in this work is the application of two bootstrapping approaches on wsd: the transductive support vector machines and the greedy agreement bootstrapping algorithm by steven abney. during the development of this research we have been interested in the construction and evaluation of several wsd systems. We have participated in the last two editions of the english lexical sample task of sen

 

Datos académicos de la tesis doctoral «Machine learning techniques for word sense disambiguation«

  • Título de la tesis:  Machine learning techniques for word sense disambiguation
  • Autor:  Gerard Escudero Bakx
  • Universidad:  Politécnica de catalunya
  • Fecha de lectura de la tesis:  13/07/2006

 

Dirección y tribunal

  • Director de la tesis
    • Lluís Márquez Villodre
  • Tribunal
    • Presidente del tribunal: horacio Rodríguez hontoria
    • mark Stevenson (vocal)
    • walter Daelemans (vocal)
    • eneko Agirre bengoa (vocal)

 

Deja un comentario

Tu dirección de correo electrónico no será publicada. Los campos obligatorios están marcados con *

Scroll al inicio