Lumping, Splitting, and Natural Language Processing
by Orin
Hargraves Abstract:
Ever since
machine-readable dictionaries (MRDs) became available, there have been attempts
to incorporate them into natural language processing (NLP) tasks, including information
extraction, question answering, and summarization. The assumption has been that
NLP software, by incorporating a dictionary, should be able to take advantage
of the thousands of man-hours of lexical analysis that is represented by a
dictionary, particularly to help with the central NLP task of word sense
disambiguation. In practice, however, MRDs have proved disappointing to many in
the NLP field and have even been denounced as useless by some. This paper looks at
1) what elements of dictionary data are of
use to an NLP system
2) how dictionary definition structure can
abet or hinder NLP
3) some features of contemporary
dictionaries that make them particularly good or bad candidates for use in NLP.
Consideration is also given to what an ideal MRD-for-NLP dictionary would look like, and
whether such a dictionary could serve the needs of both human and machine user.
Finally, several current tools available to lexicographers will be surveyed
with a view to their usefulness in bridging the gap between NLP and dictionary
databases more effectively.
|
 Updating...
Orin Hargraves, Feb 5, 2010, 1:26 PM
|