Saturday, May 22, 2010

Natural Language Processing & Applications Morphology

1 Introduction
As used in this module, one use of the word ‘grammar’ is to refer to systematic patterns both
within words and within sentences. The grammar of a language thus includes both the
morphology of words and the syntax of sentences. Languages tend to ‘trade off’ these
components against one another. Some Indo-European languages (Russian being a notable
example) have retained the complex morphology which is thought to have been a feature of
the original Indo-European language. Others, such as English and Persian (Farsi), have less
complex morphology but more rigid syntax (rules ordering elements within sentences). Thus
the word order in the English sentence:
The dog saw the cat.
is absolutely fixed. Swapping the dog for the cat produces an equally grammatical sentence
but one which has a different meaning. Any other word order is unacceptable (although
words such as the can be omitted in ‘shorthand’ forms, such as newspaper headlines). In
Modern Greek, the equivalent sentence (transcribed into the Latin alphabet) is:
O skilos eida ti gata.
To say the cat saw the dog the sentence must become:
I gata eida to skilo.
When the dog is the ‘subject’ of the sentence it is o skilos ; when it is the ‘object’ of the sen-
tence it becomes to skilo . Similarly i gata changes to ti gata as it moves from subject to
object. Thus in Greek if we take the first sentence and reverse the order of o skilos and ti
gata, giving:
Ti gata eida o skilos.
the sentence still means that the dog saw the cat. The three elements of the sentence (subject,
verb and object) can be placed in any order without altering the basic meaning. The morphol-
ogy of the words (primarily but not exclusively their endings) makes clear who is doing what
to whom. Word order merely emphasizes different elements. To get the effect of the last sen-
tence above in English, we would need to completely re-cast it. Two possibilities are:
It was the cat that the dog saw.
As for the cat, the dog saw it.
The history of many Indo-European languages seems to be one in which a combination of
complex morphological differences with fairly free word order has given way to smaller
morphological differences accompanied by rigid word ordering. Classical Latin or Greek and
Modern Russian stand at one end of this spectrum; Modern Greek somewhere in the middle;
French and English at the other end.
Some definitions are needed. Words can often be divided into morphemes . Thus inputs
(whether noun or verb) can be divided into three morphemes: in + put + s . Note that ALL
parts of the word are called morphemes: put is the root morpheme to which the THREE
affixes in and s are added. Affixes can be prefixes , infixes or suffixes , i.e. can be added to
the start of a word, inserted into a word or added to the end of a word. In English, infixes are
rare, although they are common in other languages (Arabic for example). 1
Some authors define morpheme as the smallest unit of meaning. There are some objections to
this definition, which is why I never use it.
• Units which alter meaning are not necessarily morphemes. The pluralization of man to.....................................

No comments:

Post a Comment