Probabilistic Grammars and Data Oriented Parsing

Former: Statistical Models of Natural Language Processing (NLP)

T-C222 and TW-8528

Information |
Per lecture: Slides -Reading -Homework |
Reading Material |
Other material |
Mid-Term Course
Projects |
Graduation Projects MoL |
---|

**Instructors: Khalil Sima'an****
(part I and II) and**** Detlef Prescher
(part II)**

A course for the following students:

**Taalwetenschap, Geesteswetenschappen (Linguistics)****Kunstmatige Intelligentie, Specialisatie Taalverwerking (Artificial Intelligence)****Master of Logic****(MoL), ILLC**

PLACE: B.235, Nieuwe Achtergracht 166

- Het college wordt in het Engels gegeven (
).*This course will be given in English* - Note that the course structure in 2003 will be
__different than the preceding years__because

of the semester system (department of Linguistics). This means that the course

1) spans a longer period (February - May)

2) consists of two parts: a*introductory*part (3 Feb.- 21 March) and

an*advanced*part (31 March - 23 May).

**For MoL students: a***reserach project in this direction demands*__both parts__

See also

http://www.science.uva.nl/onderwijs/studieprogramma/studiegidsen/CORE/00/3C/D6E.HTML

**Description:**

When computational models of language processing are not constructed in a purely linguistic context,

but aim at being relevant for practical applications or for psychological theory, they ought to be able to

perform tasks like disambiguation and prediction. For this reason, increasingly many models take statistical

properties of a sample corpus into account when they process new input. This course will give an overview

of the most important techniques used in statistical language processing. The course starts out by giving a

short introduction to probability theory, information theory and Bayesian learning, and continues with the

following topics: n-gram statistics and Markov models, smoothing techniques (Good-Turing and Katz

Discounting), Hidden-Markov Models (HMMs), application of HMMs to part-of-speech tagging,

Stochastic Context-Free Grammars (SCFGs) and stochastic parsing algorithms. The last two meetings

will be devoted to more advanced topics in stochastic parsing, possibly including: Bilexical-Dependency

models and Data-Oriented Parsing (DOP) models.

SLIDES OF THE LECTURESbeing updated: see also OLD SLIDES

Primary books:

First half of the course:

* Daniel Jurafsky and James H. Martin. `"SPEECH and LANGUAGE PROCESSING":

An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition.

Prentice-Hall, 2000.

Second half of the course:

* Chris Manning and Hinrich Schütze. "Foundations of Statistical Natural Language Processing",

MIT Press. Cambridge, MA: May 1999. (see http://nlp.stanford.edu/fsnlp/ )

Secondary books (some chapters):

* Eugene Charniak. "Statistical Language Learning", Cambridge, Mass, MIT Press, 1993.

* Tom Mitchell. "Machine Learning", McGraw-Hill Series in Computer Science, 1997.

Other material (see also list below):

* Joshua Goodman and Stanley Chen. "An empirical study of smoothing techniques for language modeling".

Technical report TR-10-98, Harvard University, August 1998.

see also http://research.microsoft.com/~joshuago/

A correction of a small error in the statement of Katz formula in this report can be found here.

* Various recent articles which will be supplied as handouts during the course.

* R. Bod. "Beyond Grammar", Stanford CA, CSLI, 1998.

* R. Bod, R. Scha and K. Sima'an (editors), Data Oriented Parsing, CSLI Publications, 2003.

**Some sources available on line (list is being exapanded):**

Krenn, B. and Samuelsson, Ch. (1997). ``The Linguist's Guide to Statistics".

Place: http://www.coli.uni-sb.de/~krenn/edu.html

Abney, S. (1996). ``Statistical Methods And Linguistics".

Place: http://www.sfs.nphil.uni-tuebingen.de/~abney/Abney_95c.ps.gz

Charniak, E. (1997) `` Statistical techniques for natural language parsing'', AI Magazine

Place: http://www.brown.edu/~ec/papers/aimag97.ps

Shannon, C.E. (1948) ``A Mathematical Theory of Communication".

Place: http://cm.bell-labs.com/cm/ms/what/shannonday/paper.html

Pereira, F. (2000) Formal Grammar and Information Theory: Together Again?

Philosophical Transactions of the Royal Society, 358(1769):1239-1253, April 2000.

Place: http://www.cis.upenn.edu/~pereira/bib.html