!!!PAGE for 2004/2005 (see later years at http://staff.science.uva.nl/~simaan).

Lectures
(Slides, Reading, Homework etc) |
Reading
Material (Books etc) |
Description |
Lecturer: Khalil Sima'an |
MID-TERM
PROJECTS |
Student
Presentations!! Material etc. |
Final
Projects |
Grades |

Place and Time: Blok A, Sep 6 till Oct 22:
Friday 9:00-11:00 room P.018Blok B, Nov 5 till Dec 17: Friday 11:00-13:00 room I.203 Except for Dec 10: Friday
11:00-13:00 room A.728Course and Reading material: See special page. |

- Introduction
and Motivation (Why probabilistic models for language and speech
processing?)

**Read**chapters 1 and 2 of Manning &Scheutze or Jurafsky&Martin; chapters 1 and 2 of Mitchell.

Also read this paper:**Empirical validity and technological viability: Probabilistic models of Natural Language Processing.**

- Probability
Theory, Statistics, Machine Learning and Objective Functions

**Read**chapter 1,2 from Manning&Scheutze; read also chapter 1 of Krenn and Samuelsson

HOMEWORK: a pdf file containing the homework - Word-precition,
sentence probability (without structure), Ngrams and Markov models

**Read**chapter 6 of Juranfsky and Martin or Sections 6.1-6.3 + 9.1 from Manning and Schutze

**Homework:**Exercices 6.3, 6.4, 6.5 Jurafsky and Martin (pages 232-233) combine into one program: do it! - POS
tagging and Markov Models: standard generative POS taggers. (includes
slides of next lecture)

**Read**chapter 8 (Jurafsky and Martin) about POS tagging in general (you may skip section 8.6)

On HMMs: read from chapter 9 (Manning and Schutze) only sections 9.1+9.2 +9.3.1+9.3.2)

`Further on evaluation of Taggers: read section 10.6 (Manning and Schutze)`

**Homework:**Build a POS tagger based on the standard architecture of a Generative Stochatsic Markov Tagger.

At your disposal there is a training set and a test set of tagged corpus (to be obtained from the lecturer).

The goal is to vary the architecture of the tagger every time by adding some different context in the language or/and

the lexical model in order to observe the effect of this on tagging accuracy on the test set.

Here is the general architecture:

P(t_1... t_n | w_1.....w_n) = prod_i P( t_i | Hl) P(w_i | Hx)

where Hl and Hx are the conditioning contexts for the language and the lexical models respectively.

Here are some suggestions for different instantiations of Hl and Hx

A. Hl = <t_{i-1}> and Hx = < t_i>

B. Hl = <t_{i-2}, t_{i-1}> and Hx = <t_i>

C. Hl = <t_{i-2}, t_{i-1}> and Hx = <t_i, t_{i-1}>

D. Hl = <t_{i-2}, t_{i-1}> and Hx = <t_i, SUFFIX(w_{i-1})> where SUFFIX(word) = last 3 letters of "word"

To build these models you use a training set of tagged sentences from which you extract the tables that fit with each model.

Having built these four taggers, you test them: you strip the test set from the tags, leaving only words, and tag

each of the taggers. Then compare the result of each tagger to the manually tagged test set (original one including the tags).

Report precision of each tagger as: count of correct tags devided by total count of tags in test set.

- (same
slides as 4) HMM
implemented as SFST; Tagging Algorithms; Forward/Backward + Application
of Markov Models in Spelling Correction.

**Read**Chapter 10 of Manning and Schutze and on Spelling Correction from Jurafsky and Martin chapter 5 (till section 5.6) and chapter 6.

**Homework**finish the excercise given in the preceding lecture (number 5 above).

- Dealing
with Unseen Events: Methods for Smoothing Maximum-Likelihood Ngram
Statistics (Monday 20 October - to compensate for first
lecture)

**Read**1) chapter 6 from Manning and Schutze (or chapter [6.1-6.6] from Jurafsky and Martin)

2) Until page 15 from Joshua Goodman and Stanley Chen.*"An empirical study of smoothing techniques for language modeling".*

Technical report TR-10-98, Harvard University, August 1998. Print from http://research.microsoft.com/~joshuago/

A correction of a small error in the statement of Katz formula in this report can be found here.

See also more in

**MID-TERM PROJECT: COUNTS FOR 40% of the final mark for this subject (SEE TOP OF THIS PAGE)**

**Parsing, Phrase-Structure and Probabilistic Context-Free Grammars**

Read Chapters 9 and [10.1-10.4] of Jurafsky and Martin; For CYK algorithm, read this in Charniak's book (chapter 1+2)**TabularParsing Algorithms for CFGs and Probabilistic Context-Free Grammars**

Read Chapters 9 and [10.1-10.4] of Jurafsky and Martin; For CYK algorithm, read this in Charniak's book (chapter 1+2)**Probabilistic Context-Free Grammars, Viterbi-like Disambiguation and Treebank PCFGs**

*Read Chapters 9 and [10.1-10.4] of Jurafsky and Martin; For CYK algorithm, read this in Charniak's book (chapter 1+2)

*Read also Tree-bank grammars, Technical Report CS-96-02, Department of Computer Science, Brown University (1996).

Here are an abstract and postscript version.- Probability Estimation and the Maximum-Likelihood Principle
(lecture by Dr. Detlef Prescher).

- Treebank Parsing with
PCFGs (students'
presentations: see special page !!).

**Transforms on Phrase-Structure for Improved PCFG parsing**(students' presentations: see special page !!).**Data Oriented Parsing**(students' presentations: see special page !!).**(Reserve lecture:) Information Theory, Communication, Compression and Error Minimization**

**Read**chapter 1,2 from Manning&Scheutze; read also chapter 1 of Krenn and Samuelsson

FINAL PROJECT: COUNTS FOR 40% of the final mark for this subject (COMING SOON)

THE LAST 20% of the final mark will go towards the participation in the discussions in the last two lectures