Motivation. General overview explaining the nature of our data: Treebanks and Parallel Corpora. Hidden variables/regularities in Treebanks (derivation generating rules or nodes at each step, node labels); In parallel corpus: word-alignment (translation lexicon), ITG structures, labels of nodes, generic edit operators on trees etc.; A road map for the course: data and learning structured models. Parsing: Treebanks and Learning how to Parse: the explosion of parse space and the disambiguation problem. Defining probabilistic models over parse trees and Probabilistic Grammars.
Manning & Scheutze section 3.2 and chapter 11. Chapter 12 includes material in next lecture.
Alternative: from Jurafsky & Martin (J & M) read chapter 9, 10 and 12, and as a formal background Secs. 13.0-13.3.

Khalil Sima'an 2012-10-29