Next: Lecture schedule
Up: Elements of Language Processing
Previous: Description of course
Reading material
This an extended reading list. The exact reading for each lecture is mentioned with
every lecture.
- Primary books (first book is the main source, second is a reference to basic notions):
- Chris Manning and Hinrich Scheutze.
Foundations of Statistical Natural Language Processing",
MIT Press. Cambridge, MA: May 1999. (see http://nlp.stanford.edu/fsnlp/)
- Daniel Jurafsky and James H. Martin. `"SPEECH and LANGUAGE PROCESSING":
An Introduction to Natural Language Processing, Computational Linguistics, and Speech
Recognition. Prentice-Hall, 2000. The chapters we mention are from the 2000
edition. If you have the new edition, there is a change. Some material is only in the
new edition and we will state this explicitly.
- Eugene Charniak. "Statistical Language Learning", Cambridge, Mass, MIT Press, 1993.
- Other usefull references:
- Tom Mitchell. "Machine Learning", McGraw-Hill Series in Computer Science, 1997.
- Simple Good-Turing and Good-Turing Smoothing Without Tears (William Gale paper)
http://www.grsampson.net/AGtf.html
- Joshua Goodman and Stanley Chen. "An empirical study of smoothing techniques for language
modeling". Technical report TR-10-98, Harvard University, August 1998.
See http://research.microsoft.com/~joshuago/. A correction of a small error in the
statement of Katz formula in this report can be found here.
- Why Probabilistic Models in NLP and Computational Linguistics?
- Fernando Pereira. Formal grammar and information theory: Together again?.
Philosophical Transactions of the Royal Society, 358(1769):1239-1253, April 2000.
http://www.cis.upenn.edu/
- K. Sima'an. Empirical validity and technological viability: Probabilistic models of Natural
Language Processing. In R. Bernardi and M. Moortgat (eds.), Linguistic Corpora and Logic
Based Grammar Formalisms, CoLogNET Area 6, 2003.
http://staff.science.uva.nl/~simaan/D-Papers/colognet02.ps
- Statistical Parsing:
- Black et al 1992. Towards history-based grammars: using richer models for probabilistic parsing.
http://www.aclweb.org/anthology/H/H92/H92-1026.pdf
- Michael Collins. Three Generative, Lexicalized Models of Statistical Parsing. Proceedings of ACL'97
http://staff.science.uva.nl/~simaan/Collins97.ps
- Eugene Charniak. Tree-bank grammars, Technical Report CS-96-02, Department of Computer Science, Brown University (1996).
An abstract and postscript version ftp://ftp.cs.brown.edu/pub/techreports/96/cs96-02.ps.Z
- Mark Johnson. PCFG models of linguistic tree representations. Computational Linguistics...
http://www.cog.brown.edu/
- Jelinek, Lafferty, Magerman, Mercer,Ratnaparkhi, Roukos: Decision Tree Parsing using a Hidden Derivation Model.
HLT 1994. http://acl.ldc.upenn.edu/H/H94/H94-1052.pdf
- Frederick Jelinek, John D. Lafferty: Computation of the Probability of Initial Substring Generation
by Stochastic Context-Free Grammars. Computational Linguistics 17(3): 315-323 (1991).
http://portal.acm.org/ft_gateway.cfm?id=971768&type=pdf&coll=GUIDE&dl=GUIDE,&CFID=108090232&CFTOKEN=26940591.
- Data Oriented Parsing:
- R. Bod, R. Scha and K. Sima'an (editors), Data Oriented Parsing, CSLI Publications, 2003 (book).
- Remko Scha: "Taaltheorie en taaltechnologie; competence en performance."
In: R. de Kort and G.L.J. Leerdam (eds.): Computertoepassingen in de Neerlandistiek.
Almere: LVVN, 1990, pp. 7-22. [Translated into English as:
"Language Theory and Language Technology; Competence and Performance."]
http://cf.hum.uva.nl/computerlinguistiek/scha/IAAA/rs/Leerdam.html
- Statistical Machine Translation:
- The mathematics of statistical machine translation: parameter estimation.
Brown, Della Pietra, Della Pietra and Mercer. Recommended!!.
http://portal.acm.org/ft_gateway.cfm?id=972474&type=pdf&CFID=116692157&CFTOKEN=39247122.
- A statistical approach to machine translation.
Computational Linguistics, Volume 16 , Issue 2 (June 1990)
Pages: 79 - 85 Year of Publication: 1990.
Authors Peter F. Brown John Cocke Stephen A. Della Pietra Vincent J. Della Pietra Fredrick John D.
Lafferty Jelinek Robert L. Mercer Paul S. Roossin
http://portal.acm.org/ft_gateway.cfm?id=92860&type=pdf&coll=GUIDE&dl=GUIDE&CFID=53991625&CFTOKEN=77629815
- Och and Ney: A Systematic Comparison of Various Statistical Alignment Models.
http://staff.science.uva.nl/~deoskar/SSLPhomepage/och-ney.pdf
- Papineni, K., Roukos, S., Ward, T., and Zhu, W. J. (2002).
"BLEU: a method for automatic evaluation of machine translation" in ACL-2002:
40th Annual meeting of the Association for Computational Linguistics pp. 311-318.o
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.19.9416&rep=rep1&type=pdf
- P. Koehn, F.J. Och, and D. Marcu (2003). Statistical phrase based translation.
In Proceedings of the Joint Conference on Human Language Technologies and the Annual Meeting of
the North American Chapter of the Association of Computational Linguistics (HLT/NAACL).
www.isi.edu/~marcu/papers/phrases-hlt2003.pdf
- Christoph Tillmann: A Unigram Orientation Model for Statistical Machine Translation
http://staff.science.uva.nl/~deoskar/SSLPhomepage/p101-tillmann.pdf
- Och and Ney 2004.The Alignment Template Approach to Statistical Machine Translation.
(Computational Linguistics, Volume 30 , Issue 4 (December 2004) ).
- Furthermore: The NLTK-Lite toolkit provides an excellent suite for expolorations:
http://nltk.sourceforge.net/lite/doc/en/
Next: Lecture schedule
Up: Elements of Language Processing
Previous: Description of course
Khalil Sima'an
2012-10-29