Statistics is Empirical Evidence
The Rest is Belief



Why Teach Natural Language Processing (NLP)  or Computational_linguistics(CL)  in Computer Science (CS)?


While CL and NLP have been mainstream CS topics for sometime now in many places around the world, some CS colleagues still wonder why CL/NLP squarely belong in the CS curriculum. Those who are unfamiliar with CL/NLP might be able to extract a suitable motivation for this from the following arguments.

  1. How else can we make such fantastic computer programs that conduct earthly tasks such as text/speech translation or summarization, or take spoken/written orders from humans, conduct intelligent search for information from large texts, etc.? The CL and NLP communities develop theories, models and techniques of such human cognitive skills, implement them in computer systems and empirically evaluate these theories and models against the gold standard human language use. While this study is multidisciplinary by definition, topics such as statistical modeling of incomplete data belong also in CS (e.g., also under machine learning).
  2. The concept of a language is central to Computer Science: formal languages (or Sets) constitute the mathematical foundations of computer programs. Crucially, natural language processing can be seen as a means for reviewing how this apparatus is put to test, and how it can be insufficient for real world tasks (e.g. cognitive capacities like visual and language peception, predicting physical phenomena like weather, modeling expert behavior such as medical diagnosis). I think that this is instructive for a computer science student!
  3. As formal languages are the paradigm in ``traditional" Computer Science, natural languages may constitute a paradigm for real world tasks in ``modern" Computer Science.  Uncertainty is the main problem in such complex tasks.  In NLP, uncertainty manifests itself at all levels in the form of ambiguity (both in the input to and output from a system). I think that language ambiguity constitutes a natural, intuitive manifestation of the problem of uncertainty in general. Solutions to ambiguity in NLP are based on probabilistic models, and on the interplay between learning algorithms, statistics and linguistic knowledge, which might be an instructive example of  how to devise systems that mimic human behavior as opposed to traditional, synthetic systems that carry their test of success within.
  4. Linguistic constructs (e.g. grammars) are not too wild for computer scientists. It is exactly the combination of such knowledge-based constructs with probability and statistics that makes NLP so interesting: it offers an excellent example that, despite the ``arbitrariness" of linguistic categories, these categories might still be good approximations of prior knowledge, especially when this prior knowlegde is incorporated in a probabilistic model, and weighted by statistical estimation from actual data.
  5. In short, it is all about models that can predict what happens with future events in analogy to events encountered in the past, i.e., it is models that learn from experience (data -- see also Machine Learning) that are so important for natural language processing.

Khalil Sima'an


Some readings (Why Probabilistic Models in NLP):