Computational morphosyntax (5r)

Code: 
570519
Language: 
English
Year: 
2020/2021
Course: 
Master courses
Speciality: 
Linguistics
Credits: 
5.00
Start / End: 
12/01/2021 - 16/03/2021
Semester: 
2nd
Schedule: 
Tuesdays, 17:30 - 20:30
Location: 
Online. Please email instructor for the link.
Goals: 

The objectives of this course are twofold. On the theoretical side, the objective is to learn the main techniques used in the computational treatment of morphology and syntax (i.e., in the treatment of words and strings of words that form sentences). On the practical side, the goal is to get acquainted with the basics of Python as a script programming language and to know how to manage text files to be processed. In the course the student will learn, write and use text tokenisers, morphosyntactic taggers and syntactic processors.

Requirement: if you are new to programming or to Python it is required that you acquire some knowledge of the Python programming language (Python 3) before the start of the course (contact your tutor or the course instructor for further details on tutorials for this training).

Structure and Contents: 
1.         Regular expressions and finite state automata. Basics of Python
2.         Text tokenization and Minimum Edit Distance. Regular expressions for tokenisation 
3.         Language models based on n-gram. Extracting information from tokenised texts
4.         Morphosyntactic tagging. Training taggers
5.         Formal grammars. Tagging texts
6.         Parsing. Extracting information from tagged texts
7.         Features and unification. Implementing a unification grammar fragment for agreement
8.         Statistical parsing. Implementing a unification grammar fragment for PP-attachment and wh-constructions
Methodology: 
1. reading and discussion of the relevant chapters of the reference book (Jurafsky and Martin, 2009)
2. practical assignments based on NLTK (Bird, Klein & Loper, 2014)
3. final essay
Assessment: 
Practical exercises  20%
Participation in class discussions  10%
Final essay / final exam  70%
When a low mark is obtained at the end of the course (between 3 and 4,9), a reevaluation is possible based on the revision of the practical exercises and a new final essay.

 

Bibliography: 
  • Basic items: 
  • Jurafsky, Daniel & Martin, James H. (2009), Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. 2nd edition. Prentice Hall (a new edition is currently being prepared) 
  • Bird, Steven; Klein, Ewan & Loper, Edward (2014), Natural Language Processing with Python. Analyzing Text with the Natural Language Toolkit. http://www.nltk.org/book/. (There is a previous, printed version of this book published by O’Reilly in 2009, based on Python2) • Jurafsky, Daniel & Manning, Christopher D. (2014), Natural Language Processing lectures, You Tube https://www.youtube.com/playlist?list=PLQiyVNMpDLKnZYBTUOlSI9mi9wAErFtFm...
  • Python Tutorials: There is plenty of Python tutorials on the web. Here are a couple of tutorials that may be useful: https://www.pythonprogramming.net/python-fundamental-tutorials/ (for beginners) http://www.diveintopython3.net/ (advanced content) 
  • Other recommended readings: 
  •  Manning, Christopher D. & Schütze, Hinrich (1999), Foundations of Statistical Natural Language Processing. The MIT Press.