This document describes a variety of possible natural language processing projects that can be undertaken using NLTK.
The NLTK team welcomes contributions of good student projects, and some past projects (e.g. the Brill and HMM taggers) have been incorporated into the toolkit.
This section describes the project assessment requirements for 433-460 Human Language Technology at the University of Melbourne. Project assessment has three components: an oral presentation (5%), a written report (10%), and an implementation (20%).
Students will give a 10-minute oral presentation to the rest of the class in the second-last week of semester. This will be evaluated for the quality of content and presentation:
Students should submit a ~5-page written report, with approximately one page covering each of the following points:
This should be prepared using the Python docutils and doctest packages. These are easily learnt, and ideally suited for creating reports with embedded program code, and they have been used for all NLTK-Lite documentation. For a detailed example, see the text source for the NLTK tagging chapter (text, html).
Marks will be be awarded for the basic implementation and for various kinds of complexity, as described below:
- we are able to run the system
- we can easily test the system (interface is usable, output is appropriately detailed and clearly formatted)
- we can easily work out how the system is implemented (understandable code, inline documentation; you can assume we read the report first)
- the system implements NLP algorithms (i.e. relevant to the subject, re-using existing NLP algorithms wherever possible instead of reinventing the wheel)
- the NLP algorithms are correctly implemented
- the system implements a non-trivial problem
- the system combines multiple HLT components as appropriate
- appropriate training data is used (effort in obtaining and preparing the data will be considered)
- the system permits exploration of the problem domain and the algorithms (e.g. through appropriate parameterization)
- a range of system configurations/modifications are explored (e.g. classifiers trained and tested using different parameters)
About this document...
This chapter is a draft from Natural Language Processing [http://nltk.org/book.html], by Steven Bird, Ewan Klein and Edward Loper, Copyright © 2008 the authors. It is distributed with the Natural Language Toolkit [http://nltk.org/], Version 0.9.5, under the terms of the Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License [http://creativecommons.org/licenses/by-nc-nd/3.0/us/].
This document is