NLTK Book Review Assignment (Ewan Klein - Jul 25, 2007 8:32 am) Your Book Reviews have now been graded, and most of you should be able to see some feedback from me if you navigate back to the Coursework page which describes the assignment. Please let me know if you have any problems. There was some wonderfully critical feedback in what you wrote, so thanks to all of you for doing such an excellent job! As mentioned, this completes the assignments for this course. The final grades will not be delivered via Coursework, but using some additional software specifically for the Linguistics Institute, so we will have to wait a day or two before that comes online. Assignment 4: step 6 (tree_corpus) (Edward Loper - Jul 17, 2007 10:33 pm) A few of you have asked me about what output you should expect in step 6. This is what you'd like to see: Parses generated by your grammar for sentences in sent_corpus: Parses generated by your grammar, but not listed in tree_corpus: (none) Parses listed in tree_corpus, but not generated by your grammar: (none) The first line is telling you that it will list the parses of sentences in sent_corpus; but there aren't any in that corpus at the moment, so it doesn't print any. The second and third part are looking at the parses that you put in tree_corpus, and automatically checking for errors of two types: 1. trees that your grammar produced that it wasn't supposed to 2. trees that your grammar was supposed to, but didn't. In the output above, it's saying it found none of either of these two errors -- so your grammar is working as intended! The idea here is that if you choose to continue developing your grammar, the program can automatically check for errors for you, and let you know if it finds any. As long as the last two lines say (none), that means your grammar isn't making any mistakes. -Edward Assignment 4: What to do if it says "Grammar does not cover some of your input words" (Edward Loper - Jul 17, 2007 2:21 pm) If you are working on assignment 4, and you get the following error message: ValueError: Grammar does not cover some of the input words then it means that your example sentences contain at least one word that's not generated by *any* rule in your grammar. Here are some hints to help you figure out what word(s) might be causing trouble: * Word strings are case sensitive -- so if your sentences include both "The" and "the", then you'll need rules that generate both "The" and "the." I.e., something like: Det -> 'the' | 'The' | 'a' | 'A' | 'an' | 'An' Alternatively, you can just write all words in sent_corpus using lower case (i.e., don't capitalize the first letter in each sentence.) * If the sentences you put in sent_corpus contain punctuation, then your grammar will have to generate that punctuation. For this exercise, it's probably best to just leave punctuation out. (I.e., don't put a period at the end of each sentence) * Words in the grammar rules should be surrounded by quotes. I.e., your lexical rules should look like this: Det -> 'the' | 'a' | 'an' and *NOT* like this: Det -> the | a | an (If you write it this way, nltk will think that "the" and "a" and "an" are constituent labels, just like Det and NP.) * If your sentences include morphological variants of a word (e.g., singular & plural), then your grammar will need to generate each variant. E.g., if you want to include both "cat" and "cats," then you'll need rules that generate each of these -- it's not enough to just include "cat." If you have any other questions about the assignment, feel free to email me. -Edward Assignment 4 is now available (Edward Loper - Jul 16, 2007 5:25 pm) Assignment 4 is now available under "Assignments" on the Coursework page. In this assignment, you will explore grammar development. This assignment should require minimal programming skill -- if you have any trouble running the code, please let us know. The assignment is due Wednesday night (11:59pm). Assignments 2 and 3 (Steven Bird - Jul 15, 2007 3:22 pm) * Grades for assignment 2 are now available via the coursework system (some comments were also provided but I'm not sure if they will come through to you). * Assignment 3 is due today; please remember to name your submission surname.py, and include your full name inside the file. * Please submit a Python program, not an IDLE session or an MS Word file. Course notes and assignments (Steven Bird - Jul 14, 2007 12:21 pm) * Notes for the remainder of the course are now available in the bookstore, for those of you who find it convenient to have hardcopy to work from (contents: Part II, selections from Part III, index, and bibliography of the NLTK book). * Assignment 3 was given out in the last class; please access it via the "Assignments" section of the coursework site; it is due 10pm tomorrow (Sunday). * Grades for assignment 2 will be available tomorrow. Assignment 2 and Extra Class (Steven Bird - Jul 10, 2007 4:58 pm) * Assignment 2 was announced in yesterday's class; please access it via the "Assignments" section of the coursework site; it is due 10pm tomorrow (Wednesday). * There will be an extra session at 1:30pm tomorrow in Cordura 100, to help you learn Python; it will be driven by your questions, and no new material will be covered; come along and work on your assignment! * Remember that the course notes (part I of the book) contain detailed explanations and worked examples, and you'll find the assignments much easier if you work through these and try some of the exercises. * Hopefully you'll soon be past the steepest part of the learning curve and enjoying the fun of Python programming! Other online introductory Python textbooks (Steven Bird - Jul 9, 2007 1:49 pm) There are many useful resources for learning Python available on the web. Here are two that we recommend for people who are new to programming: * Non-Programmer's Tutorial for Python: http://en.wikibooks.org/wiki/Non-Programmer's_Tutorial_for_Python/Contents * How to Think Like a Computer Scientist: Learning with Python: http://www.ibiblio.org/obp/thinkCSpy/ Assignments and Office Hours (Steven Bird - Jul 9, 2007 12:37 pm) * assignment 1 has been graded and you should be able to access your grade via the Gradebook link; we will review the assignment in today's class; * assignment 2 and the NLTK Book review assignment are available via the Assignments link; we will work on assignment 2 in today's class; * office hours this week are Monday directly after class (3:15-4:15), and Tuesday 1:30-3:15pm; * we will try to arrange an extra class on Wednesday afternoon for those who are still learning basic Python programming, time and venue to be announced. Enrolling in this course (Steven Bird - Jul 7, 2007 12:11 pm) It is now possible to enroll in this course, but only until Sunday. Please use the regular enrollment process. The course presumes basic knowledge of Python, gained from reading this chapter and completing a selection of the exercises (http://nltk.org/doc/en/programming.pdf). The first class was about tokenization and you can read up on what we covered (http://nltk.org/doc/en/words.pdf). If you're enrolling the course for credit you need to submit the first assignment this weekend; for details please see our earlier announcement. If you're already enrolled, please let others know that enrollments are still open. Also, if you haven't yet submitted the first assignment please do so this weekend, even if you were unable to complete it. Getting help (Steven Bird - Jul 6, 2007 8:30 am) Discussion forum: The chatroom is useful for spontaneous interaction, but please use the NLTK-Users mailing list for more detailed discussions. Assessment: If you're struggling with the first assignment, ask for help in the chatroom or mailing list, and plan to submit whatever you've been able to do. This will help us see any common difficulties you're having. Office hours: These will continue to be posted on the coursework website, other times by appointment. Note that our office is Cordura 222, not 212 as advertised in some documentation. Laptops in class: If its inconvenient to carry your laptop to class, why not share one between two -- it might even be a more effective way to learn (cf "pair programming"). Office Hours (Steven Bird - Jul 5, 2007 5:57 pm) Office hours this week will be 1:30-3:15 Friday in Cordura 222, or else by appointment. Materials from today's class (Steven Bird - Jul 5, 2007 4:54 pm) Presentation slides and a Python log from today's class are now available from the LSA-325 website at http://coursework.stanford.edu/ Assignment 1 (Steven Bird - Jul 5, 2007 12:53 pm) Write a Python program to read the texts of the Presidential Inaugural Address Corpus (corpus.inaugural) and use a frequency distribution (nltk.FreqDist) to count the number of occurrences of 'man', 'men', 'he' vs 'woman', 'women', 'she'. Your program should output lines of the form: , for example: 3 0 1977-Carter 11 2 1981-Reagan 10 2 1985-Reagan Submit your program via your personal drop-box on the coursework website, by the end of Friday 6 July. Please post any questions to the NLTK Chatroom. Installing NLTK on your laptop (Ewan Klein - Jul 4, 2007 2:37 pm) Intro to Computational Lingustics will be based on NLTK-Lite 0.8. If you have not yet installed this release on your laptop, please go to the NLTK Installation page and follow the instructions there. Don't forget to install the corpora zip files as well. course materials (Steven Bird - Jul 4, 2007 10:37 am) Welcome to Intro to Computational Linguistics. Our initial coursework materials will be chapters 1, 3, 4, 5 of the NLTK book, available in hardcopy in the Stanford bookshop (under LSA 110P). If you're unfamiliar with Python please read chapter 2 and complete as many of the exercises as possible. See you soon!