Teaching the Elephant to Read
Course Outline
Introduction
An NLP Crash Course
NLP and Big Data: NLTK, Dumbo, and Hadoop
Task 0 - Organizing
LARGE
bodies of text
Task 1 - Tokenization and Segmentation
Task 2 - Tagging and Stemming
Task 3 - Parsing
Conclusions
If you need to get help setting up the virtual machine, go here for detailed
setup installation directions
.
The VM's user = "hadoop" and password = "password"
git clone
https://github.com/bbengfort/strata-teaching-the-elephant-to-read.git