Module-1
Introduction to NLP, Regular Expressions, Regular Expressions in Practical NLP, Word Tokenization, Word Normalization and Stemming, Sentence Segmentation.
Module-2
Defining Minimum Edit Distance, Computing Minimum Edit Distance, Back trace for Computing Alignments, Minimum Edit Distance in Computational Biology Weighted Minimum Edit Distance.
Module-3
Introduction to N-grams, Estimating N-gram Probabilities, Evaluation and Perplexity, Generalization and Zeros.
Module-4
Smoothing Add One, Interpolation, Good Turing Smoothing, Kneser Ney Smoothing.
Module-5
The Spelling Correction Task, the Noisy Channel Model of Spelling, Real Word Spelling Correction.
Module-6
State of the Art Systems, What is Text Classification, Text Classification &Naive Bayes, Formalizing the Naive Bayes Classifier, Naive Bayes Relationship to Language Modelling, Precision, Recall, and the F-measure, Text Classification Evaluation, Practical Issues in Text Classification.
Module-7
What is Sentiment Analysis, Sentiment Analysis A baseline algorithm, Sentiment Lexicons, Learning Sentiment Lexicons, Generative vs Discriminative Models, Making features from text for discriminative NLP models.
Module-8
Feature Based Linear Classifiers, Building a Maxent Model, Generative vs Discriminative models The problem of over counting evidence, Introduction to Information Extraction, Evaluation of Named Entity Recognition, Sequence Models for Named Entity Recognition.