INFO-B 443 Natural Language Processing
3 credits
- Prerequisite(s): INFO-B 210 OR CSCI-A 204 OR CSCI-C 200 OR CSCI 23000; Recommended: Statistics (ECON-E 270 or PBHL-B 280 or PBHL-B 300 or PBHL-B 301 or PBHL-B 302 or PSY-B 305 or SPEA-K 300 or STAT-I301 or STAT-I350) OR INFO-I 415
- Delivery: On-Campus, Online
- Semesters offered: Fall (Check the schedule to confirm.)
Description
This course introduces the theory and methodology of natural language understanding and generation. Topics include stemming, lemmatization, parts of speech tagging, parsing, and machine translation. Employing specialized libraries, students develop applications for topic modeling, sentiment analysis, and text summarization.
Program Learning Outcomes Supported
- A1: Data Literacy - Distinguish between data, information, and knowledge.
- A2: Data Literacy - Recognize that data can have value and play a key role in society by providing opportunities to expand knowledge, to innovate, and to influence.
- B1: Data Science - Organize, visualize, and analyze large, complex datasets using descriptive statistics and graphs to make decisions.
- B5: Data Science - Identify, assess, and select appropriately among data analytics methods and models for solving real-world problems, weighing their advantages and disadvantages.
- B6: Data Science - Understand data science concepts, techniques, and tools to support big data analytics.
- C4: Information Science - Understand the characteristics of various data types generated and used by a variety of disciplines, subdisciplines, research communities, and government organizations.
Learning Outcomes
- Extract information from text automatically using concepts and methods from natural language processing (NLP) including stemming, n-grams, POS tagging, and parsing.
- Develop speech-based applications that use speech analysis (phonetics, speech recognition, and synthesis).
- Analyze the syntax, semantics, and pragmatics of a statement written in a natural language.
- Develop a conversational agent that uses natural language understanding and generation.
- Apply machine learning algorithms to natural language processing.
- Write scripts and applications in Python to carry out natural language processing using libraries such as NLTK, Gensim, and spaCY.
- Design NLP-based AI systems for question answering, text summarization, and machine translation.
- Evaluate the performance of NLP tools and systems.
Profiles of Learning for Undergraduate Success (PLUS) Alignment
Instructors align their courses with the Profiles of Learning for Undergraduate Success. The profiles provide students various opportunities to deepen disciplinary understanding, participate in engaged learning, and refine what it means to be a well-rounded, well-educated person prepared for lifelong learning and success.
- P2.1 Problem Solver – Think critically.
- P2.3 Problem Solver – Analyzes, synthesizes, and evaluates.
- P3.2 Innovator – Creates/designs.
Course Overview
Module 0: Introduction to Course/ Getting Started
- Course Basics and Course Navigation
- Course Structure and Schedule
- Accessibility Acknowledgement
- Writing Resources and Student Engagement Roster
- What is Zoom @IU?
- How to Create a Video
Module 1: Introduction to Natural Language Processing
- What is:
- A Natural Language?
- Natural Language Processing?
- Language Syntax and Structure
- Applications of Natural Language Processing
Module 2: Python Programming Review
- Intensive review of Python programming
- Introduction to Lambda Functions
- Using Google Colaboratory
Module 3: Python for NLP
- Working with Text Data
- Introduction Text Processing and Analysis
Module 4: Intro to Text Preprocessing and Wrangling (Part1)
- Text cleaning and Tokenization
- Removal of special characters
- Case conversion
- Correcting spellings
- Removal of stopwords
Module 5: Intro to Text Preprocessing and Wrangling (Part 2)
- Stemming
- Lemmatization
- Introduction to SpaCy
Module 6: Text Syntax and Structure
- Parts of Speech (POS) Tagging
- Shallow Parsing/ Chunking
- Dependency and Constituency parsing
Module 7: Feature Engineering
- Bag of Words model
- Bag of N-Grams model
- TF-IDF model
- Document Similarity
Module 8: Classification of Text
- What is text Classification?
- Automated Text Classification
- Classification Models
Module 9: Summarization and Topic Models
- Summarization and Information Extraction
- Topic Modeling
- Gensim
- Scikit-Learn
- Automated Document Summarization
Module 10: Text Similarity
- Essential concepts of text Similarity
- Analyzing Term Similarity
- Analyzing Document Similarity
Module 11: Clustering
- Cluster analysis
- Clustering Applications
Module 12: Semantic Analysis
- Introduction to Semantic Analysis
- WordNet
- NER Tagger
- Analysis of Semantic Representations
Module 13: Sentiment Analysis
Module 14: Final Project Preparation
Module 15: Final Project Presentation and Final Exam
Policies and Procedures
Please be aware of the following linked policies and procedures. Note that in individual courses instructors will have stipulations specific to their course.