Courses

INFO-B 443 Natural Language Processing

3 credits

Prerequisite(s): INFO-B 210 OR CSCI-A 204 OR CSCI-C 200 OR CSCI 23000; Recommended: Statistics (ECON-E 270 or PBHL-B 280 or PBHL-B 300 or PBHL-B 301 or PBHL-B 302 or PSY-B 305 or SPEA-K 300 or STAT-I301 or STAT-I350) OR INFO-I 415
Delivery: On-Campus, Online
Semesters offered: Fall (Check the schedule to confirm.)

Description

This course introduces the theory and methodology of natural language understanding and generation. Topics include stemming, lemmatization, parts of speech tagging, parsing, and machine translation. Employing specialized libraries, students develop applications for topic modeling, sentiment analysis, and text summarization.

Program Learning Outcomes Supported

A1: Data Literacy - Distinguish between data, information, and knowledge.
A2: Data Literacy - Recognize that data can have value and play a key role in society by providing opportunities to expand knowledge, to innovate, and to influence.
B1: Data Science - Organize, visualize, and analyze large, complex datasets using descriptive statistics and graphs to make decisions.
B5: Data Science - Identify, assess, and select appropriately among data analytics methods and models for solving real-world problems, weighing their advantages and disadvantages.
B6: Data Science - Understand data science concepts, techniques, and tools to support big data analytics.
C4: Information Science - Understand the characteristics of various data types generated and used by a variety of disciplines, subdisciplines, research communities, and government organizations.

Learning Outcomes

Extract information from text automatically using concepts and methods from natural language processing (NLP) including stemming, n-grams, POS tagging, and parsing.
Develop speech-based applications that use speech analysis (phonetics, speech recognition, and synthesis).
Analyze the syntax, semantics, and pragmatics of a statement written in a natural language.
Develop a conversational agent that uses natural language understanding and generation.
Apply machine learning algorithms to natural language processing.
Write scripts and applications in Python to carry out natural language processing using libraries such as NLTK, Gensim, and spaCY.
Design NLP-based AI systems for question answering, text summarization, and machine translation.
Evaluate the performance of NLP tools and systems.

Profiles of Learning for Undergraduate Success (PLUS) Alignment

Instructors align their courses with the Profiles of Learning for Undergraduate Success. The profiles provide students various opportunities to deepen disciplinary understanding, participate in engaged learning, and refine what it means to be a well-rounded, well-educated person prepared for lifelong learning and success.

P2.1 Problem Solver – Think critically.
P2.3 Problem Solver – Analyzes, synthesizes, and evaluates.
P3.2 Innovator – Creates/designs.

Course Overview

Module 0: Introduction to Course/ Getting Started

Course Basics and Course Navigation
Course Structure and Schedule
Accessibility Acknowledgement
Writing Resources and Student Engagement Roster
What is Zoom @IU?
How to Create a Video

Module 1: Introduction to Natural Language Processing

What is:
- A Natural Language?
- Natural Language Processing?
Language Syntax and Structure
Applications of Natural Language Processing

Module 2: Python Programming Review

Intensive review of Python programming
Introduction to Lambda Functions
Using Google Colaboratory

Module 3: Python for NLP

Working with Text Data
Introduction Text Processing and Analysis

Module 4: Intro to Text Preprocessing and Wrangling (Part1)

Text cleaning and Tokenization
Removal of special characters
Case conversion
Correcting spellings
Removal of stopwords

Module 5: Intro to Text Preprocessing and Wrangling (Part 2)

Stemming
Lemmatization
Introduction to SpaCy

Module 6: Text Syntax and Structure

Parts of Speech (POS) Tagging
Shallow Parsing/ Chunking
Dependency and Constituency parsing

Module 7: Feature Engineering

Bag of Words model
Bag of N-Grams model
TF-IDF model
Document Similarity

Module 8: Classification of Text

What is text Classification?
Automated Text Classification
Classification Models

Module 9: Summarization and Topic Models

Summarization and Information Extraction
Topic Modeling
- Gensim
- Scikit-Learn
Automated Document Summarization

Module 10: Text Similarity

Essential concepts of text Similarity
Analyzing Term Similarity
Analyzing Document Similarity

Module 11: Clustering

Cluster analysis
Clustering Applications

Module 12: Semantic Analysis

Introduction to Semantic Analysis
WordNet
NER Tagger
Analysis of Semantic Representations

Module 13: Sentiment Analysis

Module 14: Final Project Preparation

Module 15: Final Project Presentation and Final Exam

Policies and Procedures

Please be aware of the following linked policies and procedures. Note that in individual courses instructors will have stipulations specific to their course.

Luddy School of
Informatics, Computing, and Engineering

Courses

INFO-B 443 Natural Language Processing

Description

Program Learning Outcomes Supported

Learning Outcomes

Profiles of Learning for Undergraduate Success (PLUS) Alignment

Course Overview

Policies and Procedures

Additional links and resources

Explore

Happening at Luddy

Information For

Courses

INFO-B 443 Natural Language Processing

Description

Program Learning Outcomes Supported

Learning Outcomes

Profiles of Learning for Undergraduate Success (PLUS) Alignment

Course Overview

Policies and Procedures

Luddy School of Informatics, Computing, and Engineering resources and social media channels

Additional links and resources

Explore

Happening at Luddy

Information For