GRA 4164 Text as Data

GRA 4164 Text as Data

Course code: 
GRA 41PP
Department: 
Economics
Credits: 
6
Course coordinator: 
Leif Anders Thorsrud
Vegard Høghaug Larsen
Course name in Norwegian: 
Text as Data
Product category: 
Master
Portfolio: 
MSc in Data Science for Business
Semester: 
2024 Autumn
Active status: 
Active
Level of study: 
Master
Teaching language: 
English
Course type: 
One semester
Introduction

Curious about how ChatGPT and other Large Language Models actually work? This course brings you one step closer to this type of understadning by giving you an introduction to Natural Language Processing applied in a business setting. 

Business decisions are becoming more and more dependent on quantitative analysis and data. Data can come in many formats. In today’s data rich world traditional data, where the information is represented as numbers in ordered tables, is by far outweighed by the large amount of information embedded in textual data sources. Examples include this course description, newspaper articles, social media postings, customer reviews, books, and financial reports. 

However, to be able to extract business relevant information from textual data we need to know how to turn something potentially very high dimensional and unstructured into something useful for quantitative analysis. This is called Natural Language Processing (NLP).

This introductory course in NLP is designed to provide business students with a comprehensive foundation in NLP, enabling them to leverage language-based technologies to enhance communication, customer engagement, decision-making, and innovation within various industries and fields. Themes covered include sentiment analysis, topic modelling, classification and trend analysis. 

Through a blend of theoretical concepts and hands-on practical exercises, students will explore the fundamental principles of NLP, its applications, and its impact on modern business practices.  Prior exposure to quantitative subjects within programming, statistics and machine learning is an advantage, but not a requirement.

Learning outcomes - Knowledge

By the end of the course, the student will:

  • Have a solid understanding of how NLP can be strategically integrated into business processes, leading to improved efficiencies and competitive advantages.
  • Be able to discuss and differentiate between different NLP methodologies, and identify their strengths and weaknesses in relation to different business cases.
Learning outcomes - Skills

By the end of the course, the student will:

  • Be proficient in selecting and applying relevant NLP tools and algorithms.
  • Have the ability to create empirical models using text as data and apply them strategically to drive business growth, foster innovation, and gain a competitive edge in the digital era.
General Competence

By the end of the course, the student will:

  • Have a broader understanding of how different data sources can be used to create business value.
  • Have hands-on and practical training in executing different NLP pipelines in a business case setting.
Course content
  • Introduction to Natural Language Processing
    • Historical context and key milestones in NLP developments
    • Text processing, tokenization, and language modelling
    • Business case examples
    • Data sources
  • Text preprocessing
    • Cleaning and preparing textual data for analysis
    • Techniques for text normalization, stemming, and lemmatization
  • Algorithms
    • Supervised methods
      • Boolean/Dictionary-based
      • Regression/classification-based
    • Unsupervised methods
      • Topic models
      • Embeddings
  • AI and Large Language Models (LLM)
    • Building basic NLP pipelines for various business scenarios
    • Fine-tuning LLMs for specific tasks
  • Case studies and real-world applications
Teaching and learning activities

The course will use a blend of instructional methods, with 2/3 of the time dedicated to lectures and 1/3 to case discussions, videos, presentations and feedback on assignments.

Students are expected to prepare to the lectures by reading assigned materials and participate actively in the discussion of the lecture topics.

The course requires the usage of Python programming, and most assignments will involve practical work using real-world data. Students must therefore have their own computer. However, dedicated software and hardware relevant for the course and solving some of the assingments will be provided by BI.

 

Software
Python: It has a rich ecosystem of libraries and tools for text processing and machine learning.

Jupyter Notebook: An open-source web application that allows the creation and sharing of documents containing live code, equations, visualizations, and narrative text.

Software tools
Software defined under the section "Teaching and learning activities".
Additional information

-

Qualifications

All courses in the Masters programme will assume that students have fulfilled the admission requirements for the programme. In addition, courses in second, third and/or fourth semester can have spesific prerequisites and will assume that students have followed normal study progression. For double degree and exchange students, please note that equivalent courses are accepted.

Disclaimer

Deviations in teaching and exams may occur if external conditions or unforeseen events call for this.

Required prerequisite knowledge

Students will benefit from knowing some programming (Python), but it is not mandatory. Similarly, having had courses in mathematics and statistics, and knowledge of linear algebra, is an advantage.

Assessments
Assessments
Duration type : 
Semester(s)
Exam category: 
Submission
Form of assessment: 
Submission PDF
Exam/hand-in semester: 
First Semester
Weight: 
100
Grouping: 
Group/Individual (1 - 2)
Comment: 
Submission at the end of the semester consists of 4 previous assignments. During the semester, students will be given the opportunity to present and get feedback on each of the assignments.
Exam code: 
GRA 41641
Grading scale: 
ECTS
Resit: 
Examination when next scheduled course
Type of Assessment: 
Ordinary examination
Total weight: 
100
Student workload
ActivityDurationComment
Teaching
24 Hour(s)
Seminar groups
12 Hour(s)
Group work / Assignments
60 Hour(s)
Student's own work with learning resources
64 Hour(s)
Sum workload: 
160

A course of 1 ECTS credit corresponds to a workload of 26-30 hours. Therefore a course of 6 ECTS credits corresponds to a workload of at least 160 hours.