GRA 4164 Text as Data

Course code:

GRA 4164

Department:

Economics

Credits:

Course coordinator:

Leif Anders Thorsrud

Vegard Høghaug Larsen

Course name in Norwegian:

Text as Data

Product category:

Master

Portfolio:

MSc in Data Science for Business

Semester:

2024 Autumn

Active status:

Active

Level of study:

Master

Teaching language:

English

Course type:

One semester

Introduction

Curious about how ChatGPT and other Large Language Models actually work? This course brings you one step closer to this type of understadning by giving you an introduction to Natural Language Processing applied in a business setting.

Business decisions are becoming more and more dependent on quantitative analysis and data. Data can come in many formats. In today’s data rich world traditional data, where the information is represented as numbers in ordered tables, is by far outweighed by the large amount of information embedded in textual data sources. Examples include this course description, newspaper articles, social media postings, customer reviews, books, and financial reports.

However, to be able to extract business relevant information from textual data we need to know how to turn something potentially very high dimensional and unstructured into something useful for quantitative analysis. This is called Natural Language Processing (NLP).

This introductory course in NLP is designed to provide business students with a comprehensive foundation in NLP, enabling them to leverage language-based technologies to enhance communication, customer engagement, decision-making, and innovation within various industries and fields. Themes covered include sentiment analysis, topic modelling, classification and trend analysis.

Through a blend of theoretical concepts and hands-on practical exercises, students will explore the fundamental principles of NLP, its applications, and its impact on modern business practices. Prior exposure to quantitative subjects within programming, statistics and machine learning is an advantage, but not a requirement.

Learning outcomes - Knowledge

By the end of the course, the student will:

Have a solid understanding of how NLP can be strategically integrated into business processes, leading to improved efficiencies and competitive advantages.
Be able to discuss and differentiate between different NLP methodologies, and identify their strengths and weaknesses in relation to different business cases.

Learning outcomes - Skills

By the end of the course, the student will:

Be proficient in selecting and applying relevant NLP tools and algorithms.
Have the ability to create empirical models using text as data and apply them strategically to drive business growth, foster innovation, and gain a competitive edge in the digital era.

General Competence

By the end of the course, the student will:

Have a broader understanding of how different data sources can be used to create business value.
Have hands-on and practical training in executing different NLP pipelines in a business case setting.

Course content

Introduction to Natural Language Processing
- Historical context and key milestones in NLP developments
- Text processing, tokenization, and language modelling
- Business case examples
- Data sources
Text preprocessing
- Cleaning and preparing textual data for analysis
- Techniques for text normalization, stemming, and lemmatization
Algorithms
- Supervised methods
  - Boolean/Dictionary-based
  - Regression/classification-based
- Unsupervised methods
  - Topic models
  - Embeddings
AI and Large Language Models (LLM)
- Building basic NLP pipelines for various business scenarios
- Fine-tuning LLMs for specific tasks
Case studies and real-world applications

Teaching and learning activities

The course will use a blend of instructional methods, with 2/3 of the time dedicated to lectures and 1/3 to case discussions, videos, presentations and feedback on assignments.

Students are expected to prepare to the lectures by reading assigned materials and participate actively in the discussion of the lecture topics.

The course requires the usage of Python programming, and most assignments will involve practical work using real-world data. Students must therefore have their own computer. However, dedicated software and hardware relevant for the course and solving some of the assingments will be provided by BI.

Software
Python: It has a rich ecosystem of libraries and tools for text processing and machine learning.

Jupyter Notebook: An open-source web application that allows the creation and sharing of documents containing live code, equations, visualizations, and narrative text.

Software tools

Software defined under the section "Teaching and learning activities".

Additional information

Qualifications

All courses in the Masters programme will assume that students have fulfilled the admission requirements for the programme. In addition, courses in second, third and/or fourth semester can have spesific prerequisites and will assume that students have followed normal study progression. For double degree and exchange students, please note that equivalent courses are accepted.

Disclaimer

Deviations in teaching and exams may occur if external conditions or unforeseen events call for this.

Required prerequisite knowledge

Students will benefit from knowing some programming (Python), but it is not mandatory. Similarly, having had courses in mathematics and statistics, and knowledge of linear algebra, is an advantage.

Assessments

Assessments
Duration type : Semester(s) Exam category: Submission Form of assessment: Submission PDF Exam/hand-in semester: First Semester Weight: 100 Grouping: Group/Individual (1 - 3) Comment: Submission at the end of the semester consists of 4 previous assignments. During the semester, students will be given the opportunity to present and get feedback on each of the assignments. Exam code: GRA 41641 Grading scale: ECTS Resit: Examination when next scheduled course

Duration type :

Semester(s)

Exam category:

Submission

Form of assessment:

Submission PDF

Exam/hand-in semester:

First Semester

Weight:

100

Grouping:

Group/Individual (1 - 3)

Comment:

Submission at the end of the semester consists of 4 previous assignments. During the semester, students will be given the opportunity to present and get feedback on each of the assignments.

Exam code:

GRA 41641

Grading scale:

ECTS

Resit:

Examination when next scheduled course

Type of Assessment:

Ordinary examination

Total weight:

100

Student workload

Activity	Duration	Comment
Teaching	24 Hour(s)
Seminar groups	12 Hour(s)
Group work / Assignments	60 Hour(s)
Student's own work with learning resources	64 Hour(s)

Sum workload:

160

A course of 1 ECTS credit corresponds to a workload of 26-30 hours. Therefore a course of 6 ECTS credits corresponds to a workload of at least 160 hours.

Link to reading list

Reading list

Programmeinfo BI

Varselmelding

GRA 4164 Text as Data

GRA 4164 Text as Data

Disclaimer

Semester

Oversettelser