GRA 4164 Text as Data
GRA 4164 Text as Data
Curious about how ChatGPT and other Large Language Models actually work? This course brings you one step closer to this type of understadning by giving you an introduction to Natural Language Processing applied in a business setting.
Business decisions are becoming more and more dependent on quantitative analysis and data. Data can come in many formats. In today’s data rich world traditional data, where the information is represented as numbers in ordered tables, is by far outweighed by the large amount of information embedded in textual data sources. Examples include this course description, newspaper articles, social media postings, customer reviews, books, and financial reports.
However, to be able to extract business relevant information from textual data we need to know how to turn something potentially very high dimensional and unstructured into something useful for quantitative analysis. This is called Natural Language Processing (NLP).
This introductory course in NLP is designed to provide business students with a comprehensive foundation in NLP, enabling them to leverage language-based technologies to enhance communication, customer engagement, decision-making, and innovation within various industries and fields. Themes covered include sentiment analysis, topic modelling, classification and trend analysis.
Through a blend of theoretical concepts and hands-on practical exercises, students will explore the fundamental principles of NLP, its applications, and its impact on modern business practices. Prior exposure to quantitative subjects within programming, statistics and machine learning is an advantage, but not a requirement.
By the end of the course, the student will:
- Have a solid understanding of how NLP can be strategically integrated into business processes, leading to improved efficiencies and competitive advantages.
- Be able to discuss and differentiate between different NLP methodologies, and identify their strengths and weaknesses in relation to different business cases.
By the end of the course, the student will:
- Be proficient in selecting and applying relevant NLP tools and algorithms.
- Have the ability to create empirical models using text as data and apply them strategically to drive business growth, foster innovation, and gain a competitive edge in the digital era.
By the end of the course, the student will:
- Have a broader understanding of how different data sources can be used to create business value.
- Have hands-on and practical training in executing different NLP pipelines in a business case setting.
- Introduction to Natural Language Processing
- Historical context and key milestones in NLP developments
- Text processing, tokenization, and language modelling
- Business case examples
- Data sources
- Text preprocessing
- Cleaning and preparing textual data for analysis
- Techniques for text normalization, stemming, and lemmatization
- Algorithms
- Supervised methods
- Boolean/Dictionary-based
- Regression/classification-based
- Unsupervised methods
- Topic models
- Embeddings
- Supervised methods
- AI and Large Language Models (LLM)
- Building basic NLP pipelines for various business scenarios
- Fine-tuning LLMs for specific tasks
- Case studies and real-world applications
The course will use a blend of instructional methods, with 2/3 of the time dedicated to lectures and 1/3 to case discussions, videos, presentations and feedback on assignments.
Students are expected to prepare to the lectures by reading assigned materials and participate actively in the discussion of the lecture topics.
The course requires the usage of Python programming, and most assignments will involve practical work using real-world data. Students must therefore have their own computer. However, dedicated software and hardware relevant for the course and solving some of the assingments will be provided by BI.
Software
Python: It has a rich ecosystem of libraries and tools for text processing and machine learning.
Jupyter Notebook: An open-source web application that allows the creation and sharing of documents containing live code, equations, visualizations, and narrative text.
-
All courses in the Masters programme will assume that students have fulfilled the admission requirements for the programme. In addition, courses in second, third and/or fourth semester can have spesific prerequisites and will assume that students have followed normal study progression. For double degree and exchange students, please note that equivalent courses are accepted.
Disclaimer
Deviations in teaching and exams may occur if external conditions or unforeseen events call for this.
Students will benefit from knowing some programming (Python), but it is not mandatory. Similarly, having had courses in mathematics and statistics, and knowledge of linear algebra, is an advantage.
Assessments |
---|
Duration type : Semester(s) Exam category: Submission Form of assessment: Submission PDF Exam/hand-in semester: First Semester Weight: 100 Grouping: Group/Individual (1 - 3) Comment: Submission at the end of the semester consists of 4 previous assignments. During the semester, students will be given the opportunity to present and get feedback on each of the assignments. Exam code: GRA 41641 Grading scale: ECTS Resit: Examination when next scheduled course |
Activity | Duration | Comment |
---|---|---|
Teaching | 24 Hour(s) | |
Seminar groups | 12 Hour(s) | |
Group work / Assignments | 60 Hour(s) | |
Student's own work with learning resources | 64 Hour(s) |
A course of 1 ECTS credit corresponds to a workload of 26-30 hours. Therefore a course of 6 ECTS credits corresponds to a workload of at least 160 hours.