EBA 3500 Data Analysis with Programming
EBA 3500 Data Analysis with Programming
This course introduces the basics of statistics and machine learning in the context of Python. It covers:
- Inferential statistics, such as the bootstrap, p-values, and confidence intervals.
- Methods for constructing and evaluating statistical estimators.
- The fundamentals of the two most important regression models: Linear regression and logistic regression.
Additionally, the course introduces students to the Python packages NumPy, SciPy, and statsmodels
Upon completion of this course, students will be able to:
-
Understand, explain, and use fundamental statistical concepts such as:
- Population values and estimators.
- Construction of estimators using maximum likelihood.
- Unbiased estimators.
- Mean squared error.
- Statistical tests.
- Type I and type II errors.
- p-values.
- Confidence intervals.
- Basic bootstrap.
- Rudimentary exploratory data analysis.
- Data summaries such as the mean, standard deviation, and kurtosis.
-
Understand the key differences between statistics and machine learning.
-
Understand how to evaluate the performance of machine learning models.
-
Understand and explain the fundamentals of linear regression modeling and logistic regression.
-
Understand and explain the role of ANOVA and omnibus tests in science.
Upon completion of this course, students will be able to:
- Perform vectorized computations in NumPy.
- Run statistical simulations using NumPy and SciPy.
- Perform reasonable exploratory data analysis in Python, including constructing graphs.
- Create novel estimators for population values.
- Choose between estimators in a principled way.
- Calculate confidence intervals, p-values, and test statistics in both traditional and new settings.
- Program bootstrap procedures for statistical inference.
- Implement complex linear regressions and logistic regressions in Python and interpret their output.
- Conduct simple ANOVA experiments.
Upon completion of the course, students will have stronger competence in:
- Making sound statistical judgments.
- Using online resources to solve problems.
- Working independently on difficult problems.
- Reading and understanding technical documentation.
The course covers the following topics:
- NumPy and SciPy.
- Statistical simulation in Python.
- Exploratory data analysis.
- Statistical models and the bootstrap.
- Unbiased estimators and the efficiency of estimators.
- Construction of estimators.
- Confidence intervals.
- Hypothesis tests and p-values.
- The t-test.
- Foundations of machine learning.
- Linear regression.
- Inference for linear regression.
- Linear regression using categorical covariates (ANOVA).
- Binary regression, such as logistic regression.
The course uses the following methods for teaching and learning:
- Video lectures.
- Lab sessions where the teacher and TAs help students with problems.
- Homework exercises.
Starting in the autumn of 2023, the form of evaluation in this course has changed from two exam codes (EBA 35001 and EBA 35002) to one exam code (EBA 35003).
A re-sit examination is offered in the former exam codes EBA 35001 and EBA 35002 in autumn 2023 and last time in spring 2024.
Higher Education Entrance Qualification
Disclaimer
Deviations in teaching and exams may occur if external conditions or unforeseen events call for this.
EBA3400 Programming, data extraction and visualisation, EBA 1180 Mathematics for Data Science or equivalent courses.
Assessments |
---|
Exam category: Submission Form of assessment: Handin - all file types Weight: 100 Grouping: Group/Individual (1 - 3) Duration: 1 Week(s) Comment: Home exam. Exam code: EBA 35003 Grading scale: ECTS Resit: Examination every semester |
Activity | Duration | Comment |
---|---|---|
Digital resources
| 21 Hour(s) | |
Seminar groups | 36 Hour(s) | |
Examination | 25 Hour(s) | |
Student's own work with learning resources | 118 Hour(s) |
A course of 1 ECTS credit corresponds to a workload of 26-30 hours. Therefore a course of 7,5 ECTS credit corresponds to a workload of at least 200 hours.