# EBA 3500 Data Analysis with Programming

## EBA 3500 Data Analysis with Programming

The students will learn the most important techniques in applied statistics and data analysis, with an emphasis on linear regression and logistic regression. Students are given hands-on experience with data analysis projects, and will gain further knowledge in working with data, using descriptive statistics to motivate models, and using models to turn data into actionable knowledge. Data examples and applications will be given.

After completing the course, the student should know:

- What statistical models are, and how they may be of use in prediction, modelling, and explanations.
- What differentiates machine learning from statistics.
- The basic mathematical underpinnings of the linear regression model.
- The scope and limitations of a multiple linear regression model.
- The interpretation of the logistic regression model.
- How categorical covariates can be used in a regression context.
- The basics of ANOVA.
- The common set of assumptions underlying consistency and asymptotic normality regression coefficients.

After completing the course, the students will:

- Possess basic skills in the Python packages numpy, pandas, and statsmodels.
- Develop further skills in descriptive statistics and data visualization.
- Choosing appropriate exploratory tools for getting an overview of large datasets.
- Be able to turn a practical question into a question that can be addressed with multiple linear regression.
- Be able to use statistical tools to decide on a course of action for a given practical problem.
- Further develop programming skills in Python.
- Perform simulation studies to assess how well a statistical method performs.

Students will understand that in many situations a statistical analysis will help making better decisions, but also be aware that statistical methods can be wrongly applied and lead to false conclusions.

The following topics will be covered using Python as statistical analysis system.

- Python programming using packages such Numpy, Scipy, statsmodels, Pandas, and matplotlib.
- Data visualization with matplotlib.
- Basic data manipulation using Pandas and numpy.
- How to make simple simulation studies.
- Multiple linear regression in statsmodels.
- Logistic regression in statsmodels.

Teaching will be done using a blend of lectures and applied data projects in Python.

.

Higher Education Entrance Qualification

**Disclaimer**

Deviations in teaching and exams may occur if external conditions or unforeseen events call for this.

EBA3400 Programming, data extraction and visualisation, EBA 1180 Mathematics for Data Science or equivalent courses.

Assessments |
---|

Exam category: Submission Form of assessment: Written submission Weight: 40 Grouping: Group/Individual (1 - 3) Duration: 1 Week(s) Exam code: EBA 35001 Grading scale: ECTS Resit: Examination every semester |

Exam category: Submission Form of assessment: Written submission Invigilation Weight: 60 Grouping: Individual Support materials: - BI-approved exam calculator
- Simple calculator
- Bilingual dictionary
Duration: 3 Hour(s) Comment: All exams must be passed to obtain a final grade in the course. Exam code: EBA 35002 Grading scale: ECTS Resit: Examination every semester |

All exams must be passed to get a grade in this course.

Activity | Duration | Comment |
---|---|---|

Teaching | 36 Hour(s) | Lectures and blended learning with projects for students. |

Feedback activities and counselling | 9 Hour(s) | Project-work under supervision. |

Examination | 15 Hour(s) | Work related to the home exam. |

Student's own work with learning resources | 75 Hour(s) | |

Group work / Assignments | 62 Hour(s) | |

Examination | 3 Hour(s) | Final exam |

A course of 1 ECTS credit corresponds to a workload of 26-30 hours. Therefore a course of 7,5 ECTS credit corresponds to a workload of at least 200 hours.

All exams must be passed to obtain a final grade in the course.