Introduction to Machine Learning

Lecturer: Marco Steenbergen

Modality: In presence

Week 1: 12-16 August 2024

 

Workshop Contents and Objectives

Machine learning is of ever greater importance in data science applications in academia, government, and industry. It is not just another set of techniques; it is an entirely new way of thinking about data. The main objective of this course is to familiarize you with this way of thinking. What is it? What criteria are used to ensure its validity? How can social scientists take advantage of machine learning? What algorithms are available? We start by discussing the general machine learning workflow and by familiarizing you with the tidymodels package in R. In the following days, we delve into the most powerful machine learning algorithms currently available, focusing on both predictive performance and interpretability. By the end of the course, you should know how to use those algorithms in your own work. You should also know the logic and jargon of machine learning so that you can interact with computer and data scientists.

 

Workshop design

The course entails a mixture of lecture, individual, and group exercises. At the end of each day, there is time to discuss individual projects (clinic format).

 

Detailed lecture plan (daily schedule)

Day 1.
Morning: Objectives and workflows of machine learning (lecture); introductory example in R (lecture).
Afternoon: tidymodels in R (exercise); over-fitting, the lasso, and elastic nets (lecture); tuning models (lecture); R practice (exercise); clinic (one-on-one).

Day 2.
Morning: Classification and regression trees (lecture); variable importance (lecture); interpretation (lecture); R practice (exercise).
Afternoon: Bagging and random forests (lecture); R practice (exercise); how to read a machine learning paper (lecture); clinic (one-on-one).

Day 3.
Morning: Boosting with an emphasis on xgboost (lecture); R practice (exercise).
Afternoon: Stacking (lecture); R practice (exercise); presentation of machine learning papers (group work); clinic (one-on-one).

Day 4.
Morning: Feedforward neural networks (lecture).
Afternoon: R practice (exercise); interpretable machine learning (lecture); clinic (one-on-one).

Day 5.
Morning: R practice (exercise); advanced techniques in deep learning (demonstration).
Afternoon: Open—can be used to discuss topics requested by students (a survey will be sent ahead of the term), Q&A, or further clinics.

 

Class materials

Recommended: Kuhn, Max and Julia Silge. 2022. Tidy Modeling with R: A Framework for Modeling in the Tidyverse. O’Reilly. ISBN: 978-1492096481

 

Prerequisites

Prior knowledge of regression and R is highly recommended.

 

Recommended readings or preliminary materials

Notes on data visualization in R will be sent ahead of the course.