Introduction to Machine Learning
Lecturer: Marco Steenbergen
Modality: In presence
Week 1: 12-16 August 2024
Workshop Contents and Objectives
Machine learning is of ever greater importance in data science applications in academia, government, and industry. It is not just another set of techniques; it is an entirely new way of thinking about data. The main objective of this course is to familiarize you with this way of thinking. What is it? What criteria are used to ensure its validity? How can social scientists take advantage of machine learning? What algorithms are available? We start by discussing the general machine learning workflow and by familiarizing you with the tidymodels package in R. In the following days, we delve into the most powerful machine learning algorithms currently available, focusing on both predictive performance and interpretability. By the end of the course, you should know how to use those algorithms in your own work. You should also know the logic and jargon of machine learning so that you can interact with computer and data scientists.
Workshop design
The course entails a mixture of lecture, individual, and group exercises. At the end of each day, there is time to discuss individual projects (clinic format).
Detailed lecture plan (daily schedule)
Day 1.
Morning: Objectives and workflows of machine learning (lecture); introductory example in R (lecture).
Afternoon: tidymodels in R (exercise); over-fitting, the lasso, and elastic nets (lecture); tuning models (lecture); R practice (exercise); clinic (one-on-one).
Day 2.
Morning: Classification and regression trees (lecture); variable importance (lecture); interpretation (lecture); R practice (exercise).
Afternoon: Bagging and random forests (lecture); R practice (exercise); how to read a machine learning paper (lecture); clinic (one-on-one).
Day 3.
Morning: Boosting with an emphasis on xgboost (lecture); R practice (exercise).
Afternoon: Stacking (lecture); R practice (exercise); presentation of machine learning papers (group work); clinic (one-on-one).
Day 4.
Morning: Feedforward neural networks (lecture).
Afternoon: R practice (exercise); interpretable machine learning (lecture); clinic (one-on-one).
Day 5.
Morning: R practice (exercise); advanced techniques in deep learning (demonstration).
Afternoon: Open—can be used to discuss topics requested by students (a survey will be sent ahead of the term), Q&A, or further clinics.
Class materials
Recommended: Kuhn, Max and Julia Silge. 2022. Tidy Modeling with R: A Framework for Modeling in the Tidyverse. O’Reilly. ISBN: 978-1492096481
Prerequisites
Prior knowledge of regression and R is highly recommended.
Recommended readings or preliminary materials
Notes on data visualization in R will be sent ahead of the course.