learning.understanding.cognition.intelligence.data science

Data Science Workshop

Psychology Research Experience Program (PREP) provides mentoring and experience to undergraduates who have an interest in a scientific psychology career. LUCID partnered with PREP to create a hands-on data science workshop series. LUCID & AI + Society graduate students will facilitate the data science workshops.

The 2023 data science workshop will be held on Wednesdays from 3:30-5p in the WID room 3330.

Overview:

In the workshops students will be introduced to data-science environments, concepts, and applications. LUCID & AI+ Society facilitators will introduce a series of data science concepts via online materials and hands-on sessions. PREP Students will work through examples and demos with guidance from LUCID & AI+ Society graduate students.

Goals:

For PREP students to gain a sense of  1) how to work with an R or Python integrated development environment, 2) the kinds of things one can do with a range of data-science tools, and 3) how to continue learning about and working with these tools in the future. Note that the goal will not be specifically to teach programming in R, Python, or any other language, but how to work interactively with and adapt notebooks that carry out common data-science tasks, and to get a general sense of what the methods are used for and how they might be applied to one’s own data.

Materials and Session Outlines:

This will be updated with materials and facilitator outlines as they become available.

Schedule:

Date Facilitator Topic
7/5 Tim Rogers Intro to Data Science
7/12 Kushin Mukherjee
Running Web Experiments
7/19 Sid Suresh How can cognitive scientists use Deep Learning? Using pre-trained models to perform psychology experiments
7/26 Sarah Sant’Ana Cross Validation

This is an accordion element with a series of buttons that open and close related content panels.

Running Web Experiments: A soup-to-nuts tutorial

Running Web Experiments: A soup-to-nuts tutorial

PREP 2023
Kushin Mukherjee


This tutorial will walk you through how to design and run behavioral experiments in the web browser. Once you build an experiment you can have participants do your experiment online through MTurk, Prolific, or any other crowdsourcing platform. You can also have participants come into the lab and complete these experiments on computers running the experiment locally.

This tutorial will (time permitting) have 3 parts: (1) Setting up the necessary accounts on OSF, Github, and DataPipe. (2) Designing a simple experiment using jsPsych and hosting it using Github pages. (3) Running yourself through your spiffy new experiment and looking at your data.

Preparation

Before we begin, we will need to create some accounts on some websites. If you plan to stick around and do Psychology research moving forward, you’ll almost definitely need these accounts down the line too. I recommend setting them up using an email you are confident that you’ll have access to forever (so maybe not a university email if it expires when you graduate).

Ideally use the same email for all 3 accounts.

Github

  1. Make a new account here – https://github.com/ 

OSF (Open Science Framework)

  1. Make a new account here – https://osf.io/ 

DataPipe

  1. Make a new account here – https://pipe.jspsych.org/

 

View Tutorial Here (materials will be available after 7/12)

Cross Validation

Cross Validation with Sarah Sant’Ana

Cross validation is a common resampling technique used in machine learning studies. Broadly, cross validation involves splitting data into multiple training and testing subsets to increase generalizability of the model building and evaluation processes. There are multiple types of cross validation (e.g. k-fold, bootstrapped), but all serve two primary purposes:

  • To select the best model configurations (e.g. what type of statistical model will perform best, which sets of features will perform best, covariate selection, hyperparameter tuning, outlier identification approaches, predictor transformations, and more).
  • To evaluate the expected performance of our models in new data (i.e. on individuals who were never used in model building/selection)

Why should I use cross validation? 

You should use cross validation if..

  • You are fitting a statistical model with hyperparameters that need tuning (e.g. elastic-net logistic regression, random forests, svm)
  • You are considering multiple combinations of model configurations (e.g.  features, statistical algorithms, data transformations)
  • You want to consider a large number of predictive features or you do not want to rely on theory to guide identification of predictive features
  • You want to build predictive models that will generalize well to new data (i.e. you want your model to be applied in some way)

List of ideas, concepts, or tools that are associated with this topic

  • R/RStudio (especially the caret package, tidymodels, and parsnip packages)
  • Python
  • Common types of cross validation (CV): bootstrapped CV, k-fold CV, nested CV
  • Basic knowledge of linear and logistic regression
  • Bias/variance trade offs in model fitting and evaluation
  • Generalizability of predictive models (why its important, how to prioritize it, and how to assess it)

In preparation for our meeting, please review the following materials:

During the meeting:

  • Plan on a discussion about prediction vs explanation in psychological research. I want to help you think of how you might apply cross validation in your work if you are interested 😊
  • I will be walking us through the attached Cross Validation Markdown document (open this link then download, google will default to open as a g-doc that is not functional) to provide you some code for implementing cross validation. No need to read this beforehand, but you can have it open during the session if you’d like to follow along.
  • Feel free to send me any questions beforehand or ask during the session! Happy to talk research, data science, or grad school as would feel beneficial to you all. My email is skittleson@wisc.edu

Additional Materials (not required, just for your reference)

Books

Online tutorials (blogs and code examples):

  • This is an R Markdown file written by the creator of the caret package in R (one of the most used machine learning packages in R to date). It explains how to tune the various types of hyperparameters using CV within carets train function. Even if you don’t plan to use R, it is helpful to see what types of parameters are tuned for different models and provides examples of creating and evaluating search grids, alternate performance metrics, and more. Model training and tuning
  • This is a nice (but lengthy) R Markdown example of approaching a classic machine learning problem (product price estimation) and showcases hyperparameter tuning of a couple of different algorithms (and their comparison): Product Price Prediction: A Tidy Hyperparameter Tuning and Cross Validation Tutorial. This is geared towards a  more advanced beginner – It still walks you through everything, but incorporates more robust data cleaning and exploration before model fitting.

Videos:

  • This video is a good walkthrough using K-fold cross-validation in python to select optimal tuning parameters, choose between models, and select features: Selecting the best model in scikit-learn using cross-validation
  • A short 4 minute tutorial about how to tune various types of statistical learning models within cross validation using the caret package in R. It doesn’t discuss much of the theory and is more appropriate for application focused users who are just trying to figure out how to implement parameter tuning within CV: R Tutorial – Hyperparameter tuning in caret

Papers:

  • This paper describes the impact of using different CV types for parameter selection and model evaluation: Bias in error estimation when using cross-validation for model selection.This requires intermediate level understanding of using CV for parameter selection. Many people using machine learning in applied contexts are using improper CV methods that bias their model performance estimates. We should be using nested CV (or bootstrap CV with a separate validation set) if we are planning to select model parameters and generate trustworthy performance metrics
  • Really cool preprint that describes sources of bias in ML resampling methods due to incorrect application in psychological research https://psyarxiv.com/2yber/. A more intermediate level read because it requires some understanding of multiple types of CV methods.

Using pre-trained models to perform psychology experiments

Using CNNs to run psychophysics experiments

With Sid Suresh 

Humans can quickly pool information from across many individual objects to perceive ensemble properties, like the average size or color diversity of objects. Such ensemble perception in humans is thought to occur extremely efficiently and automatically. We’ll learn about how we run experiments on a CNN to understand if ensemble representations of average size emerge in them.

The goal :

(1) Understand how we can design and run a psychophysics experiment using a pre-trained Convolutional Neural Network.

List of ideas/concepts/tools that are associated with this topic

Ensemble representations

Convolutional neural networks

Python

Google collab

Linear Regression

Logistic Regression

Prepare for the LUCID/PREP Data Science Workshop on Computational Vision Models

Colab Notebook

Here is the notebook we’ll be working together on this Wednesday – https://github.com/siddsuresh97/prep_tutorial/blob/main/tutorial.ipynb

Resources and Sessions from 2022:

This is an accordion element with a series of buttons that open and close related content panels.

Mixed Linear Models

Introduction to Mixed Linear Models with Melissa Schoenlein 

Linear models are a type of analysis used to evaluate data that is fully independent.

Mixed effects models are a type of analysis used to evaluate data with non-independence that cannot otherwise be analyzed with regular linear regression.

 

What is non-independence/non-independent data?

Non-independence occurs when two or more data are connected (correlated) in some way. For example, you run an experiment collecting ratings on interest in math. Your participants make these ratings at the start of the semester, in the middle of the semester, and then again at the end of the semester. Each of these participants has three data points. These data points are non-independent since they are from the same person and thus are related in ways beyond the experimental procedure. In other words, data points from one participant are more likely to be more similar to each other than data points from two different participants.

Non-independence can exist beyond repeated measures at the participant level to any items occurring within “units”, including classrooms, family members, cities, etc.

 

Why/when should I use mixed linear models?

Using regular linear regression when data is non-independent can lead to inflated Type 1 error rates (saying that you have a significant result, when you actually don’t!!), less statistical power, and potentially inaccurate effects. A mixed linear model should be used anytime there is a non-independent relationship in the data.

 

List of ideas/concepts/tools that are associated with this topic

Hierarchical modeling, mixed modeling, linear mixed effects models, multilevel models, etc.

Nonindependence

Fixed versus random effects

Lme4 package in R

 

 

Prepare for the LUCID/PREP Data Science Workshop on mixed models:

In preparation for our meeting on Wednesday 6/29, please watch, read, and download the following materials. 1. Watch videos 2, 3, 11, and 16 from this multi-part video series providing a general overview of mixed models, when to use them, and how to interpret them. Total time for these 4 videos: ~ 12 minutes. 2. Read through this online tutorial to that provides a walkthrough of code and output of basic linear mixed effects models in R and why we use them. We will work together through some examples in R during the workshop, so this tutorial will provide a good foundation for being ready to apply the code to different contexts.

3. Install the following packages in R: lme4, ggplot2

 

 

 

Optional additional resources if you’re interested in learning more:

Tutorials: A very short cheat sheet of using the lme4 package in R to analyze mixed models.

A Github repo with a 3-part workshop aimed at providing tutorials and exercises to learn how to do mixed models in R. The first part is a general intro to R. The second part is about statistical modeling (generally) in R. Then part 3 is mixed models in R.

A similar, but less comprehensive, tutorial demonstrating mixed models in both R and Python.

 

Papers: This paper provides guidelines for how to create linear mixed effects models, including steps on how to decide what random effects to include and how to address convergence issues with a large number of parameters.

 

Jake Westfall, a former quantitative psychologist that now works in data science/analytics in industry, has curated a list of 13 helpful readings on mixed linear models.

Computational Vision Models

Computational Models of Vision
Convolutional neural networks are a family of neural network models that have been historically used for image recognition. Over time, these models have come to be adopted as good cognitive neuroscientific models of the human visual system and there is much work connecting the computations of these models to the computations that occur in the human visual system.
The goal of our workshop will be 2-fold:
(1) To expose the audience to Google Colab, a way to run python notebooks on the cloud and share them with others.
(2) Introduce the audience to the PyTorch Image Models library and learn how to use the vast library of pre-existing models from scratch (somewhat)
Neural network features provide a powerful feature set for analyzing any kind of image-based data. And these models can be modified to perform a wide variety of tasks beyond image recognition!
List of ideas/concepts/tools that are associated with this topic
Convolutional neural networks
Transformers
Python
Jupyter notebooks
Prepare for the LUCID/PREP Data Science Workshop on Computational Vision Models
To familiarize yourself with Convolutional Neural Networks before the workshop I recommend watching this lecture that does a great job of giving a big picture view of the topic along with some necessary details – https://www.youtube.com/watch?v=iaSUYvmCekI
If you’re interested, here’s a video on an alternative neural network family known as Transformers, which are now the state of the art models when it comes to computer vision – https://www.youtube.com/watch?v=HZ4j_U3FC94
We might get to these if we have time on Wednesday!
Colab Notebook
Last but not least, this is the notebook we’ll be working together on this Wednesday – https://colab.research.google.com/drive/16En1-7C9A14VpCGHPIuhPlU0ygc8-svk#scrollTo=bZSF5AQWW-D3
It might be tidy and ready to go before Wednesday’s workshop, so take a peep at your own risk!

Using pre-trained models to perform psychology experiments

Using CNNs to run psychophysics experiments

Humans can quickly pool information from across many individual objects to perceive ensemble properties, like the average size or color diversity of objects. Such ensemble perception in humans is thought to occur extremely efficiently and automatically. We’ll learn about how we run experiments on a CNN to understand if ensemble representations of average size emerge in them.

The goal :

(1) Understand how we can design and run a psychophysics experiment using a pre-trained Convolutional Neural Network.

List of ideas/concepts/tools that are associated with this topic

Ensemble representations

Convolutional neural networks

Python

Google collab

Linear Regression

Logistic Regression

Prepare for the LUCID/PREP Data Science Workshop on Computational Vision Models

Colab Notebook

Here is the notebook we’ll be working together on this Wednesday – https://github.com/siddsuresh97/prep_tutorial/blob/main/tutorial.ipynb

Designing experiments with JsPsych & data cleaning in R

We will be covering JsPsych and making a basic experiment. See your email for the zip file with basic materials for learning JSPsych.
We’ll be going over the code and running/compiling it together.
We’ll be using an HTML editor for this (really any code editing software should work, even Notepad and the like). If you’re interested in trying a useful and simple editor software, I like using Brackets (https://brackets.io/

Natural Language Processing

LUCID/PREP workshop
The goal of the session:
We will use the resources and tools below and see how we can go from an idea/question to a presentable web-based interactive solution.
In this session, we will evaluate what moral stance does GPT-3 take when discussing contemporary issues?
To do so, we will chat about NLP(Natural Language Processing) model GPT3, how to use GPT3 API, a quick synopsis of moral foundation, and a way to build a web-based app and host it online.
 
Natural Language Processing Model
Quick synopsis of the OpenAI’s Natural Language Processing model GPT3:
Original paper if you are interested:
Create a free account if you want to use GPT3 API and have some fun with it:
Moral Foundation
Paper on the extended moral foundation:
(lightly skim through the abstract, introduction, and figures to get a general idea)
Build an Interactive Web-based app and host (Python backend)
Lightweight Python web framework that provides tools and features that helps us to create web applications
Cloud platform service that helps us deploy, manage, and scale modern apps

Resources and Sessions from 2021:

This is an accordion element with a series of buttons that open and close related content panels.

Natural Language Processing (NLP)

Session 1: Introduction to Natural Language Processing

Prepared by Laura Stegner, stegner@wisc.edu

What is Natural Language Processing?

Natural Language Processing (NLP) can be broadly thought of as the computational tools used to help computers understand and manipulate spoken or written natural language to do useful things. This goal can be achieved with the help of various NLP tasks, such as:

  • Part of speech taggings
  • Speech recognition
  • Word sense disambiguation
  • Sentiment analysis
  • Natural langauge generation
  • Named entity recognition
  • Co-reference resolution
    Each of the above tasks is briefly described in this article by IBM.

Practically, NLP is present in our everyday lives. Some common examples include autocorrect, autocomplete, related search terms in a web engine, email filtering, smart agents (e.g. Siri or Alexa), and machine translation (e.g. Google Translate). It is also useful in business applications such as to analyze reviews or to create automated calling systems and chat bot assistants.

When would I want to use NLP?

While NLP is being readily implemented in everyday products, it is also greatly useful in data science. NLP can be used to convert messy, unstructured natural language responses (such as interview data or open responses to survey questions) into more structured, processable data forms. Using NLP techniques to analyze data can serve to speed up processing time and also eliminate inconsistencies from manual analysis.

Preparation

Prior to our meeting, please review the following materials:

Also think about the following. We will have a discussion related to some of these topics 🙂

  • Times you have encountered NLP in either your research or your daily life.
  • Situations where you don’t use NLP but why it would come in handy, and how.
  • Why we should care about the ethical considerations of NLP in data science.

Additionally, install the following packages in Python 3:

  • nltk: pip3 install nltk==3.3 or python3 -m pip install nltk==3.3

Additional Reading / Reference

Chowdhury, G.G. (2003), Natural language processing. Ann. Rev. Info. Sci. Tech., 37: 51-89. https://doi-org.ezproxy.library.wisc.edu/10.1002/aris.1440370103

Hovy, D., & Spruit, S. L. (2016, August). The social impact of natural language processing. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) (pp. 591-598). https://www.aclweb.org/anthology/P16-2096.pdf

Leidner, J. L., & Plachouras, V. (2017, April). Ethical by design: Ethics best practices for natural language processing. In Proceedings of the First ACL Workshop on Ethics in Natural Language Processing (pp. 30-40). https://www.aclweb.org/anthology/W17-1604.pdf

Tutorial and demo materials are on Laura’s github site:

https://github.com/lstegner/nlp-tutorial-PREP2021/tree/main/tutorial-materials

Regularization

Introduction to Regularization with Kendra Wyant

What is Regularization?

Regularization is a type of regression that imposes a penalty to coefficients in complex models. This penalty reduces overfitting by introducing some bias into the model. As we see with the bias-variance tradeoff, introducing some bias can reduce variance in model predictions on new data making the model more generalizable.

Types of regularization
* Ridge regression: variables with minor contribution have their coefficients close to zero. However, all the variables are incorporated in the model. This is useful when all variables need to be incorporated in the model according to domain knowledge.
* Lasso regression: the coefficients of some less contributive variables are forced to be exactly zero. Only the most significant variables are kept in the final model.
* Elasticnet regression: the combination of ridge and lasso regression. It shrinks some coefficients toward zero (like ridge regression) and set some coefficients to exactly zero (like lasso regression).

List of Related Topics/Ideas

We won’t be able to cover all of these topics due to time, but I will provide resources and code for anyone who is interested in exploring these further or using them in their own research. I am also happy to chat more outside the workshop!

  • Prediction vs Explanation in Psychology
  • Overfitting
  • Bias/variance tradeoff
  • Test and training sets
  • Cross-validation and resampling

Preparation

Watch:

StatQuest Youtube Series

  1. Machine learning fundamentals – bias and variance (6:35) https://www.youtube.com/watch?v=EuBBz3bI-aA
  2. Ridge regression clearly explained (20:26) – https://www.youtube.com/watch?v=Q81RR3yKn30
  3. Lasso regression clearly explained (8:18) – https://www.youtube.com/watch?v=NGf0voTMlcs&t
  4. Elasticnet regression clearly explains (5:18) – https://www.youtube.com/watch?v=1dKRdX9bfIo

Optional: Machine Learning Fundamentals: Cross Validation (6:04) https://www.youtube.com/watch?v=fSytzGwwBVw

Read:

  1. Skim the first 10 pages of Yarkoni and Westfall (2017) https://www.youtube.com/watch?v=1dKRdX9bfIo
  2. Read this blog post on overfitting https://www.ibm.com/cloud/learn/overfitting

Install Software:

  • We will be using R and RStudio
  • Install the following packages in RStudio:
    install.packages(“tidyverse”)
    install.packages(“tidymodels”)
    install.packages(“kableExtra”)
    install.packages(“skimr”)
    install.packages(“naniar”)
    install.packages(“doParallel”)
    install.packages(“mlbench”)
    install.packages(“vip”)
    install.packages(“Matrix”)
    install.packages(“glmnet”)

Additional Resources

Coding

Machine learning resources

I am looking forward to meeting all of you on Wednesday. Please don’t hesitate to reach out about anything (kpaquette2@wisc.edu). I am happy to talk about data science, PREP, Madison, grad school, and more!

Demo script and other resources can be found on Kendra’s github site:

https://github.com/KendraPaquette/intro_to_regularization

Mixed Linear Models

Introduction to Mixed Linear Models with Melissa Schoenlein

Mixed linear models are a type of analysis used to evaluate data with non-independence that cannot otherwise be analyzed with regular linear regression.

What is non-independence/non-independent data?

Non-independence occurs when two or more data are connected (correlated) in some way. For example, you run an experiment collecting ratings on interest in math. Your participants make these ratings at the start of the semester, in the middle of the semester, and then again at the end of the semester. Each of these participants has three data points. These data points are non-independent since they come from the same person and thus are related in ways beyond the experimental procedure (i.e. points from one participant are more likely to be more similar to each other than data points from two different participants).

Non-independence can exist beyond repeated measures at the participant level to any items occurring within “units”, including students in classrooms, family members, etc.

Why/when should I use mixed linear models?

Using regular linear regression when data is non-independent can lead to inflated Type 1 error rates, less statistical power, and potentially inaccurate effects. A mixed linear model should be used anytime there is a non-independent relationship in the data.

List of ideas/concepts/tools that are associated with this topic

Hierarchical modeling, mixed modeling, linear mixed effects models, multilevel models, etc.

Nonindependence

Fixed versus random effects

Lme4 package in R

Preparation:

In preparation for our virtual workshop Wednesday 7/7 at 4pm CST, please watch and read the following materials.

  1. Watch videos 1-3, 11, and 16 from this multi-part video series providing a general overview of mixed models, when to use them, and how to interpret them (totals ~ 12 minutes). Video 11 focuses on repeated measures models, which will be the focus of our workshop.
  2. Skim through this online tutorial to that provides a walkthrough of code and output of basic linear mixed effects models in R and why we use them.
  3. Skim through this very short cheat sheet of using the lme4 package in R to analyze mixed models.
  4. Install the following packages in R: lme4, ggplot2

Optional additional resources if you’re interested in learning more:

Videos:

A high level video overview of mixed models (mostly framed in terms of hierarchical models). The first half of the video describes when/why someone would use these models. The second half starts to touch into the equations/math for these models.

Tutorials:

A Github repo with a 3-part workshop aimed at providing tutorials and exercises to learn how to do mixed models in R. The first part is a general intro to R. The second part is about statistical modeling (generally) in R. Then part 3 is mixed models in R.

A similar, but less comprehensive, tutorial demonstrating mixed models in both R and Python.

Papers:

This paper provides guidelines for how to create linear mixed effects models, including steps on how to decide what random effects to include and how to address convergence issues with a large number of parameters.

Jake Westfall, a former quantitative psychologist that now works in data science/analytics in industry, has curated a list of 13 helpful readings on mixed linear models.

 

Workshop will be led by Melissa Schoenlein. I can be reached at schoenlein@wisc.edu if there are any issues accessing these materials or if there are any questions (about the workshop, the PREP program, the department, or anything!). Looking forward to meeting this year’s PREPsters!

Principal Component Analysis

Introduction to Principal Component Analysis with Vince Frigo

Install or download the following files: PCA files 
(If you have any issues downloading the individual files on the google drive, try the zip file!)
r packages:  pysch, ggplot
subscripts:  get_demog.r, get_survdat.r, plot_alresp.r, plot_fload.r, plot_fcenters.r, plot_kfc.r, delna.r
anonymized data:  qualtrics_alldat_500.csv
Read and explore this tutorial: www.datacamp.com/tutorials/pca-analysis-r
If you have time feel free to explore the whole tutorial, but before our meeting at 4p on 7/14 please:
  1. Read through the PCA introduction
  2. Try a simple PCA
  3. Use PCA results to plot

Optimization 

Optimization with Scott Sievert

Watch Scott’s Intro Video: box.stsievert.com/prep

The following can also be found on Scott’s github: github.com/stsievert/PREP21

Hello PREP students! My learning objectives are to answer these questions:

  • Why should I care about optimization?
  • What are the basics of optimization? How do I get a better solution?
  • Where does optimization fail?

“Optimization” is producing a model that accurately represents data, aka “fitting” a model to data. Importantly, the choice of “model” and “data” are perhaps more important than the specific method of fitting the model to the data. In short, optimization is what happens with this code:

from sklearn.linear_model import LinearRegression

estimator = LinearRegression()

# X and y are stand-ins for other data; they could easily from a CSV
X = [[1, 2], [3, 4]]
y = [3, 5]

est.fit(X, y)

In this lesson, I’ll try to open up the black box that happens when you call fit. I’ve selected about an hour’s worth of video for you to watch, and will try to highlight some relevant issues in person.

Note: optimization is heavy in mathematics. I will try to illustrate optimization without relying on mathematics.

Background

What’s optimization?

Optimization is a process to “fit” a “model” to “data.”

  • Data, typically some features and a label for each example.
  • A model which will try to predict the label from a feature factor.
  • A loss function that characterizes how poorly the model is performing for a specific example.

“Fitting” means “can the model accurately predict an unseen example?” Here are some good background videos on the components above:

  • What’s optimization? youtube.com/watch?v=x6f5JOPhci0 (10:08) provides a general overview of optimization methods (and tradeoffs of those methods) and and some common issues in optimization in a real-world example.
  • How are machine learning (ML) and optimization related? youtube.com/watch?v=NzwMV2b7jbQ (10:31) introduces ML models, and introduces how to find it. In addition the primary goal given noisy/non-standard examples?

How are models found?

The videos above provide a general overview of machine learning/optimization and a general idea of what happens inside fit. Now, let’s get into some specifics on how to find the best model for the models mentioned in “Mixed Linear Models”:

This is enough background to get understand my examples. In the example, I’ll highlight some issues with optimization, included data size, noise and loss functions.

Demo

The videos above are all the material you need for the demo. To follow along for me demos, visit github.com/stsievert/PREP21/blob/master/README.md

Want to learn more?

This material is not required for the example.

Here are some other useful videos:

Also, I would skim Chapter 7 of “Shape” by Jordan Ellenburg (23 pages). It’s light reading, and stitches a good story of optimization. The author, Jordan Ellenburg, is a mathematics professor at UW–Madison and experienced with optimization.

This Chapter is found in #prep channel in the LUCID slack workspace: wisc-lucid.slack.com

In addition, I’ve written a blog series on optimization that try to introduce the math behind optimization:

  1. “Least squares and regularization,” which steps through the basics of linear regression stsievert.com/blog/2015/11/19/inverse-part-1/
  2. “Finding sparse solutions to linear systems,” which examines a particular type of regularization (and has some fancy interactive widgets to understand what the minimization is doing) stsievert.com/blog/2015/12/09/inverse-part-2/
  3. “Gradient descent and physical intuition for heavy-ball acceleration with visualization”, which looks at a method to modify optimization methods. stsievert.com/blog/2016/01/30/inverse-3/

Convolutional Neural Networks

CNNs with Lowell Thompson

In this week’s session we will be learning about neural networks, focusing primarily on convolutional neural networks (CNNs). CNNs have become a useful tool for the development of self-driving cars, object and face recognition software, medical imaging analysis (e.g., MRI), and many other areas. These models can be simple to build using tools like TensorFlow and Pytorch, the latter of which we’ll use for our demo. Their inner workings, however, combine nearly all of the tools introduced throughout this workshop including linear regression, regularization, optimization, and dimensionality reduction. I hope to provide a brief introduction to CNNs, give you some hands-on experience with a pre-built model and then provide some time for discussion.

Session outline:

  • Introduction to Neural Networks
  • Introduction to CNNs
  • Demo session with a pre-built CNN
  • Discussion

Preparation for the workshop:

If you have trouble viewing any of the materials, please let me know (lwthompson@wisc.edu).

Resources and Sessions from 2020:

This is an accordion element with a series of buttons that open and close related content panels.

Support Vector Machines

Session 1: Support Vector Machines (SVM) with Kushin Mukherjee

Support Vector Machines (SVMs) deal with a fundamentally simple problem – how do we divide up datapoints using some form of meaningful decision boundary in a supervised learning setting? This approach gets its name from support vectors, a subset of the labeled data points whose dot products help in determining the decision boundary.

SVM

In contrast to approaches like simple neural networks or least-squares classifiers SVMs have 2 overall advantages that are important to consider together:

  1. They do not get stuck in local minima. If the data are linearly separable, the algorithm will always find the same ‘best’ decision boundary
  2. If the data aren’t linearly separable, the SVM approach supports a transformation of the dot products in a space where the data are linearly separable. This is what’s known as the ‘kernel-trick’ in SVMs.

(Note: While I do distinguish the SVM approach from simple neural networks, it has been shown that there are specific classes of neural networks that are equivalent to kernel-methods such as those in SVM. Here’s a brief summary – What are the Mathematical Relationship between Kernel Methods and Neural Networks

List of ideas/concepts/tools that are associated with this topic

  • Classification
  • Supervised learning
  • Linear separability
  • Kernel methods

Preparation for meeting: 

First Watch: Patrick Winston’s lecture on SVMs is one of the easiest to follow and assumes a very minimal background in linear algebra and multivariable calculus: Youtube

Try this out second! You will need Jupyter and the necessary libraries installed. A python based implementation of SVM using scikit-learn: Stackabuse

Additional Optional Resources: 

Videos:

One might like Andrew Ng’s lecture on the same from 2018, which is a bit more recent, but SVMs haven’t changed much over the past decade: Youtube (start from 46:20)

Online tutorials:

To get a stronger grasp on the mathematics behind SVMs and do some ‘hands-on’ work with them I recommend this site: SVM Tutorial

Here’s another jupyter notebook based python implementation of SVMs using scikit-learn: Learnopencv

Applied Papers:

The following is useful for seeing how these tools are used in cognitive science more broadly.

Here are 2 papers that employ SVMs in NLP and cognitive neuroscience settings

Shallow semantic parsing of sentences using SVMs: aclweb

Effective functional mapping of fMRI data using SVMs: ncbi

Theory Papers:

The original SVM paper by Vladamir Vapnik: image.diku 

Jupyter Notebooks Tutorial

Jupyter Notebooks Online Tutorial with Pablo Caceres

The following is a great resource to watch/read at your own pace, and feel free to contact Pablo with any questions.

Blogpost format (with video-lessons embedded)
Video-lesson format playlist with Pablo’s explanations
Jupyter Notebook format on GitHub

Unix Shell Tutorial

UNIX Shell Tutorial with Pablo Caceres
Blogpost Format (dark background): https://pabloinsente.github.io/intro-unix-shell
There are instructions to follow along for Windows and Mac/Linux users, and an online option too. It is optional to follow along, you can just read if you would like to do so.
Here is a presentation with resources for Shell, Git and IDE’s by Pablo Caceres: Things that are good to know for Data Science Beginners  

R Markdown

Introduction to R Markdown with Gaylen Fronk

R Markdown provides an authoring framework for data science in R. With a single R Markdown file, you can not only write, save, and execute your code but also communicate your process and results with an audience using high-quality, reproducible output formats. 

More detail about R Markdown

R Markdown builds off tools already available in R and RStudio to provide an integrated environment for processing, coding, and communicating. An R Markdown file can include text, chunks of code, images, links, figures, and tables. While you’re working in your RStudio environment, your file operates similarly to a normal R script (a .R file) – you can write, edit, and evaluate code to work with your data. At any point, you can “knit” your file. Knitting runs, evaluates, and compiles your R Markdown file into your desired output (e.g., HTML, PDF) to create a single document that includes all the components of your written file plus the results. This knit file is ready for high-quality scientific communication with any audience. If you’ve ever seen nice examples of R code and output online, it was probably made using R Markdown.

Why should I use R Markdown? 

R Markdown is particularly helpful if…

  • You already work in R or RStudio and would like some additional tools at your disposal
  • You value reproducible output
  • You would like to be able to share your work with people who are less familiar with R (or coding more generally)

R Markdown combines the data wrangling and analytic tools of R with high-class scientific communication. It can become your one-stop-shop for sharing your data science.

Prepare for the LUCID/PREP Data Science Workshop on R Markdown:

In preparation for our video meeting next week (Wednesday 7/1 at 4pm CST), please watch, read, or review the following materials.

  1. Begin with this 1-minute video of what’s possible with R Markdown.
  2. Read Chapter 1 (Installation) from R Markdown: the Definitive Guide (Note: you should have R & RStudio installed prior to our workshop. Confirm in advance that you can open these applications.)
  3. Read Chapter 2 (Basics) from R Markdown: the Definitive Guide
  4. Read this section of Chapter 3 (Outputs: HTML) from R Markdown: The Definitive Guide
  5. Review this cheat sheet and have it handy for our meeting

Optional additional resources if you’re interested in learning more:

  • This paper from the Statistics area of arXiv.org discusses how R Markdown can improve data science communication workflow. It’s perfect for people interested in understanding why R Markdown may be beneficial and receiving examples of its use-cases. 
  • This online book contains lessons on R Markdown basics, specific output formats, in-line and chunk code, tables, interactive websites, presentations, using multiple coding languages, and more. It’s perfect for someone looking for a comprehensive (yet still quite succinct) tutorial on using R markdown
  • The Communication section from the R for Data Science online book includes several chapters on R markdown (the tidyverse’s preferred method for statistical and scientific communication) 
  • This online code from GitHub Gist provides an example/walkthrough of using R Markdown.

A note from Gaylen:

If you have questions about these materials or other questions you’d like answered during our workshop, you can submit them via this form. Please try to do this by Tuesday 6/30 at 5pm CST so that I can aggregate questions in advance.

Workshop will be led by Gaylen Fronk. You can email me at gfronk@wisc.edu if you have problems accessing these materials or installing R/RStudio. Looking forward to meeting you all!

Regression using Jupyter Notebooks

Optimization and model regularization with Owen Levin

Linear regression and many other model fitting problems can be viewed mathematically as solutions to optimization problems.  We’ll explore how this can help generalize our models as well as how we can introduce regularization to emphasize fitting models with special properties.

Overview:

  • linear regression as an optimization problem
    • introduce loss functions
  • curve fitting as optimization
  • Is a perfect fit actually perfect? (wacky zero loss examples)
  • model regularization
    • small weights
    • sparsity

Preparation:

1.If you haven’t already downloaded anaconda or another python distribution please do so.

2.View this video: Owen’s Regression Intro

3.Jupyter Notebook: Optimization & Regularization

Or check out the github repository for session 4 on Optimization and Regularization. github.com/pabloinsente/LUCID_data_workshop

Mixed Linear Models

Introduction to Mixed Linear Models with Melissa Schoenlein

Mixed linear models are a type of analysis used to evaluate data with non-independence that cannot otherwise be analyzed with regular linear regression.

What is non-independence/non-independent data?

Non-independence occurs when two or more data are connected (correlated) in some way. For example, you run an experiment collecting ratings on interest in math. Your participants make these ratings at the start of the semester, in the middle of the semester, and then again at the end of the semester. Each of these participants has three data points. These data points are non-independent since they come from the same person and thus are related in ways beyond the experimental procedure (i.e. points from one participant are more likely to be more similar to each other than data points from two different participants).

Non-independence can exist beyond repeated measures at the participant level to any items occurring within “units”, including students in classrooms, family members, etc.

Why/when should I use mixed linear models?

Using regular linear regression when data is non-independent can lead to inflated Type 1 error rates, less statistical power, and potentially inaccurate effects. A mixed linear model should be used anytime there is a non-independent relationship in the data.

List of ideas/concepts/tools that are associated with this topic

Hierarchical modeling, mixed modeling, linear mixed effects models, multilevel models, etc.

Nonindependence

Fixed versus random effects

Lme4 package in R

Preparation:

In preparation for our video Wednesday 7/15 at 4pm CST, please watch and read the following materials.

  1. Watch videos 1-3, 11, and 16 from this multi-part video series providing a general overview of mixed models, when to use them, and how to interpret them (totals ~ 12 minutes). Video 11 focuses on repeated measures models, which will be the focus of our workshop.
  2. Skim through this online tutorial to that provides a walkthrough of code and output of basic linear mixed effects models in R and why we use them.
  3. Skim through this very short cheat sheet of using the lme4 package in R to analyze mixed models.
  4. Install the following packages in R: lme4, ggplot2

Optional additional resources if you’re interested in learning more:

Videos:

A high level video overview of mixed models (mostly framed in terms of hierarchical models). The first half of the video describes when/why someone would use these models. The second half starts to touch into the equations/math for these models.

Tutorials:

A Github repo with a 3-part workshop aimed at providing tutorials and exercises to learn how to do mixed models in R. The first part is a general intro to R. The second part is about statistical modeling (generally) in R. Then part 3 is mixed models in R.

A similar, but less comprehensive, tutorial demonstrating mixed models in both R and Python.

Papers:

This paper provides guidelines for how to create linear mixed effects models, including steps on how to decide what random effects to include and how to address convergence issues with a large number of parameters.

Jake Westfall, a former quantitative psychologist that now works in data science/analytics in industry, has curated a list of 13 helpful readings on mixed linear models.

 

Workshop will be led by Melissa Schoenlein. I can be reached at schoenlein@wisc.edu if there are any issues accessing these materials or if there are any questions (about the workshop, the PREP program, the department, or anything!). Looking forward to meeting this year’s PREPsters!

Data Visualization with Python in Jupyter Notebooks

Data Visualization with Python in Jupyter Notebooks with Pablo Caceres

Introduction
In this tutorial I will introduce Altair, which is a declarative statistical visualization library for Python based on Vega and Vega-Lite.
Altair provides an elegant and consistent API for statistical graphics. This library is built on top of the Vega-Lite high-level grammar for interactive graphics which is based on the “grammar of graphics” idea proposed by Leland Wilkinson. Altair key strength is the provision of a clear mental model based on a set of graphical primitives and carefully designed combinatorial rules, that yield an ample space of graphical displays, avoiding the constraints of chart taxonomies.
Preparation
Optional resources
We likely will not have enough time to follow along during the workshop, but you can find instructions to either run the examples online or to install the required packages and run the examples locally here:   https://github.com/pabloinsente/pydata_altair_tutorial

Cross Validation

Cross Validation with Sarah Sant’Ana

Cross validation is a common resampling technique used in machine learning studies. Broadly, cross validation involves splitting data into multiple training and testing subsets to increase generalizability of the model building and evaluation processes. There are multiple types of cross validation (e.g. k-fold, bootstrapped), but all serve two primary purposes:

  • To select the best model configurations (e.g. what type of statistical model will perform best, which sets of features will perform best, covariate selection, hyperparameter tuning, outlier identification approaches, predictor transformations, and more).
  • To evaluate the expected performance of our models in new data (i.e. on individuals who were never used in model building/selection)

Why should I use cross validation? 

You should use cross validation if..

  • You are fitting a statistical model with hyperparameters that need tuning (e.g. elastic-net logistic regression, random forests, svm)
  • You are considering multiple combinations of model configurations (e.g.  features, statistical algorithms, data transformations)
  • You want to consider a large number of predictive features or you do not want to rely on theory to guide identification of predictive features
  • You want to build predictive models that will generalize well to new data (i.e. you want your model to be applied in some way)

List of ideas, concepts, or tools that are associated with this topic

  • R/RStudio (especially the caret package, tidymodels, and parsnip packages)
  • Python
  • Common types of cross validation (CV): bootstrapped CV, k-fold CV, nested CV
  • Basic knowledge of linear and logistic regression
  • Bias/variance trade offs in model fitting and evaluation
  • Generalizability of predictive models (why its important, how to prioritize it, and how to assess it)

In preparation for our meeting next Tuesday, please review the following materials:

During the meeting on Tuesday

  • Plan on a discussion about prediction vs explanation in psychological research. I want to help you think of how you might apply cross validation in your work if you are interested 😊
  • I will be walking us through the attached Cross Validation Markdown document (open this link then download, google will default to open as a g-doc that is not functional) to provide you some code for implementing cross validation. No need to read this beforehand, but you can have it open during the session if you’d like to follow along.
  • Feel free to send me any questions beforehand or ask during the session! Happy to talk research, data science, or grad school as would feel beneficial to you all. My email is skittleson@wisc.edu

Additional Materials (not required, just for your reference)

Books

Online tutorials (blogs and code examples):

  • This is an R Markdown file written by the creator of the caret package in R (one of the most used machine learning packages in R to date). It explains how to tune the various types of hyperparameters using CV within carets train function. Even if you don’t plan to use R, it is helpful to see what types of parameters are tuned for different models and provides examples of creating and evaluating search grids, alternate performance metrics, and more. Model training and tuning
  • This is a nice (but lengthy) R Markdown example of approaching a classic machine learning problem (product price estimation) and showcases hyperparameter tuning of a couple of different algorithms (and their comparison): Product Price Prediction: A Tidy Hyperparameter Tuning and Cross Validation Tutorial. This is geared towards a  more advanced beginner – It still walks you through everything, but incorporates more robust data cleaning and exploration before model fitting.

Videos:

  • This video is a good walkthrough using K-fold cross-validation in python to select optimal tuning parameters, choose between models, and select features: Selecting the best model in scikit-learn using cross-validation
  • A short 4 minute tutorial about how to tune various types of statistical learning models within cross validation using the caret package in R. It doesn’t discuss much of the theory and is more appropriate for application focused users who are just trying to figure out how to implement parameter tuning within CV: R Tutorial – Hyperparameter tuning in caret

Papers:

  • This paper describes the impact of using different CV types for parameter selection and model evaluation: Bias in error estimation when using cross-validation for model selection.This requires intermediate level understanding of using CV for parameter selection. Many people using machine learning in applied contexts are using improper CV methods that bias their model performance estimates. We should be using nested CV (or bootstrap CV with a separate validation set) if we are planning to select model parameters and generate trustworthy performance metrics
  • Really cool preprint that describes sources of bias in ML resampling methods due to incorrect application in psychological research https://psyarxiv.com/2yber/. A more intermediate level read because it requires some understanding of multiple types of CV methods.

Neural Networks

Neural Networks with Ray Doudlah

Machine learning and artificial intelligence technology is growing at an impressive rate. From robotics and self-driving cars to augmented reality devices and facial recognition software, models that make predictions from data are all around us. Many of these applications implement neural networks, a computational model inspired by the brain.

With recent advancements in computing power and the explosion of big data, we can now implement large models that are capable of learning how to accomplish a task by itself, by only looking at the data you feed it. These deep learning models learn to extract features that the model finds important to help it accomplish the task.
In this week’s session we will be learning about neural networks, and get to play with a convolutional neural network, a model that is used in machine vision, object recognition, and self-driving cars. The topic of neural networks is very broad, so my goal is to give you a brief overview and provide you with enough resources so that you can learn more about specific models that may be applicable to your scientific work. I also want to give you some hands-on practice with running a pre-built model so you can get an intuition for what these models are doing under the hood.
Session outline:
  • Introduce neural networks and their general architecture
  • Introduce convolutional neural networks
  • Implement a convolutional neural network to solve a hand writing recognition task
Preparation for the workshop:

Overleaf

Overleaf by Glenn Palmer

LaTeX is a typesetting system that can be used to write academic papers and create professional-looking documents. Users type in plain text format, but mark up the text with tagging conventions, and the nicely-formatted result is shown in an output file. Overleaf is an online platform that can be used to create and edit LaTeX documents. You can share and simultaneously edit documents with collaborators, similar to the way you collaborate on a Google Doc.

For a high-level overview of LaTeX, Overleaf, and the resources below watch this video:
e

Videos

  • This playlist of videos is a good starting place. They were made by a company called ShareLaTeX, which recently merged with Overleaf. These videos give a good idea of how to get started using LaTeX with an online editing system.

Online tutorials

  • For more detail, and/or for a range of written tutorials, the Overleaf documentation page has a wide range of information to help get started, or to answer specific questions you might have as you get used to using LaTeX.

Cheat sheet

  • For a quick reference as you’re writing, this cheat sheet includes a bunch of commands for various formatting options, with a focus on writing scientific papers.

Resources and Sessions from 2019:

This is an accordion element with a series of buttons that open and close related content panels.

Introduction to Data Science with R

Session 1: Introduction to data science with R with Tim Rogers

This session will introduce you working with data in an “integrated development environment” or IDE using the freely available and widely-used software package R. We will briefly discuss what is meant by the term “data science,” why data science is increasingly important in Psychology and Neuroscience, and how it differs from traditional statistical analysis. We will then get a sense for how IDEs work by building, from data generated in the workshop, an interactive graph showing the structure of your mental semantic network.

Preparation for the workshop: (TO DO before arriving on Tuesday!)
– Install R, R Studio, and Swirl on your laptop following the instructions here: swirlstats.com/students
– Start Swirl as instructed at the website and install the first course module by following the prompts
– Run yourself through the first course module

Time to complete: 45-60 minutes. Feel free to work with a partner or in groups!

Overview:
We learned  how to create semantic clusters from lists of animals. Tim created this Semantic Network Demo to view the interactive graph and get the code that was used to generate the semantic clusters. The demo walks through the process of building and visualizing graphs

Using Github & Jupyter Notebooks

Session 2: Using Github, JuPyteR notebooks in several data science environments with Pablo Caceres

In this session we will set up several data science tools and environments: ATOM text editor, Python with Anaconda, Jupyter Notebooks/lab, IRKernel (to run R on Jupyter), Git (Mac/Linux) or GitBash (Windows), GitHub account, GitHub Repository, and a folder system. Then we will go over the basics of how to open, run and test each tool.

Preparation for the workshop:

– Download and install Atom text editor from atom.io
– Download and install Git* from git-scm.com/downloads
Windows users: when installing git, make sure you have the ‘GitBash Here’ selected.
Overview: 
Pablo created a Github Repository for the workshop. If you click on session 2 you will find all the topics that Pablo covered in this session. Stay tuned as we plan to update this repository with more content.

Fitting & Evaluating Linear Models

Session 3: Fitting and evaluating linear models with John Binzak

This session will introduce you to working with linear regression models using R. We will briefly discuss why linear regression is useful for Psychology and Educational research, using the topic of numerical cognition as an example.  We will play an educational game to generate our own data in the work shop, form predictions, and test those predictions by modeling gameplay performance. Through this exercise we will cover how to fit linear regression models, assess the fit of those models, plot linear relationships, and draw statistical inferences.

Preparation for the workshop: 
– Be ready to uses R, R Studio, and Swirl on your laptop following the instructions here: swirlstats.com/students
–Install the “Regression Models” swirl module using the following commands in R
> library(swirl)
> swirl::install_course(“Regression Models”)
> swirl()

– Run yourself through lessons 1-6 (Introduction-MultiVar Examples) and continue based on your interest.

Time to complete: 45-60 minutes. Feel free to work with a partner or in groups!

Optimization & Model Regularization

Session 4: Optimization and model regularization with Owen Levin

Linear regression and many other model fitting problems can be viewed mathematically as solutions to optimization problems.  We’ll explore how this can help generalize our models as well as how we can introduce regularization to emphasize fitting models with special properties.

  • linear regression as an optimization problem
    • introduce loss functions
  • curve fitting as optimization
  • Is a perfect fit actually perfect? (wacky zero loss examples)
  • model regularization
    • small weights
    • sparsity

Preparation: If you haven’t already downloaded anaconda or another python distribution please do so.

Overview: Please check out the github repository for session 4 on Optimization and Regularization. github.com/pabloinsente/LUCID_data_workshop

Pattern Recognition & Varieties of Machine Learning

Session 5: Pattern recognition and varieties of machine learning with Ashley Hou

Owen and Ashley will be co-facilitating this session.

This session will introduce basic concepts in machine learning. We will first discuss an overview of the steps involved in the machine learning process and the two main categories of machine learning problems. Then, we will walk through examples in both supervised and unsupervised learning, specifically classification using SVMs (discussing the regularization perspective) and clustering using the k-means clustering algorithm. We will conclude with brief discussion on other popular machine learning algorithms, when to use them, and good resources to learn more.

Preparation for the workshop: 1. review session 4’s overview 2. have a working Python3 distribution, scikit-learn, matplotlib, numpy, pandas, and jupyter notebook

Cross-Validation

Session 6: Cross-validation with Sarah Sant’Ana

Today’s session will introduce the concept of cross validation. Using instructional videos from the Datacamp Machine Learning toolbox, we will walk through basic examples of cross validation in R using the caret package. We will be using two publicly available data sets in R for example code.

Our goals for this session are :

1) Learn why cross validation is important
2) Learn the basic steps of k-fold cross validation and repeated k-fold cross validation
3) Provide you with basic code to use on your own

Preparation for the workshop:

– Be ready to uses R, R Studio

– Read the Yarkoni & Westfall (2017) through page 5. You can stop reading at “Balancing Flexibility and Robustness: Basic Principles of Machine Learning.” The purpose of this article is just to get you thinking about the discussion we will have during session – it is not necessary to have a crystal clear understanding!

Yarkoni, T., & Westfall, J. (2017). Choosing prediction over explanation in psychology: Lessons from machine learning. Perspectives on Psychological Science, 12(6), 1100-1122.

Neural Networks

Session 7: Neural Networks with Ray Doudlah

Machine learning and artificial intelligence technology is growing at an impressive rate. From robotics and self-driving cars to augmented reality devices and facial recognition software, models that make predictions from data are all around us. Many of these applications implement neural networks, which basically allows the computer to analyze data similar to the way the human brain analyzes data.

With recent advancements in computing power and the explosion of big data, we can now implement large models that perform end-to-end learning (deep learning). This means that we can create a model, feed it tons and tons of data, and the model will learn features from the data that are important for accomplishing the task.

Session outline:
• Introduce the simplest neural network, the perceptron
• Discuss the general architecture for neural networks
• Implement a neural network to solve a hand writing recognition task
• Introduce deep learning (convolutional neural networks)
• Implement a deep neural network to solve a hand writing recognition task

Preparation for the workshop:

  1. Watch the following videos:
  2. Pull session 7 materials from GitHub

Bayesian Inference

Session 8: Bayesian Inference: estimating unobservable variables with Lowell Thompson

This session will focus on introducing the utility of a common statistical method known as Bayesian Inference. We’ll focus first on Bayes Theorem and learn how it relates to our understanding of perception as an inverse problem. Since the majority of research in perception relies on various psychophysical methodologies to assess behavior, we’ll also walk through how you might generate your own experiments in python using a package called Psychopy. After obtaining some data, we’ll look at a specific example that illustrates the utility of Bayesian inference in modeling our own behavioral data. Lastly, we’ll go over Bayesian inference in the broader context of data science.

Session Outline:

  1. Introduce Bayes Theorem
  2. Understand the utility of Bayesian inference in a variety of contexts
  3. Learn the basics of Psychopy to create basic experiments
  4. Use your own data from an orientation discrimination task to illustrate how Bayesian inference can be used.

Preparation: Please try and install Psychopy on your computer prior to the session, and try running one of their tutorials to make sure it works: Psychopy