## Minds, Machines & Society

Free Registration at lucid.wisc.edu/events

Posted in Events

## How Can Machine Learning Improve Educational Technologies?

by Blake Mason, John Vito Binzak and Fangyun (Olivia) Zhao

As computer-based technologies continue to establish a significant presence in modern education, it becomes increasingly important to understand how to improve the ways that these technologies present educational content and adapt to learners. Achieving these goals in the design of cognitive tutors, instructional websites, and educational games leads to difficult questions. For example, what are the best activities and examples that will help learners understand the educational content? In what order should these examples be given? How can these activities focus learners’ attention in productive ways? These are difficult questions for instructional designers to answer on their own. Here at UW-Madison, educational researchers are teaming up with computer science experts in machine learning to test theories about how design decisions can optimize the learning technologies. There are many ways that machine learning techniques can be used to improve the design and effectiveness of educational technologies.

##### Here we outline two interesting examples:

Visual Representations of the Water Molecule

One example of where LUCID students are applying machine learning techniques to educational problems, is our ongoing project studying how students perceive chemical properties from molecular diagrams. To succeed in chemistry courses, students need to develop perceptual fluency with visual representations of molecules to understand how they convey important properties.

A challenge for designing instructional interventions to support this learning is that acquiring perceptual fluency a form of implicit learning, that occurs in ways that students are not consciously aware. Therefore this form of knowledge is difficult to articulate, and we cannot rely on traditional methods to pinpoint what students do and do not understand. To get around this issue, can design experiments to see how advanced and novice students see visual representations of molecules differently. First, we take a long list of molecules and record all of the visual features of each of the molecules.

ChemTutor Interface

Then, we have students judge the similarity of molecules presented 3 at a time: “is molecule a more similar to b or to c?”  Finally, using a specific form of machine learning called metric learning we see which visual features predict students similarity judgments, and thus detect which features students attend to when viewing visual representations of molecules. By comparing the results of chemistry experts and novice students, we hope to build a better understanding of how perceptual fluency changes over experiences. In the ongoing ChemTutor project at UW, we hope to use this knowledge in the development of new cognitive tutors capable of providing adaptive feedback that help as students identify and focus on key visual features of molecular diagrams.

Another example of creating interactive model for teaching and learning is combining the strengths of eye-tracking technologies along with machine-learning algorithms. Current educational software focuses much on creating features that attempt to attract students’ interest and raise motivation. We are interested in developing a tool that learns from and adapts to students’ habits such as where they tend to look on the screen and how they become distracted. Using eye-tracking technology, we can interpret gaze fixation to understand where students focus their attention, and then customize instructional materials accordingly. In addition to making better educational technologies, this work is also important for researchers studying human attention. Specifically, researchers are interested in understanding how changes in gaze fixation relate to shifts in attention, and using this data to develop models that predict gaze behaviors. Through multiple phases of development, this project demonstrates how improving education in powerful ways can involve research on low-level processes of cognitive behavior, to software development and user testing.

Posted in LUCID, Machine Learning

## Interactive Science Communication

##### by Scott Sievert and Purav “Jay” Patel

Scientific research is still communicated with static text and images despite innovations in learning theories and technology. We are envisioning a future in which interactive simulations and visualizations are used to enhance how complex methods, procedures, and results are taught to other scientists and non-scientists.

### Problem

Scientists in various fields ask questions about different things, but they share the same basic of communication. Nearly all research is communicated to other scientists, journalists, and the public in the form of static text and images encapsulated in journal articles and conference papers. These papers can be found in print and online for a high price. Recently, the “open science” movement has focused on broadening the number of people who can access these papers by eliminating the access fees for these (often) pricey papers. But there’s a problem. Even if all scientific papers are free and convenient for everyone to physically access, they will still not be cognitively accessible. In other words, the number of people who can understand the meaning of most scientific papers (even superficially) will be low.

### Consequences

We believe that research should be conveyed directly to the public. At the moment, the traditional text and images approach makes it difficult to do so. Because of this, science journalism acts as an intermediary, translating the jargon for the public to appreciate (see physorgnewspapers, and educational videos.

But this is not always a good thing. Science journalism is often misleading (see John Oliver’s takedown) because their focus is to hook the audience and drive viewership and readership. Major change is needed because scientific literacy affects how people understand and affect the environment, their bodies, and
voting. Scientists need better ways to communicate their ideas and learn about other’s work. By one estimate, the total volume of scientific research (measured in the number of published papers) doubles every nine years.

For young and seasoned scientists in any field, this is a major bottleneck toward producing great science. How can the best discoveries be made if scientists are unable to keep up with the latest findings? But how can technical research be communicated effectively without “dumbing it down?” Below, we explore some promising ideas.

### Suggestion 1: Interactive Simulations of Experimental Tasks

One problem with traditional research papers is their use of dense text and cluttered visuals to convey the materials used in a study. For example, I led a study during my master’s to understand how typical undergraduate comprehend irrational numbers. This study used four tasks, each including subsections. When I wrote up the scientific papers and sent it to the journal Cognitive Science, there was a constraint – use only text and images. Of course, I have the option of
including hyperlinks to the website hosting my experimental simulations. And I had the option of uploading the simulations to another website (Open Science Framework – osf.io). I did both of these things, but felt that most readers wouldn’t go through the trouble of visiting those website. When I got the reviews back from the editor of the journal, I found clues in their comments that no one actually used these websites. The best way to communicate the experimental tasks and procedures effectively is to directly embed them into scholarly articles. Some programmers, user interface designers, and scientists in computer science have played with this idea. I was particularly inspired by these three examples:

What do these examples all have in common? In each case, the authors could have used blocks of clearly written text to communicate complex visuospatial relationships. Instead, they chose to bring the concepts to life and allow users to manipulate the information in ways that are more natural. It is as if an expert is besides you drawing and explaining otherwise obscure ideas.

So far, prototypes of interactive scientific papers have focused on math and computing (formal sciences). At the start of my first semester of an Educational Psychology PhD, I wondered how these ideas could apply to psychological papers like the one I wrote for my master’s thesis. With the help of three Computer Science students in a Human-Computer Interaction class, I set about embedded experimental simulations in a webpage containing my scientific article. The paper was titled How the Abstract Becomes Concrete: Irrational Numbers are Understood Relative to Natural Numbers and Perfect Squares. I started by reducing the length of the paper to 10% of the original size (start small!). I helped develop each simulation corresponding to different sections of the tasks.

This is what the original section of my scientific paper looked like. Though it makes sense to most people in the field of numerical cognition, it is hard for people outside of the field to say with confidence that the description is clear.

Now let’s have a look at the interactive simulation that enhances the text:

Traditional scientific papers give users a third-person glance into experiments, whereas we aim to provide users a first-person view into what the authors did. From a technological perspective, this change is fairly simple. Existing experimental code can be embedded into digital documents, saving authors valuable time. Moreover, any extra effort required can pay off significantly down the road. The enhanced paper is now easier for scientists, friends, family members, potential employers, collaborators, students, autodidacts, and many others to understand.

Currently, most experimental tasks live inside hard-to-access supplementary materials documents that are usually not accessed. The work that goes into creating these documents often doesn’t bear fruit. By directly embedding experimental simulations we can “show and tell” what happened and rescue authors from wasting time on supplementary materials.

It is not difficult to imagine how to extend this to other kinds of tasks with different kinds of input. In the interactive article mentioned earlier, the complexity of the tasks varying from simple keyboard responses (Z or M key) to mouse clicks and text input. It is also not difficult to consider embedding simulations of experimental tasks in fields like neuroscience and education. Health science fields like physical therapy and surgery may benefit from videos and simulations to practice new therapeutic procedures.

### Suggestion 2: Interactive Data Visualizations

Interactive visualizations are useful because they allow the user to see the results of some experiment or function as parameters under their control are changed. This is super useful for data exploration and explanation and it’s picking up a lot of steam. These tools can allow users to interact with the results of scientific experiments or analyze how a particular method performs. Examples of tools that can create these interactive figures are ipywidgets and Holoviews.

These tools allow simple creation of interactive tools in Python, a high level language. These tools allow easy data exploration and can be embedding in a variety of contexts. In the future, one could imagine generating visualizations that show group level trends with the option to quickly change the visual so that individual differences across participants are transparent. For instance, consider an interactive visualization showing a density plot of participants’ scores on a cognitive task. By clicking buttons of dragging a slider, the user could abstract out more general information in the form of a violin plot, then a boxplot. Abstracting out even further, the user could change the original plot to a bar plot or table. Given that journal and conferences
adhere to strict (and problematic) word and page limits, creative interactive data visualizations capable of communicating vast amounts of information in small spaces would improve the quality of scientific articles. Authors would not need to question which one of a dozen analyses and results to focus on. By using rich multimedia, more information can be presented without overwhelming users.

One scientific journal that utilizes interactive graphics heavily is Distill. In this, the team at Distill works with the authors to create a set of interactive graphics for their article that highlight certain points. This has lowered the barrier to reading and understanding any of their papers. Interactive media can convey subtle points that are not immediately obvious.

### Conclusion

This example can be extended easily. Communication in science is critical, but surprisingly difficult. Scientists need to communicate their methods, results and experimental design so other scientists can reproduce their results. Allowing quicker and more accurate communication between scientists can yield more robust experimental results and fuel collaboration. These ideas are espoused by arxiv, a digital, open-access scientific publication platform developed by physicists. More recently, website like the Open Science Framework (OSF) and Authorea have offered tools that enable open-access disseminate of scholarly research. The OSF archives preprint, experimental materials, and data across different fields for anyone to access. Authorea is a new digital authoring tool for scientists to collaborate online and embed interactive figures into their manuscripts. This tools also helps format manuscripts for the needs specified by journals and conferences, saving tedious editing. It is possible that one of these platforms (or another
like it) will play a large role in developing the scientific documents of the future.

In addition to the sources listed above, consider checking out the
following examples of multimedia-enhanced scholarly communication:

1. Distill.pub description of benefits: https://distill.pub/about/
2. “Building blocks of interactive computing”, the introduction of Jupyterlab: https://www.youtube.com/watch?v=Ejh0ftSjk6g
3. Jupyer widgets, which has live (!) examples of ipywidgets and other libraries: http://jupyter.org/widgets.html
4. Explorable Explanation: http://worrydream.com/ExplorableExplanations/
5. Interactive PhD Thesis: http://tomasp.net/coeffects/
6. Journal of Visualized Experiments: https://www.jove.com/

Posted in LUCID, Machine Learning, Resources, Uncategorized

## Singular Value Decomposition (SVD)

By Lowell Thompson and Ashley Hou

This is a collaborative tutorial aimed at simplifying a common machine learning method known as singular value decomposition. Learn how these techniques impact computational neuroscience research as well!

Singular value decomposition is a method to factorize an arbitrary $m \times n$ matrix, $A$, into two orthonormal matrices $U$ and $V$, and a diagonal matrix $\Sigma$. $A$ can be written as $U\Sigma V^T$. The diagonal entries of $\Sigma$, called singular values, are arranged to be in decreasing magnitude. The columns of $U$ and $V$ are composed of the left and right singular vectors. Therefore, we can express $U \Sigma V^T$ as a weighted sum of outer products of the corresponding left and right singular vectors, $\sigma_i u_i v_i^T$.

In neuroscience applications, we have a matrix $R$ of the firing rate of a given neuron, where the first dimension represents different frontoparallel motion directions and the second represents disparity (a measure that helps determine the depth of an object). Relationships between preferences for direction and depth could serve as a potential mechanism underlying 3D motion perception. SVD can be used to determine whether a neuron’s joint tuning function for these properties is separable or inseparable. Separability entails a constant relationship between the two properties: a particular direction preference will be maintained across all disparity levels, and vice versa. If this were the case, then all of the vectors in the firing matrix could be described in terms of a single linearly independent vector, or function. This is also known as a rank 1 matrix.

Using SVD, we can approximate $R$ by $\sigma_1 u_1 v_1^T$, which is obtained by truncating the sum after the 1st singular value. This will be a low-rank approximation of $R$. If $R$ is fully separable in direction and disparity, only the first singular value will be non-zero, indicating the matrix is of rank 1. $\sigma_1 u_1 v_1^T$ will then be a close approximation of $R$. In general, the closer $R$ is to being separable, the more dominant the first singular value $\sigma_1$ will be over the other singular values, and the closer the approximation $\sigma_1 u_1 v_1^T$ will be to the original matrix $R$.

Below is an interactive example that can help you visualize this concept. On the left is the representation of a neuron’s joint tuning function, given by a matrix who’s rows and columns are defined by the “Direction” and “Disparity” properties. The values within each cell of the matrix are an example neuron’s firing rate for the given combination of these properties. On the right, we have altered the representation of the matrix by plotting the firing rate of the neuron across different disparity levels for each direction of lateral motion. The peak firing rate is deemed the neuron’s disparity preference at that particular frontoparallel motion direction. Notice that regardless of motion direction, this neuron maintains a similar, slightly negative disparity preference (preferring objects near the observer). These cell types are predominantly found in the middle temporal (MT) cortex of rhesus macaques, an area of the brain that seems to be specialized for both 2D and 3D motion processing (Smolyanskaya, Ruff, & Born, 2013; Sanada & DeAngelis, 2014).

Using the slider on the bottom of the graph, you are manipulating the example neuron’s separability. As you move the slider to the far right side, representing the largest degree of inseparability for this example, the disparity tuning curves develop a peculiar pattern. That is, for directions of motion that are nearly opposite to one another (~180 degrees apart), the disparity preference of the neuron is flipped. These types of neurons are predominantly found an area that lies just above MT in the cortical hierarchy, the medial superior temporal area (MST). Cells of this type have been deemed “direction-dependent disparity selectivity” (DDD) neurons, and are potentially useful in differentiating self-motion from object-motion, although this is hypothesis has not been critically evaluated (Roy et al., 1992b; Roy & Wurtz, 1990; Yang et al., 2011).

Another plot is displayed below that illustrates how the singular values of a matrix will change depending on the cell’s separability. Notice as the cell becomes less separable, the magnitude of the first singular value decreases, and the contribution of other singular values begins to increase. The inset plot illustrates this using a common metric for evaluating separability, known as the degree of inseparability. This is simply the ratio of the first singular value compared to the sum of all the singular values.

Lastly, we’ve provided another interactive graph where the left portion is the same example neuron from the previous graph. On the right, is the prediction generated from $\sigma_1 u_1 v_1^T$. As you move the slider to the right, increasing the degree of inseparability, you’ll notice the prediction becomes increasingly dissimilar to the actual firing matrix.

Posted in LUCID, Machine Learning, Resources

## What is a Computational Cognitive Model?

By Rui Meng and Ayon Sen

A computational cognitive model explores the essence of cognition and various cognitive functionalities through developing detailed, process-based understanding by specifying corresponding computation models.

Sylvain Baillet discusses various aspects of cognitive computation models

Computational model is a mathematical model using computation to study complex systems. Typically one sets up a simulation with the desired parameters and lets the computer run. One then looks at the output to interpret the behavior of the model.

Computational cognitive models are computational models used in the field of cognitive science. Models in cognitive science can be generally categorized into computational, mathematical or verbal-conceptual models. At present, computational modeling appears to be the most promising approach in many respects and offers more flexibility and expressive power than other approaches.

Computational models are mostly process based theories i.e., they try to answer how human performance comes about and by what psychological mechanism.

A general structure of a Neural Network

Among the different computational cognitive models, neural networks are by far the most commonly used connectionist model today. The prevailing connectionist approach today was originally known as parallel distributed processing (PDP). It is an artificial neural network approach that stressed the parallel nature of neural processing, and the distributed nature of neural representation. It is now common to fully equate PDP and connectionism.

A general structure for the Recurrent Neural Network

Neural networks were inspired by the structure of human brains. One particular variant of the neural network is called the recurrent neural network. It embodies the philosophy that learning requires remembering. Recurrent neural networks are also becoming a popular computational model for cognitive sciences.

Try it yourself: Play with a neural network to see how it works. A Neural Network Playground

#### Real World Example: Optimizing Teaching

Cognitive Computational models are used to do tasks that are hard or impossible to do with a lab experiments e.g., too many people are involved for the experiment to be feasible. For example, let us assume a teacher wants to teach 100 problems to children in class. But due to time constraint she can only teach 30 problems to the students. The teacher would preferably like to select 30 problems such that the children learn the most i.e., perform well on all 100 problems. Note that, there are 100C30 = 2.93e25 possible question sets. To evaluate the question sets, the teacher would need to teach each dataset to one group of children and evaluate their performance. Let there be 30 children in each class then the total number of children required would be 30 X 2.93e25 = 8.81e26 which is large. This makes identifying an optimal question set infeasible.

This task can be simplified if a cognitive computational model of children for that particular task can be devised. Then the teacher only needs to test the question sets on the cognitive model to figure out which one is the best. This saves a lot of time and is feasible.

Posted in LUCID Library, Resources

## Martina Rau on Learning with Visuals

Lucid Faculty, Educational Psychology Professor, Director of Learning, Representations, & Technology Lab as well as Computer Sciences Affiliate, Martina Rau is interested in educational technologies to support more effective learning with visuals.

While we generally think of visuals as helpful tools, Rau highlights the fact that visuals can be confusing if  students do not know how to interpret visuals, construction visuals, or make connections among multiple visuals.

She created a video blog, Learning with Visuals, that aims to translate the research conducted on campus and share findings in a more accessible and useful approach to teaching and learning with visuals. She aims to help students looking for effective study strategies, parents wanting to help with children learn, teachers who notice that students often have difficulties understanding visuals, and of course prospective researchers in educational psychology.

##### Involving Students from Multiple Disciplines in Research

Here she discusses the importance of students from multiple disciplines learning from each other and gaining an understanding of what it takes to conduct research in a different field.

##### Collaborating with Visuals

Here she discusses how educational technologies can support students to more effectively collaborate with visuals.

##### Translating Research into Everyday Language

Here she discusses the importance of finding new ways to learn and teach with visuals as well as how she plans to make her research accessible to non-scientists, such as teachers, parents, and students.

Posted in LUCID Library, Resources

## Poolmate: Pool-Based Machine Teaching

Poolmate provides a command-line interface to algorithms for searching teaching sets among a candidate pool. Poolmate is designed to work with any learner which can be communicated with through a file-based API.

Developed by Ara Vartanian (aravart@cs.wisc.edu), Scott Alfeld (salfeld@amherst.edu), Ayon Sen (ayonsn@cs.wisc.edu) and Jerry Zhu (jerryzhu@cs.wisc.edu), poolmate details can be found on aravart or github

If machine learning is to discover knowledge, then machine teaching is to pass it on.

-Jerry Zhu

Machine teaching is an inverse problem to machine learning. Given a learning algorithm and a target model, machine teaching finds an optimal (e.g. the smallest) training set. For example, consider a “student” who runs the Support Vector Machine learning algorithm. Imagine a teacher who wants to teach the student a specific target hyperplane in some feature space (never mind how the teacher got this hyperplane in the first place). The teacher constructs a training set D=(x1,y1) … (xn, yn), where xi is a feature vector and yi a class label, to train the student. What is the smallest training set that will make the student learn the target hyperplane? It is not hard to see that n=2 is sufficient with the two training items straddling the target hyperplane. Machine teaching mathematically formalizes this idea and generalizes it to many kinds of learning algorithms and teaching targets. Solving the machine teaching problem in general can be intricate and is an open mathematical question, though for a large family of learners the resulting bilevel optimization problem can be approximated.

Machine teaching can have impacts in education, where the “student” is really a human student, and the teacher certainly has a target model (i.e. the educational goal). If we are willing to assume a cognitive learning model of the student, we can use machine teaching to reverse-engineer the optimal training data — which will be the optimal, personalized lesson for that student. We have shown feasibility in a preliminary cognitive study to teach categorization. Another application is in computer security where the “teacher” is an attacker and the learner is any intelligent system that adapts to inputs. More details are from this research overview: Machine Teaching

Machine teaching, the problem of finding an optimal training set given a machine learning algorithm and a target model. In addition to generating fascinating mathematical questions for computer scientists to ponder, machine teaching holds the promise of enhancing education and personnel training.

Jerry Zhu is a LUCID faculty member and CS professor, Ayon Sen is a LUCID trainee and CS graduate student.

Poolmate is based upon work supported by the National Science Foundation under Grant No. IIS-0953219. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.

Posted in Resources

## LUCIDtalks: Machine Learning with Ayon Sen

As part of our LUCID program, the graduate students meet weekly and discuss topics that are interesting to both the computation and behavioral graduate students.

In this video Ayon provides an overview of Machine Learning in 20 minutes. Ayon explains linear and non-linear learners, the overfitting vs. bias-variance trade off and provides resources for those interested in learning more about machine learning.

Posted in LUCID, LUCID Library, Machine Learning

## Apply to LUCID!

### We are currently accepting applications for Fall 2018!

#### Details for how to apply to the LUCID program at UW–Madison:

2. Apply to a Ph.D. program in one of the four core departments in LUCID:

Psychology

Educational Psychology

Electrical and Computer Engineering (ECE)

Computer Sciences (CS)

3. Apply to LUCID:

Email your statement of interest to ceiverson(at)wisc(dot)edu