**I always prefer video to a pile of reading materials, but I am not sure which one helps me learn better.** — *Lin Dong*

‘Reinforcement’ has become a buzz word in the machine learning and artificial intelligence communities. It has wide application from winning a video game to automating a car.

If you are not familiar with reinforcement learning, here is what it is. First of all, it is a sub-area of machine learning. In supervised learning, the task is to learn to predict or classify something from a training dataset. For example, you want to decide if an image is showing an apple. You will receive some pictures of apples and get an idea of what an apple looks like. In reinforcement learning, the task is not simply to predict or classify but to learn what to do to maximize our reward in a complex dynamic system. In this setting, we don’t have the nicely labelled training set to teach us. So what should we do? Well, we can learn through trial-and-error interactions with the system. This is like making an apple pie. Try different kinds of apples and various amounts of sugar. You may puke several times, but eventually you will learn to make a perfect-tasting apple pie.

You may wonder how this is related to education. Think of students taking a course to learn some skills. The complex system is the interaction between the instructor and the students, as well as how student learn the knowledge. The reward of this system is how much the students actually learn.

Nowadays, the common practice in education is a one-fit-all method. That is, each student in a course is treated identically for all the teaching activities – the same content, same way of teaching, and same tests. However, some students may learn better from a video illustration, whereas some students may learn more from a well-organized handout. Or, some students may perform better on a project but others are good at exams. Therefore, a better strategy would be to develop a personalized educational scheme that takes into account the inherent differences between students, and the scheme should be able to change dynamically according to feedback from the student.

The study process can be formally modeled as a Markov decision process. Each student entering the course has his/her own initial status, which may include the student’s own characteristics and previous proficiency level. The process starts by an assessment (A), say a quiz. The assessment is so important here as instructors normally cannot read minds. They need to give a quiz to estimate how much he/she really understands the content. The result of this quiz is observed (X) and serves as an estimate to the student’s true proficiency level. Then the instructor gives an intervention (I) by choosing one of the teaching resources for this student. This intervention leads the student to a new proficiency level. The data triple (A, X, I) accumulates until the student reaches the end of the course. What we care about most is the final assessment result, which is a measurement of a student’s final proficiency level after the course.

This process is slightly different from ordinary Markov decision processes in the sense that there are two completely different decisions to make: how to assess the student’s understanding and how to select the teaching resources for the student. Therefore, our goal is to maximize the final outcome by finding the optimal policy of both assessment and instructor’s intervention.

The next step is how to solve for the optimal policy using the accumulated data of students. We will use approximate dynamic programming, a tool from reinforcement learning, to learn the optimal teaching plan. Check out my next post for details!

Lin is a PhD Candidate whose research interests include dynamic treatment regimes, reinforcement learning, and survival analysis. Her current research focuses on shared decision making in resource allocation problems. We thought this posting was a great excuse to get to know a little more about her, so we we asked her a few questions!

**Q: What do you find most interesting/compelling about your research?**

A: I can always simulate fake subjects and manipulate their imaginary behaviors. In a larger view, I may change the world of education.

**Q: What do you see are the biggest or most pressing challenges in your research area?**

A: Inference is hard. That’s why the world need statisticians.

**Q: Explain, as you might to a child, that just because mommy and daddy are splitting up it doesn’t mean they love him any less. This is *not* his fault, but, if we’re being honest, he didn’t help.**

A: The poor kid’s name is Snow.

“Snow, come here!”

Snow comes to Daddy.

“Kid, here is something you need to know. You know that daddy and mommy both fear cold weather right? Well, two people that both hate cold cannot live together, because they make each other colder. Now it is winter and snowy. You know, it is cold now, but it is not because of the snow outside. Snow just does not help warm up the weather. So daddy and mommy have to split for a while.”

## One Reply to “Reinforcement Learning in Education”