Propaganda in the Twitterverse

Khuzaima Hameed
Khuzaima Hameed, PhD Student

Propaganda has become a hot topic since the 2016 U.S. presidential election cycle. Since then, we’ve learned of the significant efforts that State actors have made to misinform, misguide, and manipulate the institutions that we trust. Looking at Twitter, our president’s vocation and pastime of choice, we can see numerous reports of Russia aiming to gain an advantage by manipulating Americans’ behavior. And there is a common thread between these incidents—that Russian actors used accounts to promote inauthentic agendas.

An unsettling thought is that many of us did not discern that these accounts were fake. Consider this: many of us are passionate about certain topics and can occasionally be vocal about them in debates with others. And imagine you’re following such a debate only to realize that one of the accounts was not tweeting from a coffee shop in St. Louis, but rather an office building in St. Petersburg. All of the sudden, your trust in Twitter wavers, and it brings to light the tricky problem of identifying propaganda online.

Sure enough, Twitter is making an effort to maintain users’ trust and to take down offending accounts, but identifying such accounts is not a static process. If Twitter identifies content as propaganda, a bad actor can adjust their content so that they become undetected. Of course Twitter will respond in kind and will work to identify the newly changed content as propaganda. This back-and-forth is not unlike a pattern that occurs in computer security, where both security systems and hackers become more and more sophisticated as they discover, learn, and attempt to outsmart one another. Likewise, a successful propaganda classifier will account for this dynamic and should be robust to being exploited. This isn’t considered in your typical classification problem of, say, classifying animals as cats or not cats. Like, cats aren’t trying to trick you into thinking they’re not cats (although this would set a potentially adorable plot in a Blade Runner-esque sci-fi thriller).

One possible weakness in classifying propaganda directly is assuming that the labels being used are correct, e.g., a text identified as propaganda is in fact propaganda. As far as I know, there aren’t any propaganda experts who can identify propaganda with 100% accuracy! So the task becomes managing errors in labeled data. I’ve had the wonderful pleasure of using our lab’s greatest minds to label Tweets on Twitter. Using a majority vote, we can retrieve labels, with the understanding that our lab’s implicit biases play a role in the result of the label. I will not say what those biases are, but let’s just say any disparagement towards Oreos will be duly mislabeled as propaganda. From there, one can use robust methods like γ-divergence to reduce bias induced from mislabeling.

Another important question is what covariate information to collect, i.e., what attributes about Tweets can help us distinguish propaganda from ordinary content. For instance, we can analyze the text of a Tweet, which can give us useful information like sentiment or topics. But extracting information from text is generally a difficult task to do across multiple languages. An alternative would be to explore the behavior of a Tweet on Twitter. For instance, how do users engage with a Tweet? A natural way to formulate engagement is through network analysis. Indeed there are observed differences between so called misinforming users and ordinary users, such as the tendency for misinforming users to be within the core of retweet networks. These differences can provide us with an alternative means to identify propaganda. Another advantage of using network features is that propagandists rely on engaging users and masking that engagement would require more than simply changing the content of their Tweets. Selecting a robust feature set is crucial in dealing with propagandists’ ever-changing methods.

There are a myriad of other possible issues that arise from attempting to identify propaganda on Twitter, like sampling bias from using APIs, missing data, and reliable early detection. These issues bring with them a myriad of opportunities to understand the phenomenon of propaganda better, as well as to protect both our institutions and our trust in each other.

Khuzaima is a PhD Candidate whose research interests include machine learning and mobile health. His current research focuses on optimal treatment regimes on partially observed spatial networks. We asked a fellow Laber-Labs colleague to ask Khuzaima a probing question —

Q: Draw a graph of Alex’s productivity as a function of the length of his hair.  Justify your answer. 

I know this is a pressing question that has stumped scientists for decades, long before Alex joined the lab. But Alex is a complex individual. Much beyond his productive acuity, there is an abundance of dimensions to Alex’s constantly evolving character (“hair-acter?”). And lying below his flowing locks of hair are the answers to all of our questions. Above is a small sample of our discoveries to date. Each panel displays a picture of Alex, and his—err—hair as a heatmap of the designated attribute (the top of his head represents the shortest length of hair). As for an explanation, the relationship of Alex’s hair and these qualities is nothing short of magic, and as Dr. Laber would tell you, a magician never explains his tricks.

Where, oh where are the statisticians?

Conor Artman, PhD Student 

In the timeless words of Alyssa Edwards, “I’m back, back, back, back, back again.

It’s been a while, so let me remind you about my research—I am interested in modeling illicit network behaviors. In my last blog post, I described some of the challenges faced in such problems and argued that taking a “bottom-up” approach based on agent-based models (ABMs) lends itself well to identifying a solution. Agent-based models are not unique to studying illicit networks; many different fields use ABMs. You might even be very familiar with ABMs and not realize it! For example, if you are an operations researcher studying traffic management, an economist designing macroeconomic simulations for “what-if” market scenarios, a financial analyst designing bots to trade automatically for you, or even a marketing analyst studying how news propagates through social networks on Twitter, you may be readily familiar with ABMs. If you are an ecologist, you might know them as Individual-Based Models (IBMs). In short—ABMs are used to model a variety of complex phenomena!

If you were reading that closely, you might have noticed that I never mentioned statisticians! In fact, there is NO statistical theory for developing, analyzing, and assessing ABMs! This is particularly puzzling as in all of the fields mentioned it is extremely common for statisticians to help design simulations or assist in operationalizing mental models to generate or assess empirical evidence — think of chemometrics, econometrics, psychometrics, geostatistics, demography, astrostatistics, biostatistics, or (more recently) reinforcement learning as examples.  But for some reason,  statisticians have been largely out to lunch when it comes to ABMs.

The good news is that their absence does not (completely) wreak havoc: to validate an ABM as being “good” in some qualitative sense, all you need is a pair of human eyes attached to a human brain!

The bad news is that all you need is a pair of human eyes attached to a human brain!

Currently, to validate massive ABMs (imagine simulating the entire U.S. economy, for instance), modelers typically bring in an expert in their field of study to eyeball the simulation as it runs in real time and to basically give a thumbs-up or thumbs-down. If the ABM is reasonable enough in some heuristic sense to satisfy a minimum standard of face-validity (i.e., does the model’s behavior even seem plausible, as compared to our understanding of how the world works currently?), it gets a thumbs-up. Otherwise, thumbs-down. On one hand, this is an essential feature of any model. Would you trust a traffic simulation model that concludes optimal traffic behavior is for all vehicles to never drive? Or a weather simulation claiming tomorrow’s forecast includes car-sized hail and 400-degree Celsius heat? Absolutely not! On the other hand, this is a disturbingly low bar, given how ubiquitous ABMs are for informing high-consequence decisions. (As an easy example, the Federal Reserve still uses ABMs to study trajectories of outcomes under different policy changes.) If this approach seems like a reasonable place to stop in assessing model validity, let me illustrate why this is a bad idea from the perspective of cognitive psychology.

Even though we have a whole part of our brain that has evolved over many millions of years to give us shockingly powerful and efficient visual pattern classification, it is surprisingly easy to find and exploit “bugs” in our cognition. A perfect and easy example is the Müller-Lyer illusion. In the picture below, all three line segments are exactly the same length, but if you consider the topmost picture, it looks as though each of the lines could be different lengths. Looking at the bottom-most picture, we can see that this is not the case.

This point is easy to discount, so let me reinforce it: even though we have an extremely efficient and powerful set of machinery dedicated to visual processing in our brains, we can easily trick a simple assessment of length with what amounts to stick figures.
Now imagine you have to watch a richly complex set of agents interacting over time, with agents possibly following some entangled and nonlinear fashion of behavior, for an hour or so. How confident would you really be in claiming that the ABM passes the standard of ‘face-validity’, given that we know how easy it is to trick our visual processing?

I think it would be conservative to say that this process is error-prone. And while I don’t have the space to discuss this further, these biases in our cognition are exacerbated by other well-known biases such as confirmation bias, the availability heuristic, base-rate neglect, and motivated reasoning.

Keeping this idea in mind, face-validity offers a necessary but insufficient condition for validating complex ABMs. Even if face-validity with our eyes were sufficient, this addresses only one part of a multiplex problem—let’s say I have 3 ABMs that produce very similar data. How do I decide which one is “best” among them? How do I know when an ABM I have coded up is “good enough” to represent the complex real-world phenomenon it seeks to study? When should I replicate an episode of an ABM simulation, and when do I know I have enough data to stop altogether? And let’s say I run a massive ABM simulating something like a city-wide evacuation plan under threat of a dirty bomb, or an Ebola outbreak in Liberia—what is a principled, standardized way that we could use to analyze both ABMs? Depending on the analyst, one scientific question may be operationalized into a longitudinal study on the data produced by an ABM, or maybe the same question would be operationalized as reinforcement learning problem. In that case, ABMs are even divergent in comparing evidence from the same simulation among analysts. All of these remain open questions, but they are at the very least open questions that historically lie in a statisticians’ wheelhouse—thus how bizarre it is that we haven’t seen very much activity from statisticians!

Luckily, there have been some exceptions. An exciting direction in statistical research is taking place by exploring Approximate Bayesian Computing (ABC) and emulators in the context of ABMs. Considering ABMs from the perspective of emulators, one can start to see a direction for a principled theory of ABMs that looks akin to a linear model—this would be great, as linear models have a huge literature of tools at their disposal that handily give us direct ways to answer the kinds of questions posed earlier, but to find out we clearly need more statisticians jumping into the fray.  We are currently exploring this avenue, and I’ll update you on our findings in my next post!

Conor is a PhD Candidate whose research interests include reinforcement learning, dynamic treatment regimes, statistical learning, and predictive modeling. His current research focuses on pose estimation for predicting online sex trafficking. We asked a fellow Laber-Labs colleague to ask Conor a probing question —

Q: If you were to form a cult, what common interest would it be based around and how would you recruit followers?

A: One idea is a cult that works behind the scenes to run a society called Bertrand’s Tea Salon! Or maybe something more superficially prestigious sounding like, the Bertrand Society of Empiricists or some nonsense. The selection criteria? The depth of specious reasoning! Our order would pore over all scientific literature to find those who are truly worthy. Once selected, our group would cultivate membership under the guise of a rigorous peer-review process, where recruitment takes the form of flattery via email.  The name is in homage to Bertrand Russell’s teapot analogy. Directly from Wikipedia:

“Russell’s teapot is an analogy, formulated by the philosopher Bertrand Russell (1872–1970), to illustrate that the philosophic burden of proof lies upon a person making unfalsifiable claims, rather than shifting the burden of disproof to others.
Russell specifically applied his analogy in the context of religion.[1] He wrote that if he were to assert, without offering proof, that a teapot, too small to be seen by telescopes, orbits the Sun somewhere in space between the Earth and Mars, he could not expect anyone to believe him solely because his assertion could not be proven wrong.”
As a fine example of such a claim that would merit entry into my cult, consider the statement, “[Insert phenomenon here] would exist even if we didn’t have statistics!” For example, “The endowment effect would exist even if we didn’t have statistics to observe it!”, or “Priming and the availability heuristic would exist even if we didn’t have a way to measure it!”
Now you may ask, “Why, Conor, why is this such a crock of shit?” And I’ll tell you why: The assertion that a phenomenon would still exist, after originally not knowing it existed and then discovering that it exists after a rigorous scientific process, is a marvelous exercise in hindsight-biased counterfactual reasoning.
One way to think of this would be the now-beaten-to-death question, “If a tree falls in the woods without anyone around to see it, does it still make a sound?” After having observed trees, the sounds they make when they fall, and generally coming to understand their behavior, it seems like a completely reasonable assertion that trees, in general, should probably make noise when they fall even when unobserved. But, this is very different from saying that after never having observed trees, never having observed their sounds, and having never observed their general behavior when they fall, that we can assert that trees behave the same way. Why? Because if we’ve never observed trees before, we have no experience or data to draw from, and we can’t construct an inductive argument.
Put another way, imagine all of human history, knowledge, and data is erased tomorrow, and we forget any and all discoveries the human race has made. From our perspective in this erasure-reality, we don’t know any of the results we previously knew, and we don’t have any evidence for our conjectures and intuition. Even if for some reason we had some great intuition about some phenomenon existing, we could not assert that this phenomenon exists without some form of evidence. Is it possible that some scientific process or phenomenon still persists regardless of if we’ve gone to the trouble of precisely observing it? Of course. Is that the same thing as then being able to assert that it still does exist, when we don’t have any data available? Absolutely not! And why is that? Russell’s teapot! If I told you that I and a group of special other individuals “knew” there is a massive sentient teapot orbiting the moon at such an angle that no one has ever observed it except us, then clearly the burden of proof is on us to demonstrate this.
So, if I tried to assert the truth of some result from the erasure-reality, but now without any proof, then from our new (mind-wiped) perspective this would be equivalent to stating the research question, simply claiming that it’s true, and calling it a day.
In any case, this would be an A+ way to receive an induction ceremony into Bertrand’s Tea Salon.
In summary, my recruitment method is an appeal to human vanity and insecurity under the guise of peer review, and one-by-one my legion of specious scientists would grow day-by-day, fueled by humankind’s congenital well-spring of self-deception and self-interest!

The puzzle of asymptotic argument

Jesse Clifton
Jesse Clifton, PhD Student 

Those who have taken an introductory course in probability or statistics will be familiar with two famous theorems: the law of large numbers and the central limit theorem. These theorems are both statements about what happens to the average of a sample of data as the amount of data goes to infinity. The law of large numbers says that, under mild conditions, the sample average will converge to the population-level average. The central limit theorem says that, under mild conditions, the sample average not only converges to the population average as data accumulates, but its distribution under repeated sampling converges to a normal distribution; we may then derive further nice properties which obtain as the amount of data we have goes to infinity.

These two theorems are examples of asymptotic results.  Asymptotic results tell us about the behavior of (certain functions of) data as the amount of data goes to infinity. Asymptotic arguments form much of the backbone of statistical theory; they allow us to identify, for instance, which of a set of candidate estimators of some quantity will (asymptotically) be closest to that quantity, on average.  In my field of sequential decision-making, asymptotic arguments about the rewards gained by decision-making algorithms in the limit of infinite data are also sometimes used to argue for using one algorithm over others.

And yet, we never have an infinite amount of data!  What, then, leads us to trust asymptotic theory as a guide to the real world of finite sample sizes?  As Geyer puts it:

We know that asymptotics often works well in practical problems because we can check the asymptotics by computer simulation (perhaps what Le Cam meant by “checked on the case at hand”), but conventional theory doesn’t tell us why asymptotics works when it does. It only tells us that asymptotics works for sufficiently large [sample size] n, perhaps astronomically larger than the actual n of the actual data. So that leaves a theoretical puzzle.

  • Asymptotics often works.
  • But it doesn’t work for the reasons given in proofs.
  • It works for reasons too complicated for theory to handle.

I think the obvious but overlooked point that asymptotics logically implies nothing about finite data sets — and yet, empirically, does seem to track finite-sample properties — should lead us to adopt a more nuanced attitude towards asymptotic argumentation.  I propose that we regard an asymptotic framework as a model of what happens in finite samples.  A model is something that we use to hone in on (what we hope to be) the most important aspects of a phenomenon we’re interested in while throwing away (what we hope to be) details that have little effect on our final conclusions.  What does this mean in the context of an asymptotic argument? Taking the example of the central limit theorem, the “important aspects” of the phenomenon are its approximately normal distribution in large samples, and the “unimportant details” are all the complicated mathematical terms that are present in finite samples but disappear as the sample size goes to infinity.  

And as with models in all scientific fields, some are better than others at capturing the phenomenon of interest.  In the case of asymptotics, certain asymptotic frameworks are better at capturing the finite-sample properties of estimators than others.  For instance, standard asymptotic theory allows some estimators to be “superefficient” even though they are known to perform badly in finite samples.  But adopting a different asymptotic framework — the “moving parameter” framework — leads to more realistic conclusions about the behavior of these estimators (see the discussion of Pollard).  

Nevertheless, even these more sophisticated asymptotic frameworks are still models of the finite-sample phenomenon of interest.  And there is no guarantee that the predictions of these models will hold true in any given situation.  On this point of view, asymptotic theory should be taken down from its privileged position in academic statistics and set on par with other modes of statistical argumentation (such as simulations, correspondence with existing scientific theory, etc.) as a valuable but fallible source of qualitative insight.

Jesse is a PhD candidate working with Laber Labs. His research interests include reinforcement learning and artificial intelligence. Jesse is a returning author to our blog — check out his first post about exploration and exploitation. We thought this posting was a great excuse to get to know a little more about him, so we we asked him:

Q: Write a haiku about your favorite or most preferred player in the game Super Smash Bros Brawl.

The calm,
Cool face of the turnip
Asked me for a kiss.

(With apologies to Langston Hughes.)

League of Legends: A verdant jungle of statistical opportunity

Alex Cloud, PhD Student

A good data set is a gold mine for a statistician. A good data set provides an opportunity to put one’s favorite statistical methods to the test or to try out new ones. In fact, a good data set is worth even more: it presents novel challenges and questions, which can in turn inspire creative approaches to problem solving that meld statistical, mathematical, and other ideas that may have never been realized otherwise.

At Doran’s Lab, the undergraduate data science research group at Laber Labs, we have found our gold mine, and it sits atop the worldwide phenomenon League of Legends. Released by Riot Games in 2009, League of Legends (or, “League”) is a 5-vs-5 competitive online game with a massive following and a professional scene of unprecedented maturity. In keeping with Riot’s openness toward their player base, they maintain a public API that lets anyone with a free account make requests to obtain detailed game data. In our case, we were given permission to make requests at a high volume, which has allowed us to fill up almost nine terabytes of hard drive space in about a year; we collect over 100,000 games every day and have logged data from well over 3 million accounts in North America. This data is ripe for insights.

After a year of collaborative work we’ve still only scratched the surface of the possibilities presented by the data, which contains information as detailed as (1) minute-to-minute updates on player location, gold, and experience (which are the primary resources in the game) and (2) exact times and details of important game events like item purchases, champion, building, and elite monster kills. If you’re interested in reading about these projects, check out our awesome website at

Since our undergraduate data analysts have already done an excellent job writing up their work on past projects, I’d like to highlight three examples of unanticipated learning that resulted from a challenge in our data. These are a few tiny examples of times when the data demanded a new approach, although I stress that there were many more challenges and the bulk of the good stuff is contained in the linked articles themselves!

Game prediction. The project: for 600,000 games, given which 5 of League’s 141 champions are on blue team and which 5 are on red team, estimate the probability that the blue team will win using a procedure that is transparent and interpretable to humans. The challenge: we wanted to include terms to measure potential synergies between champions, or effects where a champion “counters” another (e.g., an already-strong fighter might be even stronger against a defenseless mage). In statistical language, we wanted to estimate pairwise interaction effects. The only issue is, there are (141 choose 2) times 2, or about 20,000 possible interactions, and storing a table with this many columns and 600,000 rows became a computational nightmare. This data challenge caused us to look into methods for storing sparse matrices, which allowed us to develop a highly efficient solution.

Critical strike smoothing: The project: in the game, a player can issue the command for their champion to automatically attack another unit. Based on items, these attacks have a chance to deal double damage (a “critical strike,” or “crit,” for short), in a process that is known, perhaps surprisingly, to be non-independent. This data is not available through the API, so we needed to collect it ourselves by reading in attacks as “crits” or “non-crits” from video recorded from the game. The challenge: what seemed like a basic task not worthy of the “computer vision” moniker proved surprisingly difficult, as slight variations in the game state required to obtain video footage made it difficult to apply simple heuristics to detect attacks. In the end, we learned the importance of deliberate pre-processing and systematic validation and error detection.

Champion location metrics: The project: to develop useful and easily-interpreted metrics for measuring ways players move around the map, for example, how much a champion “roams” to apply pressure to different strategic regions. The challenge: while the League of Legends map sits comfortably on a square grid and movement can be measured by traditional Euclidean distance, it turns out that naively recording “total distance traveled” (as estimated by the minute-to-minute location updates we have access to) provides little useful information, as certain areas of the game’s map are expected to be traversed often and other regions represent highly contested and noteworthy places. This forced us to think creatively about new ways to measure movement.

These are just a few examples, and there are more every day for everyone at Doran’s Lab. If you’d like to jump in to experience the magic yourself, check out our publicly available, curated datasets here: and follow us on Twitter @DoransLab to hear more soon.

Alex is a PhD candidate working with Laber Labs. His research interests include reinforcement learning and interpretable models. We thought this posting was a great excuse to get to know a little more about him, so we we asked him:

Q: What is your favorite statistics-themed limerick?

A: This is a rap, not a limerick– because we live in 2019. It’s about the baddest motherf—er to ever enroll in the graduate statistics program at NC State.


Zhen Li, PhD Student

Any adult of a certain age knows the video game Pac-Man! It is a classic “pursuit-evasion” game, where the player maneuvers Pac-Man around a maze to gobble up as many dots as possible before being captured by a ghost. Such pursuit-evasion problems are found in the real world as well, such as applications to missile guidance systems. Unfortunately, learning to optimally coordinate pursuer behaviors so as to minimize time to capture of the evader is challenging because the action space for the pursuers can be quite large and some noisy information exists. Consequently, previous approaches have relied primarily on heuristics. However, we have developed a variant of Thompson Sampling for pursuit-evasion that performs favorably relative to competitors in terms of time-to-capture.

Thompson sampling is a online decision algorithm that balances the exploration and exploitation of the decision making process based on the Bayesian approach. At each time point in a pursuit-evasion problem, we have the following steps:

(i) computing the posterior distribution over the space of possible evader strategies and the posterior distribution of the evader’s location;
(ii) sampling a strategy from the posterior distribution; and
(iii) using the model-based method in reinforcement learning to estimate the optimal pursuer strategy.

The proposed algorithm performs favorably relative to competitors in terms of time-to-capture in a suite of simulation experiments including pursuit-evasion over a grid and the coordination of ghost behavior in the class arcade game Pac-Man.

In the Pac-Man game implemented in JavaScript, the ghosts (pursuers) could follow one of several pursuit strategies. We devised three strategies for Pac-Man. The first is a pure random walk. The second is defined such that when the distance between Pac-Man and one of the ghosts is less than some value d, the Pac-Man will move in the opposite direction of the closest ghost; otherwise Pac-Man will move uniformly at random. The third is the same as the second except that when the distance between Pac-Man and one of the ghosts is at least d, Pac-Man moves towards the closest Pac-Dot with a probability p and moves randomly otherwise. We compare our method with a search strategy in which the ghosts know Pac-Man’s exact location and select their actions to minimize the distance to Pac-Man. The results show that our method performs slightly better than the benchmark strategy. In the future, we are deriving theoretical results for our algorithm and instead of pursuit-evasion problems, our algorithm can be extended to general multi-agent games and more applications. We expect to see the results soon!

Zhen is a PhD candidate working with Laber Labs. His research interests include  optimal treatment regime, machine learning, and Bayesian Inference . We thought this posting was a great excuse to get to know a little more about him, so we we asked:

Q: Explain what a p-value is, from the perspective of a deranged professor who’s clearly had enough, can’t take it any more, and is having a career-ending meltdown in front of their stoic, uncomprehending students.

A:We want to evaluate the free throw percentage of Jack, who loves playing basketball. We are interested in whether his free throw percentage is over 50%. Thus, we can set the null hypothesis that Jack’s free throw percentage is not greater than 50%. For simplicity, we want him to try two free throws. We consider two cases as follows: (i) Jack successfully puts the two in or (ii) Jack misses both free throws. The p-value is the probability Jack can get the same or better result than his current result when his free throw percentage is 50%. For the case (i), the same or better result than he has made two shots is only he makes the two shots. If his free throw percentage is 50%, then the probability he makes the two shots is 50%x50%=25%. So the p-value in case (i) is 25%. For the case (ii), every shooting result is the same or better than he has missed two shots. So, the probability he gets the same or better result is 100%. So the p-value in case (ii) is 100%. The p-value is used to help us decide if we accept our null hypothesis (i.e. Jack’s free throw percentage is not greater than 50% here). We should always remember a rule: the smaller the p-value is, we are less likely to accept the null hypothesis. Thus in the case (i), as the p-value is 25% (relatively small), we may not like to accept the null hypothesis and we think Jack’s free throw percentage is over 50%. In the case (ii), as the p-value is 100% (very big), we should accept the null hypothesis and we think Jack’s free throw percentage is not greater than 50%. We can see that the use of p-value matches to our intuition to judge.

That’s all. I’m ready to be fired.

Connection between Causal Inference and off-policy in Reinforcement Learning

Lili Wu, PhD Student

Recently I am kind of excited because I suddenly realize that there are some amazing similarities between causal inference and off-policy in reinforcement learning. Who “stole” the other?

Through some of the previous blogs, you may already more or less know about reinforcement learning (such as Lin’s blog about reinforcement learning in education), which is now very popular in Artificial Intelligence. Reinforcement learning is trying to find a sequential decision strategy to maximize the long-term cumulative rewards. Okay, now I am going to directly jump to introducing what off-policy is!

To understand “off-policy” it is probably easiest to start with “on-policy.” First, a policy is a “mapping” from a state to an action. For example, clinicians use patient’s health information (the “state”) to recommend treatment (the “action”). Clearly, in this example, we care about how good a policy is at determining the best action. But how do we evaluate that? One way is to follow the policy, record the outcomes, and then use some measure to evaluate how good the policy is. Such a procedure is called “on-policy” – the policy is always followed. However, it may not be possible to always follow a policy. For instance, clinicians cannot give some treatments to patients because the treatment may be dangerous; some clinicians may only follow a conservative policy; or we may only have access to observational data that did not follow a specific policy. Then “off-policy” plays a role! It deals with the situation where we want to learn a policy (we call it “target policy”) while following some other different policy (“behavior policy”). Most off-policy methods use a general technique known as “importance sampling” to estimate the expected value. The importance sampling ratio is the relative probability of the trajectory under the target and behavior policy, so we can use it to reweight the outcomes under the behavior policy to estimate what it will be if we follow the target policy, and thus measure the “goodness” of the policy.

Okay, if you know about causal inference, you may already have to be familiar with the our patient example. You only have observational data, but you want to use that data to learn the effect of some idealized treatment. In the language of our AI method — we can regard the idealized treatment as the target policy and the treatment actually assigned in the observational data as the behavior policy. Now – can we reweight the outcomes?? Yes! There was this method — “inverse probability weighting” proposed by Horvitz and Thompson in 1952, which is used to do the same thing as important sampling methods — reweight the outcomes!  See, this is the connection between the two!

Nowadays, there are more and more connection between causal inference and reinforcement learning. Statisticians combine the ideas of reinforcement learning like Q-learning into the causal inference frame, such as dynamic treatment regime. Computer scientists get inspired by some work in causal inference to estimate policy value, such as doubly robust estimator for off-policy evaluation. And I am excited about these connections, because causal inference has a long history, and it has built tons of good work; because reinforcement learning is getting more and more attention and has a lot of interesting ideas. Can we get more inspiration from each other? I think this is an emerging area and has a lot of possibilities to explore! I am looking forward to working on this and finding more!

Lili is a PhD candidate working with Laber Labs. Her research interests include  reinforcement learning and local linear models. We thought this posting was a great excuse to get to know a little more about her, so we we asked her:

Q: Do you have a motto?



(When hopes are won, oh! drink your fill in utmost delight.

And never leave your wine-cup empty in moonlight! )


(Heaven has made us talents, we’re not made in vain.

A thousand gold coins spent, more will turn up again.)

— from 《将进酒(Invitation to Wine)》by 李白 (Li, Bai)

Uncovering the Truth

Zekun (Jack) Xu, PhD Candidate

We almost never observe the absolute truth. In fact, there are entire industries driven by the single motivation of distorting it! How many times a day do you hear “You can look years younger!”? However, the distortion is not always intentional – has your GPS ever shown you driving through a nearby field? Or maybe your FitBit didn’t realize your leisurely walk was not a nap? Such distortions happen all of the time, and it can be hard to know what is true.

In our day to day lives, we, as the observers, must recognize that what we see or hear is not all that is there. We must continuously dig deeper to understand what is real. Frankly, it can be exhausting!

Fortunately, in science, there are tools that help us estimate the truth based on what we observe! In statistics, a class of models has been developed for just this purpose — the latent (or hidden) state models. Most of the models in this class are based on either the so-called dynamic linear model or the hidden Markov model. Both models date back to the 1960s [1][2], but they are still popular. And, in my opinion, they are the coolest generative models!

In the dynamic linear model, we assume that the data we observe over time is a noisy realization of a latent true process. For instance, to monitor air quality the EPA records the hourly concentration values for a variety airborne particles (O3, NO2, etc.). However, due to the measurement error from device and operation, the recording data is a noisy version of the true value. A similar example is the navigation system, where the GPS data from the satellite would deviate from the actual coordinates to some extent – thus your appearing to drive through a corn field!
In both cases, a dynamic linear model can be used to filter out the noise and perform predictions.


In the hidden Markov model, we assume that there is more than one underlying data-generating mechanisms, or the so-called states. For instance, physical activity data gathered through wearable devices like FitBit and iWatch. Those data do not contain the activity labels but provide only the intensity during the wearing time, which is driven by the actual activity state. In fact, one of my current research projects is to identify interesting patterns in human activity data, which are measured by wearable devices worn continuously. Based on those data, we want to be able to determine whether a subject is doing high intensity activity (e.g., running), medium intensity activity (e.g., walking), or low intensity activity (e.g., resting) during different times of the day. We can use this information to compare the lifestyle between different subjects. This is an interesting topic, especially in this era of artificial intelligence. For example, we might be able to build “smart” wearable devices based on this model that provide personalized suggestions regarding healthy lifestyle choices. This framework is both useful for predicting and modeling the effect of activity on health outcomes.

To paraphrase from Plato’s idealism, truth is an abstraction of the external world that we live in. All that we observe is a projection of the ideal world into reality. It is great to have some tools that aim to uncover the truth from the observation, but care must be taken regarding when those tools are applicable.

Kalman, Rudolph Emil. “A new approach to linear filtering and prediction problems.” Journal of basic Engineering 82.1 (1960): 35-45.

Baum, Leonard E., and Ted Petrie. “Statistical inference for probabilistic functions of finite state Markov chains.” The annals of mathematical statistics 37.6 (1966): 1554-1563.

Jack is a PhD candidate working with Laber Labs. His research interests include  wearable computing and hidden Markov model.  Currently he is working on hidden Markov models with applications in veterinary data. We thought this posting was a great excuse to get to know a little more about him, so we we asked him:

Q: What are the five qualities that great PhDs and great artists share?


  1. Honesty. Falsification, fabrication, and plagiarism are despicable in both professions.
  2. Curiosity. Great PhDs and great artists are highly motivated to ask questions and seek answers.
  3. Creativity. Groundbreaking work almost always originates from out-of-the-box thinking.
  4. Detail orientation. Great PhDs and great artists demand perfection in every teeny-tiny detail in their work.
  5. Perseverance. Success in both professions requires persistence in spite of obstacles and setbacks.


The Two-Outcome Problem

Daniel Luckett, PhD
Post-doctoral Fellow, UNC

If you’re like me, you often find yourself struggling to select an outfit for a party. Finding the garments to provide the right balance between comfort and style is a difficult task. If you focus all your attention on looking sharp, you’ll find yourself showing up to the party in a suit and tie, while if you focus all your attention on being comfortable, you’ll find yourself showing up in sweatpants. In the absence of a professional stylist to consult, striking the right balance is tricky. I often find myself settling on a compromise: I’ll wear jeans, but I’ll pick out my nicest pair (what some refer to as “grad student formal”).

The difficulty in this decision making process lies in the fact that there are two competing goals. While the optimal decision for each goal may be obvious (wear a suit for style and sweatpants for comfort), there’s no decision that is simultaneously optimal for both. The right balance between the two goals isn’t obvious and the optimal balance may vary across individuals. Even if you could ask an individual directly, most people wouldn’t be able to articulate how they weight the trade-off between style and comfort. In this case, it might seem that the best solution would be to hire a team of professional stylists and observe how they select outfits for a number of individuals, doing their best to balance style and comfort using their expertise. Then, one could try to emulate the decisions of the professionals.

Similar themes show up in medical decision making. A large body of statistical literature has focused on estimating decision rules for assigning treatment to optimize a clinical outcome. However, this idea creates a disconnect with what actually happens in the clinic; much like we all have to select outfits to balance the trade-off between comfort and style, physicians often must make treatment decisions to balance the trade-off between multiple outcomes. Suppose you were a mental health professional treating a patient with bipolar disorder. You know that prescribing an antidepressant may help your patient control their symptoms of depression. However, you’ve recently read research articles, like the one by Gabriele Leverich and colleagues, indicating that antidepressants may induce manic episodes [1]. The value that each patient places on symptoms of depression and symptoms of mania is unknown and may vary from patient to patient. How can we use data to learn decision rules for treatment that balance two outcomes in a meaningful way?

A recent project that I have worked on approaches the two-outcome problem through the lens of utility functions. We assume that there exists some unknown utility function (a possibly patient-dependent function of the two outcomes) that physicians seek to optimize, perhaps subconsciously, when selecting treatments. The physician will not always be able to assign a patient the best treatment for that patient’s utility function, but we can assume that they are successful with some probability. In observational data, where treatment decisions are not randomized, this assumption allows us to model clinician decisions, estimate a patient-specific utility function, and estimate an optimal decision rule for the estimated utility function.

This idea represents a new way of thinking about observational data. Randomized controlled trials are widely considered the gold standard for medical research and many statistical methods are designed to take observational data and apply transformations that allow us to perform analyses as if treatment decisions were randomized. However, the statistical method we’ve been developing for this project handles observational data differently- by recognizing that when treatment decisions are not randomized, there may be information to be gleaned from the decisions themselves. This can be viewed as a form of inverse reinforcement learning, where we observe decisions made by someone with expertise, attempt to discern the goals of the expert, and finally, attempt to learn policies that will achieve the expert’s goals. This idea is similar in spirit to imitation learning, covered in more detail in a previous post on this blog entitled “The Computer is Watching!” by Eric Rose.

We applied our method to data from the observational component of the Systematic Treatment Enhancement Program for Bipolar Disorder (STEP-BD) study. By observing treatment decisions made in the clinic and assuming that physicians are implicitly acting with the intent to balance each patient’s depression symptom score and mania symptom score, we were able to construct a composite “depression-mania” score and estimate a decision rule for determining which patients should receive an antidepressant in order to optimize this composite score. We estimated that applying the resulting decision rule in a population would achieve a 7% improvement in the composite score compared to standard practice. Much like observing the actions of a professional stylist could help us all improve our fashion sense, in the future we may be able to use observed actions of experienced physicians to help us systematically construct better decisions rules for assigning treatment.

[1] Leverich, G. S. et al., (2006). “Risk of switch in mood polarity to hypomania or mania in patients with bipolar depression during acute and continuation trials of venlafaxine, sertraline, and bupropion as adjuncts to mood stabilizers.” American Journal of Psychiatry, 163(2), 232-239.

Daniel recently completed his PhD in biostatistics at UNC! Congratulations, Daniel!!  We thought this posting was a great excuse to get to know a little more about him, so we we asked him:

Q: Provide a list of five do’s and don’ts that apply both to effective teaching in STEM and dealing with a wild bear.


  1. Do be patient. Sometimes all you need to do is wait for a student’s  understanding to catch up or wait for the bear to wander away.
  2. Do document your experiences. You’ll want to study your notes to become a better teacher, and you’ll want that bear photo to show off to your friends.
  3. Do recognize that there are multiple approaches to problem solving. Do you play dead? Or fight back?
  4. Do report any concerns to a department chair/park ranger.
  5. Do look for ways to improve for next time. There might be other bears out there, and you have another lecture on Thursday.


  1. Don’t be distracted while on the job. Keep your attention on your students/surroundings.
  2. Don’t be afraid to use technology. You should use every resource you can to be successful.
  3. Don’t try to go it alone. Have your students work in teams, and hike with your friends.
  4. Don’t carry food with strong scents. It’s distracting for everyone.
  5. Don’t run. You can’t outrun a bear, and your students will be confused.

To the festival and back: A journey of fun and Statistics


Yeng Saanchi
Yeng Saanchi, PhD Candidate

The Science and Engineering Festival is a biennial event that attracts numerous exhibits from people in the Science, Technology, Engineering, and Mathematics (STEM) fields and is held at the Walter E. Washington Convention Center in Washington DC. The purpose of the festival is to enlighten children, especially teens, about the great work that is being done in STEM and to engender interest in these fields.

This year, the festival took place from April 6 to April 8. A few of us from Laber Labs took a trip to DC to exhibit a computer game called Laser Foxes, which was developed by some members of the lab. The game is built primarily on a statistical concept called classification — a term used for problems that involve predicting a label from among several possible labels based on some known features.

Now to a simple description of the game. As the name suggests, the game involves two foxes shooting lasers at each other while navigating a series of blocks, where the blue fox is controlled by the human player and the orange fox is controlled by the computer. A player could adopt one of four possible strategies at any point in the game. There is the Camper, who generally stays in one place; the Aggessor, who actively seeks out his opponent; the Forager, who searches for tools within the game to aid his mission; and the Evader, who actively avoids his opponent. Throughout the course of the game, the computer, or the Artificial Intelligence (AI), adjusts his strategy to counter what it predicts the human player’s current strategy to be. This prediction is based on the observed past moves of the human player. As the game goes on, the AI gets “smarter” at predicting and responding to the human’s strategy. In effect, the AI learns to beat the human player by observing the human’s plays.

The first day of the festival was a sneak peek and was open to only middle schoolers. The turnout was great. As soon as the doors opened, busloads of children, together with their teachers, disembarked at the entrance to the convention center. It was heartwarming to see how excited the kids were to play the game and how good some of them were at beating the AI, which is quite a feat in the game of Laser Foxes. Some of the children and many of the parents were actually quite interested in the theory behind the game and thought it was really cool to be able to create a computer game using Statistics. We gave out stuffed foxes as a memento as well as information cards with details about the American Statistical Association and Laber Labs. During brief lulls in the traffic of kids waiting to play the game, some of us played the game and even managed to win at the expert level. Alas, some of us still managed to lose every game at the easy level.

Attendance on the second day was overwhelming. This time the festival was open to the public. By early afternoon, we had run out of the over one thousand stuffed animals we had brought with us. Up until the very end of the day’s session, we had children playing the game, some of whom had to be convinced by their parents that they could return the next day before they agreed to leave. In between shifts, we took in some of the sights in DC since for some of us it was our first visit. My first sight of the city of Washington DC with its primarily white-stone buildings, brought to mind Gondor, that make-believe city in Tolkien’s Lord of the Rings that was the prominent kingdom of the race of men in Middle-earth. It was impossible to see the museums because of the long queue of people waiting to enter at each one we attempted to visit. However, we managed to see the cherry blossom trees, the Lincoln Memorial, and the White House.

The third day of the festival dawned bright and clear and we were looking forward to another day of Laser Foxes with the kids. Our last day went really well. The turnout was remarkably good and the kids were as excited as ever. It was with mixed feelings that we packed up at the end of the day, sad that the festival was over but glad at the same time that for a few days we had been able to share our love of Statistics through a fun game called Laser Foxes. All in all, it was a trip to be remember!

Yeng is a PhD Candidate whose research interests include predictive modeling and variable selection. We asked a colleague to ask Yeng a probing question —

Q: Classic good news / bad news situation. First, the good news. You’ve accidentally been buried alive in a casket with enough air to keep you alive for three days. Now the bad news. You’re bored and need to occupy your thoughts until you suffocate so you decide to write a short story about a squirrel who befriends an anthropomorphic acorn but struggles against eating him during the long winter when food supplies are running low. Tell us that story.

Winter had lasted far longer than Mr. Toad, the squirrel, had anticipated. Ten months of snow and ice was something no one could have foreseen. “What have those infernal humans done this time to cause such a drastic imbalance in weather?”, he wondered for the umpteenth time. As he sat and contemplated this, he also wondered how he was going to survive if the weather did not warm up soon. He was down to his penultimate acorn. The last acorn happened to be a particularly fat one that he thought could last him at least a week. Three days went by and still winter persisted. Mr. Toad was now down to the shiny and seemingly delectable acorn he had saved for last. He stared at the acorn sadly, reluctant to mar its smooth surface with teeth marks, but alas, he was too hungry to resist for long. Muttering an apology under his breath, he opened his mouth to take a nibble. Just before his teeth could make contact with the acorn, he heard a squeaky voice say, “Please sir, I beg of you, don’t eat me.” To say Mr. Toad was startled would be a gross understatement. He was positively stupefied and not a little terrified. He had assumed all this while that he was the only living being in his dwelling. He knew without a doubt that the voice must have come from the acorn but it was too much. A living, breathing, talking acorn? Who had ever heard of such a thing? Not even his Grandpa Turtle, the famous squirrel who had sailed the seven seas with Captain Blood, had ever mentioned such a phenomenon. “What devilry is this”, he wondered? “Has hunger made me delusional?” After a few minutes of silence, he mustered courage and asked, “Who are you?” In a breathy, chocolaty voice, the acorn replied, “My name is Mr. Mulberry”. “How is it that you can talk?”, asked Mr. Toad. And so began the tale of how the acorn became anthropomorphic. But that is a story for another time. Suffice it to say that by the time Mr. Mulberry was done telling his tale, the ice had begun to melt. Spring had arrived! Mr. Toad and Mr. Mulberry, now fast friends, ventured outdoors for the first time in almost a year, breathing in the fresh smell of spring. The End.

This is Yeng’s second post! To learn more about her research, check out her first article here!

Exploration and Exploitation Trade-off

Jesse Clifton
Jesse Clifton, PhD Student 

Most people want to eat as many delicious meals as possible throughout the course of their life. What’s the optimal strategy for accomplishing this goal? Every time you decide on a restaurant, you have a choice between going to the best restaurant that you know or trying somewhere new. If you always go to your favorite restaurant and never try anything new, you’re likely to miss out on even better dishes at new places you’ve never been to. But if you always try a new restaurant, you’ll eat a lot of meals that aren’t as good as your current favorite. Maximizing the number of delicious meals over a lifetime means balancing this trade-off.

The exploration-exploitation trade-off is a dilemma we face in sequential decision-making: each time we have a decision to make, should we try to learn more about the world or stick with what we currently think is the best decision?. Acting to learn — exploration — gives you more information to help achieve your goals in the long run, but you lose out on gains from going with your current best guess. When you exploit, you give up the chance to learn something new.

The exploration-exploitation trade-off arises in many problems studied in Laber Labs: decision-making in artificial intelligence, design of optimal medical treatment regimes, and effectively preventing the spread of disease are a few examples. I’m currently researching exploration and exploitation in cases where there is a huge number of choices available to the actor. For example, when public health decision-makers were trying to stop the spread of the recent Ebola epidemic, they had to decide whether to treat (given limited resources) each of dozens or hundreds of locations. All possible combinations of decisions to treat or not-treat each location add up to an astronomical number of possible decisions, so this is an example of a large action-space problem.

To explore effectively in large action-spaces, I’m looking into variants of an old technique called Thompson sampling. In Thompson sampling, we maintain a probability distribution that expresses our uncertainty over various models of the environment and continually update a probability distribution over the parameters of these models. In order to explore, we sample one model from this probability distribution and try to make the best decision acting as if this model were true. However, we also exploit effectively, because — as we get more data — our probability distribution will concentrate on the most accurate models and, therefore, lead to reasonable decisions.

Continuing the Ebola example, our models might be epidemiological models of how disease spreads between locations. As we observe the actual spread of disease, we update our uncertainty (probability distribution) over the parameters of these disease models. Each time we need to make a decision, we sample a single model from this probability distribution and try to act optimally according to this sampled model.

So much for our brief introduction to Thompson sampling. While the techniques of formal sequential decision-making may be less relevant to our everyday lives, the exploration-exploitation trade-off crops up in many of the decisions we make under uncertainty. Simply being aware of the costs and benefits of exploring and exploiting may help you to maximize your own payoffs in the long run.

Jesse is a PhD Candidate whose research interests include reinforcement learning and artificial intelligence. His current research focuses on finite approximations to infinite-horizon decision problems. We asked a fellow Laber-Labs colleague to ask Jesse a probing question —

Q: Suppose you made a significant discovery in the course of your research that could lead to the development of an Artificial Intelligent Digital Assistant (AIDA) which could result in medical breakthroughs that we have up until now only been able to dream about. However, there’s a 0.01% chance that AIDA could develop a mind of HER own, work toward the annihilation of the human race and succeed. Would you publish your research or would you destroy it so that it never sees the light of day? Perhaps, the discovery of a cure to cancer is worth the risk?

A: Assuming we ought to maximize expected value, consider that the expected number of lives saved by not turning on AIDA is 0.01 * (Expected number of people ever to live if AIDA doesn’t destroy the world). The latter is astronomically large, given that if civilization survives it may spread through the galaxy and beyond and persist until the heat death of the universe. This dwarfs the good that would come from medical breakthroughs, unless we expect these medical breakthroughs to be a necessary condition for civilization’s colonization of the universe.

This leaves out some considerations, such as scenarios where an AIDA-like discovery is made by someone else even if I don’t share my findings. But altogether, on the (debatable) assumption that saving astronomically many lives in expectation is good, I would destroy my research.