I’m kidding. Your time in graduate school can be challenging, but like so many things in life, it’s how you take on those challenges that matters. My resolution to succeed was tested after jumping to the conclusion that two semesters of research will never see the light of day. I had a bit of an identity crisis. I questioned my life decisions. I was bitter and resentful. But getting through moments like these has made me realize the intrinsic value of a PhD.
Everyone has some theory of the world in which they conceptualize themselves. At least I’d like to think so. When someone or something dear to us objects to our theory or lurks outside our structure, chaos ensues. Luckily, I had the fortune to have such a formative experience and gain this perspective through the wide gamut of projects in our lab. I even had a short stint as a data scientist this summer at a local start-up. Over the past year, the projects I’ve worked on include:
- Monitoring food safety violation rates.
- Using digital mammography to predict breast cancer.
- Text mining Twitter data to identify incidences of food poisoning.
- Developing a means to detect age from facial and body markers.
- Reconciling disparate data sources.
- Building a simulation tool to illuminate the benefits and costs of microtransit.
If you aren’t familiar with these topics, let me assure you that this year was a random walk through research areas – no one topic naturally flowed from the last. In hindsight, it was interesting to encounter the broad divide among statisticians in understanding the nature of these problems and the best approaches to solve them—and no, this is not another one of those frequentists vs. bayesians posts. If there is a common thread to these projects, it would be that we have some set of inputs, x, to which we hope to apply some statistical magic so that we arrive at the response of interest, y. But what magic do we use? How do we get from x to y?
In one camp are those that generally assume that the input data is generated by some stochastic model and can be fit using a class of parametric models. By applying this template, we can elegantly conduct our hypothesis tests, arrive at our confidence intervals, and get the asymptotics we desire. This tends to be the lens provided by our core curriculum. The strength of this approach lies in its simplicity. However, with the rise of Bayesian methods, Markov chains, etc. this camp is beginning to lose the “most interpretable” designation. Moreover, what if the data model doesn’t hold? What if it doesn’t emulate nature at all?
In the other camp, are the statisticians whose magic relies on the proverbial “black-box” to get from x to y. They use algorithmic methods, such as trees, forests, neural nets and svms, which can achieve high prediction rates. I must admit, most of the projects I’ve worked on fall in this camp. But despite its many advantages, there are issues: multiplicity, interpretability and dimensionality, to name a few. Case in point, the team I worked with in the digital mammography project was provided a pilot data set of 500 mammograms from 58 patients with and without breast cancer. Our goal was to design a model that can flag a patient with or without cancer. But how can we make rich inferences from such limited training data? Some argue our algorithmic models can be sufficiently groomed to learn representations that meet that of a human mind. In this case, that of a radiologist in identifying mammograms associated with those at risk of breast cancer. However, our team worked on tinkering with a variety of adaptations of often-cited convolutional neural nets; each variation of which was not able to fully capture the representations we desired in identifying radiological features. The tools at hand were simply not designed to achieve the objective; grooming was not the solution.
So now that I’ve come through this experience and am again looking forward — I have to ask — in which camp do I fall? Perhaps it’s not either-or; perhaps it’s not even a combination, but something entirely new. Whatever the path forward is, I’m excited to be playing a part. It’s been an intense year, but the level of intellectual growth and personal self-discovery made it all the more worthwhile.
Leo Breiman. Statistical modeling: the two cultures, Statistical Science, 2001.
Joyce is a PhD Candidate whose research focuses on machine learning. We asked a fellow Laber-Labs colleague to ask Joyce a probing question —
Q: Propose a viable strategy to Kim Jong-un on how to take over the world in the next 5 years. — Marshall Wang
With just nuclear capability, NK is left with a route with low odds but high payout. They should continue to do missile tests that inflate their nuclear capability. Kim should also ramp up the disparaging comments against Trump for Trump’s inaction insinuates Americans would never use an atomic bomb. Such comments would likely not draw sanctions from strong allies, namely China and Russia. Kim should then leak intel on a planned nuclear weapon launch as close to the SK border as possible. If stars align, Trump could justify nuking NK, but the collateral damage in SK would likely draw political ire from the global community. If successful to this point, the US would fall into great political turmoil as Trump would be demonized to be worst than Putin. Kim would then need the US and Russia to somehow engage with one another in WW3. While they are preoccupied on that front, Kim could start a violent civil war within Korea. This may involve bombing highly populated areas in SK, though the US would HAVE to be preoccupied fiercely on other fronts AND China would have to be involved, perhaps on the Eastern front, for Kim to execute on such a scheme successfully. NK should at this time revert to a defensive strategy in order to move towards a united Korea. Over the course of a few years, Kim should hope both sides take heavy causalities, provide help to China when it can, and win over other Asian allies of US. And since Korea is among the most advanced nations in technological production, these years would leave a destructive gap in technological progress.
This is Joyce’s second post! To learn more about her research, check out her first article here!