One of the topics we discuss in our forthcoming book EVERYDATA is how over time, as more scientific studies are replicated and tested, often supposed "statistical" findings are overturned. Recently, there has been numerous articles in the popular media about sitting and sedentary work leading to increased death rates among Americans. In this insightful article in the Washington Post, a new epidemiological study is discussed which suggests the fear mongering may not really live up to what is in the data. Never fear, I am not giving up my FITBIT--but I get in my steps for general fitness, not out of fear that my desk job will lead me to an early grave.
Bethesda, Maryland? According to a recent article by Value Penguin, Bethesda is the most educated city in America.
Of course, as a consumer of everydata, we wanted to dig a little deeper into what the methodology was for these rankings. According to Value Penguin, they focused on "the 25-and-older population of 3,000 cities in this country, rewarding those with the highest percentage of post-high school degrees earned." Based on a government survey, they then ranked cities based on two dimensions--attended college and earned an advanced degree. In the methodology section, we think the description is a typo (they state that they combined no high school degree, high school degree, and college degree into attended college) but putting that to the side, a few observations:
(1) I wondered how the relative weightings were done to reach the composite score. it appears earning a college degree and an advanced degree are two metrics given equal weight (though that is not quite apparent). Are all advanced degrees created equally? It appears an associates degree is treated exactly the same as an MBA or a PhD? How does the bucketing effect the ultimate rankings
(2) Does the ultimate ranking metrics (which is based on the "score" derived from the underlying methodology) really do a good job of creating an ordinal ranking among these cities? It could be this list is very useful as an indicator of highly educated locations, but perhaps drawing finer distinctions as to whether Palo Alto or Bethesda or Newton are really the most educated city is less useful.
(3) Finally, what do we make of this outcome? Does this suggest if one wants to be more highly educated, we should all move to Bethesda, MD? I don't think so. It would be interesting to correlate these education statistics with relative income. It is not surprising to think that many of the cities on this list are among the most affluent and are located near major universities.
Is the "wild" salmon you bought at the supermarket truly wild?
According to a new study, it may depend - on the time of year.
According to Gizmodo, "The last study was done during the summer. This recent study? It took place during the winter... Essentially, the last study found almost no salmon fraud because there was a plentiful supply of salmon. When that supply dried up with the season, salmon fraud was suddenly everywhere."
Timing matters. Significant changes in the data can sometimes be explained by timing of when the data was sampled. Keep this in mind the next time you're comparing multiple data points.
As the Washington Post and others reported, Yelp is now posting food safety alerts for restaurants with poor food safety scores. For example, if you check out Bai Thong Thai in San Francisco, you'll see the big red box we've posted here.
From an everydata perspective, this new data raises all sorts of interesting questions:
- Is there a correlation between a restaurant's ranking from diners, and its food safety score from local officials?
- Will the food safety warnings change diners' behavior? (An assistant professor from Harvard will be studying the effects of the warnings, according to the Post.)
- How timely is the data? As Food Safety Magazine asked, "If a restaurant quickly resolves their food safety issues, how soon will Yelp take down the alert? Days later? Weeks? Months?"
- How accurate are past rankings in forecasting the future?
- Will a restaurant that advertises on Yelp be treated any differently than a non-advertiser when it comes to these food safety warnings? We certainly have no reason to believe that they would - but it's a question that a smart consumer of data should ask.
And that, my friends, is your daily recommended serving of everydata.
It may not surprise regular readers of Everydata that a true data geek tracks lots of nutrition data in his everyday life. Recently, my favorite app, Lose It! made this recommendation:
Basically, the app takes your daily calorie intake, and looks for patterns. In this case, on days that shrimp is on the menu, the daily calories are lower. As a sound consumer of Everydata, does that automatically mean that "try shrimp more often" will translate to weight loss? This is a great example of a correlation, but let's dig deeper.
This is a very small sample size of days, and this statistic is based on a relative comparison of days eating shrimp to days not eating shrimp. The potential for other factors here is quite significant. For example, I know that I eat shrimp early in the week and on weekends because that's when I food shop. By the end of the week, I've eaten all the shrimp in the fridge and so tend to eat it less on Thursdays and Fridays. Also, what if shrimp days were correlated with the days I go to the gym? I always work out on Mondays, and look at the calorie intake on Mondays...it's generally much lower. This is a great example of an "omitted" variable that could be driving the results.
The point is not to pick on Lose It! predictions or to disparage the health benefits of my favorite fish. I really enjoy getting these little statistical tidbits about my eating habits and thinking about it. But, a little knowledge of basic statistical concepts goes a long way in knowing how to interpret these numbers.
A recent study in the Journal of the American Medical Association explores a fascinating fact: Elephants, despite their large body size, have cancer mortality rates of about 4.8%, compared to humans whose rates are between 11 and 25%. The authors study a sample of various large mammals, including elephants, and one of their key findings--elephants appear to have 20 or more copies of a key cancer blocking gene (TP53) whereas humans only have one.
I find this research fascinating, and despite the fact I have no background in medicine, I found I could read the research and noticed a few things from the everydata perspective. First, the methodologies, sample sizes, and statistical precision of all the estimates are clearly stated. Second, the authors do not appear to overstate their results--explaining that IF their results can be replicated, this could have significant implications for future cancer research.
The study received quite a bit of popular press attention, including this articles on CBS News.
We conclude our collection of Fall infographics with a special look at the baseball post-season and the World Series, also known as the Fall Classic. This great infographic from ESPN shows regular season MVPs whose team went on to win the World Series. It doesn't happen as often as you'd think!