Jump to content

Data Journalism - How Nerds Might Save The News From Itself.


Recommended Posts

There's a very refreshing trend in the media these days called "data journalism". Essentially, news outlets hire people trained in empirical analysis to write articles (mostly online) about the news and political issues.

Here are links to a couple of my favorite ones:

New York Times' new "The Upshot":

http://www.nytimes.com/upshot/

Washington Post's "The Monkey Cage", which is run by several prominent political scientists:

http://www.washingtonpost.com/blogs/monkey-cage/

Nate Silver's now-famous 538:

http://fivethirtyeight.com/

Pollster.com is unfortunately affiliated with HuffPost, but they provide some of the best polling analysis on the web:

http://www.huffingtonpost.com/news/pollster/

Ezra Klein has a new website called "The Vox" which isn't exactly data journalism (they don't do their own models), but it's empirically based policy analysis and news reporting.

For me, this is an incredibly important development in news reporting because it allows reporters and commentators to delegate the analysis part of the news to those who are actually trained for that. Commentators don't have to speculate about whether the economy or candidate characteristics are a bigger influence on the election...the models and data tell us exactly what effect each has. So reporters and commentators are free to provide context and understanding to the data and reporters are free to link the analysis to broader issues in their stories.

This isn't the dominant mode of news reporting yet, but hopefully it will be in the coming years.

The importance of data journalism is that it provides a reality check against the partisan windbags who make overblown and unrealistic predictions or claims. On the left, you have the Democratic partisans claiming that they will keep the Senate and pick up seats in the House. But all of the forecasts say that there's a good chance of a GOP Senate majority and that the GOP will pick up seats in the House. So journalists and reporters have the facts to dispute or at least contextualize the overblown predictions from Democrats this season.

The biggest disappointment for me is that we do not have any data journalists on the conservative side (unless I'm missing someone). The closest would be Jay Cost, who did not complete his Ph.D. and just doesn't have the methodological skills of an Andrew Gelman (Monkey Cage), John Sides (Pollster.com), or Nate Silver (538). This is why we got the "skewed polls" idiocy in 2012. There were plenty of people like Nate Silver explaining why that was nonsense, but he was dismissed as a liberal (which he is) even though his data analysis is perfectly objective and non-partisan. If FOX News or some other conservative media would crank up a data journalism unit, it would not be so easy for conservatives to ignore reality and continuously hear "louder echoes of their own voices". Nobody would have taken the "skewed polls" thing seriously if you had a conservative affiliated data guy explaining why it's crap.

If anyone knows of a data journalism site that is affiliated with conservatives then by all means post it. But I've been looking for awhile and can't find any.

Anyway, thought that I'd make this thread as a way for people to post articles of interest from these data journalism websites and/or discuss this new trend in news reporting.

Link to comment
Share on other sites

Here's one from The Upshot discussing media bias and slant in news reporting:

Media Slant: A Question of Cause and Effect

Consumers of the news, both from television and print, sometimes feel that they are getting not just the facts but also a sizable dose of ideological spin. Yet have you ever wondered about the root cause of the varying political slants of different media outlets?

That is precisely the question that a young economist,
, has been asking. A professor at the Booth School of Business at the University of Chicago, Mr. Gentzkow was recently awarded the
by the American Economic Association for the best economist under the age of 40. (Full disclosure: As one of the association’s vice presidents, I was among those who voted to give him this award.) His main contributions have been to our understanding of the economics of the media industry.

, of which he was a co-author with Jesse Shapiro, a University of Chicago colleague, studied the political slant of more than 400 daily newspapers nationwide. The first step in their analysis, which was published in 2010, was simply to measure the slant of each paper. But that itself was no easy task.

When you listen to Sean Hannity of Fox News and Rachel Maddow of MSNBC, for example, you probably have no trouble figuring out who leans right and who leans left. But social scientists like Mr. Gentzkow and Mr. Shapiro need more than subjective impressions. They require objective measurement, especially when studying hundreds of news outlets. Here the authors were devilishly clever.

Mr. Gentzkow and Mr. Shapiro went to the Congressional Record and used a computer algorithm to find phrases that were particularly associated with the rhetoric of politicians of the two major political parties. They found that Democrats were more likely than Republicans to use phrases like “minimum wage,” “oil and gas companies” and “wildlife refuge.” Republicans more often referred to “tax relief,” “private property rights” and “economic growth.” While Democrats were more likely to mention Rosa Parks, Republicans were more likely to mention the Grand Ole Opry.

With specific phrases associated with political stands, the researchers then analyzed newspaper articles from 2005 to determine which papers leaned left and which leaned right. (They looked only at news articles and excluded opinion columns.) That is, they computed an objective, if imperfect, measure of political slant based on the choice of language.

To confirm the validity of their measure, Mr. Gentzkow and Mr. Shapiro showed that it was correlated with results from subjective surveys of readers. For example, both the computer algorithm and newspaper readers rated The San Francisco Chronicle as a distinctly liberal paper, and The Washington Times and The Daily Oklahoman as distinctly conservative ones. Both measures put The New York Times as moderately left of center and The Wall Street Journal as moderately right.

With a measure of political slant in hand, the researchers then analyzed its determinants. That is, they examined why some papers write in a way that is more consistent with liberal rhetoric while others are more conservative.

A natural hypothesis is that a media outlet’s perspective reflects the ideology of its owner. Indeed, much regulatory policy is premised on precisely this view. Policy makers sometimes take a jaundiced view of media consolidation on the grounds that high levels of cross-ownership reduce the range of political perspectives available to consumers.

From their study of newspapers, however, Mr. Gentzkow and Mr. Shapiro, find little evidence to support this hypothesis. After accounting for confounding factors like geographic proximity, they find that two newspapers with the same owner are no more likely to be ideologically similar than two random papers. Moreover, they find no correlation between the political slant of a paper and the owner’s ideology, as judged by political donations.

So, if not the owner’s politics, what determines whether a newspaper leans left or right? To answer this question, Mr. Gentzkow and Mr. Shapiro focus on regional papers, ignoring the few with national scope, like The Times. They find that potential customers are crucial.

If a paper serves a liberal community, it is likely to lean left, and if it serves a conservative community, it is likely to lean right. In addition, once its political slant is set, a paper is more likely to be read by households who share its perspective.

Religiosity also plays a role in the story, and it helps Mr. Gentzkow and Mr. Shapiro sort out cause and effect. They find that in regions where a high percentage of the population attends church regularly, there are more conservatives, and newspapers have a conservative slant. They argue that because newspapers probably don’t influence how religious a community is, the best explanation is that causation runs from the community’s politics to the newspaper’s slant, rather than the other way around.

The bottom line is simple: Media owners generally do not try to mold the population to their own brand of politics. Instead, like other business owners, they maximize profit by giving customers what they want.

These findings speak well of the marketplace. In the market for news, as in most other markets, Adam Smith’s invisible hand leads producers to cater to consumers. But the findings also raise a more troubling question about the media’s role as a democratic institution. How likely is it that we as citizens will change our minds, or reach compromise with those who have differing views, if all of us are getting our news from sources that reinforce the opinions we start with?

N. Gregory Mankiw is a professor of economics at Harvard.

Link to comment
Share on other sites

Here is an interesting article about a polling firm in Georgia - InsiderAdvantage - that has been getting a lot of press recently.

We take a closer look at one pollster's attempt to combine automated phone polls with Internet surveys. Another poll shows a big racial divide on Donald Sterling's punishment. And the turnout wars continue. This is HuffPollster for Friday, May 2, 2014.

REVIEWING INSIDERADVANTAGE'S UNUSUAL NEW METHODS - In the mid 2000s, telephone polls conducted using automated, recorded voice methodology (sometimes referred to as IVR) built a record of accuracy in forecasting election outcomes. They did so at a time when when the percentage of Americans who use only cell phone was still in the single digits. As the "cell only" population surged, however -- 38 percent of U.S. adults now have wireless service only -- automated pollsters faced a mortal threat, since federal law prohibits dialing cell phone with "auto dialers." [National Journal, CDC]

Over the last two years, most of the well-known automated pollsters have started to supplement their landline samples with interviews conducted over the Internet, mostly using non-random panels of individuals who volunteered to complete online surveys. Those pollsters have officially disclosed few details of their methodology. [Previously: HuffPollster reported details on SurveyUSAand PPP]

A new poll released on Thursday by the Georgia based pollster InsiderAdvantage and new partner OpinionSavvy produced a flurry of questions on Twitter about its methodology. InsiderAdvantage's polling combines automated, recorded voice survey calls to landline telephone lines with interviews completed over the Internet. OpinionSavvy's Matt Towery, Jr., a PhD candidate at Georgia State University (and son of InsiderAdvantage CEO Matt Towery), provided more information on their methods to HuffPost.

Towery Jr. explains that their online sample combines interviews from an online panel, which he did not specify, with respondents intercepted via Facebook. He sampled voters from Facebook by placing advertising on the social networking site that invited users in Georgia to click through and complete a survey. The online interviews were among all registered voters and not screened to cell only or any other subpopulation. Neither data source can be considered a random sample of Georgia voters.

Thursday's poll story included this description of how they combine the telephone and online samples: "Over multiple iterations (preserving original ratios), the online and telephone

polls were integrated and subsequently resampled randomly. The poll was weighted for age, gender, and political affiliation." HuffPollster found that description puzzling, as did several other pollsters. Towery explains that they started with a larger sample (a total of 1,474 interviews), and randomly selected the 737 interviews used to produce the final results, setting a 2:1 ratio of telephone to online surveys (491 telephone and 246 online surveys) [Fox 5 Atlanta, @MysteryPollster]

Why a 2:1 ratio of phone to internet? Towery: "The 2:1 relationship is based on a experimentation. We've been trying online vs phone polling and a combination thereof for a while now, and there are obvious demographic biases in each. For online polls, the 30-44 male demographic is most evident (apart from pre-screened samples, which are heavily female). The 2:1 ratio is an expression of a general ideal population of voters; in this case, it applies to GOP primary voters in Georgia. The 2:1 relationship might change, depending on the jurisdiction; since we have run polls identical to this one in the past few months, I have been able to narrow this ratio to 2:1, based on prior observations. Given this ratio (for this poll, at least), weighting is almost unnecessary."

Why sample from a sample? Why not just weight? "My answer is that it is basically the same, but slightly more thorough. Here's why: (a) It's a matter of consistency: If I were to weight a poll according to the 2:1 relationship, then we are effectively eliminating a large number of online respondents, while the telephone respondents remain untouched (or vice versa). While this will not inherently result in any bias, over sampling, etc., I prefer to perform the same procedure to each sample as a matter of consistent transformation of the data. (b Randomization is almost never a bad thing: I've used this sample-in-sample technique previously in social scientific research to reduce larger datasets for analytical purposes, as well as to mitigate the damages of potential clustering. The same goes for polling data: it's possible that clusters around a latent variable are present, and further randomization decreases the chances that the clusters will present themselves in your results. Sure, stratification can help, but if a variable is truly latent, a randomized sample-in-sample is a more effective solution. I am happy to give up 50% of the respondents for a more representative sample with a slightly higher margin of error."

So what should we make of this method? - We reached out to two prominent survey methodologists:

-Natalie Jackson (who will soon join HuffPost as our senior data scientist): "I don't like the idea of deleting data under most circumstances. Throwing data out is not only wasting time and money, it is theoretically contrary to what survey researchers are trying to do: we beg people to talk to us, to let their opinions be heard. To then 'randomly' and systematically delete large numbers of the opinions we work so hard to get is to essentially say those opinions don't count. With the current abysmal state of response rates and cooperation rates, I don't think we can afford the implication that some opinions provided to us aren't counted in our field...More generally, blending an online sample from multiple sources is a complex process on its own; adding that to a phone sample and then resampling the whole thing (plus weighting) makes the process pretty opaque. The farther we get from basic sampling principles, the more questions there are to be answered regarding representativeness and validity."

Charles Franklin (director of the Marquette Law School Poll and co-founder of the original Pollster.com): "When polls abandon probability sampling they lose the theory (and theorems) that prove samples can be generalized to populations. There is not yet an accepted theory for how to generalize from non-probability samples, including internet samples, though there are a number of interesting approaches being tested. Some of these rely on weighting by a variety of demographic information. Others rely on estimating relationships in the sample and then applying that model to a known population (usually from census data or voter lists.) And some have remarkably ad hoc approaches. Those based on explicit models can be replicated and tested in a variety of settings but how well they work is an empirical question. The more ad hoc the approach the more impossible it becomes to assess. In effect we have polls with no theoretical basis to claim legitimacy. Maybe they work. Maybe they don’t. We don’t know."

Link to comment
Share on other sites

To sum up: People read blogs as news; and Acworth supports the trend.

It's better than watching the cable "news" channels and only hearing partisans screaming at one another. These blogs are reporting on the news more accurately than a lot of the journalists out there. And the articles are based on valid, scientific data analysis. What's wrong with that?

Link to comment
Share on other sites

It's better than watching the cable "news" channels and only hearing partisans screaming at one another. These blogs are reporting on the news more accurately than a lot of the journalists out there. And the articles are based on valid, scientific data analysis. What's wrong with that?

even that information can be misleading or suspect

didn't we have a thread comparing U.S. income inequality to European income inequality recently with statistics to back it up?

before I make my point, you know I believe our middle class is being damaged.

however, the comparison was extremely flawed even tho backed by those facts/statistics because it didn't bother to compare the differences in the lifestyles of the poor in the U.S. vs. Europe or the differences between the lifestyles of the middle class in the U.S. vs. Europe.

Link to comment
Share on other sites

even that information can be misleading or suspect

didn't we have a thread comparing U.S. income inequality to European income inequality recently with statistics to back it up?

before I make my point, you know I believe our middle class is being damaged.

however, the comparison was extremely flawed even tho backed by those facts/statistics because it didn't bother to compare the differences in the lifestyles of the poor in the U.S. vs. Europe or the differences between the lifestyles of the middle class in the U.S. vs. Europe.

I agree with you that it can sometimes be misleading. But that's why it's so important to have fact-based discussions. People from both sides can point to flaws in the conclusions drawn from the data or otherwise argue over the interpretation of the data. But at least everybody is operating on the same numbers.

The biggest problem I have right now with political discourse and news reporting is that there are no shared facts. Politicians can make up stuff and it gets reported as "news" without reporters questioning the factual reality of it. When reporters have people actually trained in data analysis to cite in their story explaining that the politician made up the claim, then there's at least some possibility of a reality check against the partisan rhetoric.

And it's a far sight better than the punditry and commentary we see today, which is largely rooted in outright myths, fantasies, or anecdotes.

Link to comment
Share on other sites

Here's a great example of why these data journalism sites are needed. This is a story from Politico.com's "Chief Political Columnist" discussing why you can't believe polls:

http://www.politico....1211/70717.html

No need to read the whole thing, but it's the usual nonsense from someone who knows nothing about basic statistics. A lot of blogs like The Monkey Cage were all over him about his ignorance, but the editor at Politico should have nixed the article completely. There's nobody at Politico.com with even the minimal amount of education to have prevented Simon from humiliating himself. Having a team of data journalists in every news agency can improve the quality of news reporting and commentary.

Link to comment
Share on other sites

I agree with you that it can sometimes be misleading. But that's why it's so important to have fact-based discussions. People from both sides can point to flaws in the conclusions drawn from the data or otherwise argue over the interpretation of the data. But at least everybody is operating on the same numbers.

The biggest problem I have right now with political discourse and news reporting is that there are no shared facts. Politicians can make up stuff and it gets reported as "news" without reporters questioning the factual reality of it. When reporters have people actually trained in data analysis to cite in their story explaining that the politician made up the claim, then there's at least some possibility of a reality check against the partisan rhetoric.

And it's a far sight better than the punditry and commentary we see today, which is largely rooted in outright myths, fantasies, or anecdotes.

I agree...you can't have a meaningful debate without the foundation being facts

Link to comment
Share on other sites

Try an experiment some time. Watch the big three broad at networks news broadcasts. Switch back and forth between the three. You will notice that they are reporting the exact same stories and providing commentary which is exactly alike. Now, it's not remarkable that they are reporting the same stories, because they can't control what happens, but it is remarkable that they all come to the same conclusions. Plus it's remarkable when they all ignore the same stories.

It's also remarkable that much of their news stories involve attacking other sources of news which report the stories they ignore.

Link to comment
Share on other sites

It's better than watching the cable "news" channels and only hearing partisans screaming at one another. These blogs are reporting on the news more accurately than a lot of the journalists out there. And the articles are based on valid, scientific data analysis. What's wrong with that?

So it's not that blogs have gotten better, It's that cable news channels have gotten that bad.

Link to comment
Share on other sites

So it's not that blogs have gotten better, It's that cable news channels have gotten that bad.

No, it's that we now have blogs devoted specifically to analyzing the news and public policy from an empirical perspective. There have been very few data analysis specialists in the media until the last few years, and most of that can be attributed to Nate Silver's election forecasts. Now we have over a dozen of these types of websites, many of them holding prominent places at major news outlets such as the New York Times and the Washington Post. It's a good thing that these outlets are giving prominence to the new data journalism.

I still don't understand why you think it's a bad thing that people are going to these places for information about politics and public policy.

Link to comment
Share on other sites

I still don't understand why you think it's a bad thing that people are going to these places for information about politics and public policy.

First rule of journalism is not to be part of the news.

Experts who are both Journalist and "researcher" have an incentive to manipulate data to serve their personal agenda, and worse, people will mistakenly accept their slanted opinions as indisputable facts. It breeds a false sense of legitimacy.

For example, I would be skeptical of anyone recommending a blogger/family doctor for health advice. Is his recommendation supposed to improve my health or get website hits?

Link to comment
Share on other sites

First rule of journalism is not to be part of the news.

Experts who are both Journalist and "researcher" have an incentive to manipulate data to serve their personal agenda, and worse, people will mistakenly accept their slanted opinions as indisputable facts. It breeds a false sense of legitimacy.

For example, I would be skeptical of anyone recommending a blogger/family doctor for health advice. Is his recommendation supposed to improve my health or get website hits?

That's the point of my posts...we have a group of people doing objective data analysis that reporters can then rely upon in their news reporting. We need these same kind of people doing it for conservative affiliated news organizations, as well. Do you think that the people on the links that I listed above are manipulating things to promote an agenda?

Link to comment
Share on other sites

That's the point of my posts...we have a group of people doing objective data analysis that reporters can then rely upon in their news reporting. We need these same kind of people doing it for conservative affiliated news organizations, as well. Do you think that the people on the links that I listed above are manipulating things to promote an agenda?

It doesn't matter if they're manipulating data on purpose or not. A true researcher should not have dual interests; they should only be concerned with gathering accurate data. It's the reason why Global Warming research got de-railed and lost it's credibility, they were corrupted by political interests: No matter what evidence they present, no one can take them seriously.

There shouldn't be a need to have researchers working for both a liberal and conservative news organizations. Both sides should be able to use data from an independant, neutral source.

Link to comment
Share on other sites

It doesn't matter if they're manipulating data on purpose or not. A true researcher should not have dual interests; they should only be concerned with gathering accurate data. It's the reason why Global Warming research got de-railed and lost it's credibility, they were corrupted by political interests: No matter what evidence they present, no one can take them seriously.

There shouldn't be a need to have researchers working for both a liberal and conservative news organizations. Both sides should be able to use data from an independant, neutral source.

What are you talking about "gathering data"? Have you even visited the places that I'm talking about to see what they're doing? It's like you've got this square peg and are running around trying to hammer it into a round hole.

Most of the links above use public data (Census data, opinion polls, economic indicators) that are gathered by independent third parties. They analyze the data to answer questions about public policy and things in the news. It's sort of the political version of "Mythbusters".

You should read some of the articles from those sites before you criticize them, because your criticism of this new trend makes zero sense.

Link to comment
Share on other sites

What are you talking about "gathering data"? Have you even visited the places that I'm talking about to see what they're doing? It's like you've got this square peg and are running around trying to hammer it into a round hole.

Most of the links above use public data (Census data, opinion polls, economic indicators) that are gathered by independent third parties. They analyze the data to answer questions about public policy and things in the news. It's sort of the political version of "Mythbusters".

You should read some of the articles from those sites before you criticize them, because your criticism of this new trend makes zero sense.

Just the fact that you need to have both conservative and liberal "experts" analyzing the same data because they will come away with opposing conclusions is the problem. Their opinions aren't fact, despite your obsession with presenting them as such in threads. It's exactly what I said, you're projecting false legitimacy to these bloggers.

The process from research to analysis should be left to the third party, not co-opted mid-process by blogger/experts. It will just create an environment where both sides can be proven correct by the same statistic. It's bad journalism.

example:

Researcher: We found this new chemical present in fish.

Data Journalist: I'm an expert in chemicals. They are bad for you. Ban fish!

Researcher: wait, wait! we're not done yet!

Link to comment
Share on other sites

Just the fact that you need to have both conservative and liberal "experts" analyzing the same data because they will come away with opposing conclusions is the problem. Their opinions aren't fact, despite your obsession with presenting them as such in threads. It's exactly what I said, you're projecting false legitimacy to these bloggers.

The process from research to analysis should be left to the third party, not co-opted mid-process by blogger/experts. It will just create an environment where both sides can be proven correct by the same statistic. It's bad journalism.

example:

Researcher: We found this new chemical present in fish.

Data Journalist: I'm an expert in chemicals. They are bad for you. Ban fish!

Researcher: wait, wait! we're not done yet!

Okay, you really don't know what you're talking about here and it's obvious that you haven't looked at the websites I linked in the OP. Was it just "opinion" when Nate Silver forecasted the 2012 election? Was he simply coming away "with opposing conclusions"? Or was he looking at the data accurately while most other people were filtering the data through their partisan lenses?

No, he objectively analyzed the data and used that analysis to predict the outcome with a lot of accuracy. Because he's not some opinion blogger pontificating about politics. He's a data journalist who was reporting his research and analyses on the New York Times website.

Go back and read my posts again...I didn't say that we need "both conservative and liberal experts analyzing the same data". I said that we need data analysts WORKING FOR BOTH liberal and conservative news outlets, because currently some right-winger caught up in the "skewed polls" nonsense is just going to dismiss the data analysis from Silver because it's posted on the New York Times.

The original analysis that is presented on these data journalism websites is objective and scientifically valid. But most of the time, they are not collecting any data themselves. And a lot of the time, they are actually presenting a series of peer-reviewed journal articles that have been written by academics, making such research accessible to the general public in a way that hasn't happened in pretty much the history of modern journalism.

The "example" that you gave shows how little you know about the work being done by these people. Again, go read their work before you make a comment like that.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

 Share

×
×
  • Create New...