Sunday, 23 August 2015

What data journalism means to me

So I told someone recently I was a data journalist.

"Define 'data journalism'," they said.

"No," I replied.

"Why not?" they asked.

"Because it's an incredibly tedious thing to do," I said. "And anyone who thinks it is important to define data journalism precisely is likely to be doing it really, really badly."

I'm paraphrasing. In real life I'm not that much of an asshole. But it's basically what I said, and certainly what I believe.

The fact is, I am not interested in precisely defining data journalism. I am interested in practising data journalism, and how other people practise it. 

I am interested in brilliant content - content that is valuable; content that is unique; content that people actually want to read, share and engage with.

I am interested in how we harvest the sea of information that is now available to anyone with a computer, tablet or mobile phone. 

Information being pumped out, in real time, in the guise of data feeds. 

Information being published by governments in the form of spreadsheets that no one seems to bother to read. 

Information - stories, scandals even - hiding in plain sight.

Now, okay. If we can't construct any kind of definition of data journalism, then it becomes a meaningless concept. 

I wouldn't want that. 

But I'm with Wittgenstein on these things. I think data journalism is defined primarily by practice. I also believe it is a classic Wittgensteinian 'vague concept'. 

Like sport. 

Some things are clearly sports (rugby, say). Some things are clearly not sports (having a bath, for example). And then there are things in between. 

Is darts a sport? It has certain 'sporty' features, like precision improved with training, competition, etc etc. 

It has 'non-sporty' features, too, in that it isn't physically taxing, and (at least if you are an amateur) drinking beer can make you better at it.

So is darts a sport? 

Dunno. Don't care. There is no categorical answer because the concept 'sport' isn't precisely defined.

That doesn't mean there is no such thing as sport. Or that darts is any greater, or lesser, an activity, for being, or not being, a sport.

Is Phil Taylor a sportsman?


Is Phil Taylor amazing?


Same with data journalism. Some things are data journalism. Some things are not data journalism. Some things are like darts: borderline. Are they data journalism? Are they not?

I'm not interested in that. I'm interested in whether they are amazing.

Having said all that, I am interested in the ways people try to over-define data journalism; to exclude things as not really data journalism at all.

I'm interested in this because I think it can teach us valuable lessons.

One the one hand, you have those people who say: "That isn't really data journalism."

We got this, for example, when we did our 'Pick your horse with data' gadget for the Grand National. This was, clearly, a bit of fun: a bit of fun aimed at people (like me) who don't just want to stick a pin in a list of names, but also don't want to spend hours and hours poring over every horse's performance in every race on every type of going etc etc.

It was a bit of fun.

It involved programming. And algorithms. And visualisation.

But it was definitely a bit of fun.

Was it 'journalism'?


Was it fun? Was it original? Did people love it?


Then, at the other end, you have those who say: "This isn't really data journalism."

Often this is simply code for: it wasn't visibly complicated enough for me. 

Or: there weren't enough really difficult maths in it for me.

These seem odd complaints. It seems to me that if we want to connect with a mass audience then the difficult stuff should happen in the background; swans' frantic feet below the water, if you like, allowing for the serene progress above.

Also, I think people confuse 'statistics journalism' with 'data journalism'. 

I define 'data' the classical way: as information. There is nothing intrinsically mathematical about it.

For me, data journalism is essentially about applying advanced techniques for finding and interrogating information in the service of journalism in a digital age. 

That might involve scraping websites, knowing how to properly use spreadsheets, having programming skills, visualising data, and using and understanding data feeds.

Data journalism might involve all of these at once. Or more often some. Or sometimes none at all.

A final point. When Claire Miller and I set up the data unit, we defined two different work streams.

One was 'news', as classically understood. A perishable commodity with a 'line' or hook. We wanted to use data journalism techniques to create front-page news.

The other was 'resources'. By this we meant data journalism projects or tools which would be enduringly useful and which our readers could use to find and explore relevant information about their local schools, hospitals, crime rates or whatever.

I said at the start of this post I defined data journalism by practice.

For me, these splashes - based on some quite sophisticated statistical analysis by Claire of datasets that are not usually put together - were very much 'data journalism':


But then so was this one, based on a round-robin FOI that generated responses collated into a master spreadsheet that could then be sorted to generate a very simple but newsworthy line:

And so was this data scrape of sheet music by the data unit's Patrick Scott to find the modern singer with the greatest vocal range:

And so was Rob Grant's painstaking data-processing of more than a million records held by the Commonwealth War Graves Commission, which technical savvy allowed him to turn into such incredibly detailed (and moving) insights into local losses in World War One as this, for Liverpool:

All of these are 'data journalism'. And I'm proud of every one. 

Only I'm not proud of them because they are data journalism. I'm proud because they are bloody good.

No comments:

Post a Comment