The Evolving Newsroom is a series of Q&As with important names in the data journalism field, discussing how the newsroom is evolving to better incorporate data and data-driven journalism. Next I’ve spoken with Simon Rogers, Editor at the Guardian’s Data Store and Data Blog.
Ændrew Rininsland: To start with, would you mind describing a typical workday?
Simon Rogers: We start by thinking about the news of the day — what’s going on today, what can we add to, what can we compliment, and is there stuff we can get stories out of by investigating and looking it? Because there’s not very many of us, we have to be very focused, we can’t look at everything. We also have different specializations — James [Ball] may focus on more investigative stuff, I might pick up some of the more mainstream stories. It really varies, depending on what we’re interested in. A news editor once said that the way to choose what you do each day is to look at things that interest you, because if it interests you, it will interest somebody else. I really believe that, you have to pick up things you can really add to.
We’ll look at that, and often then it goes into getting ahold of the data. Often we try to aim stuff for lunchtime, and the reason for doing that because is the web has sort of midday phenomenon, even with all of our U.S. readers. If you can get stuff up at lunchtime or early afternoon, you’re going to get more hits. So, there’s kind of time thing there, and a lot of stuff we do is very quick, we don’t have a lot of time to do stories. We don’t really spend ages doing stuff, whereas traditionally data journalism does spend a long time doing stuff. We don’t do that, we do things as quickly as humanly possible. But then what we do is we open up the data set and people can take it and do things in a better way. So there’s a lot of faffing around with the data, making it work properly, getting rid of all those merged columns and getting it to do what you want it to do. When you’re at that point, you can start looking at it and thinking what does this show us, what does this tell us — and that’s when it gets really interesting, when you can visualize it and make it work for you.
Then what we do is ask “What are we going to do with this?” Is it going to be interesting as a story? Is it going to work best as a graphic or visualization? The great thing about data journalism is it’s such a versatile format — what you do can end up in so many different results. You could just be printing the numbers, or you could be printing lots of things much different than that. It’s what makes it interesting — what you produce can be anything from a story to a visualization, an animation to anything else from that. Once we’ve done that, we kind of concentrate on getting stuff up and out there. But that’s not the end of it because being the Guardian, it’s very much an open organisation and we will try to put stuff out as much as possible on social media, Twitter, Tumblr, so on, see what people can do with it, and see what they like about it, what they don’t like about it and so on. So we try to make it a live process, where we respond to people as well.
Q: How does the Data Store make use social media?
A: We do it partly because they’re all traffic drivers — we often get about 50% of our traffic via Twitter, for instance. So it really works for us because less people use the front page and it’s really hard to get stuff on the front page; if you can’t do that, how are people going to find what you’re doing? So we rely quite a lot on social media, and some things do really, really well on social media without ever having being near the front page — the other day, we reproduced a graphic from the 1920s showing what people though life would be like in the 1950s, and it didn’t go anywhere near the front page until it started doing really, really well. So that stuff really matters for us. We’ve recently started doing a lot of stuff on Tumblr and Facebook, which at the moment because we haven’t been doing it that long, we don’t know how it’s going to go. But that could work out really interestingly, we just have to see what happens with that. That kind of thing, the increasing of that, is where people are going to share. Stuff going viral by word of mouth, that really matters for people. That’s what’s really interesting to us there.
Q: That community often feeds back with new content; how often does that community help produce new visualizations or other content?
A: It really varies; if you look at the comment field, some things do really well — we did the poverty map of England and it got 300 comments in about half a day. That’s one instance where feeding back into the visualizations, people take a bit longer. There’s only about 1400 active members of the Flickr group, but they are pretty active — they’ll normally post one or two things a day there. So it really varies, but I’ve noticed the frequency of stuff has increased over time.
Q: Talking a bit more about some of the Data Store’s outside contributors, what are some of the risk and rewards of having outside contributors?
A: I guess it can be risky, but newspapers have always used columnists and people who don’t work for them to write. There’s no difference in having somebody visualize something for you, who doesn’t work for you. What you often get are new takes and new ways of seeing things you wouldn’t normally get otherwise. You have to make sure stuff’s right, or if it’s not right you do stuff about it — that’s always a risk. We had somebody the other day we had to correct, because it wasn’t adequately sourced, and that happens sometimes. But it’s a risk worth taking because you get associated with being the place to see that kind of stuff.
Q: In terms of the positioning of the Data Store in the broader context of the Guardian’s newsroom and workflow, how would you describe it? Are you more of an island or more integrated into the overall workflow?
A: It’s really important that we’re part of the workflow of the paper as a whole. We sit on the news desk, we work closely with the news editors to suggest stories, we’ll see things they missed and so on. It’s really an interesting thing for us to be in our place. If we were too isolated, it’d be quite easy to just put your head down and worry about the latest scraper you’re working on, but it’s really much more important to be part of the routine of the desk. So now when people are, say, discussing the budget next week, other people will kind of just make sure we’re included in them, which I’m really pleased about, because that’s how it should be.
Q: How do other journalists in the newsroom perceive the work that you do?
A: I think before Wikileaks, it was seen as slightly odd — it’s still seen as slightly odd, really — but less-so. I think it’s always with journalists, that as soon as you get a story out of something, then it makes a difference, or as soon as you get content out of something — now increasingly what we do is stuff beyond stories that just does really well on the site — then people can see it’s important. They can see you can add stuff to what they’re doing — often journalists might be faced with a really hideous spreadsheet and they’ll might need help with that. I try not to turn anybody down, probably we all do a bit of that too much, but I think it’s important to be part of that process for as many people as possible.
Q: Has there been a movement towards computer-assisted reporting (CAR) and if so, has it been a gradual evolutionary process or more of a faster, revolutionary change?
A: I love this thing about definitions we always have to have; we talk about data journalists but also data scientists, data engineers and so on. I think CAR has been going for ages; Heather Brooke has been doing it for ages. What’s interesting now, I think, is how this really has become just part of the process for many people; it’s less unusual than it used to be, it’s less weird. What you have is a few tipping points that have kind of pushed it there — Wikileaks would be one, MPs expenses would be another — not necessarily in terms of what it achieved, but just for the idea that you could do it. Obvious the coverage of the riots was another. So these things add up and they create a whole which is more supportive, more interested in the entire area.
Q: You talked a bit about definitions and how people define things — how do you team members define yourselves? Do they see themselves as journalists, data journalists, data engineers, what?
A: Very much as a journalist; that’s my background, James’ background too. We’re reporters, and often what we’re doing is very pure journalism — it’s not colour reporting and all that kind of stuff people enjoy doing. There’s a strong history of people reporting on facts and trying to base on that their work and analysis, and that’s what we’re trying to do. I very much see us as journalists. It’s just journalism that’s changing all the time.
Q: In a previous interview, Conrad Quilty-Harper from the Telegraph mentioned the boundary changes map that you and they collaborated somewhat on. How frequently do you collaborate with other other data desks?
A: It’s more — I think we’re less… argumentative about stuff. For instance, the Telegraph had this really nice code which put two maps next to each other. So we pinched that, but we credited them for it. I’ve run a post before that Conrad Quilty-Harper wrote for the Telegraph site, some investigative work he did, and we ran that as well and said where it’s from. I think we’re less — obviously we’re a bit competitive in the same way that all journalists are competitive — but I think we’ll also recognize when other people have done good work and try to show that. A lot of the stuff we show on the site are graphics and things that have appeared elsewhere that we like. I’m about to put a post up now that’s graphics from CNN and New York Times and all sorts of people. I think there’s a whole sort of process there that’s interesting for people, that we’re doing much more of.
Q: So, how would you characterize your relationship with other data desks?
A: It’s quite a small world, people tend to know each other. You tend to know who the other people are because you tend to bump into them at events or conferences or things. It’s such a weird area that people who do it tend to stand out. I think it’s a good thing, it’s much more collaborative. Open journalism is what we’re about, right? This fits in totally with that. That said, the people we collaborate with might not be other journalists, they might be from developer groups or interesting organisations around the world.
Q: What part of the job is your favourite?
A: When you come across something that nobody else has found, “That is just a good idea, we should just do that,” — I love that aspect. And at the end of it, when you’ve done all the boring stuff and can see what it looks like; when you throw a load of locational data into a [Google] Fusion map and suddenly it creates a pattern that you haven’t seen before, that’s really lovely. That moment of realization that what you’ve got is actually interesting beyond just a number, that actually tells you something about that world — that works for me.
Q: On the flip-side of that, what’s your least favourite aspect?
A: God, it’s getting stuff out of PDFs; vlookups — I’m quite good at vlookups now, but I hate doing them when you’ve got like 650 departments or constituencies and you’re having to change a third of them. All that boring, kind of making data work together stuff — but it’s part of it, it’s part of the process, and I don’t know if you can improve the tools to make that easier. Google Refine does a lot of that, but I think at the end of the day, you’re going to end up with something that can be quite tedious, but it gets you to a point where you can tell a story you’d never be able to otherwise.
Q: How do you see the field of data journalism changing in the immediate future?
A: I think what will happen is it will become much more commonplace to the extent that what we do will not be that unusual — everyone will be doing it. And that’s not a bad thing, that’s a good thing. When I started reporting, we didn’t have Internet access; people laughed at the idea of having Internet access on your machine. It wasn’t until 1998 that I worked at a place where we had Internet access all the time. But now can you imagine a journalist not using the Internet? I think it’s just the same process, it will become part what other people do.
Q: There seems to be two viewpoints: on one hand, it’s a growing part of what everyone does, and on the other, it’s a specialist skill restricted to those with a particular technical literacy…
A: I suppose there’s such a lot of variety within this world; everybody’s not doing the same thing. We republish data, and also, I’m very keen that we do stuff as much as possible so it’s part of the news routine, and gets on the front as much as possible. Whereas other people will spend months doing something, but what they produce will be amazing. The way the New York Times will work for a long time on stuff before they put it out whereas we’ll do stuff really, really quickly, partly because we know it does well with the traffic. It’s an interesting thing.
Q: In terms of journalists outside of the data store adopting some of these CAR techniques, how fast do you see that progression going?
A: It’s really changed since even last year. I mean, you have Polly Curtis doing Reality Check, essentially she’s doing what we’d call data journalism, but it’s just reporting for her. I think that’s what we’re going to see; I’ve noticed it much more frequently. I mean, we work in an organisation where most people in the office don’t have access to Excel because it’s expensive, so it’s limited, but it’s increasing, especially young reporters coming in and it’s just part of what they do; they don’t see it as anything special.
Q: Talking a bit more broadly about other movements surrounding data journalism, how does the open data movement inform the work that you do?
A: I suppose it’s been a kind of catalyst for a lot of this stuff. We launched in 2009, just after data.gov was launched, just before data.gov.uk launched, so we’re really very much part of that timing. That’s really interesting for us, and it’s worked out quite well. That’s been lucky, and that’s maybe why we’re as well-known as we are in this field, just the timing. So it’s important to us, but I think it kind of helps you can ring up government departments now and they can’t refuse to give you stuff; very rare that people refuse to give you stuff now. So even during the riots, when we asked for those court records, people just said yes. That there’s a right to this information is really important.
Q: Are there any technologies you can see moving more towards the ubiquity that, say, maps now have? Also, how do you see geo stuff evolving into the future?
A: I think what we’ll see is much more disaggregated, raw stuff, where you can see stuff at a quite granular level. Especially geo stuff, we’re noticing that. We’re getting bigger data sets that are very, very local. And that’s really good. Certainly we’re in a place where we can do stuff with that.
Q: Do you see there being a separation between the print product and the online product in terms of the work that you do? If so, how does that vary?
A: I think now we’re very much part of the piece; often a story won’t launch until a data blog post is ready to go with it, which is great. It’s very much part of what we do. Often we’ll say “Get the data online” and also what we do often is product is graphics or visualizations for the paper. The Guardian is a multi-platform product, so we have space on all those platforms.