Friday, February 1, 2013

Netflix data tracking


video behind the scenes of Netflix tracking
http://whatsthebigdata.com/2012/12/24/data-science-at-netflix-with-elastic-mapreduce/





Netflix data tracking
http://www.salon.com/2013/02/01/how_netflix_is_turning_viewers_into_puppets/ 

How Netflix is turning viewers into puppets

"House of Cards" gives viewers exactly what Big Data says we want. This won't end well


I hit the pause button roughly one-third of the way through the first episode of “House of Cards,” the political drama premiering on Netflix Feb. 1. By doing so, I created what is known in the world of Big Data as an “event” — a discrete action that could be logged, recorded and analyzed. Every single day, Netflix, by far the largest provider of commercial streaming video programming in the United States, registers hundreds of millions of such events. As a consequence, the company knows more about our viewing habits than many of us realize. Netflix doesn’t know merely what we’re watching, but when, where and with what kind of device we’re watching. It keeps a record of every time we pause the action — or rewind, or fast-forward — and how many of us abandon a show entirely after watching for a few minutes.

Netflix might not know exactly why I personally hit the pause button — I was checking on my sick son, home from school with the flu — but if enough people pause or rewind or fast-forward at the same place during the same show, the data crunchers can start to make some inferences. Perhaps the action slowed down too much to hold viewer interest — bored now! — or maybe the plot became too convoluted. Or maybe that sex scene was just so hot it had to be watched again. If enough of us never end up restarting the show after taking a break, the inference could be even stronger: maybe the show just sucked.

In 2012, for the first time ever, Americans watched more movies legally delivered via the Internet than on physical formats like Blu-Ray discs or DVDs. The shift signified more than a simple switch in formats; it also marked a major difference in how much information the providers of online programming can gather about our viewing habits. Netflix is at the forefront of this sea change, a pioneer straddling the intersection where Big Data and entertainment media intersect. With “House of Cards,” we’re getting our first real glimpse at what this new world will look like.
For at least a year, Netflix has been explicit about its plans to exploit its Big Data capabilities to influence its programming choices. “House of Cards” is one of the first major test cases of this Big Data-driven creative strategy. For almost a year, Netflix executives have told us that their detailed knowledge of Netflix subscriber viewing preferences clinched their decision to license a remake of the popular and critically well regarded 1990 BBC miniseries. Netflix’s data indicated that the same subscribers who loved the original BBC production also gobbled down movies starring Kevin Spacey or directed by David Fincher. Therefore, concluded Netflix executives, a remake of the BBC drama with Spacey and Fincher attached was a no-brainer, to the point that the company committed $100 million for two 13-episode seasons.

“We know what people watch on Netflix and we’re able with a high degree of confidence to understand how big a likely audience is for a given show based on people’s viewing habits,” Netflix communications director Jonathan Friedland told Wired in November. “We want to continue to have something for everybody. But as time goes on, we get better at selecting what that something for everybody is that gets high engagement.”

The strategy has advantages that go beyond the assumption of built-in popularity. Netflix also believes it can save big on marketing costs because Netflix’s recommendation engine will do all the heavy lifting. Already, Netflix claims that 75 percent of its subscribers are influenced by what Netflix suggests to subscribers that they will like.

“We don’t have to spend millions to get people to tune into this,” Steve Swasey, Netflix’s V.P. of corporate communications, told GigaOm last March. “Through our algorithms we can determine who might be interested in Kevin Spacey or political drama and say to them, ‘You might want to watch this.’”

And maybe we will. Early reviews for “House of Cards” are promising. It will be fascinating to find out how many people gorge themselves on all 13 episodes this upcoming weekend. (Netflix data shows that’s how we like to consume our TV series now — in great gulps and marathons — so that’s how it will give them to us.) But one does end up wondering: What will the Big Data approach mean for the creative process? If Netflix perfects the job of giving us exactly what we want, when and how will we be exposed to things that are new and different, the movies and TV shows we would never imagine we might like unless given the chance? Can the auteur survive in an age when computer algorithms are the ultimate focus group? And just how many political dramas starring Kevin Spacey can we stand, anyway?

The scope of the data collected by Netflix from its 29 million streaming video subscribers is staggering. Every search you make, every positive or negative rating you give to what you just watched, is piped in along with ratings data from third-party providers like Nielsen. Location data, device data, social media references, bookmarks. Every time a viewer logs on he or she needs to be authenticated. Every movie or TV show also has its own associated licensing data. The logistics involved with handling every bit of information generated by Netflix viewers — and making sense of it — are pure geek wizardry. 

Netflix doesn’t just know that you are more likely to be watching a thriller on Saturday night than on Monday afternoon, but it also knows what you are more likely to be watching on your tablet as compared to your phone or laptop; or what people in a particular ZIP code like to watch on their tablets on a Sunday afternoon. Netflix even tracks how many people start tuning out when the credits start to roll.

Correlating the raw numbers of Kevin Spacey fans who also like David Fincher and have a fondness for British political dramas is just the beginning. Netflix knows enough about what you are watching to judge specific aspects of content as well. Last summer senior data scientist Mohammad Sabah reported at a conference that Netflix was capturing specific screen shots to analyze in-the-moment viewing habits, and the company was “looking to take into account other characteristics.”
What could those characteristics be? GigaOm’s report of the Sabah presentation speculated that “it could make a lot of sense to consider things such as volume, colors and scenery that might give valuable signals about what viewers like.”

Netflix chief content officer Ted Sarandos has said that all that data means that Netflix has a very “addressable audience.” Unlike the traditional broadcast networks or cable companies, Netflix doesn’t have to rely on shoveling content out into the wild and finding out after the fact what audiences want or don’t want. They believe they already know.

Of course, data-centric decisions don’t guarantee hit-making success. Kevin Spacey’s participation isn’t bulletproof (see “Fred Claus”) and even David Fincher can’t claim a perfect record. (“Alien 3,” anyone?) Netflix’s ambition to challenge HBO as a destination for quality original programming will require fabulous craftsmanship to go along with the Big Data filters. All the Big Data in the world can’t rule out, once and for all, the possibility of a bomb.

But that goes without saying. The interesting and potentially troubling question is how a reliance on Big Data might funnel craftsmanship in particular directions. What happens when directors approach the editing room armed with the knowledge that a certain subset of subscribers are opposed to jump cuts or get off on gruesome torture scenes or just want to see blow jobs. Is that all we’ll be offered? We’ve seen what happens when news publications specialize in just delivering online content that maximizes page views. It isn’t always the most edifying spectacle. Do we really want creative decisions about how a show looks and feels to be made according to an algorithm counting how many times we’ve bailed out of other shows?

For years Netflix has been analyzing what we watched last night to suggest movies or TV shows that we might like to watch tomorrow. Now it is using the same formula to prefabricate its own programming to fit what it thinks we will like. Isn’t the inevitable result of this that the creative impulse gets channeled into a pre-built canal?

It’s certainly possible to overstate the case here. One could argue that Netflix’s strategy is only a slightly more sophisticated version of what’s already been in place for, well, forever. We wouldn’t be seeing teenage vampires or zombies every time we turn on the TV if the money that bankrolls the content creation business hadn’t already decided that’s what we want to see. Actors who have the fortune to appear in hit movies or TV show get more parts to play. So what else is new?

But there’s a level of specificity made possible by Big Data that suggests we’re headed into new territory. “House of Cards” is just one symptom of a society-wide shift. The Obama campaign used the same kind of number crunching to target voters with more accuracy than any political campaign had ever accomplished before. Online advertisers are also gathering vast amounts of detailed information about us from our smartphones, our Facebook likes and our Google searches.

The sheer amount of data available to crunch is already phenomenal and is growing at an extraordinary rate. Last summer, at a panel discussion that included several significant players in the emerging Big Data universe, Michael Karasick, a V.P. at IBM Research, estimated that there is “a thousand exabytes of data on the planet anywhere.” An exabyte is one quintillion bytes, or 1,000 gigabytes. That’s a lot of ones and zeroes all by itself, but the mind-boggling part of the equation is that Karasick predicted that just two years from now there will be 9,000-10,000 exabytes of data on the planet.

The companies that figure out how to generate intelligence from that data will know more about us than we know ourselves, and will be able to craft techniques that push us toward where they want us to go, rather than where we would go by ourselves if left to our own devices. I’m guessing this will be good for Netflix’s bottom line, but at what point do we go from being happy subscribers, to mindless puppets?
Andrew Leonard Andrew Leonard is a staff writer at Salon. On Twitter, @koxinga21.