Monday, May 9, 2011

The $1,000,000 Netflix Challenge

Netflix " has more than 7 million subscribers, who have the option to rate movies on a scale of 1 to 5. To encourage users to keep their subscriptions active, Netflix rolled out Cinematch, which used those ratings to help customers find new movies they'd like. When a user logs in, the service suggests "Movies You'll Love" — a list of films that the algorithm guesses will get a high rating from that particular user." To improve customer recommendations they offered a $1 million prize in 2006 to the first team to create an algorithm 10% more accurate than Cinematch. After thousands of hours of crowdsourced labor were invested in this challenge, Netflix paid 7 statisticians, machine-learning experts and engineers $1 million dollars for an algorithm on September 21, 2009. This may not surprise you after learning that they were led by AT+T Research engineers and worked on the algorithm for three years, but this does beg the question, "What did they create and why was it so valuable?"

The Netflix Challenge

Netflix views their business as more than renting videos to customers, " we create demand for content and we help you find great movies that you'll really like." Given this business model they created the Netflix Challenge. The competition began with hundreds of competitors trying their hand at recommendation system algorithms. After a year, progress slowed to a crawl with entries only halfway to their goal, when an unemployed psychologist/engineer quickly rose to the top 5 by using psychological principles to improve his algorithm. After little improvement for months, the top two teams combined and cleared the winning threshold. This triggered a 30-day scramble by other teams to beat the winning algorithm prior to contest closure. Thirty lower ranked teams then combined to become 'The Ensemble' and turn in better results than the top team. "By pooling a larger number of competitors — all of whose algorithms performed more poorly than those of the top two teams — The Ensemble produced the best results. It’s a profound lesson in the power of the crowd." The Ensemble ultimately lost due to the award criteria, but the competition is rife with strategic implications that will reverberate for years.

Data Mining Insights (How The Teams Created Better Recommendations)

  • Temporal Effects: "viewers in general tend to rate movies differently on Fridays versus Mondays, and certain users are in good moods on Sundays, and so on."
  • Rating Immediately vs. By Memory: "As it turns out, people who rate a whole slew of movies at one time tend to be rating movies they saw a long time ago. The data showed that people employ different criteria to rate movies they saw a long time ago, as opposed to ones they saw recently — and that in addition, some movies age better than others, skewing either up or down over time."
  • Anchoring: "If a customer watches three movies in a row that merit four stars — say, the Star Wars trilogy — and then sees one that's a bit better — say, Blade Runner — they'll likely give the last movie five stars. But if they started the week with one-star stinkers like the Star Wars prequels, Blade Runner might get only a 4 or even a 3. Anchoring suggests that rating systems need to take account of inertia — a user who has recently given a lot of above-average ratings is likely to continue to do so."
  • Customer Profiling: Classifying viewer preferences into "low brow" and "high brow", or by languages spoken, preference for modern movies or classics, etc.
  • Nearest Neighbor Classification: People like movies that are similar to other movies they like. (Visualization of 5000 movies provided)
  • Weight Recent Rankings and Movie Selections More Heavily: Recent ratings are better indications of tastes than old ratings. For example, movies watched during college may not reflect your tastes as a young professional.
Benefits and Quantification
"Netflix has said that winning a 10 percent improvement on its recommendation algorithm for $1 million would be a tremendous bargain."
  • Lock-in Effect: After a customer has rated a thousand movies, Netflix is generating finely tuned recommendations and doesn't recommend previously watched movies. Will the user reenter those ratings on a different website that can't generate better results?
  • Generates Better Return on DVD Inventory:  From The Long Tail "Historically, Blockbuster has reported that about 90% of the movies they rent are new theatrical releases. Online they're more niche: about 70% of what they rent from their website is new releases and about 30% is back catalog. That's not true for Netflix. About 30% of what [Netflix] rents is new releases and about 70% is back catalog and its not because we have a different subscriber." This allows Netflix to advertise unused inventory to generate returns on existing investments.
  • Reduces Stock Outs and Long Wait Times: By spreading customer demand over a broader variety and number of titles, they eliminate spikes in customer demand for a single title and thereby avoid long times.
  • Increases Revenues: Encourages household accounts to change to individual accounts so that their recommendations are personalized. Also encourages migration from the streaming subscription to the more expensive "streaming and snail mail" package. After all, what good are your insightful recommendations if you can't access the non-streaming ones?
  • Customer Experience Enhancement: Never watch a bad movie again. Titles delivered to you.
  • Customer Loyalty: Better recommendations means fewer customer defections to competitors (Blockbuster or Redbox).

CrowdSourcing Commentary
This challenge may be the best proof ever of the wisdom of crowds.
"the Netflix Prize competition has proffered hard proof of a basic crowdsourcing concept: Better solutions come from unorganized people who are allowed to organize organically . But something else happened that wasn’t entirely expected: Teams that had it basically wrong — but for a few good ideas — made the difference when combined with teams which had it basically right, but couldn’t close the deal on their own." "Ir onically, the most outlying approaches — the ones farthest away from the mainstream way to solve a given problem — proved most helpful towards the end of the contest, as the teams neared the summit."

Team Calibre Commentary

One article pointed out that crowdsourcing attracts talent that is otherwise unavailable because they always have jobs. The rather formidable winning team included 3 Research Engineers from AT+T Labs, a Senior Scientist from Yahoo Research in Israel, two machine learning consultants from Austria, and two Canadian engineers from the University of Toronto. Ignoring this team's credentials though, remember that there were thousands of such teams competing from around the world.

Intellectual Property Commentary

At first I was dumbfounded that Netflix only required a license to the algorithm at the end of the competition. They didn't require all intellectual property relating to it! I later realized that if delivery was required of the winning team though, then winning second place (and avoiding the hand-it-over requirement) becomes the optimal strategy for participants. Accordingly, Netflix let participants retain the intellectual property, which then interested AT+T enough to dedicate three scientists for years.

"This Psychologist Might Outsmart the Math Brains Competing for the Netflix Prize." By Jordan Ellenberg. Wired Magazine Issue 16-03. 25FEB2008.

 The Long Tail. by Chris Anderson. Page 109. Copyright 2006. Hyperion Books, New York, NY.

"$1 Million Netflix Prize So Close, They Can Taste It." By Eliot Van Buskirk. 17JUN2009.

"Winning Teams Join to Qualify for $1 Million Netflix Prize." By Eliot Van Buskirk 26JUN2009.

"How the Netflix Prize Was Won." By Eliot Van Buskirk. 22SEP2009.

"Netflix Prize: It Ain’t Over ’til It’s Over." By. Eliot Van Buskirk. 12AUG2009.