Wednesday, March 14, 2012

Crowdsourcing Analytics: Restraints On Growth


Is Crowdsourced Analytics Effective?


I came across an interesting question on Quora the other day about the relative lack of growth in crowdsourced analytics over the past decade. The successes of the GoldCorpChallenge  and the NetflixChallenge should have expanded the use of crowdsourced analytics competitions, so why hasn’t there been an explosion of such competitions? Even the participants describe the competitions as a goldmine for the analytical techniques developed, their use still hasn't taken off.
Although his team did not win,  the CEO ofdata analytics company Opera Solutions said the company got "a $10 million payoff internally from what we’ve learned” by using improved modeling and analysis techniques it created for the contest with its paying clients. And Netflix got a pretty sweet return on its investment. Companies like Netflix could expect to pay $1 million to hire five researchers for a year. As Darren Vengroff, a former lead researcher for Amazon's recommendation engine, said in a Forbes article, Netflix "spent the same amount and got thousands, probably millions of engineer-years.

Emergence of Competitive Markets and Evidence of Growth


Some readers may point to the HeritageHealth Prize’s  $3 million reward as evidence that crowdsourcing is growing, but it is only the third multi-million dollar award offered in the past ten years. The following sites now advertise themselves as marketplaces for crowdsourced talent and display more than a hundred competitions in total, but crowdsourced analytics haven’t caught fire as a discipline. Procter & Gamble uses Innocentive, Kaggleis used by Ford and Microsoft, so what is holding back the marketplace?



Restraints on the Growth of Crowdsourced Analytics


Prestige and Marketing: The winner of the Goldcorp Challenge earned less in prize money than they spent to win the competition. They were more interested in the free publicity it would generate for their startup business, and wanted to use their victory as marketing collateral. Similarly, the runner up in the Netflix Challenge used their success to launch Opera Analytics, a consultancy focused on big data, which now leads the scoreboard in the Heritage Health Prize. Such prestige only results from interesting, well funded competitions with many participants which demands large prizes.

To generate interest from top talent, the prize amount must create lots of publicity.

Prize Money and ROI: Companies are attracted to crowdsourcing by the obscene ROI, and they can get acceptable improvements for much smaller price amounts. The prizes offered on Kaggle and other sites are 95-99% smaller than the GoldCorp Challenge and 99% smaller than the Heritage Health Prize, ranging from $10,000-$30,000. At this size value, participants aren’t attracted by the chance of winning unless they’re living in emerging economies. The prize amount determines whether the best data scientist in the world will partake or whether it will attract just the hobbyists. Companies earn a better ROI on small prize values, and may be content with the quality of results they receive so effort on a grander scale is not required. 

Access to Proprietary Data and Creation of Intellectual Property: Like the GoldCorp Challenge, the winners of the Netflix Challenge earned less in prize money than they spent to win the competition. AT+T Labs dedicated  three data scientists to the Netflix Challenge for three years because the resulting intellectual property was valuable to them and not because the project was profitable.

Legal Risks: Netflix was ready to sponsor a second challenge, but reconsidered after a class-action lawsuit began against them for disclosing customer information. It was even more problematic after security researchers de-anonymized some customer records by cross-referencing data from the Netflix Challenge with an unaffiliated IMDB-like rating website.

Most Winners Are Corporate Sponsored: The biggest competitions were won, or appear they will be won by corporate-sponsored talent. If Opera Analytics invests to win every notable competition as a marketing tactic, it discourages non-corporate participation in the future.

Winner’s Curse: The competition rules usually make the winner give a free license to the sponsor (if not the patent). The runner up retains exclusive rights to their intellectual property though. The question then is,
“Why not deliberately be the runner up to retain patent rights?”

Competitive Differentiation: Since the competition is public and your competitors could simply buy the analytical method from the participants afterward, it doesn't create a competitive advantage. Accordingly, companies can’t differentiate themselves with crowdsourced analytics. At best, a company can increase their ROI and remove analytics from the arena as a competitive advantage. Crowdsourced analytics is great for destroying a competitors’ analytical advantage, but doesn't necessarily create one.