Is Crowdsourced Analytics Effective?
I came across an interesting question on Quora the other day about the relative lack of growth in crowdsourced
analytics over the past decade. The successes of the GoldCorpChallenge and the NetflixChallenge should have expanded the use of crowdsourced analytics competitions, so why hasn’t there been an explosion
of such competitions? Even the participants describe the competitions as a
goldmine for the analytical techniques developed, their use still hasn't taken off.
Although his team did not win, the CEO ofdata analytics company Opera Solutions said the company got "a $10 million payoff internally from what we’ve learned” by using improved modeling and analysis techniques it created for the contest with its paying clients. And Netflix got a pretty sweet return on its investment. Companies like Netflix could expect to pay $1 million to hire five researchers for a year. As Darren Vengroff, a former lead researcher for Amazon's recommendation engine, said in a Forbes article, Netflix "spent the same amount and got thousands, probably millions of engineer-years.
Emergence of Competitive Markets and Evidence of Growth
Some readers may point to the HeritageHealth Prize’s $3 million
reward as evidence that crowdsourcing is growing, but it is only the third
multi-million dollar award offered in the past ten years. The following sites
now advertise themselves as marketplaces for crowdsourced talent and display
more than a hundred competitions in total, but crowdsourced analytics haven’t
caught fire as a discipline. Procter & Gamble uses Innocentive, Kaggleis used by Ford and Microsoft, so what is holding back the
marketplace?
Restraints on the Growth of Crowdsourced Analytics
Prestige and Marketing: The winner of the Goldcorp Challenge earned
less in prize money than they spent to win the competition. They were more
interested in the free publicity it would generate for their startup business,
and wanted to use their victory as marketing collateral. Similarly, the runner
up in the Netflix Challenge used their success to launch Opera Analytics, a
consultancy focused on big data, which now leads the scoreboard in the Heritage
Health Prize. Such prestige only results from interesting, well funded
competitions with many participants which demands large prizes.
To
generate interest from top talent, the prize amount must create lots of
publicity.
Prize
Money and ROI: Companies are
attracted to crowdsourcing by the obscene ROI, and they can
get acceptable improvements for much smaller price amounts. The prizes offered
on Kaggle and other sites are 95-99% smaller than the GoldCorp Challenge and
99% smaller than the Heritage Health Prize, ranging from $10,000-$30,000. At
this size value, participants aren’t attracted by the chance of winning unless
they’re living in emerging economies. The prize amount determines whether the best data scientist in the world will partake or whether it will attract just the hobbyists. Companies earn a better ROI on small
prize values, and may be content with the quality of results they receive so
effort on a grander scale is not required.
Access to
Proprietary Data and Creation of Intellectual Property: Like the
GoldCorp Challenge, the winners of the Netflix Challenge earned less in prize
money than they spent to win the competition. AT+T Labs dedicated three data scientists to the Netflix
Challenge for three years because the resulting intellectual property was
valuable to them and not because the project was profitable.
Legal
Risks: Netflix was ready to sponsor a second challenge, but reconsidered
after a class-action lawsuit began against them for disclosing customer information.
It was even more problematic after security researchers de-anonymized some
customer records by cross-referencing data from the Netflix Challenge with an
unaffiliated IMDB-like rating website.
Most
Winners Are Corporate Sponsored: The biggest competitions were won, or
appear they will be won by corporate-sponsored talent. If Opera Analytics
invests to win every notable competition as a marketing tactic, it discourages
non-corporate participation in the future.
Winner’s
Curse: The competition rules usually make the winner give a free license
to the sponsor (if not the patent). The runner up retains exclusive rights to
their intellectual property though. The question then is,
“Why not deliberately be the runner up to retain patent rights?”
Competitive
Differentiation: Since the competition is public and your
competitors could simply buy the analytical method from the participants
afterward, it doesn't create a competitive advantage. Accordingly, companies can’t differentiate themselves with crowdsourced
analytics. At best, a company can increase their ROI and remove
analytics from the arena as a competitive advantage. Crowdsourced
analytics is great for destroying a competitors’ analytical
advantage, but doesn't necessarily create one.