Baidu, China's leading internet search engine, has released some of its AI (artificial intelligence) code less than a week after former Google CEO Eric Schmidt said technology companies need to start working together on AI if humans want to get the most out of machines.
Schmidt, now executive chairman of Alphabet, Google's parent company, claimed last Monday that AI has the potential to fix some of the world’s "hard problems," including population growth, climate change, human development, and education. In order for this to happen, however, he stressed that companies need to start working together on AI and publish their AI breakthroughs to the academic community.
The now-public code has been used to build a Baidu speech-recognition system called Deep Speech 2, which can recognise certain short sentences better than humans. It's useful technology for Baidu because the company's many millions of customers often prefer to engage with Baidu services using their voice as typing Chinese characters into a smartphone can be difficult.
Baidu's "Warp-CTC" tool can plug into existing machine learning frameworks being developed by startups and other companies to significantly speed up their AI development efforts. MIT Technology Review reports that a machine learning startup called Nervana, which offers a deep-learning framework to companies that don't have the know-how or resources to develop their own, is already using Warp-CTC in its software.
Yahoo data dump
Last Thursday Yahoo gave machine learning scientists access to a huge dataset in a bid to help them develop computer programs that can think and learn for themselves.
"Data is the life-blood of research in machine learning," said Suju Rajan, director of personalisation science at Yahoo Labs. "However, access to truly large-scale datasets is a privilege that has been traditionally reserved for machine learning researchers and data scientists working at large companies — and out of reach for most academic researchers."
The dataset is a collection of anonymised user interactions with the news feeds on websites like Yahoo News and Yahoo Sports. Yahoo says there are 110 billion events in the 13.5 terabyte file, which is more than 10 times the size of the previous largest dataset released.
Google and Facebook have also published AI code, research, and datasets that help machine learning scientists.
China's internet giants have been slower off the mark, possibly because they see their code as important intellectual property that gives them a competitive advantage over their rivals.