China’s Alibaba Just Beat the US in a Global Machine Battle
Each year, Jim Gray held a battle of the machines.
This was a battle of speed and time and energy, and it involved some of the top minds in the world of hardcore computer science. Who could build a system that could analyze the most data in 60 seconds? Who could sort 100 terabytes the quickest? Who could sort 100 terabytes—aka 100,000 gigabytes—using the least amount of electricity?
Gray—the legendary computer scientist who won the Turing Award for his work with computer databases—was lost at sea in 2007, mourned across the computer science community and beyond. But in the years since, others have continued his battle of the machines. Today, as we move so rapidly into the age of cloud computing, this competition doesn’t just pit one machine against another. It pits an army of machines against so many other armies.
In recent years, researchers at Microsoft—where Gray was working when he died—have topped several of these contests. Last year, a top prize went to a team that includes one of the top engineers at Google. Researchers from the University of California at Berkeley have also fared well. But this year, there was a new winner: Alicloud, which sorted 100 terabytes of data in a mere six-and‐a-half minutes, abusing the previous record of 23-and-a-half minutes.
Alicloud, or Aliyun, is the cloud computing arm of Chinese tech giant Alibaba. It’s analogous to Amazon Web Services or Microsoft Azure or the Google Cloud Platform. It serves up a sweeping set of online services where any company or independent coder can build and run websites, smartphone apps, and virtually any other software—without setting up hardware in their own data center.
Such “public cloud” services represent the future of information technology. A new report from research outfit Forrester deems the public cloud a “hyper-growth market,” predicting that this market will grow to $191 billion by 2020. Here in the States, Amazon is the king of cloud computing, with revenues of about $6 billion a year, and the two big challengers are Microsoft and Google. But these are hardly the only players. A New York-based upstart called Digital Ocean is challenging the big names, and Alicloud is very much on the rise in China.
The company’s recent victory on the GraySort benchmark—where systems compete to sort 100 terabytes in the shortest amount of time—is merely a sideshow in its larger evolution. But the win shows that Alicloud has the engineers and the desire and, well, the hardware to compete in this rapidly growing market. Alicloud is following in the footsteps of Amazon and Microsoft and Google, and at least in China, it’s intent on eclipsing these American giants.
Amazon and Microsoft offer their own cloud computing services in China, serving them up through local partners, due to local government restrictions. But as Alicloud chief architect Hong Tang will tell you, his company is the market’s dominant player.
According to Tang, the company’s infrastructure spans “hundreds of thousands” of machines. It serves about 1.8 million customers. And the company’s revenues now top $100 million a year. He acknowledges that this is small compared to Amazon’s overall numbers. But judging from independent data compiled by the UK-based research outfit Netcraft, Alicloud is growing at a remarkable rate. According to Netcraft, it now houses more public websites than all but three other operations on earth—and more than any other outfit in China. Microsoft claims a total of about 50,000 Azure customers in the country.
Google’s Chinese Twin
Alicloud didn’t just top the GraySort competition. It also took the gold in the MinuteSort, organizing 7.7 terabytes of data in the allotted 60 seconds. It did both using a data-crunching program it calls FuxiSort. Tang and his team built this tool from scratch, in the C++ programming language. It’s (roughly) analogous to Hadoop, the open source standard for crunching data across dozens, hundreds, or even thousands of machines.
But, says George Porter, an assistant professor of computer science at the University of California, San Diego, who has reviewed Alicloud’s public paper on FuxiSort, the software is designed to use computing power more efficiently, to use available hardware to the fullest. According to Porter, FuxiSort seems to operate much like TritonSort, a platform he developed alongside Googlers Michael Conley and Amin Vahdat, the man who oversees Google’s worldwide computer network. TritonSort topped the GraySort competition last year, alongside a system based on an open source tool called Spark.
Porter points out, however, that FuxiSort took the prize this year in part because it used so many more machines than TritonSort (about 3,100 processors versus only 186 processors). “They were 3.6 times faster than we were,” Porter says. “But they used almost 17 times more servers.” He says that he and his team only had access to so many of highest power machines on Amazon’s cloud service, whereas Alicloud could draw on a much larger number of high-powered machines via its own cloud service.
In other words, Alicloud has not just the software but the hardware needed to compete in the larger market. And that’s the larger point. It’s not in the business of winning benchmark competitions. It’s in the business of selling access to computing power and online software.
With this in mind, could something like FuxiSort prove useful in the marketplace? Perhaps. According to Porter, it’s particularly well suited to crunching data across a relatively small number of machines. This could help small organizations with small pockets. “There’s a lot of people that want to do Big Data processing on a smaller scale,” Porter says. “It would be great if they had access to this Big Data computing but with much fewer resources. It would democratize [the technology], make it available to a much larger group of people.”
Even Hong Tang will tell you that Alicloud is imitating the American cloud giants. “We’ve built a very general, large scale cloud computing infrastructure,” he says, “very similar to Google’s infrastructure.” The Google infrastructure, you see, is the ideal all other cloud companies aspire to. But it was Amazon who created the cloud market in realizing it should offer its infrastructure to the rest of world via the Internet. And when Alicloud launched its own cloud service back in 2011, it was really imitating Jeff Bezos and company.
Like Amazon and Google and Microsoft, Alicloud offers raw computing power and data storage space as well as a wide range of pre-built software, including data analysis tools akin to FuxiSort. These services are a way for companies to run their businesses without having to built too much infrastructure on their own.
Tang studied at the University of California Santa Barbara and later worked at Yahoo, whose role in the rise of cloud computing is under-appreciated. “Yahoo’s has been really innovative in the Big Data space,” says Porter. “Not only have they built some interesting products. They’ve been really active in creating communities around those products.” Now, under Tang’s leadership, Alicloud is very much a part of that same movement, alongside Amazon and Google and Microsoft as well as Yahoo. It has a GraySort trophy to prove it.
Read this article: