Facebook Just Proved It Isn’t Hooli From Silicon Valley
Yann Collet is the real-life Richard Hendricks. Which means he isn’t like Richard Hendricks at all.
Richard Hendricks is that fictional computer programmer at the heart of Silicon Valley, HBO’s rollicking parody of the Northern California tech scene. As the show begins, he creates a new data-compression algorithm—a way of squeezing text, software, sounds, photos, and videos into far smaller digital packages—and pretty soon, a giant Internet company called Hooli is battling for control of this creation. With a better data compression algorithm, you see, Hooli can pack more data into fewer machines and send it across the Internet at faster speeds. That means Hooli can save lots of money. And if it sells the algorithm to others, it can make lots of money too.
Like Richard Hendricks, Yann Collet is what you might call a data-compression genius. He once worked in the marketing department at Orange, the French equivalent of AT&T, but in his spare time, he built compression algorithms. He created one called LZ4, and pretty soon, it caught the eye of a giant Internet company. In the summer of 2015, Facebook hired Collet, moving him from Paris to its headquarters in Menlo Park, and he continued work on his new algorithm, Zstandard. But Facebook isn’t keeping this algorithm to itself. It’s not trying to create a product and sell it for beaucoup bucks. It’s giving the code away.
Silicon Valley gets so many things right about Silicon Valley—that’s part of its unique charm—and one thing it gets right is that data compression is enormously important to the operation of the Internet. But the real giants of the Internet view compression quite differently from the folks at Hooli. In the modern age, they don’t nurture this kind of fundamental technology behind closed doors and then sell for a profit. They open source the code, allowing anyone to use it and even modify it. In the end, this is more valuable than the money they could make by selling a product. It can streamline the operation of the Internet as a whole, and if that happens, the wider world of software engineers will improve the technology in ways no single company ever could on its own. That’s why Facebook is giving away Zstandard.
Today, the company open sourced the first official version of Zstandard, a particularly fast data-compression algorithm. The moment is largely symbolic—earlier “beta” versions of Zstandard were already open source—but the symbol is important. This is how the company generally operates, freely sharing the software and even the hardware designs that underpin its online empire, so that it can feed the evolution of the Internet as a whole. If the Internet is healthier, the thinking goes, so too is Facebook. The company’s hope is that Zstandard will live up to its name, that it will become a standard way of compressing files, that the rest of the industry will work to expand and improve it. “We need strong tools, and by open sourcing this compression algorithm, we make it strong,” Collet says.
Facebook is hardly alone. Open source software is now fundamental to the Internet, and open source hardware is finding a role as well. In the Valley, open source is the norm for operating systems, databases, web serving software, AI engines, and, yes, compression algorithms. Recently, both Apple and Google open sourced their own super-fast compression tools, hoping to streamline the Internet in ways Zstandard does not.
One of the reasons to open source a data compression algorithm is that if everyone uses it, it becomes easier to use. If one system sends a compressed file to some other system, it can decompress the data and open it up. “Imagine if the English language was jealously guarded. We wouldn’t be able to use it to communicate,” says Daniel Horn, an engineer at file-sharing startup Dropbox who works on compression. “Compression becomes very valuable if people agree on it.” That’s what Google hopes to engender with its open source algorithm, Brotli. It wants a new compression standard for web browsers, so that any website can more quickly deliver data to people everywhere. If you run the world’s largest Internet search engine, that is a very a good thing. Ultimately, it can even boost the bottom line.
According to Facebook vice president of engineering Jay Parikh, Facebook is already using Zstandard in parts of its own online empire, and it plans on gradually expanding its use. Zstandard is a “lossless” compression standard—meaning the algorithm can compress and decompress without losing even tiny pieces of the data—-and it can decompress at unusually fast speeds. As Parikh explains, this saves CPU power. And since Facebook’s data is spread across thousands upon thousands of machines, that’s a big deal. “Give the scale we’re operating at,” Parikh says, “we really want to improve the state of the art.”
But Parikh and Collet want this tool to improve even more, and that’s why they’re open sourcing it. Yes, there are other open source algorithms that improve on the current state of lossless compression, including Brotli. But Brotli is designed for data shipped to and from web browsers. Zstandard, Collet says, is designed for the world of apps. Companies and coders can use it in almost any situation.
That said, Zstandard is best used with text and software files, not photos or videos. The reality is that Internet photos and videos are already compressed in a way that doesn’t lend itself to additional lossless compression. On Silicon Valley, that’s why Hooli wants the Hendricks algorithm for itself: the code does the previously impossible. You could argue that if a real world algorithm cracked lossless video compression in much the same, a real world Hooli would want the code for itself too. After all, video takes up so much more space than text, and it’s the future of the Internet. But your argument may not hold up.
At a recent Dropbox hack week, Daniel Horn and other engineers built a system that shows how a Hendrickian compression tool just might be possible. And they open sourced it. Meanwhile, Collet says parts of Zstandard could eventually lead to a system suited to photo and video. And Zstandard is open source too. “Every human work is a work in progress,” says Horn. “What if someone can pick up the torch and make something even better?”