Internet routers hitting 512K limit, some become unreliable
From performance issues at hosting provider Liquid Web to outages at eBay and LastPass, large networks and websites suffered a series of disruptions and outages on Tuesday. Some Internet engineers are blaming the disruptions on a novel technical issue that impacts older Internet routers.
At the heart of the issue, the growth of routable networks on the Internet overwhelmed the amount of memory set aside in infrastructure hardware, typically routers and switches, that determines the appropriate way to route data through the Internet. For the first time, the lists of routable networks—also called border gateway protocol (BGP) tables—surpassed a significant power of two (two to the 19th power or 512K). Many older routers limit their use of a specialized, and expensive, type of memory known as ternary content-addressable memory (TCAM) to 512K by default.
When the tables outgrew the space allotted for them, the routers shut down or slowed.
Hosting provider Liquid Web, for example, blamed an outage on Tuesday on the 512k issue, posting a short statement on Twitter: “As ISP’s have recovered from #512k active bgp routes being reached, many of our customers affected by these carrier issues have regained the ability to reach their sites.” An outage at password service LastPass could partly be blamed on the issue, according to CEO Joe Siegrist. “We suspect the outage was caused by the 512k BGP issue, but we do not have confirmation yet from our provider. There were other factors at play in our particular situation.”
Other outages reportedly impacted eBay, Comcast, and Time-Warner.
The issues appear to be isolated, according to an analysis by network-performance monitoring service Renesys, a subsidiary of Dyn. Because only a small fraction of BGP tables have exceeded 512K entries, most routers have not been impacted by the issue. Each provider’s routing tables are a bit different, as what they count as a legitimate network varies, so only a small percentage of routing tables have reached the 512K limit. Over the coming weeks, as more tables hit that limit, the network problems could multiply, said Jim Cowie, chief scientist at Dyn.
“In terms of looking at the overall stability of the Internet and taking its temperature, we really have not seen its temperature rising,” Cowie said. “But as 512K becomes the norm, as it will in the next few weeks, the temperature will rise a little bit as we find out where all of these (outdated) systems live.”
Older routers have less of a valuable type of memory known as ternary content-addressable memory, or TCAM. Unlike typical RAM, content-addressable memory allows software to provide content and to recall the address of the memory. Such operations are much faster using binary CAM than RAM. Ternary CAM allows an additional field to be used, such as a mask, as a modifier to any look-up operations.
The issue appears to have surprised many network engineers. Network-hardware vendors did not give much warning as to the dangers of the default configuration of older routers, and corporate executives likely put off resolving the issue, said Vess Bakalov, co-founder and CTO of network-monitoring firm SevOne.
“It was a bit of a failure across the board, starting with the vendors, who should have been asking their customers to upgrade sooner,” Bakalov said. “It should not have been something that came out of left field at this stage in the Internet’s life.”
In May, Cisco published an advisory warning of the issue and advising customers of workarounds for the issue in four of its product lines.
Overall, Internet experts do not believe the issue will dramatically impact Internet operations. Only older routers and switches are affected, and most can be reconfigured to assign more memory to routing IPv4 traffic, but at the expense of supporting the next generation of networking, IPv6.
“I would like to say that it will help the adoption of IPv6, because it may bring some attention to the fact that IPv4 addresses are being exhausted, but IPv6 has completely failed to take off,” Cowie said. “If anything, it really shows you that IPv4 has really never been healthier.”