How Netflix Manages Security in the Age of DevOps

By Rachael King

At Netflix Inc.NFLX -1.21%, engineers deploy code thousands of times per day. To maintain this pace, the streaming media company uses a method of software development that promotes close collaboration between engineers and product deployment called DevOps.

It’s a pace that’s not for the faint of heart, particularly when it comes to security. Netflix has more than 62 million subscriber accounts to protect worldwide. “The majority of security practitioners view DevOps as a huge threat,” said Gene Kim, co-author of a book about IT and DevOps called “The Phoenix Project.” But companies such as Netflix are using automation and other approaches to mitigate potential security problems while maintaining high velocity, he said.

It’s a problem Jason Chan wrestled with when he began working at Netflix in April 2011 as a cloud security architect and started building a team. The IT environment was starting to get quite big and the company was moving fast. “The only realistic way of maintaining security in an environment that grows so rapidly and changes so quickly is to make it automation first,” said Mr. Chan, who is now director of engineering in cloud security at the company.

Netflix has since introduced a variety of security tools such as Security Monkey, Scumblr and Fully Integrated Defense Operation that automate everything from finding compromised subscriber accounts to responding to security incidents. “When you move faster, it’s just logical that the ability for humans to operate effectively diminishes,” he said.

Security Monkey is a system that Netflix built and open sourced in June 2014 that looks internally at the security of configurations. Since Netflix operates its infrastructure within Amazon Web Services, the tool continuously monitors and tracks AWS security configurations. It has a rules engine that can let Netflix know when things change and someone needs to look at it. It might be that a developer created a firewall rule that allowed access from a suspicious IP address or created an access control policy on a storage resource that might provide world readable access, he said.

There are three components to Security Monkey. Watcher monitors a given AWS account and technology for changes to configurations. The second component, Notifier, lets teams know when something has changed. The third piece, Auditor, determines the level of risk associated with a particular configuration by running a set of business rules against each configuration. “There’s a continuum of changes and some of those changes are benign and happen all the time and then there’s a small percentage of those changes that we need to look at more closely and have a human get involved,” said Mr. Chan.

Another automated intelligence search tool, Scumblr, can search websites such as Pastebin for leaked names and passwords or look for compromised Netflix accounts that criminals try to sell on eBay. It then reports back findings to Netflix so they can help customers regain control of their accounts. Scumblr is essentially a Web application that lets Netflix create searches of sites like Google, Facebook, Twitter and eBay that can run automatically, similar to Google Alerts. Scumblr works with a tool called Sketchy that collects screenshots and test content from potentially malicious sites. Sketchy lets engineers collect data while isolating their systems from getting infected by malicious software that may occur on these sites.

Netflix has also begun to automate incident response with a system called Fully Integrated Defense Operation that it open sourced May 4. The system can automatically analyze and prioritize security events depending on the severity. In some cases, it can even automatically isolate problems like disabling an employee account that has been compromised by malicious software.

When FIDO receives a security alert from a firewall or intrusion detection system, the system tries to find more context about what’s happening. It will check internal systems to see if something is targeting an executive, domain administrator or the PCI zone, the most secure part of the network that handles financial transactions. The system also checks with outside threat information to determine if it’s a false positive or a more pervasive problem. FIDO then correlates that information, scores the threat and then takes further action, whether it’s emailing an engineer or disabling an employee account.

Automation is important to flag potential security problems that developers might miss, said Tom Pageler, chief risk officer of DocuSign Inc. But DevOps primarily is an organizational approach, not a technology-based one and for that reason, any security model must include what he calls ‘security champions,’ development team members assigned to check each other’s work, said Mr. Pageler, who was formerly a deputy chief information security officer at JPMorgan Chase & Co. The security and DevOps teams are very connected, with security people assigned to the development team and vice versa, said Mr. Pageler.

Wal-Mart Stores Inc., which has been steadily moving to a more agile approach to software development, follows a similar approach. It created a program called “security mavens” to increase security awareness among development teams. “In order for infosec and agile to be effective in an organization, you can’t have it locked up with a few people or a few departments that are narrowly looking at their portfolio of work,” said Julie Tsai, director of engineering in information security at Walmart Global eCommerce, speaking at the DevOps Enterprise Summit in October 2014. Instead, it needs to be embodied in people’s everyday practices, she said.

By offering to help subsidize a security certification, the company was able to attract workers from developers to product managers to quality assurance to become security mavens. About 10% of its engineering organization is now comprised of security mavens, said Kamal Manglani, agile lead at Walmart Global eCommerce, speaking at the same conference.

Over the past couple years, as the retailer has promoted application security, there has been a 92% reduction in security defects, said Ms. Tsai. “There is no way we would have achieved this without being able to get in at the root level, where developers were creating the code and threat modeling and internalizing these things,” she said.

“We have to make humans more effective via automated decision making, automated data gathering and analysis,” said Mr. Chan at Netflix. Security is a field where there’s a lot of judgments and decisions that people need to make based on a malicious actor and they may change their tactics in real-time, he said. “You really need to help get what’s most important in front of people as quickly and easily as possible, so you’re using your human resources as effectively as possible,” he added.