OpenKilda: A Scalable Open Source SDN Controller

SDN became one of the most discussed issues in the industry going back as far as 20 years and the benefits of its core promise – separating the control plane from the data plane – have now been largely realized (if not implemented) since.  Industry titans such as Google, Facebook, and Microsoft, among others, have long shown the advantages of implementing SDN (Software Defined Network) in local area and data center networks.  But successfully using SDN across Communication Service Provider (CSP) networks and wider area networks requires a control plane that performs at an extremely high level with negligible failures, a model that is scalable and one that has high availability.

The nagging challenge is that the three facets – performance, availability, and scalability – are closely intertwined.  The luxury of addressing each in isolation without making some concessions or compromises to the others has not been a realistic option.  Scalability seems to have been the one dimension that suffered the most at the expense of the others.  But a new open source SDN controller called OpenKilda provides a solution that finally addresses the conundrum.  OpenKilda’s primary objective is to provide CSPs around the world total Software-Defined Networking command and control.

Why OpenKilda was created

With the great strides made by major open source projects like ONOS, Open Daylight, and others, why pursue another open source SDN Controller option? Scalability is the chief reason. As concentration was focused upon centralizing control of the network, current open source – as well as proprietary SDN controllers – and their developers seem to have neglected to keep in mind that distributing functionality and the associated data processing requirements is the best way to handle large network footprints.  Many software developers of highly scalable applications have learned this over time.

As a result of this oversight – and as the application of SDN has extended to geographically larger and more complex networks – the scalability of SDN controllers has mushroomed into a serious problem that has demanded attention.  The use of controllers has expanded into true SD-WANs which are distributed over broader geographic areas, utilizing more switches and spanning across more providers.  This expansion, lack of uniformity in providers and latency caused by expanding geographic coverage has naturally resulted in less reliable control networks, and problems inevitably have begun to arise.

Another issue was that a network of hundreds or even thousands of switches (rather than tens of switches or labs) and the resulting increased volume of the flows of telemetry and the additional required processing can overwhelm clusters when a single, centralized instance is required to respond to many changes in topologies and status. We were also beginning to see unacceptable, cascading network outages due to flows being dropped and in subsequent need of reprogramming, causing a vicious cycle of playing catchup between the controller and switches needing full reprogramming.

The OpenKilda project was a reaction to these and other lessons learned.  As noted by the contributors on the OpenKilda website:

“OpenKilda solves the problem of latency while providing a scalable SDN control & data-plane and end-to-end flow telemetry. OpenKilda is a scalable SDN Controller, architected from the ground up from web-scale technologies. OpenKilda solves the scalability challenge other SDN controllers face and was built to manage the unreliable control plane which can traverse across multiple carriers over long distances.”

Past technologies have made fantastic headway and undoubtedly were instrumental in ushering in the advancements of SDN, but small cluster control and telemetry management simply will not meet the current demands that continue to grow exponentially.  The scalability of a controller is defined by the changes in throughput and latency when adding more switches and hosts to the network or the ability to add more CPU to servers where the controller runs.  A network with more switches, with more flows, and one hampered by latency because it spans a greater geography are all drivers for the need of increased scalability capabilities in SDN controllers.

The construction of OpenKilda from the onset has been determined to use the instruments of big data and distributed processing to address scaling issues.  Its architecture has the capability to control a network of 100,000 switches with 16 million flows – an extraordinary leap forward by any standards.  Additionally, its operations and products will benefit in terms of telemetry, latency, and self-healing from the big data and distributed processing we have brought to bear in the development of OpenKilda.

Why CloudSmartz is involved with OpenKilda

CloudSmartz is proud to be a strategic and prime implementation partner with OpenKilda in bringing scalability to the control plane dilemma – as well as delivering pronounced gains in telemetry, network state, self-healing, and GUI.  In addition to these advancements, OpenKilda also includes a Path Computation Engine to deliver dynamic customer provisioning.  Our next post in the series will delve deeper into each of these features and how they promise to help CSPs make the digital transformation more quickly, more cost-effectively, and more seamlessly while protecting legacy investment – which has been the guiding principle of CloudSmartz since our inception.