Networking

Management Network

The management network provides connectivity between Service Processors (SPs) and the Management Gateway Service (MGS) via a dedicated network and switch. SPs are present on each major system component (the "Gimlet" server sled, "Sidecar" network switch, and power shelf controller), and communicate on a physically separate network in the cabled backplane.

Each Sidecar includes a dedicated management network switch, which is connected to the main system network via a 10G link to the Tofino ASIC. The Tofino, in turn, exposes data from this link over PCIe to the adjacent Gimlet. MGS runs on this Gimlet, translating higher-level commands into bidirectional communication with the SPs. MGS is then used by higher-level software — including the control plane software (Nexus) and low-level technician CLI (wicket) — to control the rack.

Each SP is connected to both management switches (one in each Sidecar) for high availability. The management switches are configured such that traffic within individual SPs remains isolated from each other. Traffic on the management network will never transit the data plane or leave the rack.

In addition to the SPs, the management network switch has connectivity to two RJ-45 ports on the front of each Sidecar, the "technician ports". These ports are used mainly by Oxide support personnel and occasionally by operators under close supervision of Oxide during rack initialization and subsequent troubleshooting. Access to the technician ports are protected with customer-managed security devices.

Management Gateway Service

Data Network

Data networks (distinct from management networks) in the Oxide Rack are focused on providing a set of customer virtual priate cloud (VPC) networks as well as a boundary services VPC. Virtual machines provisioned to customers are attached to these networks through the Oxide Packet Transformation Engine OPTE. Transit between hypervisors is across in-rack switches. These switches are built on the Tofino 2 ASIC and are controlled by a software stack called "Dendrite" on an adjacent compute unit over external PCIe.

Data Network

The routing protocols in play can be broken down into three tiers:

  1. Overlay: routing in and out of tunnels (RIOT) within a VPC, this mostly takes place within OPTE.

  2. Underlay: delay driven multipath in fat trees (DDM), this takes place on hypervisors and switches.

  3. Boundary: BGP, static or OSPF based on customer needs, this takes place on switches.

How these protocols come together is depicted in the diagram below

Routing Tiers

Packets go directly from VMs into OPTE where they are encapsulated, potentially replicated, and then sent to their destination over the Oxide underlay network. The underlay network is built on the delay driven multipath (DDM) routing protocol. The Oxide routing daemon (Maghemite) contains an implementation of the DDM protocol upper half together with lower halves for Oxide Dendrite and the illumos network stack. Generally speaking a protocol upper half implements the control plane e.g., it’s in charge of exchanging messages with peer routers to understand what network paths are available and sometimes engage in distributed computations to determine what routes are "best". A lower half consumes information from the upper half and is responsible for configuring the underlying data plane substrate, be it an ASIC or a kernel networking stack, to push packets according to the routing tables determined by the upper half. In addition to DDM, Maghemite also has BGP, static and OSPF implementations to support interconnection with customer networks.

Integration with On-premises Network

The initial version of Oxide Rack supports the basic integration scenario in which the default route to the internet is backed by a single gateway router. The Oxide rack has both of its switches connected to the L2 network that the customer gateway is on. Static routes are set up on the Oxide routers to provide a path to the internet, and on customer routers to provide a path to the Oxide rack. An IP-pool is created for the control plane to allocate IP addresses.

In this topology the customer network has two next-hops into the Oxide rack. Bidirectional forwarding detection (BFD) is used for achieving HA connectivity with the rack. In the event that one of the rack switches has gone down due to failure or maintenance, the L2 network traffic can be redirected automatically to and from the other switch.

Basic Network Integration

In future product releases, there will be increased support of dynamic routing protocols such as Border Gateway Protocol (BGP) and more complex topologies.

Integration with On-premises Domain Name Service

The Oxide rack provides a DNS service that manages a subdomain that has been delegated by an upstream customer-managed DNS server. The diagram below shows a basic delegation setup. In this setup, there are two Oxide DNS servers. This is a choice for this example and not a constraint.

Domain Name Service

In the initial release of Oxide rack, only the API and Console endpoints make use of the name service. In future releases, DNS support will be extended to user-created instances, allowing VM instances to be accessible by DNS names auto-generated based on their silo, project, and host names.

Physical/Virtual Packet Transformation

The Oxide Packet Transformation Engine (OPTE) sits between virtual machines and physical network interfaces and runs in each of the server sleds. OPTE serves a wide variety of networking functions such as firewalling, routing, NATing, packet encapsulation and decapsulation. It works off a connection-oriented flow table, rewrites packet source/destination addresses and generates appropriate headers according to the network protocol and boundary context.

With OPTE, use cases such as instance-to-instance, NATing out and ephemeral IP inbound traffic flows can be achieved through a series of per-guest virtual NIC layers of transformations. Each layer looks up existing states (e.g. routing rules) or allocates state (firewall rules) and then pushes on a set of transformations to apply once all of the layers are processed (e.g. NAT or Geneve encapsulation).

Last updated