This document describes how an Oxide rack is connected to a broader network. The following is a high-level checklist of things to consider when planning for the oxide rack.
Need to connect between 2 and 4 RJ45-based Ethernet cables to the rack for management, with at least one connection for each switch.
The management network requires IPv6.
Decide what type of transceiver to use for the data network from the supported transcievers list.
Set up management and data network firewall rules.
Plan out a set of names and addresses to assign to the rack.
Plan a routing strategy between the rack and the broader network.
This guide will go over the basics of physical network setup along with guidance on how to set up the management and data networks to help inform these considerations.
Physical Setup
The Oxide rack has two middle-of-rack switches. Here’s one below.
Each switch has 32 QSFP-compatible ports that can operate at either 40 Gbit/s (QSFP), 100 Gbit/s (QSFP28), or 200 Gbit/s (QSFP56). There are also two RJ45 ports that will be referred to as technician ports. The technician ports are 10/100/1000BASE-T capable.
The following table shows supported QSFP transceiver types.
Type | Fiber | Strands |
---|---|---|
40GBASE-LR4 | Single-Mode | 2 (LC) |
100GBASE-CWDM4 | Single-Mode | 2 (LC) |
100GBASE-LR4 | Single-Mode | 2 (LC) |
100GBASE-SR4 | Multi-Mode | 8 (MPO/MTP) |
100GBASE-SR-BiDi | Multi-Mode | 2 (LC) |
100GBASE-FR1 | Single-Mode | 2 (LC) |
200GBASE-FR4 | Single-Mode | 2 (LC) |
Management Network
The management network is how the rack is first set up and is accessible through the technician ports. The management network provides access to:
A terminal UI program called
wicket
that allows administrators to:Provide an initial configuration for the rack and bring the system online.
Perform out-of-band system updates.
A support shell that allows Oxide support engineers to troubleshoot low-level system issues.
The details of using these programs are covered in the Initial Rack Setup Guide. This guide will focus on the network aspects of technician ports.
Addressing
Technician ports send out periodic IPv6 SLAAC advertisements at an interval of 30 seconds. Any device plugged into a technician port will receive these advertisements.
When the device plugged into the technician port receives a SLAAC advertisement, it will auto-assign an address on the IPv6 network advertised by the rack’s technician port. For example if the technician port advertises a prefix
fdb1:a840:2504:195::/64
the connected device will assign an address in that space, typically based on
the MAC address of the interface it’s connected on
following EUI-64
conventions. For example, if the connected interface has a MAC address of
2:8:20:36:5c:8d
, the resulting self-assigned IPv6 address on the advertised
technician port prefix would be the following.
fdb1:a840:2504:195:8:20ff:fe36:5c8d/64
The technician port assigns the first address in this range to itself. So for the example prefix above, you can reach services provided by the rack over the technician port at the following address.
fdb1:a840:2504:195::1
Each technician port advertises a distinct IPv6 /64
prefix.
Firewall Considerations
In order to access the services provided over the management network, SSH port
22
must be accessible. Both the wicket
program and the support shell are
accessed through TCP port 22
.
Data Network
The data network is accessible through the rack switch QSFP ports. The data network provides connectivity between services and instances running inside the rack and the broader network the rack is a part of. Services running inside the rack include:
In order for the rack to function correctly, it needs access to a few services on the broader network, including the following.
NTP servers
Upstream DNS servers
Thinking at Layer 3
An important concept to highlight is that while we refer to the components in the center of the rack as switches, they are really more like routers. There is no common broadcast domain shared between any of the ports. When thinking about how an Oxide rack will integrate with a broader network, think about the switches as L3 edge routers. Any populated port on the switch will need an egress route assigned to it to forward packets into the broader network. Similarly for ingress traffic, the rack will not respond to ARP or NDP requests for any IP pool addresses it has been assigned. The rack switches must be assigned a gateway address that the broader network can use to direct off-subnet traffic into the rack. The switches will of course respond to ARP and NDP requests for gateway addresses assigned to them.
Initial Setup
The way initial communication paths are set up between the broader network and
the rack is through an initial configuration file. This configuration file is
handled by the wicket
setup program as described in the
Initial Rack Setup Guide. This guide
will focus on the networking details in that initial configuration. We’ll go
through the configuration section-by-section and then provide a complete
overview at the end.
Broader Network Services
The first part of the initial setup config tells the rack how to access the
services it needs on the broader network. Here we are telling the rack that it
can use 1.1.1.1
and 9.9.9.9
as upstream DNS servers and it can use
"ntp.acme.com"
as a time source.
In the examples that follow IPv4 is used. However, IPv6 is also supported. The
value provided to ntp_servers
may be a DNS name or IP address. For the time
being, there is a limit of 3 DNS and NTP servers. The upstream DNS servers must
be recursive resolvers and must be specified as IP addresses. In addition to
rack infrastructure, end user instances are given these DNS servers via DHCP
options.
dns_servers = [
"1.1.1.1",
"9.9.9.9",
]
ntp_servers = [
"ntp.acme.com",
]
Assignment of Names and Numbers to the Rack
The DNS names and IP address numbers assigned to the rack from the broader network include.
A DNS domain with a subdomain for each Silo.
A set of IP addresses for routing between rack switches and the broader network.
A set of IP addresses for rack-hosted DNS servers.
A set of IP addresses for the Oxide API server.
A set of IP addresses for end-user instances.
In this example, the DNS name cloud.acme.com
is assigned to the rack. The DNS
servers for the broader network infrastructure will need to delegate
cloud.acme.com
to the IP addresses described below and use glue records to
forward DNS requests to the rack hosted DNS servers.
The IP addresses that will be used by the rack to host DNS servers are set as
172.20.26.1
and 172.20.26.2
. These are just example addresses, and we’ll
generally use addresses from this subnet for the rest of this example. The only
limitation on these addresses is that there must be at least two provided. Once
the rack control plane is up, these addresses will respond to DNS queries.
Critically, you will be able to resolve the rack recovery address via
recovery.sys.cloud.acme.com
.
The internal-services IP pool provides the rack with a set of addresses to assign to rack-hosted services such as the Oxide API, DNS, etc. Generally speaking, IP pools are a resource that the rack control plane can dynamically allocate IPs from. In this case, we are defining an IP pool for internal services. IP pools are also used for end-user instances and can be defined using the Oxide API once the rack is initialized. These are addresses from the broader network that are assigned to the rack. It’s recommended to assign at least 16 addresses to the rack for high-availability (HA) setups. A minimal HA setup uses the following addresses:
5 addresses for DNS.
3 addresses for the Oxide API.
2 addresses for boundary NTP daemons.
DNS addresses are specified explicitly in configuration. Other address types are allocated dynamically from the provided IP pool. Oxide API addresses are discoverable via the external DNS servers by querying for records of the form:
.sys.cloud.acme.com
On initialization the rack automatically sets up the recovery
silo.
external_dns_zone_name = "cloud.acme.com"
external_dns_ips = [
"172.20.26.1",
"172.20.26.2",
]
internal_services_ip_pool_ranges = [
{ first = "172.20.26.1", last = "172.20.26.16" }
]
Rack Switch Configuration with Static Routes
The final bit of initial rack configuration relevant to networking is switch configuration. This configuration sets up the routes and addresses on the rack switches that are needed for services and instances within the rack to communicate with the broader network.
The set of addresses infra_ip_first
and infra_ip_last
at the beginning of
the configuration define a range of addresses that may be assigned to rack
switches. These addresses may be used exactly once. An attempt to assign the
same address to multiple switches or to multiple ports on the same switch will
result in an error. This constraint may be relaxed in a later release when
anycast addresses become supported. This range is inclusive meaning the first
and last addresses are included in the range.
Next, an uplink port is configured for each rack switch. In this example one uplink is configured per switch. However, there is no limit to the number of uplinks that may be configured here.
Each uplink configuration includes the following.
gateway_ip
: the address of the upstream router that will provide off-subnet communications for the rack on this uplink.port
: specifies which switch port this configuration applies to. The ports on the switch are physically labeled with a number. In this configuration that number is prefixed with"qsfp"
.uplink_port_speed
: the speed of the transceiver module plugged into the QSFP port.uplink_port_fec
: the forward error correction mode to be used for the port. This can currently bers
for Reed-Solomon ornone
.uplink_cidr
: the IP and subnet mask in CIDR format to assign to this port. This address must be pulled from theinfra_ip
address range.switch
: which rack switch this configuration applies to, may be eitherswitch0
orswitch1
.
[rack_network_config]
infra_ip_first = "172.20.15.21"
infra_ip_last = "172.20.15.22"
[[rack_network_config.ports]]
routes = [{nexthop = "172.20.15.17", destination = "0.0.0.0/0"}]
addresses = ["172.20.15.21/29"]
port = "qsfp0"
uplink_port_speed = "100G"
uplink_port_fec = "rs"
bgp_peers = []
switch = "switch0"
[[rack_network_config.ports]]
routes = [{nexthop = "172.20.15.17", destination = "0.0.0.0/0"}]
addresses = ["172.20.15.22/29"]
port = "qsfp0"
uplink_port_speed = "100G"
uplink_port_fec = "none"
bgp_peers = []
switch = "switch1"
Complete Configuration
The following is all of the above configuration in one place.
#
# Broader network services
#
dns_servers = [
"1.1.1.1",
"9.9.9.9",
]
ntp_servers = [
"ntp.acme.com",
]
#
# Assign names and numbers to the rack
#
external_dns_zone_name = "cloud.acme.com"
external_dns_ips = [
"172.20.26.1",
"172.20.26.2",
]
internal_services_ip_pool_ranges = [
{ first = "172.20.26.1", last = "172.20.26.16" }
]
#
# Configure rack switches
#
[rack_network_config]
infra_ip_first = "172.20.15.21"
infra_ip_last = "172.20.15.22"
bgp = []
[[rack_network_config.ports]]
routes = [{nexthop = "172.20.15.17", destination = "0.0.0.0/0"}]
addresses = ["172.20.15.21/29"]
port = "qsfp0"
uplink_port_speed = "100G"
uplink_port_fec = "rs"
bgp_peers = []
switch = "switch0"
[[rack_network_config.ports]]
routes = [{nexthop = "172.20.15.17", destination = "0.0.0.0/0"}]
addresses = ["172.20.15.22/29"]
port = "qsfp0"
uplink_port_speed = "100G"
uplink_port_fec = "none"
bgp_peers = []
switch = "switch1"
Rack Switch Configuration with BGP
Setting up BGP as a part of rack setup requires supplying two types of information.
A set of BGP router configurations must be specified as a part of the
rack_network_config
.Each port that peering will take place over must have a BGP peer config for each neighbor.
The BGP router config below configures a router with an autonomous system number
of 47. This router will announce the prefix 172.20.26.0/24
to any peers it
establishes BGP sessions with.
[[rack_network_config.bgp]]
asn = 47
originate = [ "172.20.26.0/24" ]
The port configurations that follow are a direct translation from the previous
static routing configurations to BGP. Here the routes
field is empty and the
bgp_peers
field filled in. Because each rack switch can have multiple BGP
routers running on different ASNs, peers must specify which ASN they are in.
Each peer configuration also specifies the address of the neighbor it is
expecting to peer with.
port
field of BGP peers is redundant with the port
field of the
rack_network_config.ports
port
field and will be removed in a future
release.[[rack_network_config.ports]]
routes = []
addresses = ["172.20.15.21/29"]
port = "qsfp0"
uplink_port_speed = "100G"
uplink_port_fec = "rs"
bgp_peers = [{asn = 47, addr = "172.20.15.17", port = "qsfp0"}]
switch = "switch0"
[[rack_network_config.ports]]
routes = []
addresses = ["172.20.15.22/29"]
port = "qsfp0"
uplink_port_speed = "100G"
uplink_port_fec = "none"
bgp_peers = [{asn = 47, addr = "172.20.15.17", port = "qsfp0"}]
switch = "switch1"
Beyond Initial Setup
This guide has primarily focused on network considerations for getting the rack up and running. Once the rack is set up, there are additional considerations for transiting traffic to and from VM instances. The Oxide API provides a set of endpoints for managing IP pools. These IP pools are the same basic abstraction as the internal services IP pool covered above. The only difference is the IP pools that are managed through the Oxide API are used to hand out IP addresses to VM instances. The addresses in these pools need to be routed to the rack, and the rack needs to have egress routes set up pointing at appropriate gateways for the address space covered by the IP pool.
There are no restrictions on what IP ranges can be used in IP pools.
The configuration provided during initial rack setup may be changed later through the Oxide API once the rack is up and running.
Firewall Considerations
The following ports are used by the rack and should be made available on the
broader network segment the rack is a part of. The direction in
identifies
traffic to the rack from the broader network, out
identifies traffic from
the rack to the broader network, and both
indicates bidirectional traffic.
Port | Protocol | Direction | Usage |
---|---|---|---|
443 | TCP / HTTPS | in | Oxide rack API |
53 | UDP / DNS | both | Name resolution for rack services (out). Rack provided name resolution (in). |
123 | UDP / NTP | both | Network time protocol (NTP) message exchange. |
179 | TCP / BGP | both | Border gateway protocol (BGP) peering and prefix exchange between the rack and broader network routers. |
4784 | UDP / BFD | both | Bidirectional forwarding detection (BFD) messaging. The Oxide platform uses BFD for Multihop Paths as described in RFC5883. |
22 | TCP / SSH | in | SSH access to instances. Not strictly required for rack functionality but likely needed by end users. |