v20

Important notes

The Oxide CLI, Go SDK, and Terraform Provider have been updated for API enhancements described under New features. Please be sure to upgrade.

Installation

Oxide Computer Model 0 must be installed and configured under the guidance of Oxide technicians. The requirement may change in future releases.

Upgrade from version 19, 19.1, 19.2, 19.3 or 19.4 is supported. We recommend shutting down all running instances on the rack before software update commences. Any instances that aren’t stopped for software update are transitioned to the failed state when the control plane comes up. They can be configured to start automatically with auto-restart policy or they can be started manually by the user.

All existing setup and data (e.g., projects, users, instances) remain intact after the software update.

New features

Jumbo frame support for external networking

Instance external networking now supports the use of jumbo frames, which raise the instance’s primary interface MTU from 1500 to 8500 bytes. The higher MTU can substantially improve throughput for flows that send large messages, provided that the network upstream of the rack has end-to-end jumbo frame support.

Please refer to the Jumbo Frames guide for details about jumbo frame settings and workload configuration recommendations.

Network performance improvements

Besides jumbo frames support, there are other improvements in this release to enable higher intra- and inter-VPC network throughput between VM instances:

viona IPv6 TCP segmentation offload support (illumos#18086)
Exclude viona worker LWPs from core-pinning in large vCPU instances (propolis#1146)

System update validation

The update system now checks for abnormal conditions before and after an update and prompts the operator to contact Oxide Support if needed (omicron#10271). See the System Update guide for more information.

Tunable for telemetry retention period

System telemetry stored in the ClickHouse database is retained for 30 days. The retention period is now adjustable through Oxide Support and will be configurable by the fleet admin in a future release. (omicron#10366)

Storage lifecycle management improvements

Local disk cleanup on expunged disks/sleds: Local disks backed by expunged physical disks or sleds could sometimes become undeletable if the sled or disk in question no longer responds to volume backend termination. Cleanup now accounts for expungement, so these disks can be deleted. (omicron#10222, omicron#10257)
Expunged disks re-adoption: Prior to this release, expunged SSDs were assumed to be unrecoverable, and could only be replaced with new disks. The limitation has been removed to allow physical disks to be re-adopted if they have been repaired in software (e.g., firmware update). More information about this can be found in the updated Hardware Maintenance guide and omicron#10221.

Web console

The console changes in this release are all about UX polish. A few examples: new users didn’t know where to edit firewall rules for a given instance; now the instance networking page links to the firewall rules page under the relevant VPC. We also added support for ICMPv6 filters in firewall rules. Comboboxes are now much faster when there are many items. See the full list of changes below.

Full console changelog

Support ICMPv6 filters in firewall rules (console#3212)
View IP pool details in a sidebar (console#3158)
Show server errors inline in modals instead of toasts, including image upload errors (console#3192, console#3227)
Combobox improvements: fix value editing, virtualize long lists, remove dropdown dead space (console#3217, console#3221, console#3186)
Clearer copy in all confirmation modals (console#3205)
Show "contact support" message on update status page when the API says to (console#3226, console#3238)
Link directly to firewall rules from the instance networking tab (console#3216)
Improve subnet pool member and IP range validation (console#3188)
Many small UI fixes (console#3210, console#3204, console#3200, console#3187, console#3181, console#3215)

Other bug fixes and enhancements

Compress external API responses by up to 90% with gzip (omicron#10341, dropshot#1448)
Support bundles include reason_for_creation (omicron#10240)
Add OxQL metrics for ZFS pool and dataset usage (omicron#10453)
Work around slow thread renaming in ClickHouse to reduce CPU usage (omicron#10431)
Remove oximeter schema cache to support database endpoint mobility (omicron#10292)
Prevent concurrent instance-start operations from making multiple sled reservations (omicron#10479)
Mark running instances failed before starting a planned sled update to hasten instance restart on other available sleds (omicron#10334)
Allow instances to transition to stopped state when one switch is down (omicron#10389)
Nexus startup is independent of IP allowlist plumbing (omicron#10305)
Improve NTP time sync checks to avoid starting CockroachDB prematurely (omicron#7668)
Ensure firewall rule update validation happens before DB write (omicron#10563)
BFD status no longer returns 404 when only one switch is available (omicron#9979)
Ensure stable DUID values are used in DHCPv6 exchanges (dendrite#267)
Fix Intel CPUID leaf 4 cache topology for SMT (propolis#1002)
Add amd_turin_v2 CPU platform for glibc 2.34-2.36 compatibility (omicron#10560)

Firmware update

AMD Turin microcode update to B002162/B101059
AMD Milan microcode update to 1.0.0.F with GN-PMFW version 45.104.100
Storage firmware updates for Western Digital SN861 series NVMe drives (not applicable to all hardware configurations)
Helios update from v2.0 to v3.0 (helios#244)

Patches

20.1

Improve RoT-SP attestation timeout handling (hubris#2561)
Wait for DDR training to complete before starting temperature polling (hubris#2560)
Fix router solicitation parsing issue (maghemite#797)

20.2

Refine thermal control loop to allow operating at the edge of threshold (hubris#2585)
Update AMD Milan firmware to 1.0.0.F with CPU performance regression fix (helios#262)

Known behavior and limitations

End-user features

Feature Area Known Issue/Limitation Issue Number

Feature Area	Known Issue/Limitation	Issue Number
Disk/image management	Disks in `importing_from_bulk_writes` state cannot be deleted directly. The procedure for unsticking a canceled disk import can be used as a workaround.	omicron#2987
Disk/image management	Disk rejected by guest OS due to duplicate nvme device names. The issue is caused by a 20-character limit in applying the disk name to the device serial number. See the Troubleshooting guide for more information.	-
Disk/image management	The ability to modify image metadata is not available at this time.	omicron#2800
Instance orchestration	Unable to start an instance that has a disk replica on a sled being updated.	crucible#1690
Instance orchestration	Instance start API frequently times out when attached to local disks.	omicron#9953
Instance orchestration	New instances cannot be created when the total number of NAT entries (private-to-external IP mappings) in the system exceeds 1024.	omicron#6939
Instance performance	The `tsc` clocksource is treated as unreliable by guest, resulting in its fallback to use substantially slower timestamp syscalls. A workaround for this issue can be found in the Troubleshooting guide.	omicron#8001
Instance performance	Linux guests unable to capture hardware events using `perf record`. A workaround for this issue can be found in the Troubleshooting guide.	propolis#894
VPC internet gateway	Changing a silo’s default IP pool causes some instances to lose their outbound internet access. This is due to a mismatch between the pool containing the instances' external IP (which are allocated from the new default pool) and the pool attached to the system-created internet gateways (which are linked to the old pool during creation time). Please see the Troubleshooting guide for some possible options for restoring instance outbound connectivity.	omicron#7297
VPC routing	Subnet update clears custom router ID when the field is left out of the request body.	omicron#6406
VPC routing	Network interface update clears transit ips when the field is left out of the request body.	-
Telemetry	VM instance memory utilization and VPC network/firewall metrics are unavailable at this time.	-

Disk/image management

Disks in importing_from_bulk_writes state cannot be deleted directly. The procedure for unsticking a canceled disk import can be used as a workaround.

omicron#2987

Disk/image management

Disk rejected by guest OS due to duplicate nvme device names. The issue is caused by a 20-character limit in applying the disk name to the device serial number. See the Troubleshooting guide for more information.

Disk/image management

The ability to modify image metadata is not available at this time.

omicron#2800

Instance orchestration

Unable to start an instance that has a disk replica on a sled being updated.

crucible#1690

Instance orchestration

Instance start API frequently times out when attached to local disks.

omicron#9953

Instance orchestration

New instances cannot be created when the total number of NAT entries (private-to-external IP mappings) in the system exceeds 1024.

omicron#6939

Instance performance

The tsc clocksource is treated as unreliable by guest, resulting in its fallback to use substantially slower timestamp syscalls. A workaround for this issue can be found in the Troubleshooting guide.

omicron#8001

Instance performance

Linux guests unable to capture hardware events using perf record. A workaround for this issue can be found in the Troubleshooting guide.

propolis#894

VPC internet gateway

Changing a silo’s default IP pool causes some instances to lose their outbound internet access. This is due to a mismatch between the pool containing the instances' external IP (which are allocated from the new default pool) and the pool attached to the system-created internet gateways (which are linked to the old pool during creation time). Please see the Troubleshooting guide for some possible options for restoring instance outbound connectivity.

omicron#7297

VPC routing

Subnet update clears custom router ID when the field is left out of the request body.

omicron#6406

VPC routing

Network interface update clears transit ips when the field is left out of the request body.

Telemetry

VM instance memory utilization and VPC network/firewall metrics are unavailable at this time.

Operator features

Feature Area	Known Issue/Limitation	Issue Number
Silo management	The ability to modify silo and IDP metadata is not available at this time.	omicron#3400, omicron#3125
System management	Real-time availability status for sleds and physical storage is not yet shown in the inventory API or UI.	omicron#2035
System management	Operator-driven instance migration across sleds is currently unavailable.	-
System management	Some running instances transitioned to the "stopped" state after update.	omicron#9177

Feature Area

Known Issue/Limitation

Issue Number

Silo management

The ability to modify silo and IDP metadata is not available at this time.

omicron#3400, omicron#3125

System management

Real-time availability status for sleds and physical storage is not yet shown in the inventory API or UI.

omicron#2035

System management

Operator-driven instance migration across sleds is currently unavailable.

System management

Some running instances transitioned to the "stopped" state after update.

omicron#9177

v20

Important notes

System requirements

Installation

Upgrade compatibility

New features

Jumbo frame support for external networking

Network performance improvements

System update validation

Tunable for telemetry retention period

Storage lifecycle management improvements

Web console

Other bug fixes and enhancements

Firmware update

Patches

20.1

20.2

Known behavior and limitations

End-user features

Operator features

Related documentation

v20

Table of Contents