Important notes

  1. The Oxide CLI, Go SDK, and Terraform Provider have been updated for API enhancements described under New features. Please be sure to upgrade.

Installation

Oxide Computer Model 0 must be installed and configured under the guidance of Oxide technicians. The requirement may change in future releases.

Upgrade compatibility

Upgrade from version 19, 19.1, 19.2, 19.3 or 19.4 is supported. We recommend shutting down all running instances on the rack before software update commences. Any instances that aren’t stopped for software update are transitioned to the failed state when the control plane comes up. They can be configured to start automatically with auto-restart policy or they can be started manually by the user.

All existing setup and data (e.g., projects, users, instances) remain intact after the software update.

New features

Jumbo frame support for external networking

Instance external networking now supports the use of jumbo frames, which raise the instance’s primary interface MTU from 1500 to 8500 bytes. The higher MTU can substantially improve throughput for flows that send large messages, provided that the network upstream of the rack has end-to-end jumbo frame support.

Please refer to the Jumbo Frames guide for details about jumbo frame settings and workload configuration recommendations.

Network performance improvements

Besides jumbo frames support, there are other improvements in this release to enable higher intra- and inter-VPC network throughput between VM instances:

  • viona IPv6 TCP segmentation offload support (illumos#18086)

  • Exclude viona worker LWPs from core-pinning in large vCPU instances (propolis#1146)

System update validation

The update system now checks for abnormal conditions before and after an update and prompts the operator to contact Oxide Support if needed (omicron#10271). See the System Update guide for more information.

Tunable for telemetry retention period

System telemetry stored in the ClickHouse database is retained for 30 days. The retention period is now adjustable through Oxide Support and will be configurable by the fleet admin in a future release. (omicron#10366)

Storage lifecycle management improvements

  • Local disk cleanup on expunged disks/sleds: Local disks backed by expunged physical disks or sleds could sometimes become undeletable if the sled or disk in question no longer responds to volume backend termination. Cleanup now accounts for expungement, so these disks can be deleted. (omicron#10222, omicron#10257)

  • Expunged disks re-adoption: Prior to this release, expunged SSDs were assumed to be unrecoverable, and could only be replaced with new disks. The limitation has been removed to allow physical disks to be re-adopted if they have been repaired in software (e.g., firmware update). More information about this can be found in the updated Hardware Maintenance guide and omicron#10221.

Web console

The console changes in this release are all about UX polish. A few examples: new users didn’t know where to edit firewall rules for a given instance; now the instance networking page links to the firewall rules page under the relevant VPC. We also added support for ICMPv6 filters in firewall rules. Comboboxes are now much faster when there are many items. See the full list of changes below.

Full console changelog

Other bug fixes and enhancements

  • Compress external API responses by up to 90% with gzip (omicron#10341, dropshot#1448)

  • Support bundles include reason_for_creation (omicron#10240)

  • Add OxQL metrics for ZFS pool and dataset usage (omicron#10453)

  • Work around slow thread renaming in ClickHouse to reduce CPU usage (omicron#10431)

  • Remove oximeter schema cache to support database endpoint mobility (omicron#10292)

  • Prevent concurrent instance-start operations from making multiple sled reservations (omicron#10479)

  • Mark running instances failed before starting a planned sled update to hasten instance restart on other available sleds (omicron#10334)

  • Allow instances to transition to stopped state when one switch is down (omicron#10389)

  • Nexus startup is independent of IP allowlist plumbing (omicron#10305)

  • Improve NTP time sync checks to avoid starting CockroachDB prematurely (omicron#7668)

  • Ensure firewall rule update validation happens before DB write (omicron#10563)

  • BFD status no longer returns 404 when only one switch is available (omicron#9979)

  • Ensure stable DUID values are used in DHCPv6 exchanges (dendrite#267)

  • Fix Intel CPUID leaf 4 cache topology for SMT (propolis#1002)

  • Add amd_turin_v2 CPU platform for glibc 2.34-2.36 compatibility (omicron#10560)

Firmware update

  • AMD Turin microcode update to B002162/B101059

  • Storage firmware updates for Western Digital SN861 series NVMe drives (not applicable to all hardware configurations)

  • Helios update from v2.0 to v3.0 (helios#244)

Known behavior and limitations

End-user features

Feature AreaKnown Issue/LimitationIssue Number

Disk/image management

Disks in importing_from_bulk_writes state cannot be deleted directly. The procedure for unsticking a canceled disk import can be used as a workaround.

Disk/image management

Disk rejected by guest OS due to duplicate nvme device names. The issue is caused by a 20-character limit in applying the disk name to the device serial number. See the Troubleshooting guide for more information.

-

Disk/image management

The ability to modify image metadata is not available at this time.

Instance orchestration

Unable to start an instance that has a disk replica on a sled being updated.

Instance orchestration

Instance start API frequently times out when attached to local disks.

Instance orchestration

New instances cannot be created when the total number of NAT entries (private-to-external IP mappings) in the system exceeds 1024.

Instance performance

The tsc clocksource is treated as unreliable by guest, resulting in its fallback to use substantially slower timestamp syscalls. A workaround for this issue can be found in the Troubleshooting guide.

Instance performance

Linux guests unable to capture hardware events using perf record. A workaround for this issue can be found in the Troubleshooting guide.

VPC internet gateway

Changing a silo’s default IP pool causes some instances to lose their outbound internet access. This is due to a mismatch between the pool containing the instances' external IP (which are allocated from the new default pool) and the pool attached to the system-created internet gateways (which are linked to the old pool during creation time). Please see the Troubleshooting guide for some possible options for restoring instance outbound connectivity.

VPC routing

Subnet update clears custom router ID when the field is left out of the request body.

VPC routing

Network interface update clears transit ips when the field is left out of the request body.

-

Telemetry

VM instance memory utilization and VPC network/firewall metrics are unavailable at this time.

-

Operator features

Feature AreaKnown Issue/LimitationIssue Number

Silo management

The ability to modify silo and IDP metadata is not available at this time.

omicron#3400, omicron#3125

System management

Real-time availability status for sleds and physical storage is not yet shown in the inventory API or UI.

omicron#2035

System management

Operator-driven instance migration across sleds is currently unavailable.

-

System management

Some running instances transitioned to the "stopped" state after update.

omicron#9177