Important Notes

  1. The Oxide CLI and Rust SDK have been updated to support all the new features such as boot disk designation, instance update, and VPC internet gateway. Please be sure to upgrade to the latest versions.

  2. The Go SDK and Oxide Terraform Provider have also been updated to support boot disk designation. They will be enhanced to manage internet gateway objects in a future release.

  3. In this release, the control plane is enhanced to attempt restarting instances that have gone into the failed state. This new feature allows previously running instances to come back up after software updates. See New Features for more information.

Installation

Oxide Computer Model 0 must be installed and configured under the guidance of Oxide technicians. The requirement may change in future releases.

Upgrade Compatibility

Upgrade from version 10 is supported. We recommend shutting down all running instances on the rack before software update commences. Any instances that aren’t stopped for software update are transitioned to the failed state when the control plane comes up. They will be automatically restarted afterwards in a controlled manner. Failed instances may also be started or stopped manually while they are waiting to be restarted.

All existing setup and data (e.g., projects, users, instances) remain intact after the software update.

New Features

Improved Failed Instance Handling

The Oxide control plane has been enhanced to allow failed instances to be auto-restarted or started/stopped manually. Prior to this release, instances marked failed because of planned or unexpected sled reboots or hypervisor process failures can only be revived by deleting/recreating them. The handling of instance state transition has also been improved to prevent timeout-related failures (e.g., omicron#5235).

Auto-restart policy is configurable on a per-instance basis using the new PUT /v1/instances/{instance} endpoint (or the corresponding instance update CLI command). If no policy is set, instances will automatically restart by default.

Boot Disk

Users can now designate one of an instance’s disks as its boot disk. The boot disk can be specified at instance create time or changed later using the new PUT /v1/instances/{instance} endpoint (or the corresponding instance update CLI command). In a future release, this endpoint will also be used to change instance CPU and memory size. Instance updates are allowed only when the instance is stopped.

Prior to this release, when an instance was provisioned, the first disk in the request body was implicitly treated as the boot disk. The firmware also attempted to boot from other disks in an unpredictable fashion if it could not find the bootloader on the first disk, sometimes resulting in unusable instances as described in this terraform provider issue.

Boot disk and other disks

Telemetry

Prior to v10, the rack telemetry data was limited to rack-level network metrics, resource utilization, and a subset of disk and instance data. In v10 and v11, we have further expanded the metrics coverage to include:

The metrics data can be consumed using Oxide’s Oximeter Query Language, "OxQL", which is described in RFD 463. Please note that the query language is experimental and its syntax may change in future releases. Details about available timeseries will be added to this site soon.

Internet Gateways

Internet gateways support the routing of VPC traffic to networks outside of the Oxide rack. They can be used to ensure that instances only use certain external IP addresses when sending traffic to a given network. Prior to this release, only the system-defined internet gateway was available for routing external traffic using the default IP pool of each silo. Starting from v11, project users can optionally set up internet gateways in their VPCs against other IP pools. These gateways can be applied as routing targets to customize instance outbound traffic based on the packet destinations. You can find an example of such granular routing setup and more details about internet gateways in the Networking guide.

Please note that the management of internet gateways is available through API only at this time. The web console, Go SDK, and Terraform provider will provide the same support in the next release.

Rack Reconfigurator

The reconfigurator module provides the foundation for Oxide control plane service configuration changes during hardware and software maintenance. It is accessible by Oxide technicians only at this time. It will be made available to rack operators in the form of component update/replacement capabilities in a future release.

  • Enable replacement of external DNS instances

  • Support horizontal scaling and replacement of oximeter (metrics collector) instances

Web console changes

Bug fixes and minor enhancements

  • Fix 500s due to DB memory limit when creating a large number of instances concurrently (omicron#5904)

  • Instances were stuck in running state after the backend propolis servers panicked. (omicron#5705)

  • Administratively deleting a bgp peer did not result in the routes learned from the peer from being deleted (maghemite#349)

  • List hardware switch API returned an empty result set (omicron#6597)

  • BGP announce-set and config delete APIs returned HTTP 500 errors (omicron#6471, omicron#6619)

  • BGP announce-set list had a redundant name_or_id query parameter (omicron#6467)

  • Better tx_eq defaults for different transceiver types (dendrite#1020)

  • Improve handling of RIB priorities for Static and BGP protocols (maghemite#359)

  • SAML authentication signed requests should respond with an appropriate name id format (omicron#5604)

  • Missing or modified RelayState should be handled during SAML authentication (omicron#5607)

  • Storage performance improvements

Known Behavior and Limitations

End-user features

Feature AreaKnown Issue/LimitationIssue Number

Image/snapshot management

Disks in importing_from_bulk_writes state cannot be deleted directly. The procedure for unsticking a canceled disk import can be used as a workaround.

Image/snapshot management

Image upload sometimes stalls with HTTP/2 on Firefox.

Image/snapshot management

The ability to modify image metadata is not available at this time.

Instance orchestration

Instance hostname validation has been strengthened. Instances with a now-invalid hostname will fail to start, though they can still be listed and viewed. If the disks attached to them are valuable, they may be detached from the invalid instances, and re-attached to a new instance. The invalid instance may be deleted at that time.

Instance performance

The tsc clocksource is treated as unreliable by guest, resulting in its fallback to use substantially slower timestamp syscalls. A workaround for this issue can be found in the Troubleshooting Guide.

VPC routing

Subnet update clears custom router ID when the field is left out of request body.

VPC routing

Network interface update clears transit ips when the field is left out of request body.

-

Telemetry

VM instance memory utilization and VPC network/firewall metrics are unavailable at this time.

-

Operator features

Feature AreaKnown Issue/LimitationIssue Number

Access control

Device tokens do not expire.

omicron#2302

Control plane

Sled and physical storage availability status are not available in the inventory UI and API yet.

omicron#2035

Control plane

Operator-driven software update is currently unavailable. All updates need to be performed by Oxide technicians.

-

Control plane

Operator-driven instance migration across sleds is currently unavailable.

-

User management

User offboarding from the rack is not supported at this time. Apart from updating the identity provider to remove obsolete users from the relevant groups, operators will need to remove any IAM roles granted directly to those users in silos and projects.

omicron#2587