Important Notes
The Oxide CLI and Rust SDK have been updated to support all the new features such as boot disk designation, instance update, and VPC internet gateway. Please be sure to upgrade to the latest versions.
The Go SDK and Oxide Terraform Provider have also been updated to support boot disk designation. They will be enhanced to manage internet gateway objects in a future release.
In this release, the control plane is enhanced to attempt restarting instances that have gone into the failed state. This new feature allows previously running instances to come back up after software updates. See New Features for more information.
System Requirements
Please refer to v1.0.0 release notes.
Installation
Oxide Computer Model 0 must be installed and configured under the guidance of Oxide technicians. The requirement may change in future releases.
Upgrade Compatibility
Upgrade from version 10 is supported. We recommend shutting down all running instances on the rack before software update commences. Any instances that aren’t stopped for software update are transitioned to the failed
state when the control plane comes up. They will be automatically restarted afterwards in a controlled manner. Failed instances may also be started or stopped manually while they are waiting to be restarted.
All existing setup and data (e.g., projects, users, instances) remain intact after the software update.
New Features
Improved Failed Instance Handling
The Oxide control plane has been enhanced to allow failed instances to be auto-restarted or started/stopped manually. Prior to this release, instances marked failed
because of planned or unexpected sled reboots or hypervisor process failures can only be revived by deleting/recreating them. The handling of instance state transition has also been improved to prevent timeout-related failures (e.g., omicron#5235).
Auto-restart policy is configurable on a per-instance basis using the new PUT /v1/instances/{instance}
endpoint (or the corresponding instance update
CLI command). If no policy is set, instances will automatically restart by default.
Boot Disk
Users can now designate one of an instance’s disks as its boot disk. The boot disk can be specified at instance create time or changed later using the new PUT /v1/instances/{instance}
endpoint (or the corresponding instance update
CLI command). In a future release, this endpoint will also be used to change instance CPU and memory size. Instance updates are allowed only when the instance is stopped.
Prior to this release, when an instance was provisioned, the first disk in the request body was implicitly treated as the boot disk. The firmware also attempted to boot from other disks in an unpredictable fashion if it could not find the bootloader on the first disk, sometimes resulting in unusable instances as described in this terraform provider issue.
Telemetry
Prior to v10, the rack telemetry data was limited to rack-level network metrics, resource utilization, and a subset of disk and instance data. In v10 and v11, we have further expanded the metrics coverage to include:
Instance network interface (e.g., I/O, packet drops, errors)
The metrics data can be consumed using Oxide’s Oximeter Query Language, "OxQL", which is described in RFD 463. Please note that the query language is experimental and its syntax may change in future releases. Details about available timeseries will be added to this site soon.
Internet Gateways
Internet gateways support the routing of VPC traffic to networks outside of the Oxide rack. They can be used to ensure that instances only use certain external IP addresses when sending traffic to a given network. Prior to this release, only the system-defined internet gateway was available for routing external traffic using the default IP pool of each silo. Starting from v11, project users can optionally set up internet gateways in their VPCs against other IP pools. These gateways can be applied as routing targets to customize instance outbound traffic based on the packet destinations. You can find an example of such granular routing setup and more details about internet gateways in the Networking guide.
Please note that the management of internet gateways is available through API only at this time. The web console, Go SDK, and Terraform provider will provide the same support in the next release.
Rack Reconfigurator
The reconfigurator module provides the foundation for Oxide control plane service configuration changes during hardware and software maintenance. It is accessible by Oxide technicians only at this time. It will be made available to rack operators in the form of component update/replacement capabilities in a future release.
Enable replacement of external DNS instances
Support horizontal scaling and replacement of
oximeter
(metrics collector) instances
Web console changes
Image uploads are ~3x faster, as fast as CLI (omicron#6690)
Searchable pickers (comboboxes) allow arbitrary values and are used in more places: firewall rule hosts and targets (console#2296), instance disks (console#2467), router routes (console#2448)
Boot disk UI (console#2464)
Validate IPs and IP networks in forms (console#2461)
View and edit transit IPs on instance network interfaces (console#2437)
Allow failed instances to be started and stopped (console#2439, console#2482)
Fix row actions menus closing suddenly when polling (console#2453, console#2460)
Link to silo pages from system utilization, add copy silo ID buttons (console#2408, console#2414)
Bug fixes and minor enhancements
Fix 500s due to DB memory limit when creating a large number of instances concurrently (omicron#5904)
Instances were stuck in running state after the backend propolis servers panicked. (omicron#5705)
Administratively deleting a bgp peer did not result in the routes learned from the peer from being deleted (maghemite#349)
List hardware switch API returned an empty result set (omicron#6597)
BGP announce-set and config delete APIs returned HTTP 500 errors (omicron#6471, omicron#6619)
BGP announce-set list had a redundant
name_or_id
query parameter (omicron#6467)Better
tx_eq
defaults for different transceiver types (dendrite#1020)Improve handling of RIB priorities for Static and BGP protocols (maghemite#359)
SAML authentication signed requests should respond with an appropriate name id format (omicron#5604)
Missing or modified RelayState should be handled during SAML authentication (omicron#5607)
Storage performance improvements
Use new
syncfs
syscall (crucible#1427)Fine-tuning backpressure clamping, and API cleanups (crucible#1442)
Increment write backpressure before deferred encryption (crucible#1444)
Do fast-ack immediately after a write is submitted (crucible#1445)
Do IO immediately, instead of storing it in new_work (crucible#1460)
Tune ZFS parameters for improved IO performance (helios#170)
Firmware update
None
Known Behavior and Limitations
End-user features
Feature Area | Known Issue/Limitation | Issue Number |
---|---|---|
Image/snapshot management | Disks in | |
Image/snapshot management | Image upload sometimes stalls with HTTP/2 on Firefox. | |
Image/snapshot management | The ability to modify image metadata is not available at this time. | |
Instance orchestration | Instance hostname validation has been strengthened. Instances with a now-invalid hostname will fail to start, though they can still be listed and viewed. If the disks attached to them are valuable, they may be detached from the invalid instances, and re-attached to a new instance. The invalid instance may be deleted at that time. | |
Instance performance | The | |
VPC routing | Subnet update clears custom router ID when the field is left out of request body. | |
VPC routing | Network interface update clears transit ips when the field is left out of request body. | - |
Telemetry | VM instance memory utilization and VPC network/firewall metrics are unavailable at this time. | - |
Operator features
Feature Area | Known Issue/Limitation | Issue Number |
---|---|---|
Access control | Device tokens do not expire. | |
Control plane | Sled and physical storage availability status are not available in the inventory UI and API yet. | |
Control plane | Operator-driven software update is currently unavailable. All updates need to be performed by Oxide technicians. | - |
Control plane | Operator-driven instance migration across sleds is currently unavailable. | - |
User management | User offboarding from the rack is not supported at this time. Apart from updating the identity provider to remove obsolete users from the relevant groups, operators will need to remove any IAM roles granted directly to those users in silos and projects. |