System
Release Notes
RSS

v11

Important Notes

  1. The Oxide CLI and Rust SDK have been updated to support all the new features such as boot disk designation, instance update, and VPC internet gateway. Please be sure to upgrade to the latest versions.

  2. The Go SDK and Oxide Terraform Provider have also been updated to support boot disk designation. They will be enhanced to manage internet gateway objects in a future release.

  3. In this release, the control plane is enhanced to attempt restarting instances that have gone into the failed state. This new feature allows previously running instances to come back up after software updates. See New Features for more information.

Installation

Oxide Computer Model 0 must be installed and configured under the guidance of Oxide technicians. The requirement may change in future releases.

Upgrade Compatibility

Upgrade from version 10 is supported. We recommend shutting down all running instances on the rack before software update commences. Any instances that aren’t stopped for software update are transitioned to the failed state when the control plane comes up. They will be automatically restarted afterwards in a controlled manner. Failed instances may also be started or stopped manually while they are waiting to be restarted.

All existing setup and data (e.g., projects, users, instances) remain intact after the software update.

New Features

Improved Failed Instance Handling

The Oxide control plane has been enhanced to allow failed instances to be auto-restarted or started/stopped manually. Prior to this release, instances marked failed because of planned or unexpected sled reboots or hypervisor process failures can only be revived by deleting/recreating them. The handling of instance state transition has also been improved to prevent timeout-related failures (e.g., omicron#5235).

Auto-restart policy is configurable on a per-instance basis using the new PUT /v1/instances/{instance} endpoint (or the corresponding instance update CLI command). If no policy is set, instances will automatically restart by default.

Boot Disk

Users can now designate one of an instance’s disks as its boot disk. The boot disk can be specified at instance create time or changed later using the new PUT /v1/instances/{instance} endpoint (or the corresponding instance update CLI command). In a future release, this endpoint will also be used to change instance CPU and memory size. Instance updates are allowed only when the instance is stopped.

Prior to this release, when an instance was provisioned, the first disk in the request body was implicitly treated as the boot disk. The firmware also attempted to boot from other disks in an unpredictable fashion if it could not find the bootloader on the first disk, sometimes resulting in unusable instances as described in this terraform provider issue.

Boot disk and other disks

Telemetry

Prior to v10, the rack telemetry data was limited to rack-level network metrics, resource utilization, and a subset of disk and instance data. In v10 and v11, we have further expanded the metrics coverage to include:

The metrics data can be consumed using Oxide’s Oximeter Query Language, "OxQL", which is described in RFD 463. Please note that the query language is experimental and its syntax may change in future releases. Details about available timeseries will be added to this site soon.

Internet Gateways

Internet gateways support the routing of VPC traffic to networks outside of the Oxide rack. They can be used to ensure that instances only use certain external IP addresses when sending traffic to a given network. Prior to this release, only the system-defined internet gateway was available for routing external traffic using the default IP pool of each silo. Starting from v11, project users can optionally set up internet gateways in their VPCs against other IP pools. These gateways can be applied as routing targets to customize instance outbound traffic based on the packet destinations. You can find an example of such granular routing setup and more details about internet gateways in the Networking guide.

Please note that the management of internet gateways is available through API only at this time. The web console, Go SDK, and Terraform provider will provide the same support in the next release.

Rack Reconfigurator

The reconfigurator module provides the foundation for Oxide control plane service configuration changes during hardware and software maintenance. It is accessible by Oxide technicians only at this time. It will be made available to rack operators in the form of component update/replacement capabilities in a future release.

  • Enable replacement of external DNS instances

  • Support horizontal scaling and replacement of oximeter (metrics collector) instances

Web console changes

Bug fixes and minor enhancements

  • Fix 500s due to DB memory limit when creating a large number of instances concurrently (omicron#5904)

  • Instances were stuck in running state after the backend propolis servers panicked. (omicron#5705)

  • Administratively deleting a bgp peer did not result in the routes learned from the peer from being deleted (maghemite#349)

  • List hardware switch API returned an empty result set (omicron#6597)

  • BGP announce-set and config delete APIs returned HTTP 500 errors (omicron#6471, omicron#6619)

  • BGP announce-set list had a redundant name_or_id query parameter (omicron#6467)

  • Better tx_eq defaults for different transceiver types (dendrite#1020)

  • Improve handling of RIB priorities for Static and BGP protocols (maghemite#359)

  • SAML authentication signed requests should respond with an appropriate name id format (omicron#5604)

  • Missing or modified RelayState should be handled during SAML authentication (omicron#5607)

  • Storage performance improvements

Known Behavior and Limitations

End-user features

Feature AreaKnown Issue/LimitationIssue Number

Image/snapshot management

Disks in importing_from_bulk_writes state cannot be deleted directly. The procedure for unsticking a canceled disk import can be used as a workaround.

Image/snapshot management

Image upload sometimes stalls with HTTP/2 on Firefox.

Image/snapshot management

The ability to modify image metadata is not available at this time.

Instance orchestration

Instance hostname validation has been strengthened. Instances with a now-invalid hostname will fail to start, though they can still be listed and viewed. If the disks attached to them are valuable, they may be detached from the invalid instances, and re-attached to a new instance. The invalid instance may be deleted at that time.

Instance performance

The tsc clocksource is treated as unreliable by guest, resulting in its fallback to use substantially slower timestamp syscalls. A workaround for this issue can be found in the Troubleshooting Guide.

VPC routing

Subnet update clears custom router ID when the field is left out of request body.

VPC routing

Network interface update clears transit ips when the field is left out of request body.

-

Telemetry

VM instance memory utilization and VPC network/firewall metrics are unavailable at this time.

-

Operator features

Feature AreaKnown Issue/LimitationIssue Number

Access control

Device tokens do not expire.

omicron#2302

Control plane

Sled and physical storage availability status are not available in the inventory UI and API yet.

omicron#2035

Control plane

Operator-driven software update is currently unavailable. All updates need to be performed by Oxide technicians.

-

Control plane

Operator-driven instance migration across sleds is currently unavailable.

-

User management

User offboarding from the rack is not supported at this time. Apart from updating the identity provider to remove obsolete users from the relevant groups, operators will need to remove any IAM roles granted directly to those users in silos and projects.

omicron#2587

     
v10

Important Notes

  1. The Oxide CLI, Go SDK, and Terraform Provider have been updated for API enhancements such as the BGP API changes described under New Features. Please be sure to upgrade.

Installation

Oxide Computer Model 0 must be installed and configured under the guidance of Oxide technicians. The requirement may change in future releases.

Upgrade Compatibility

Upgrade from version 9 is supported. We recommend shutting down all running instances on the rack before the software update commences.

All existing setup and data (e.g., projects, users, instances) will remain intact after the software update.

New Features

Rack networking

A number of new switch configurations and API endpoints for querying BGP settings are available in v10:

Live instance state in web console

When an instance is starting or stopping, the console now automatically refreshes as the instance changes state (console#2360, console#2391). Users no longer have to manually refresh to know when an instance is ready to interact with. When trying to connect to the serial console of a starting instance, the console will wait and automatically connect when the instance is ready (console#2374).

Instance state refreshing

VPC routers and routes in web console

v9 added endpoints for managing VPC routers and routes to the API. In v10, users can manage them in the web console (console#2359, console#2371). Subnets can be linked to custom routers (console#2393).

VPC routes

Rack Reconfigurator

The reconfigurator module provides the foundation for Oxide control plane service configuration changes during hardware and software maintenance. It is accessible by Oxide technicians only at this time. It will be made available to rack operators in the form of component update/replacement capabilities in a future release.

  • Enable boundary NTP zone replacement

  • Enable disk downstairs auto-replacement when a sled or disk is marked expunged

  • Migrate in-progress jobs from expunged nexus zone to other available peers

Bug fixes and minor enhancements

Firmware update

  • AMD Microcode: Version update from 20240116 to 20240710

Known Behavior and Limitations

End-user features

Feature AreaKnown Issue/LimitationIssue Number

Image/snapshot management

Disks in importing_from_bulk_writes state cannot be deleted directly. The procedure for unsticking a canceled disk import can be used as a workaround.

Image/snapshot management

Image upload sometimes stalls with HTTP/2 on Firefox.

Image/snapshot management

The ability to modify image metadata is not available at this time.

Instance orchestration

Possible 500 errors when creating a large number of instances concurrently. Users can retry the requests to work around the failures.

Instance orchestration

Instances are stuck in running state when the backend propolis servers are gone or disassociated from the control plane.

Instance orchestration

Instance hostname validation has been strengthened. Instances with a now-invalid hostname will fail to start, though they can still be listed and viewed. If the disks attached to them are valuable, they may be detached from the invalid instances, and re-attached to a new instance. The invalid instance may be deleted at that time.

Instance orchestration

Instance disk boot order problem causes instance to drop to UEFI shell.

VPC routing

Subnet update clears custom router ID when the field is left out of request body.

VPC routing

Network interface update clears transit ips when the field is left out of request body.

-

Telemetry

VM instance memory utilization and network throughput metrics are unavailable at this time.

-

Operator features

Feature AreaKnown Issue/LimitationIssue Number

Access control

Device tokens do not expire.

Control plane

Sled and physical storage availability status are not available in the inventory UI and API yet.

Control plane

Operator-driven software update is currently unavailable. All updates need to be performed by Oxide technicians.

-

Control plane

Operator-driven instance migration across sleds is currently unavailable. Instance migrations need to be performed by Oxide technicians.

-

Rack Networking

Administratively deleting a bgp peer (e.g., oxide system networking bgp peer del) does not result in the routes learned from this peer from being deleted. This occurs because the handler for this API endpoint incorrectly missed the logic to cleanup routes imported from this peer.

To avoid this issue, ensure the peer transitions out of the Established state (e.g., by administratively shutting down the session or causing a connectivity loss between the two peers) or set the peer’s import policy to deny all routes before deleting the peer.

Telemetry

Hardware metrics such as temperatures, fan speeds, and power consumption are not exposed to the control plane at this time.

-

User management

User offboarding from the rack is not supported at this time. Apart from updating the identity provider to remove obsolete users from the relevant groups, operators will need to remove any IAM roles granted directly to those users in silos and projects.

     
v9

Important Notes

  1. The session timeout in the web console is now 8 hours idle and 24 hours absolute for a better user experience (omicron-PR#5920). These values will be made configurable in a future release (omicron#5477).

  2. The external IP allowlist is now applied to the API only; the allowlist no longer affects DNS server access (omicron#5892).

  3. The Oxide CLI, Go SDK, and Terraform Provider have been updated for API enhancements such as VPC subnet routing described under New Features. Please be sure to upgrade.

Installation

Oxide Computer Model 0 must be installed and configured under the guidance of Oxide technicians. The requirement may change in future releases.

Upgrade Compatibility

Upgrade from version 8 is supported. We recommend shutting down all running instances on the rack before the software update commences.

All existing setup and data (e.g., projects, users, instances) will remain intact after the software update.

New Features

VPC Subnet Routing

  • Project users can now configure custom routes in VPCs to allow instances in different subnets within the same VPC to talk with one another.

  • Custom routers may be attached/detached to a VPC subnet using the custom_router field in subnet POST and PUT requests. See the latest Networking guide for more information.

  • A common use case enabled by subnet routing is hosting a VPN tunnel on a VM instance, as illustrated by this example in the networking guide.

  • Web console support for subnet routing will be added in a future release.

Uplink VLAN Tagging

  • Operators may now include VLAN ID optionally in the switch port settings.

  • The Oxide rack switches will make use of the VLAN ID to produce and consume 802.1Q Ethernet tags, enabling the Oxide rack to operate with shared physical network interfaces.

Console usability improvements

Bug fixes and minor enhancements

  • PEM encoded certificate is now included in external API responses (omicron-PR#5078)

  • Compute resource usage was decremented incorrectly when stopping a running instance (omicron#5525)

  • Attempt to add firewall rule with duplicate name now returns a 400 (omicron#5725)

  • IP pool linked silos pagination did not work (omicron#5837)

  • Marking a sled non-provisionable caused existing instances to lose their private IP connectivity (omicron#5872)

  • Fixed 404 on project IP pool view for users without fleet viewer role (omicron#5883)

  • Enable support for updating RoT bootloader in future releases (omicron-PR#5882)

  • Inflight orchestration jobs weren’t recovered automatically when the control plane was restarted (omicron#5948)

  • Added BGP announce set modification API endpoint (omicron#6022)

  • Database error was thrown when reading BGP peer configs in background sync job (omicron#6023)

  • BGP filters were not persisted in the bootstore early networking configurations (omicron#6067)

Firmware update

  • NVMe: Micron 7300 version 95420280 (release notes)

  • NVMe: Western Digital SN840 version R2210010

Known Behavior and Limitations

End-user features

Feature AreaKnown Issue/LimitationIssue Number

Image/snapshot management

Disks in importing_from_bulk_writes state cannot be deleted directly. The procedures to unstick a canceled disk import can be applied to work around the issue.

omicron#2987

Image/snapshot management

Image upload sometimes stalls with HTTP/2 on Firefox.

omicron#3559

Image/snapshot management

The ability to modify image metadata is not available at this time.

omicron#2800

Instance orchestration

Possible 500 errors when creating a large number of instances concurrently. Users can retry the requests to work around the failures.

omicron#5904

Instance orchestration

Instances are stuck in running state when the backend propolis servers are gone or disassociated from the control plane.

omicron#5798

Instance orchestration

Instance hostname validation has been strengthened. Instances with a now-invalid hostname will fail to start, though they can still be listed and viewed. If the disks attached to them are valuable, they may be detached from the invalid instances, and re-attached to a new instance. The invalid instance may be deleted at that time.

omicron-PR#4938

Telemetry

VM instance memory utilization and network throughput metrics are unavailable at this time.

-

Operator features

Feature AreaKnown Issue/LimitationIssue Number

Access control

Device tokens do not expire.

omicron#2302

Control plane

Sled and physical storage availability status are not available in the inventory UI and API yet.

omicron#2035

Control plane

Operator-driven software update is currently unavailable. All updates need to be performed by Oxide technicians.

-

Control plane

Operator-driven instance migration across sleds is currently unavailable. Instance migrations need to be performed by Oxide technicians.

-

Telemetry

Hardware metrics such as temperatures, fan speeds, and power consumption are not exposed to the control plane at this time.

-

User management

User offboarding from the rack is not supported at this time. Apart from updating the identity provider to remove obsolete users from the relevant groups, operators will need to remove any IAM roles granted directly to those users in silos and projects.

omicron#2587

     
v8

Important Notes

  1. These recent versions of API clients - CLI v0.4.0, Go SDK v0.1.0-beta4, and Terraform v0.3.0 - remain compatible with v8 besides rack networking configurations.

  2. If you want to leverage the new rack networking configurations (see New Features), please review the Networking section of the API documentation for the new configurable options and get the newer CLI binaries (v0.5.0).

Installation

Oxide Computer Model 0 must be installed and configured under the guidance of Oxide technicians. The requirement may change in future releases.

Upgrade Compatibility

Upgrade from version 7 is supported. We recommend shutting down all running instances on the rack before the software update commences.

All existing setup and data (e.g., projects, users, instances) will remain intact after the software update.

New Features

Rack networking configurations

  • Improved BGP support (maghemite#199)

    • Use of the enforce-first-as option. (maghemite#208)

    • Specifying ASN of BGP peer to prevent unauthorized/unintended remote peering. (maghemite#151)

    • Additional BGP configurations such as keepalive time, multi-exit discriminator, import/export policies, local preferences, and operator-defined communities.

  • Operator can now use the new networking_allow_list_update API to restrict the Oxide API/UI endpoint access by source IP address. (omicron-PR#5686)

Rack reconfigurator

  • The new feature provides the foundation for rack component replacement and configuration changes. In v8, the reconfigurator module supports the programmatic configuration of new sleds for instance and disk mirror placement.

  • Additional capabilities are being developed to support other rack reconfiguration use cases. More details are forthcoming in release v9.

  • The reconfigurator module is currently only accessible by Oxide technicians. It will be made available to rack operators in a future release.

Console usability improvements

In this release we focused on making the web experience friendlier for new users.

Docs popover - Snapshots

Bug fixes and minor enhancements

  • Web Console

  • Users can now create snapshots from disks attached to stopped instances. (omicron#3289)

  • Floating IP create returned a 404 error when users without fleet admin privileges specified the IP pool to use. (omicron#5508)

  • An initial set of SMBIOS tables are now exposed to the guest via fw_cfg and the OVMF ROM. (propolis#628)

  • Instance delete was stuck in stopping state due to deadlock during VM halt and destroy. (propolis#675)

  • Storage job queue management has been improved to avoid kicking out a disk mirror during heavy writes. (crucible-PR#1252, crucible-PR#1256, crucible-PR#1260)

  • Metrics producer registration logic was refactored to use a lease-based renewal process to support sled replacement. (omicron#5284)

Known Behavior and Limitations

End-user features

Feature AreaKnown Issue/LimitationIssue Number

Image/snapshot management

Disks in importing_from_bulk_writes state cannot be deleted directly. The procedures to unstick a canceled disk import are not obvious to CLI users.

omicron#2987

Image/snapshot management

Image upload sometimes stalls with HTTP/2 on Firefox.

omicron#3559

Image/snapshot management

The ability to modify image metadata is not available at this time.

omicron#2800

Instance orchestration

Instance or disk provisioning requests may fail due to unhandled sled or storage failure on rare occasions. Users can retry the requests to work around the failures.

omicron#4259, omicron#4331

Instance orchestration

Disk volume backend repair may fail to complete under heavy large write workload, preventing instances from starting or stopping.

crucible#837

Instance orchestration

Instance hostname validation has been strengthened. Instances with a now-invalid hostname will fail to start, though they can still be listed and viewed. If the disks attached to them are valuable, they may be detached from the invalid instances, and re-attached to a new instance. The invalid instance may be deleted at that time.

omicron-PR#4938

Telemetry

VM instance memory utilization and network throughput metrics are unavailable at this time.

-

VPC and routing

Inter-subnet traffic routing is not available by default. Router and routing rules will be supported in future releases.

omicron#2232

Operator features

Feature AreaKnown Issue/LimitationIssue Number

Access control

Device tokens do not expire.

omicron#2302

Control plane

Sled and physical storage availability status are not available in the inventory UI and API yet.

omicron#2035

Control plane

When a sled is rebooted outside of the maintenance settings, new instances on the sled may be unable to reach existing instances on other sleds until those instances have been restarted.

omicron#5214

Control plane

Operator-driven software update is currently unavailable. All updates need to be performed by Oxide technicians.

-

Control plane

Operator-driven instance migration across sleds is currently unavailable. Instance migrations need to be performed by Oxide technicians.

-

Telemetry

Hardware metrics such as temperatures, fan speeds, and power consumption are not exposed to the control plane at this time.

-

User management

User offboarding from the rack is not supported at this time. Apart from updating the identity provider to remove obsolete users from the relevant groups, operators will need to remove any IAM roles granted directly to those users in silos and projects.

omicron#2587

     
v7

Important Notes

  1. The instance_create API endpoint now returns a success response as soon as orchestration is finished. The hypervisor-level setup and booting has been made asynchronous to avoid timeouts (e.g., when there are many concurrent large instance requests). Care must be taken when provisioning instances to poll for instance state and connect to an instance only when it has transitioned to the running state.

  2. The Oxide CLI, Go SDK, and Terraform Provider have been updated for various API enhancements such as IP pool utilization described under New Features. Please ensure you obtain the latest version of the clients.

Installation

Oxide Computer Model 0 must be installed and configured under the guidance of Oxide technicians. The requirement may change in future releases.

Upgrade Compatibility

Upgrade from version 6 is supported. We recommend shutting down all running instances on the rack before the software update commences.

All existing setup and data (e.g., projects, users, instances) will remain intact after the software update.

New Features

Floating IPs in web console

The web console now supports creating and deleting floating IP addresses, as well as attaching to and detaching from instances.

IP pool utilization in API and web console

The IP pool management page in the web console now includes real-time utilization data, giving fleet administrators visibility into the number of external IP addresses currently allocated in each pool. See also the ip_pool_utilization_view API endpoint.

IP Pool Utilization

Bug fixes and minor enhancements

  • New API endpoint floating_ip_update for updating floating IP name and description (omicron#5016)

  • New API endpoint networking_bgp_message_history for retrieving BGP message history (maghemite-PR#179)

  • Fix capacity and utilization showing "undefined" in some cases (console#1954)

  • Crucible worker thread tunable is reduced to avoid hitting the kernel limit, resulting in instances stuck in stopping state under heavy disk I/O. (crucible#1184)

  • Fixes around crucible disk repair reliability (crucible#1146, crucible#1155)

  • Additional validations to prevent cross-project floating IP attach (omicron-PR#5177)

  • Graceful transition to/from BFD-based networks (maghemite-PR#174)

  • ClickHouse upgrade from v22.8.9.24 to v23.8.7.24 (omicron-PR#5127)

  • "Probes" experimental API. Probes are instance-like objects used for emulating instance and network interface lifecycle events. They will consume IP addresses but do not take up any compute and storage resources. Probes may be used by Oxide technicians from time to time for instrumentation purposes with the rack operator’s permissions. (omicron-PR#4585)

Firmware update

Known Behavior and Limitations

End-user features

Feature AreaKnown Issue/LimitationIssue Number

Image/snapshot management

Disks in importing_from_bulk_writes state cannot be deleted directly. The procedures to unstick a canceled disk import are not obvious to CLI users.

omicron#2987

Image/snapshot management

Image upload sometimes stalls with HTTP/2 on Firefox.

omicron#3559

Image/snapshot management

Unable to create snapshots for disks attached to stopped instances. As a workaround, user can detach a disk temporarily for snapshotting and re-attach it to the instance afterwards.

omicron#3289

Image/snapshot management

The ability to modify image metadata is not available at this time.

omicron#2800

Instance orchestration

Instance or disk provisioning requests may fail due to unhandled sled or storage failure on rare occasions. Users can retry the requests to work around the failures.

omicron#4259, omicron#4331

Instance orchestration

Disk volume backend repair may fail to complete under heavy large write workload, preventing instances from starting or stopping.

crucible#837

Instance orchestration

Instance hostname validation has been strengthened. Instances with a now-invalid hostname will fail to start, though they can still be listed and viewed. If the disks attached to them are valuable, they may be detached from the invalid instances, and re-attached to a new instance. The invalid instance may be deleted at that time.

omicron-PR#4938

Telemetry

VM instance memory utilization and network throughput metrics are unavailable at this time.

-

VPC and routing

Inter-subnet traffic routing is not available by default. Router and routing rules will be supported in future releases.

omicron#2232

Operator features

Feature AreaKnown Issue/LimitationIssue Number

Access control

Device tokens do not expire.

omicron#2302

Control plane

Sled and physical storage availability status are not available in the inventory UI and API yet.

omicron#2035

Control plane

When a sled is rebooted outside of the maintenance settings, new instances on the sled may be unable to reach existing instances on other sleds until those instances have been restarted.

omicron#5214

Control plane

Operator-driven software update is currently unavailable. All updates need to be performed by Oxide technicians.

-

Control plane

Operator-driven instance migration across sleds is currently unavailable. Instance migrations need to be performed by Oxide technicians.

-

Telemetry

Hardware metrics such as temperatures, fan speeds, and power consumption are not exposed to the control plane at this time.

-

User management

User offboarding from the rack is not supported at this time. Apart from updating the identity provider to remove obsolete users from the relevant groups, operators will need to remove any IAM roles granted directly to those users in silos and projects.

omicron#2587

     
v6

Important Notes

  1. IP pools have been reworked for greater flexibility and can now be managed through the web console UI.

    1. In v5, an IP pool was either fleet-scoped (i.e., available to users in all silos) or silo-scoped (i.e., available in one silo). In v6, an IP pool can be linked to any number of silos. This enables configurations that were not possible in v5, such as an IP pool shared by silos A and B but not C.

    2. There is no longer a concept of fleet-scoped pool: users can only allocate IP addresses from pools explicitly linked to their silo. The behavior of a fleet-scoped pool can be recreated in the new model by linking a pool to every silo individually.

    3. The software update process will set up links between existing pools and silos in a way that preserves v5 behavior:

      1. Formerly fleet-scoped pools will be linked to every silo.

      2. If a formerly fleet-scoped pool was marked “default” for the fleet, it will continue to be the default for each silo unless that silo had its own default pool overriding the fleet-level default.

    4. After setting up a new silo, you will need to link an IP pool to it before users can allocate external IPs.

    5. Please review the updated IP Pool Management guide and API docs and update any API client that manipulates IP pools.

  2. The Oxide CLI, Go SDK, and Terraform Provider have been updated for floating IP attach/detach support and the IP Pool changes mentioned above as well as other API enhancements, please ensure you obtain the latest versions.

  3. The NewClient function in the Go SDK has been modified to no longer require user agent, and now takes a Config struct instead. You can find more information about this, and other changes in the Go SDK changelog.

  4. The latest release of the Oxide CLI produces JSON-formatted output.

Installation

Oxide Computer Model 0 must be installed and configured under the guidance of Oxide technicians. The requirement may change in future releases.

Upgrade Compatibility

Upgrade from version 5 is supported. We recommend shutting down all running instances on the rack before the software update commences.

All existing setup and data (e.g., projects, users, instances) should remain intact after the software update.

New Features

This release comes with two new features for rack switch failover support, security fixes, and other minor enhancements.

Fault-Tolerant Multipath Connectivity

Each of the two Oxide rack switches (aka “sidecars”) can now be connected to two or more uplinks allowing rack connectivity to remain available should there be a single physical switch or uplink failure. For static multipath routing configurations, this is made possible through Bidirectional Forwarding Detection (BFD) provided in this release. New tunnel routing capabilities within the rack’s internal network ensure packets leaving the rack are always routed to a rack switch with sufficient connectivity to forward packets to their final destination. For BGP routing configurations, tunnel routes adapt to changes in BGP routing tables. These features work together to eliminate single points of failure by automatically detecting connectivity issues and redirecting network traffic to the functional network paths. (Note: BFD verifies IP connectivity between the source and destination by actively sending control packets and/or passively responding to control packets from the neighboring devices.)

Floating IP address attach/detach

The floating IP feature introduced in v5 allows a consistent IP addresses to be allocated to a new instance and the address to be de-allocated upon instance termination. The feature has been further enhanced in v6 to allow floating IPs to be allocated to, or de-allocated from, running instances. In other words, you can move a floating IP from one instance to another on the fly. See the latest Guest Networking Guide for more information.

Web console improvements

  • Manage IP pools: add/remove IP ranges, link/unlink silos, set default pool (console#1910)

  • Select from list of SSH keys on instance create form (console#1867)

  • Firewall rules table includes priority and direction, is sorted by priority (console#1887)

  • Add user data (e.g., cloud-init config YAML) field under Advanced on instance create form (console#800)

  • Show external IPs at top of instance page (console#1882)

Bug fixes and minor enhancements

  • TCP state machine race condition could leave Nexus API or guest instance TCP connections in the wrong state (opte#442)

  • Outbound TCP flow occasionally hung as old TCP flows were in FLOW_WAIT, blocking port reuse (opte#436)

  • Instances could not transition to failed state when their propolis zones crashed or were purged (omicron#4709)

  • API

    • Select the SSH keys to inject into instances at create time (omicron#3056)

    • Users can list IP pools available to them (omicron#2148)

    • Project deletion did not enforce the removal of the project’s floating IP addresses (omicron#4854)

    • Instance create requests with hostnames not conforming to RFC 1035 are now prohibited (omicron#4938)

  • Web console

    • Enhance number field and use it more consistently (console#1926)

    • Fix y-axis units for large numbers on instance disk metrics charts (console#1916)

    • Clickable styling (underline and hover) on links in tables (console#1899)

    • Handle empty IP address in network interface create form (console#1854)

    • Instance networking config moved under Advanced/Networking on instance create (console#800)

    • Don’t allow editing on the instance create form while submit is in progress (console#1893)

    • Relative times (e.g., “7d ago”) have a tooltip showing absolute time (console#1879)

    • Increase page size on tables from 10 to 25 (console#1878)

    • Pressing enter while adding target, host, or port to firewall rule should not submit form (console#1919)

  • Reliability improvements

    • Improved network device driver error handling (propolis#583)

    • Stop checking the disk parent image/snapshot if the scrub is done (crucible#1093)

  • Storage performance improvements (crucible-PR#1058, crucible-PR#1066, crucible-PR#1089, crucible-PR#1094, crucible-PR#1107)

  • Control plane datastores now use ZFS datasets in encrypted mode (omicron-PR#4853)

Known Behavior and Limitations

End-user features

Feature AreaKnown Issue/LimitationIssue Number

Image/snapshot management

Disks in importing_from_bulk_writes state cannot be deleted directly. The procedures to unstick a canceled disk import are not obvious to CLI users.

omicron#2987

Image/snapshot management

Image upload sometimes stalls with HTTP/2 on Firefox.

omicron#3559

Image/snapshot management

Unable to create snapshots for disks attached to stopped instances. As a workaround, user can detach a disk temporarily for snapshotting and re-attach it to the instance afterwards.

omicron#3289

Image/snapshot management

The ability to modify image metadata is not available at this time.

omicron#2800

Instance orchestration

Instance or disk provisioning requests may fail due to unhandled sled or storage failure on rare occasions. Users can retry the requests to work around the failures.

omicron#4259, omicron#4331

Instance orchestration

Disk volume backend repair may fail to complete under heavy large write workload, preventing instances from starting or stopping.

crucible#837

Instance orchestration

Instance hostname validation has been strengthened. Instances with a now-invalid hostname will fail to start, though they can still be listed and viewed. If the disks attached to them are valuable, they may be detached from the invalid instances, and re-attached to a new instance. The invalid instance may be deleted at that time.

omicron-PR#4938

Telemetry

Guest VM vcpu and memory metrics are unavailable at this time.

-

VPC and routing

Inter-subnet traffic routing is not available by default. Router and routing rules will be supported in future releases.

omicron#2232

Operator features

Feature AreaKnown Issue/LimitationIssue Number

Access control

Device tokens do not expire.

omicron#2302

Control plane

Sled and physical storage availability status are not available in the inventory UI and API yet.

omicron#2035

Control plane

When sleds attached to the switches are restarted outside of rack cold-start, a full rack power cycle may be required to re-propagate sled NAT configurations.

omicron#3631

Control plane

Operator-driven software update is currently unavailable. All updates need to be performed by Oxide technicians.

-

Control plane

Operator-driven instance migration across sleds is currently unavailable. Instance migrations need to be performed by Oxide technicians.

-

Telemetry

Hardware metrics such as temperatures, fan speeds, and power consumption are not exposed to the control plane at this time.

-

User management

User offboarding from the rack is not supported at this time. Apart from updating the identity provider to remove obsolete users from the relevant groups, operators will need to remove any IAM roles granted directly to those users in silos and projects.

omicron#2587

     
v5

Important Notes

  1. This release comes with new features for multi-tenant management. As part of this change, silo creation now requires a set of resource quotas (vcpu, memory, storage) to be specified.

    1. As part of the software update, all existing silos will have resource quotas configured to use all available fleet capacity initially. Fleet administrators can modify the quotas afterwards via the new /v1/system/silos/{silo}/quotas API.

    2. See the New Features section for more information about silo resource allocation.

  2. A previous issue with disk deletion (omicron#3866) resulted in incomplete removal of backend data volumes and over-reported disk usage. The issue may manifest as 500 errors when a user attempts to delete a project that has no resource in it. Such partially removed disks will be un-deleted during the software update process. They are marked in faulted state so that they can be cleanly deleted again. Please review and remove any faulted disks once the system has been updated.

  3. The disk_import_blocks_from_url API endpoint for importing disk images from a remote URL is no longer supported. Please download image files to your local workstation and use the disk_bulk_write APIs to import them instead. (Note: The oxide disk import CLI command is not affected by this change.)

  4. The Oxide CLI binaries (v0.2.0) have been updated for the new floating IP and resource quota endpoints.

Installation

Oxide Computer Model 0 must be installed and configured under the guidance of Oxide technicians. The requirement may change in future releases.

Upgrade Compatibility

Upgrades from version 3 and 4 are both supported. We recommend shutting down all running instances on the rack before the software update commences.

All existing setup and data (e.g., projects, users, instances) should remain intact after the software update.

New Features

This release includes several new features for rack management and VM instance networking:

Silo resource allocation

Fleet administrators can now set limits on virtual resources (vcpu, memory, storage) usable by individual silos. Quotas are set during silo creation and can be modified afterwards to levels at or above the current utilization. Silo administrators can query the capacity and current utilization via API or CLI to independently manage the rack resources allocated to them.

Quotas are enforced when a new disk is provisioned and when an instance is started. If the action causes the silo’s aggregate resource usage to exceed its quotas, users will receive an InsufficientCapacity error. Please refer to the new Silo Management guide for details on usable capacity and utilization calculations as well as some possible exception scenarios.

Marking sled non-provisionable

This new API allows operators to temporarily exclude a sled from new workload placement. The action may be required when diagnosing and mitigating the impact of unexpected sled issues (e.g., unresponsive sleds, unscheduled reboots). This operator API is a precursor for the sled maintenance and replacement feature set.

Floating IP address

Floating IPs are permanent, project-scoped resources which bind an individual IP address from a given IP Pool. They allow for well-known addresses to be allocated (explicitly or automatically) and assigned to target instances, making it easier to host services from a consistent address. Floating IPs are allocated or de-allocated only when instances are created or destroyed at this time. They can also be used along with ephemeral IP so that an instance can be accessed on more than one external IP address. Please refer to the user guides (Configuring Guest Networking and Managing Floating IPs) for more information.

Bug fixes:

  • Security fix: CVE-2023-50913 SSRF in Oxide software that could allow attacker to access the ClickHouse metrics datastore.

  • Spurious errors were returned for snapshot or disk deletions after multiple delete requests on the same snapshot (omicron#3866, omicron-PR#4547)

  • Disk create or instance start requests under high concurrency failed to complete (omicron#3304)

  • Instances sometimes fail to boot up when they are created under very high concurrency (propolis#535)

  • IP address could not be left blank when adding NIC to instance (console#1438)

  • Image sizes were not available on image picker (console#1824)

  • Project picker showed only the first 20 projects (console#1817)

  • Disk snapshot action did not provide any UI feedback (console#1815)

  • BGP configuration was not applied to switches after upgrade (omicron#4474)

  • BGP failed to handle ConnectRetryTimerExpires in Active state (maghemite#93)

  • Link parameters from rack setup was not persisted in the control plane datastore (omicron#4470)

  • Link config API did not allow for setting link autonegotiation (omicron#4458)

  • RoT on production Rev E gimlet incorrectly prohibited software update (omicron#4420)

  • Reliability improvements:

    • Support for non-power-of-2 multipath route selection (dendrite-PR#685)

    • Background tasks to populate NAT entries of sleds and instances during normal and unexpected restarts (omicron#3631)

    • Better handling of racing VM suspend conditions (propolis#559, propolis#561)

    • Better handling of project resource usage mismatch conditions (omicron#4426)

  • Storage backend reliability and performance improvements (crucible-PR#1014, crucible#1038, crucible#1021, crucible-PR#991, crucible-PR#1019, crucible-PR#1047)

Firmware update:

  • None in this release

Known Behavior and Limitations

End-user features

Feature AreaKnown Issue/LimitationIssue Number

Image/snapshot management

Disks in importing_from_bulk_writes state cannot be deleted directly. The current procedures to unstick a canceled disk import are not obvious to CLI users.

omicron#2987

Image/snapshot management

Image upload sometimes stalls with HTTP/2 on Firefox.

omicron#3559

Image/snapshot management

Unable to create snapshots for disks attached to stopped instances. As a workaround, user can detach a disk temporarily for snapshotting and re-attach it to the instance afterwards.

omicron#3289

Image/snapshot management

The ability to modify image metadata is not available at this time.

omicron#2800

Instance orchestration

The ability to select which SSH keys to be passed to a new instance is not available at this time.

omicron#3056

Instance orchestration

Instance or disk provisioning requests may fail due to unhandled sled or storage failure on rare occasions. Users can retry the requests to work around the failures.

omicron#3480, omicron#2483

Instance orchestration

Disk volume backend repair may fail to complete under heavy large write workload, preventing instances from starting or stopping.

crucible#837

Instance orchestration

Instances no longer transition to failed state when propolis zone has crashed or is gone

omicron#4709

Telemetry

Guest VM vcpu and memory metrics are unavailable at this time.

-

VPC and routing

Inter-subnet traffic routing is not available by default. Router and routing rules will be supported in future releases.

omicron#2232

Operator features

Feature AreaKnown Issue/LimitationIssue Number

Access control

Device tokens do not expire.

omicron#2302

Control plane

Sled and physical storage availability status are not available in the inventory UI and API yet.

omicron#2035

Control plane

When sleds attached to the switches are restarted outside of rack cold-start, a full rack power cycle may be required to re-propagate sled NAT configurations.

omicron#3631

Control plane

Operator-driven software update is currently unavailable. All updates need to be performed by Oxide technicians.

-

Control plane

Operator-driven instance migration across sleds is currently unavailable. Instance migrations need to be performed by Oxide technicians.

-

Network management

End users cannot query the names of non-default IP pools. The information needs to be provided by the administrators manually at this time.

omicron#2148

Telemetry

Hardware metrics such as temperatures, fan speeds, and power consumption are not exposed to the control plane at this time.

-

User management

User offboarding from the rack is not supported at this time. Apart from updating the identity provider to remove obsolete users from the relevant groups, operators will need to remove any IAM roles granted directly to those users in silos and projects.

omicron#2587

     
v4

Important Notes

This release includes bug fixes that are essential for configuring BGP. It also includes an OpenSSL update that has no impact on the product features. If you do not plan to use BGP for rack networking, you may consider skipping this release.

There is also additional operator documentation on how to configure BGP and a new version of the CLI binaries that support the BGP configuration API.

Installation

Oxide Computer Model 0 must be installed and configured under the guidance of Oxide technicians. The requirement may change in future releases.

Upgrade Compatibility

Upgrade from version 3 is supported. We recommend shutting down all running instances on the rack before the software update commences.

All existing setup and data (e.g., projects, users, instances) should remain intact after the software update.

New Features

Changes in this release:

  • OpenSSL version upgrade from 3.0.11 to 3.0.12 for CVE-2023-5363 (see also security advisory).

  • Rack networking configuration fixes and improvements (omicron#4406).

    • A port settings update resulted in the ASIC and switch-zone updates going to different sidecars.

    • Determine nexthop dynamically based on peer connection.

    • Improve link configurability (technician tool to set/clear PRBS mode, better identification scheme, allowing manual lane selection)

Firmware update:

  • None in this release

Known Behavior and Limitations

End-user features

Feature AreaKnown Issue/LimitationIssue Number

Image/snapshot management

Disk in importing_from_bulk_writes state cannot be deleted directly. The current procedures to unstick a canceled disk import are not obvious to CLI users.

omicron#2987

Image/snapshot management

Image upload sometimes stalls with HTTP/2.

omicron#3559

Image/snapshot management

Unable to create snapshots for disks attached to stopped instances.

omicron#3289

Image/snapshot management

The ability to modify image metadata is not available at this time.

omicron#2800

Instance orchestration

The ability to select which SSH keys to be passed to a new instance is not available at this time.

omicron#3056

Instance orchestration

Disk create or instance start requests under high concurrency may fail to complete. Users can reduce the concurrency level to avoid the error or retry failed requests.

omicron#3304

Instance orchestration

Instance or disk provisioning requests may fail due to unhandled sled or storage failure on rare occasions. Users can retry the requests to work around the failures.

omicron#3480, omicron#2483

Instance orchestration

Instances sometimes fail to boot up when they are created under very high concurrency. Rebooting the instances should allow the guest OS to come up.

propolis#535

Instance orchestration

Disk volume backend is occasionally stuck in repair state, preventing instances from starting or stopping.

crucible#837

Telemetry

Guest VM cpu and memory metrics are unavailable at this time.

-

VPC and routing

Inter-subnet traffic routing is not available by default. Router and routing rules will be supported in future releases.

omicron#2232

Operator features

Feature AreaKnown Issue/LimitationIssue Number

Access control

Device tokens do not expire.

omicron#2302

Control plane

Sled and physical storage availability status are not available in the inventory UI and API yet.

omicron#2035

Control plane

When sleds attached to the switches are restarted outside of rack cold-start, a full rack power cycle may be required to re-propagate sled NAT configurations.

omicron#3631

Control plane

Operator-driven software update is currently unavailable. All updates need to be performed by Oxide technicians.

-

Control plane

Operator-driven instance migration across sleds is currently unavailable. Instance migrations need to be performed by Oxide technicians.

-

Network management

End users cannot query the names of non-default IP pools. The ability to set up and query different IP pools (e.g., per-project IP pools) will be available soon in future releases.

omicron#2148

Telemetry

Hardware metrics such as temperatures, fan speeds, and power consumption are not exposed to the control plane at this time.

-

User management

User offboarding from the rack is not supported at this time. Apart from updating the identity provider to remove obsolete users from the relevant groups, operators will need to remove any IAM roles granted directly to those users in silos and projects.

omicron#2587

     
v3

Important Notes

  1. Starting from this release, the numbering convention follows an integer version numbering scheme. Upgrade from v1.0.2 to this release, version 3, does not require a system reset or other special handling.

  2. Please note that there are two functional changes around the instance lifecycle:

    • The vCPU and memory resources of stopped instances no longer count toward current utilization in the web UI and system metrics API.

    • The transient instance state rebooting may not be reflected in the UI or API if the reboot operation completes almost instantly. The instance may be considered running throughout the process.

      These changes allow more accurate reporting of resource utilization and reduced concurrent instance provisioning errors.

  3. There is a change in the ClickHouse timeseries key generation method. Historical metrics are dropped as part of the software update. Utilization metrics will be re-populated once a new instance/disk lifecycle event (e.g., create, start, delete) has taken place.

Installation

Oxide Computer Model 0 must be installed and configured under the guidance of Oxide technicians. The requirement may change in future releases.

Upgrade Compatibility

Upgrade from version 1.0.2 is supported. We recommend shutting down all running instances on the rack before the software update commences.

All existing setup and data (e.g., projects, users, instances) should remain intact after the software update.

New Features

This release includes a number of performance and VM workflow reliability improvements.

Bug fixes:

  • Instances provisioned in bulk from the same image were stuck in stopping state due to back pressure on read-only parent (crucible#9696, crucible-PR#984)

  • Spurious errors returned for snapshot or disk deletions after multiple delete requests on the same snapshot (omicron#3866, omicron-PR#920)

  • 500 errors were returned when attempting to delete images (omicron#3033)

  • Firewall rules using VPC as target traffic filtering was based on an instance’s private IP only but not its public IP (opte#380)

  • Instance DNS was hardcoded to 8.8.8.8 (opte#390)

  • Disk replica placement did not maximize physical disk spread over separate sleds (omicron#3702)

  • Guest OS panicked upon executing lshw -C disk (propolis#520)

  • Disk run-time metrics were not captured after a timeseries format change (crucible#942)

  • User with silo collaborator role was unable to start an instance (omicron#4272)

  • Unsuccessful deletion of running instances could still remove public IP (omicron#2842)

  • IP Pool API get responses now include the is_default value (omicron#4005)

  • Silo TLS certificate can now be specified in Silo Create UI (console#1736)

  • Additional stratum checks for more reliable NTP synchronization (omicron-PR#4119)

  • Improved physical disk out-of-space handling (crucible#861)

  • Improved back pressure handling under heavy disk write workload (crucible#902)

  • Added TLS certificate CN/SAN validation against rack external domain name (omicron#4045)

  • curl upgrade (from 8.3.0 to 8.4.0) for CVEs (helios#119)

Firmware update:

  • None in this release

Known Behavior and Limitations

End-user features

Feature AreaKnown Issue/LimitationIssue Number

Image/snapshot management

Disk in importing_from_bulk_writes state cannot be deleted directly. The current procedures to unstick a canceled disk import are not obvious to CLI users.

omicron#2987

Image/snapshot management

Image upload sometimes stalls with HTTP/2.

omicron#3559

Image/snapshot management

Unable to create snapshots for disks attached to stopped instances.

omicron#3289

Image/snapshot management

The ability to modify image metadata is not available at this time.

omicron#2800

Instance orchestration

The ability to select which SSH keys to be passed to a new instance is not available at this time.

omicron#3056

Instance orchestration

Disk create or instance start requests under high concurrency may fail to complete. Users can reduce the concurrency level to avoid the error or retry failed requests.

omicron#3304

Instance orchestration

Instance or disk provisioning requests may fail due to unhandled sled or storage failure on rare occasions. Users can retry the requests to work around the failures.

omicron#3480, omicron#2483

Instance orchestration

Instances sometimes fail to boot up when they are created under very high concurrency. Rebooting the instances should allow the guest OS to come up.

propolis#535

Instance orchestration

Disk volume backend is occasionally stuck in repair state, preventing instances from starting or stopping.

crucible#837

Telemetry

Guest VM cpu and memory metrics are unavailable at this time.

-

VPC and routing

Inter-subnet traffic routing is not available by default. Router and routing rules will be supported in future releases.

omicron#2232

Operator features

Feature AreaKnown Issue/LimitationIssue Number

Access control

Device tokens do not expire.

omicron#2302

Control plane

Sled and physical storage availability status are not available in the inventory UI and API yet.

omicron#2035

Control plane

When sleds attached to the switches are restarted outside of rack cold-start, a full rack power cycle may be required to re-propagate sled NAT configurations.

omicron#3631

Control plane

Operator-driven software update is currently unavailable. All updates need to be performed by Oxide technicians.

-

Control plane

Operator-driven instance migration across sleds is currently unavailable. Instance migrations need to be performed by Oxide technicians.

-

Network management

End users cannot query the names of non-default IP pools. The ability to set up and query different IP pools (e.g., per-project IP pools) will be available soon in future releases.

omicron#2148

Telemetry

Hardware metrics such as temperatures, fan speeds, and power consumption are not exposed to the control plane at this time.

-

User management

User offboarding from the rack is not supported at this time. Apart from updating the identity provider to remove obsolete users from the relevant groups, operators will need to remove any IAM roles granted directly to those users in silos and projects.

omicron#2587

     
v1.0.2

Important Notes

This patch release does not require a system reset. All existing setup and data (e.g., projects, users, instances) remain intact after the software update.

Installation

Oxide Computer Model 0 must be installed and configured under the guidance of Oxide technicians. The requirement may change in future releases.

Upgrade Compatibility

Upgrade from version 1.0.1 is supported. We recommend shutting down all running instances on the rack before the software update commences.

New Features

This release includes a number of performance improvements and a new capability for multi-tenant IP pool management.

  • Reduced virtual disk read/write latencies

  • Improved instance provisioning performance

  • Maximum VM instance size limit raised to 64 vcpus and 256 GiB memory

  • Ability for operator to define silo-specific external IP pools (see ip_pool_create)

  • Instance external IP address automatically allocated from silo IP pool if one is configured

Bug fixes:

  • Booting up an instance after rack power-cycle required an extra stop-start cycle to regain network connectivity (omicron#3813)

  • Spurious errors returned after successful snapshot or disk deletions (omicron#3866)

  • VM start operation was prohibited when metrics subsystem was unable to serve requests (propolis#497)

  • System clock sync with NTP was declared prematurely in some cases (omicron#3831)

Firmware update:

  • None in this release

Known Behavior and Limitations

End-user features

Feature AreaKnown Issue/LimitationIssue Number

Firewall rules

Firewall rules using VPC as target should allow/deny traffic based on an instance’s private IP only and not apply the rules against the instance’s public IP. As a workaround, use subnet as target to permit only intra-subnet traffic without allowing inbound traffic from other IP addresses on the same public network as the instance.

opte#380

Image/snapshot management

Image upload sometimes stalls with HTTP/2.

omicron#3559

Image/snapshot management

Unable to create snapshots for disks attached to stopped instances.

omicron#3289

Image/snapshot management

The ability to delete images is not available at this time.

omicron#3033

Image/snapshot management

The ability to modify image metadata is not available at this time.

omicron#2800

Instance orchestration

The ability to select which SSH keys to be passed to a new instance is not available at this time.

omicron#3734

Instance orchestration

Concurrent instance provisioning requests (e.g., as typically happens with programmatic orchestration such as Terraform) may return 500 errors. Users can reduce the concurrency level to avoid the error or retry the failed requests.

omicron#3304

Instance orchestration

Instance or disk provisioning requests may fail due to unhandled sled or storage failure on rare occasions. Users can retry the requests to work around the failures.

omicron#3480, omicron#2483

Telemetry

Guest VM cpu and memory metrics are unavailable at this time.

-

VPC and routing

Inter-subnet traffic routing is not available by default. Router and routing rules will be supported in future releases.

omicron#2232

Operator features

Feature AreaKnown Issue/LimitationIssue Number

Access control

Device tokens do not expire.

omicron#2302

Control plane

Sled and physical storage availability status are not available in the inventory UI and API yet.

omicron#2035

Control plane

When switch zones are bounced outside of rack cold-start, a full rack power cycle is required to re-propagate sled NAT configurations.

omicron#3631

Control plane

Operator-driven software update is currently unavailable. All updates need to be performed by Oxide technicians.

-

Control plane

Operator-driven instance migration across sleds is currently unavailable. Instance migrations need to be performed by Oxide technicians.

-

Network management

Public IP addresses used for VM instances are currently assigned from a single pool named “default”. End-users do not have the ability to see the names of other IP pools. The ability to set up and query per-project IP pools will be available soon in future releases.

omicron#2148

Network management

Routing between the rack and on-premise L2 networks is currently restricted to static routes only. The use of Border Gateway Protocol (BGP) for dynamic route configuration will be supported in upcoming releases.

maghemite#27

Telemetry

Hardware metrics such as temperatures, fan speeds, and power consumption are not exposed to the control plane at this time.

-

User management

User offboarding from the rack is not supported at this time. Apart from updating the identity provider to remove obsolete users from the relevant groups, operators will need to remove any IAM roles granted directly to those users in silos and projects.

omicron#2587

     
v1.0.1

Important Notes

This is a patch release aimed at improving fault tolerance. A factory reset will be required to make use of the new features.

Installation

Oxide Computer Model 0 must be installed and configured under the guidance of Oxide technicians. The requirement may change in future releases.

Upgrade Compatibility

Upgrade from version 1.0.0 is not supported.

New Features

This release covers a number of fault tolerance related improvements. There are no new user features.

  • New bootstore for persistent rack initialization data

  • Improved handling of network and service configurations during rack, sled, and service restart

  • Encryption key generation hardening

  • Improved storage space management for system log and dump files

Bug fixes:

  • I2C temperature error handling for U.2s could be improved (hubris-pr#1465)

  • IPv6 RIP router was enabled by default in general deployments. (omicron-pr#3736)

  • Identity provider descriptor endpoint was not resolvable in Nexus. (omicron#3724)

  • Sled-agent leaked contracts when executing commands from non-global zones. (omicron#3753)

  • Pantry service was not deployed as a cluster. (omicron#3609)

  • Backplane ports were not defaulted to RS FEC. (omicron-pr#3714)

  • ipadm did not allow creation of point to point links. (illumos#15806)

  • TSC sync produced unreliable results with caches disabled. (illumos#15810)

  • bhyve could take more care around VM_MAXCPU (illumos#15812)

  • fp_lwp_init allocated under p_lock, leading to deadlock under memory pressure. (stlouis#463)

Firmware update:

  • Chelsio cxgbe firmware is updated from version 1.27.1.0 to 1.27.4.0. (illumos#15804)

  • AMD microcode is updated from version 20230414 to 20230719. (illumos#15811)

Known Behavior and Limitations

End-user features

Feature AreaKnown Issue/LimitationIssue Number

Firewall rules

Firewall rules using VPC as target should allow/deny traffic based on an instance’s private IP only and not apply the rules against the instance’s public IP. As a workaround, use subnet as target to permit only intra-subnet traffic without allowing inbound traffic from other IP addresses on the same public network as the instance.

opte#380

Image/snapshot management

Image upload sometimes stalls with HTTP/2.

omicron#3559

Image/snapshot management

Unable to create snapshots for disks attached to stopped instances.

omicron#3289

Image/snapshot management

Spurious errors after snapshot or disk deletion has been completed successfully.

crucible#824

Image/snapshot management

The ability to delete images is not available at this time.

omicron#3033

Image/snapshot management

The ability to modify image metadata is not available at this time.

omicron#2800

Instance orchestration

The maximum instance size is currently limited to 32 vcpus and 64 GiB of memory, and up to seven 1023 GiB disks.

propolis#474, omicron#3129

Instance orchestration

The ability to select which SSH keys to be passed to a new instance is not available at this time.

omicron#3734

Instance orchestration

Concurrent instance provisioning requests (e.g., as typically happens with programmatic orchestration such as Terraform) may return 500 errors. Users can reduce the concurrency level to avoid the error or retry the failed requests.

omicron#3304

Instance orchestration

Instance or disk provisioning requests may fail due to unhandled sled or storage failure on rare occasions. Users can retry the requests to work around the failures.

omicron#3480, omicron#2483

Instance orchestration

Booting up an instance after rack power-cycle currently requires an extra stop-start cycle to regain network connectivity.

omicron#3813

Telemetry

Guest VM cpu and memory metrics are unavailable at this time.

-

VPC and routing

Inter-subnet traffic routing is not available by default. Router and routing rules will be supported in future releases.

omicron#2232

Operator features

Feature AreaKnown Issue/LimitationIssue Number

Access control

Device tokens do not expire.

omicron#2302

Control plane

Sled and physical storage availability status are not available in the inventory UI and API yet.

omicron#2035

Control plane

When switch zones are bounced outside of rack cold-start, a full rack power cycle is required to re-propagate sled NAT configurations.

omicron#3631

Control plane

Operator-driven software update is currently unavailable. All updates need to be performed by Oxide technicians.

-

Control plane

Operator-driven instance migration across sleds is currently unavailable. Instance migrations need to be performed by Oxide technicians.

-

Network management

Public IP addresses used for VM instances are currently assigned from a single pool named “default”. End-users do not have the ability to see the names of other IP pools. The ability to set up and query per-project IP pools will be available soon in future releases.

omicron#2148

Network management

Routing between the rack and on-premise L2 networks is currently restricted to static routes only. The use of Border Gateway Protocol (BGP) for dynamic route configuration will be supported in upcoming releases.

maghemite#27

Telemetry

Hardware metrics such as temperatures, fan speeds, and power consumption are not exposed to the control plane at this time.

-

User management

User offboarding from the rack is not supported at this time. Apart from updating the identity provider to remove obsolete users from the relevant groups, operators will need to remove any IAM roles granted directly to those users in silos and projects.

omicron#2587

     
v1.0.0

Important Notes

This is the first release of Oxide Computer Model 0, also known as the “0x1” rack. In future release notes, we will include detailed changelogs for each software component to highlight changes such as new features, bug fixes, and upgrade impact.

System Requirements

Oxide Computer is an appliance built with tightly integrated hardware and software components. It requires the following supporting services external to the rack for it to be operational:

  • L2 network that supports bidirectional port forwarding

  • NTP service for system time synchronization

  • Domain name service and a delegated subdomain for use by the rack

  • Identity provider that supports SAML authentication

Supported systems for the official clients:

  • Browsers for web console: Chrome, Edge, Firefox, Safari

  • OSes for CLI: Linux, macOS, Windows

Installation

Oxide Computer Model 0 must be installed and configured under the guidance of Oxide technicians. The requirement may change in future releases.

Features

The Oxide rack covers the complete hardware and software stack for deploying and managing virtual machine instances and related virtualized resources such as

  • persistent block storage in the form of detachable virtual disks

  • snapshots and disk images

  • private network interface for intra-subnet access

  • public network interface for inbound access

  • built-in NAT service for outbound public network access

  • virtual private cloud (VPC) and firewall capabilities

Many of the popular Linux distros such as Ubuntu, Debian and CentOS are supported as the guest operating systems for VM instances. Microsoft Windows server and desktop operating systems will be fully supported soon in the upcoming releases.

The key features for operators and administrators include:

  • organizing and isolating virtual resources by tenants and by projects

  • integrating with SAML-based identity provider for identity and access management

  • managing IP address allocation

  • basic metrics and visualizations for capacity utilization

The key management features for end-users include:

  • self-service orchestration via the web console and API

  • command-line tools including Oxide CLI and Terraform

  • capabilities to build custom integration tools using Oxide SDK

  • project-level resource utilization and disk I/O metrics

The rack also comes with built-in security features to ensure all hardware and software are genuine Oxide products:

  • purpose-built hardware root of trust (RoT) – present on every Oxide server and switch – cryptographically validates that its own firmware is genuine and unmodified

  • encryption of data at rest via internal key management system built on the RoT

  • trust quorum establishment at boot time to ensure the cryptographically-derived rack secret is verified before unlocking storage

More details about Oxide Computer can be found in the product documentation.

Known Behavior and Limitations

End-user features

Feature AreaKnown Issue/LimitationIssue Number

Firewall rules

Firewall rules using VPC as target should allow/deny traffic based on an instance’s private IP only and not apply the rules against the instance’s public IP. As a workaround, use subnet as target to permit only intra-subnet traffic without allowing inbound traffic from other IP addresses on the same public network as the instance.

opte#380

Image/snapshot management

Image upload sometimes stalls with HTTP/2.

omicron#3559

Image/snapshot management

Unable to create snapshots for disks attached to stopped instances.

omicron#3289

Image/snapshot management

Spurious errors after snapshot or disk deletion has been completed successfully.

crucible#824

Image/snapshot management

The ability to delete images is not available at this time.

omicron#3033

Image/snapshot management

The ability to modify image metadata is not available at this time.

omicron#2800

Instance orchestration

The maximum instance size is currently limited to 32 vcpus and 64 GiB of memory, and up to seven 1023 GiB disks.

propolis#474

Instance orchestration

The ability to select which SSH keys to be passed to a new instance is not available at this time.

omicron#3734

Instance orchestration

Concurrent instance provisioning requests (e.g., as typically happens with programmatic orchestration such as Terraform) may return 500 errors. Users can reduce the concurrency level to avoid the error or retry the failed requests.

omicron#3304

Instance orchestration

Instance or disk provisioning requests may fail due to unhandled sled or storage failure on rare occasions. Users can retry the requests to work around the failures.

omicron#3480, omicron#2483

Telemetry

Guest VM cpu and memory metrics are unavailable at this time.

-

VPC and routing

Inter-subnet traffic routing is not available by default. Router and routing rules will be supported in future releases.

omicron#2232

Operator features

Feature AreaKnown Issue/LimitationIssue Number

Access control

Device tokens do not expire.

omicron#2302

Control plane

Sled and physical storage availability status are not available in the inventory UI and API yet.

omicron#2035

Control plane

Operator-driven software update is currently unavailable. All updates need to be performed by Oxide technicians.

-

Control plane

Operator-driven instance migration across sleds is currently unavailable. Instance migrations need to be performed by Oxide technicians.

-

Network management

Public IP addresses used for VM instances are currently assigned from a single pool named “default”. End-users do not have the ability to see the names of other IP pools. The ability to set up and query per-project IP pools will be available soon in future releases.

omicron#2148

Network management

Routing between the rack and on-premise L2 networks is currently restricted to static routes only. The use of Border Gateway Protocol (BGP) for dynamic route configuration will be supported in upcoming releases.

maghemite#27

Telemetry

Hardware metrics such as temperatures, fan speeds, and power consumption are not exposed to the control plane at this time.

-

User management

User offboarding from the rack is not supported at this time. Apart from updating the identity provider to remove obsolete users from the relevant groups, operators will need to remove any IAM roles granted directly to those users in silos and projects.

omicron#2587