Important Notes

  1. Starting from this release, the numbering convention follows an integer version numbering scheme. Upgrade from v1.0.2 to this release, version 3, does not require a system reset or other special handling.

  2. Please note that there are two functional changes around the instance lifecycle:

    • The vCPU and memory resources of stopped instances no longer count toward current utilization in the web UI and system metrics API.

    • The transient instance state rebooting may not be reflected in the UI or API if the reboot operation completes almost instantly. The instance may be considered running throughout the process.

      These changes allow more accurate reporting of resource utilization and reduced concurrent instance provisioning errors.

  3. There is a change in the ClickHouse timeseries key generation method. Historical metrics are dropped as part of the software update. Utilization metrics will be re-populated once a new instance/disk lifecycle event (e.g. create, start, delete) has taken place.

Installation

Oxide Computer Model 0 must be installed and configured under the guidance of Oxide technicians. The requirement may change in future releases.

Upgrade Compatibility

Upgrade from version 1.0.2 is supported. We recommend shutting down all running instances on the rack before the software update commences.

All existing setup and data (e.g. projects, users, instances) should remain intact after the software update.

New Features

This release includes a number of performance and VM workflow reliability improvements.

Bug fixes:

  • Instances provisioned in bulk from the same image were stuck in stopping state due to back pressure on read-only parent (crucible#9696, crucible-PR#984)

  • Spurious errors returned for snapshot or disk deletions after multiple delete requests on the same snapshot (omicron#3866, omicron-PR#920)

  • 500 errors were returned when attempting to delete images (omicron#3033)

  • Firewall rules using VPC as target traffic filtering was based on an instance’s private IP only but not its public IP (opte#380)

  • Instance DNS was hardcoded to 8.8.8.8 (opte#390)

  • Disk replica placement did not maximize physical disk spread over separate sleds (omicron#3702)

  • Guest OS panicked upon executing lshw -C disk (propolis#520)

  • Disk run-time metrics were not captured after a timeseries format change (crucible#942)

  • User with silo collaborator role was unable to start an instance (omicron#4272)

  • Unsuccessful deletion of running instances could still remove public IP (omicron#2842)

  • IP Pool API get responses now include the is_default value (omicron#4005)

  • Silo TLS certificate can now be specified in Silo Create UI (console#1736)

  • Additional stratum checks for more reliable NTP synchronization (omicron-PR#4119)

  • Improved physical disk out-of-space handling (crucible#861)

  • Improved back pressure handling under heavy disk write workload (crucible#902)

  • Added TLS certificate CN/SAN validation against rack external domain name (omicron#4045)

  • curl upgrade (from 8.3.0 to 8.4.0) for CVEs (helios#119)

Firmware update:

  • None in this release

Known Behavior and Limitations

End-user features

Feature AreaKnown Issue/LimitationIssue Number

Image/snapshot management

Disk in importing_from_bulk_writes state cannot be deleted directly. The current procedures to unstick a canceled disk import are not obvious to CLI users.

omicron#2987

Image/snapshot management

Image upload sometimes stalls with HTTP/2.

omicron#3559

Image/snapshot management

Unable to create snapshots for disks attached to stopped instances.

omicron#3289

Image/snapshot management

The ability to modify image metadata is not available at this time.

omicron#2800

Instance orchestration

The ability to select which SSH keys to be passed to a new instance is not available at this time.

omicron#3056

Instance orchestration

Disk create or instance start requests under high concurrency may fail to complete. Users can reduce the concurrency level to avoid the error or retry failed requests.

omicron#3304

Instance orchestration

Instance or disk provisioning requests may fail due to unhandled sled or storage failure on rare occasions. Users can retry the requests to work around the failures.

omicron#3480, omicron#2483

Instance orchestration

Instances sometimes fail to boot up when they are created under very high concurrency. Rebooting the instances should allow the guest OS to come up.

propolis#535

Instance orchestration

Disk volume backend is occasionally stuck in repair state, preventing instances from starting or stopping.

crucible#837

Telemetry

Guest VM cpu and memory metrics are unavailable at this time.

-

VPC and routing

Inter-subnet traffic routing is not available by default. Router and routing rules will be supported in future releases.

omicron#2232

Operator features

Feature AreaKnown Issue/LimitationIssue Number

Access control

Device tokens do not expire.

omicron#2302

Control plane

Sled and physical storage availability status are not available in the inventory UI and API yet.

omicron#2035

Control plane

When sleds attached to the switches are restarted outside of rack cold-start, a full rack power cycle may be required to re-propagate sled NAT configurations.

omicron#3631

Control plane

Operator-driven software update is currently unavailable. All updates need to be performed by Oxide technicians.

-

Control plane

Operator-driven instance migration across sleds is currently unavailable. Instance migrations need to be performed by Oxide technicians.

-

Network management

End users cannot query the names of non-default IP pools. The ability to set up and query different IP pools (e.g. per-project IP pools) will be available soon in future releases.

omicron#2148

Telemetry

Hardware metrics such as temperatures, fan speeds, and power consumption are not exposed to the control plane at this time.

-

User management

User offboarding from the rack is not supported at this time. Apart from updating the identity provider to remove obsolete users from the relevant groups, operators will need to remove any IAM roles granted directly to those users in silos and projects.

omicron#2587