Important Notes
Starting from this release, the numbering convention follows an integer version numbering scheme. Upgrade from v1.0.2 to this release, version 3, does not require a system reset or other special handling.
Please note that there are two functional changes around the instance lifecycle:
The vCPU and memory resources of
stopped
instances no longer count toward current utilization in the web UI and system metrics API.The transient instance state
rebooting
may not be reflected in the UI or API if the reboot operation completes almost instantly. The instance may be consideredrunning
throughout the process.These changes allow more accurate reporting of resource utilization and reduced concurrent instance provisioning errors.
There is a change in the ClickHouse timeseries key generation method. Historical metrics are dropped as part of the software update. Utilization metrics will be re-populated once a new instance/disk lifecycle event (e.g., create, start, delete) has taken place.
System Requirements
Please refer to v1.0.0 release notes.
Installation
Oxide Computer Model 0 must be installed and configured under the guidance of Oxide technicians. The requirement may change in future releases.
Upgrade Compatibility
Upgrade from version 1.0.2 is supported. We recommend shutting down all running instances on the rack before the software update commences.
All existing setup and data (e.g., projects, users, instances) should remain intact after the software update.
New Features
This release includes a number of performance and VM workflow reliability improvements.
Support for Border Gateway Protocol (BGP); see the updated API documentation for more details about the setup
Improved virtual disk read/write performance (crucible#872, crucible#981, crucible#586)
Ping API endpoint for control plane connectivity healthcheck (omicron#3923)
Silo utilization summary table (console#1783)
Bug fixes:
Instances provisioned in bulk from the same image were stuck in
stopping
state due to back pressure on read-only parent (crucible#9696, crucible-PR#984)Spurious errors returned for snapshot or disk deletions after multiple delete requests on the same snapshot (omicron#3866, omicron-PR#920)
500 errors were returned when attempting to delete images (omicron#3033)
Firewall rules using VPC as target traffic filtering was based on an instance’s private IP only but not its public IP (opte#380)
Instance DNS was hardcoded to 8.8.8.8 (opte#390)
Disk replica placement did not maximize physical disk spread over separate sleds (omicron#3702)
Guest OS panicked upon executing
lshw -C disk
(propolis#520)Disk run-time metrics were not captured after a timeseries format change (crucible#942)
User with silo collaborator role was unable to start an instance (omicron#4272)
Unsuccessful deletion of running instances could still remove public IP (omicron#2842)
IP Pool API get responses now include the
is_default
value (omicron#4005)Silo TLS certificate can now be specified in Silo Create UI (console#1736)
Additional stratum checks for more reliable NTP synchronization (omicron-PR#4119)
Improved physical disk out-of-space handling (crucible#861)
Improved back pressure handling under heavy disk write workload (crucible#902)
Added TLS certificate CN/SAN validation against rack external domain name (omicron#4045)
curl
upgrade (from 8.3.0 to 8.4.0) for CVEs (helios#119)
Firmware update:
None in this release
Known Behavior and Limitations
End-user features
Feature Area | Known Issue/Limitation | Issue Number |
---|---|---|
Image/snapshot management | Disk in | |
Image/snapshot management | Image upload sometimes stalls with HTTP/2. | |
Image/snapshot management | Unable to create snapshots for disks attached to stopped instances. | |
Image/snapshot management | The ability to modify image metadata is not available at this time. | |
Instance orchestration | The ability to select which SSH keys to be passed to a new instance is not available at this time. | |
Instance orchestration | Disk create or instance start requests under high concurrency may fail to complete. Users can reduce the concurrency level to avoid the error or retry failed requests. | |
Instance orchestration | Instance or disk provisioning requests may fail due to unhandled sled or storage failure on rare occasions. Users can retry the requests to work around the failures. | |
Instance orchestration | Instances sometimes fail to boot up when they are created under very high concurrency. Rebooting the instances should allow the guest OS to come up. | |
Instance orchestration | Disk volume backend is occasionally stuck in repair state, preventing instances from starting or stopping. | |
Telemetry | Guest VM cpu and memory metrics are unavailable at this time. | - |
VPC and routing | Inter-subnet traffic routing is not available by default. Router and routing rules will be supported in future releases. |
Operator features
Feature Area | Known Issue/Limitation | Issue Number |
---|---|---|
Access control | Device tokens do not expire. | |
Control plane | Sled and physical storage availability status are not available in the inventory UI and API yet. | |
Control plane | When sleds attached to the switches are restarted outside of rack cold-start, a full rack power cycle may be required to re-propagate sled NAT configurations. | |
Control plane | Operator-driven software update is currently unavailable. All updates need to be performed by Oxide technicians. | - |
Control plane | Operator-driven instance migration across sleds is currently unavailable. Instance migrations need to be performed by Oxide technicians. | - |
Network management | End users cannot query the names of non-default IP pools. The ability to set up and query different IP pools (e.g., per-project IP pools) will be available soon in future releases. | |
Telemetry | Hardware metrics such as temperatures, fan speeds, and power consumption are not exposed to the control plane at this time. | - |
User management | User offboarding from the rack is not supported at this time. Apart from updating the identity provider to remove obsolete users from the relevant groups, operators will need to remove any IAM roles granted directly to those users in silos and projects. |