Event and Fault Management

Power Fault Detection and Management

The rectifiers of the Oxide rack are capable of detecting power faults and power supply changes in different parts of the system, proactively charging/discharging power supply rails, and transitioning between power states for safe rack shutdown and cold start. No custom configuration is required and no data loss as a result of power outage is expected.

The remote monitoring unit (RMU) in the Power Shelf Controller (PSC) is designed to monitor rack-level power consumption and PSU state of health and expose that information to the control plane. These capabilities will enable more advanced power management and monitoring features in the future releases of the Oxide rack.

Note
The rear panel of the power shelf comes with a connector for an external battery shelf which is not used by the Oxide Rack at this time.
Important
If power rectifiers need to be disconnected for any reason, they should be removed all at once for accurate power condition monitoring by the PSC.

Understanding Status and Fault LEDs

All LEDs in the system are monochrome LEDs with three possible modes:

ModeDescription

Solid On

device or component is functioning properly

Solid Off

device is not present, incorrectly inserted, or so mechanically broken that it cannot function

Blinking

device needs attention or it is being worked on

A common cause for the “Solid Off” light is that hot-serviceable components such as server sleds, optical transceivers, and U.2 NVMe devices are not properly inserted and therefore cannot function.

Blinking is used as a combined service and fault indicator for signaling the device to be operated on. Here is an example of how the LED signal works in the case of a SSD replacement:

  • If the LED is “Solid On” afterwards, the device is operational and the service has been completed.

  • If the LED remains “Solid Off”, it means that the SSD fails to be completely inserted or it is so broken that it cannot be powered on or detected at all.

  • If a new fault arises, the LED will blink again.

Hardware Maintenance

In the initial version of Oxide Rack, all field-replaceable units (FRUs) are serviced through Oxide technical support engagement. Commonly-serviced units are designed to be hot-pluggable. In the future releases, certain FRUs will become customer-replaceable units (CRUs).

Here is a list of FRUs serviceable by Oxide:

Encompassing FRUComponent FRUHot-pluggable?

Gimlet

-

Yes

Gimlet

Gimlet U.2 Drive/Carrier

Yes

Gimlet

Gimlet Front Panel/Drive Cage

No

Gimlet

Air Shroud

No

Gimlet

DIMMs

No

Gimlet

M.2 Device

No

Gimlet

M.2 Heatsink

No

Gimlet

CPU

No

Gimlet

CPU Heatsink

No

Gimlet

Gimlet Individual Fans

No

Gimlet

Gimlet Storage Midplane (“Sharkfin”)

No

Sidecar

-

Yes

Sidecar

Sidecar Optical Transceiver (Single)

Yes

Sidecar

Sidecar Rear Fan

Yes

Sidecar

4:1 Squid Cables

Yes

Sidecar

Sidecar to Gimlet PCIe Cable

Yes

Sidecar

PSC to Sidecar Cables

Yes

Sidecar

Sidecar to Sidecar Aux Cable

Yes

Sidecar

Sidecar Internal Cables

No

Power Shelf

-

No

Power Shelf

Power Shelf Rectifier

Yes

Power Shelf

Power Shelf Controller

Yes

Power Shelf

Fiber Optic Cables

Yes

Power Shelf

Power Shelf Adapter Kit

No

Power Shelf

Power Shelf Bus Bar Connector

No

Power Shelf

Power Shelf Whip Adapters

No

Power Shelf

Backplane 4:1 Squid Cables

No

Power Shelf

Sidecar to Gimlet PCIe Cable

No

Power Shelf

Core Rack

No

Power Shelf

Bus Bar

No

Power Shelf

Front and Rear Doors

Yes

Power Shelf

Side Panels

Yes

Power Shelf

Gimlet Cubby

No

Power Shelf

Gimlet Blank

Yes

Seismic Kit

-

Yes

Last updated