Symptom / What You See
- Controller initiated a rebuild on a RAID array.
- Progress stalls at 0–5% or jumps, then the Virtual Disk disappears or goes Offline.
- One member was auto-marked Failed; another shows Rebuilding or Online.
- After reboot, volumes appear RAW or the array cannot be mounted.
This scenario often follows a power event, drive swap, or an attempted import of a foreign configuration.
What It Means (Technical)
The controller began writing new parity to the set but encountered conditions that invalidate or outdate the original layout, such as:
- UREs (Unrecoverable Read Errors) on surviving members during rebuild.
- Stale parity after the wrong member was selected as “failed.”
- Member order drift or foreign epoch mismatch following a controller/firmware change.
- Cache/battery anomalies that committed partial writes.
Once the controller starts a rebuild with incorrect assumptions, it can overwrite the only good parity history, making the array appear to vanish mid-operation.
What NOT To Do
- Do not restart the rebuild hoping it will complete.
- Do not force different members Online/Offline to “try another combination.”
- Do not initialize or recreate the virtual disk definition.
- Do not swap controllers or shuffle drives between slots.
- Do not keep power-cycling — each boot can commit additional metadata writes.
These actions increase the chance of permanent parity contamination.
Safe Actions (Triage You Can Do)
- Stop writes immediately: power down cleanly to freeze the current state.
- Document everything: photos of controller screens (VD config, physical list, any error logs).
- Record slot order and serials for all members.
- Clone each drive read-only to stable media before further analysis.
- Capture controller metadata and any foreign/virtual drive tables without importing.
- Do not clear or import until the last good layout is identified offline.
If the array was healthy prior to the event, preserving this state maximizes the odds of full recovery.
How Professionals Recover This Case
- Imaging First: read-only clone of all members to protect evidence of the pre-rebuild layout.
- Header & Epoch Analysis: compare on-disk headers and controller tables to determine the last consistent configuration and which member was incorrectly selected as “failed.”
- Parity Verification: emulate the array with multiple candidate orders; verify parity and stripe alignment across the full LBA range.
- Rollback to Pre-Rebuild State: reconstruct the set using the last valid parity and exclude contaminated segments written during the failed rebuild.
- Read-Only Mount & Extraction: export data to safe storage; only after validation consider building a replacement array.
This avoids compounding damage and restores access without destructive controller operations.
Real-World Insight
Mid-rebuild drops are frequently secondary to the initial error: the wrong drive was flagged as failed, so the rebuild wrote new parity against stale data. When that reality collides with UREs on surviving members, the controller abandons the effort and the VD disappears. By reconstructing the pre-event layout and ignoring the partial rebuild writes, data is typically recoverable.
What Not To Miss (Checklist)
- Exact bay order and serials recorded.
- Screenshots of rebuild status, error codes, and physical disk states.
- No further writes since the failure.
- Read-only images created or ready to create.
- Ability to state stripe size, parity rotation, and start offsets (or captured metadata to determine them).
When To Escalate
- Rebuild repeatedly restarts or stalls at the same percentage.
- Multiple members toggle between Degraded/Foreign/Online.
- Import prompts or layouts differ from the last known configuration.
- Filesystem remains RAW or missing after any attempted import.
- Any uncertainty about which member was actually stale vs. failed.
Get Help
Before any more controller actions, speak to an engineer. Provide screenshots, bay order, serials, and a short timeline of events (power loss, firmware change, swaps). We’ll validate the correct layout and recover from the pre-rebuild state without destructive writes.
