“Failed” ≠ Unreadable — How to Verify Members and Rebuild Parity Safely
First Impression vs. Reality
Seeing “Two Drives Failed” in a RAID-5 array feels fatal.
But “failed” in controller language rarely means “physically unreadable.”
It means the controller cannot confirm parity consistency or timing alignment, not necessarily that the data is gone.
In many cases, at least one of the “failed” drives is only logically offline — a member flagged bad after timeout, power fluctuation, or stale metadata.
That’s the difference between data loss and data salvage.
What Actually Happened
A RAID-5 array tolerates one failed member. When a second failure appears, several non-catastrophic possibilities exist:
- False second failure: one drive timed out during rebuild or parity check.
- Dropped member: controller saw a transient read error and marked it “offline.”
- Foreign metadata mix: after reboot or import attempt, controller compared mismatched sequence numbers.
- Power event or BBU fault: cache flushed incompletely, marking drives inconsistent.
- Firmware update or slot remap: changed drive identifiers.
The controller flags both as “failed” to protect data — not because it tested every sector unreadable.
First Rule — Don’t Force It Online
Never “force online” both members and never rebuild yet.
Every controller rebuild writes parity — meaning it overwrites history.
The goal is to preserve the pre-failure state until imaging and verification.
Steps:
- Power down immediately.
- Label each disk with slot number and serial.
- Clone every drive (write-blocked).
- Record controller config: stripe size, order, cache policy.
- Work only from the images — never from the live drives.
Testing Members Individually
Clone complete images first, then:
| Check | Tool | What You Learn |
|---|---|---|
| SMART & sector scan | HDDSuperClone, ddrescue, DeepSpar | Physical condition; weak heads vs. timeout flags |
| Header comparison | ADR RAID Inspector™, UFS Explorer | Metadata consistency (sequence, parity, offsets) |
| Byte-level parity check | ADR Virtual Builder | Whether parity reconstructs cleanly with each drive omitted in turn |
Parity Reconstruction Strategy
- Mount virtual array excluding each member in rotation.
- Verify consistency across stripes — the configuration with valid parity throughout is the pre-failure state.
- Export user data read-only from that build.
- If one image has intermittent bad sectors, map them and rebuild parity only in those ranges.
ADR’s method leverages block-level parity testing across clones to rebuild missing segments while isolating weak reads — no destructive controller rebuilds.
When Both Really Are Bad
Even when both drives contain hard defects, it’s rarely total loss:
- Interleaved sector reads: partial imaging from both drives may complete missing regions.
- Parity stitching: reconstructing parity math from healthy portions fills gaps.
- Hybrid rebuild: combine clean sectors from both degraded disks into a composite member.
This requires controlled imaging tools that log unreadable regions and maintain positional integrity — never raw dd.
Indicators of Logical vs. Physical Failure
| Symptom | Likely Type |
|---|---|
| Controller sees serial numbers but flags “Offline (Bad)” | Logical/metadata |
| Drive spins and identifies but SMART OK | Logical |
| Drive shows 0 GB capacity or clicks | Physical |
| Sector reads succeed in clone tool but parity fails | Stale parity |
| Two drives failed after power loss | Cache/Battery event, not mechanical |
ADR Data Recovery Method
- Clone all drives in-house using write-blocked imagers.
- Compare metadata epochs and controller config dumps.
- Simulate the array with every permutation to detect the only parity-consistent layout.
- Export data read-only to verified media.
- Deliver a configuration map documenting which drive truly failed and why the controller mis-flagged the second.
This process reverses controller logic errors without risking destructive writes.
Key Takeaways
- “Two failed” doesn’t always mean “two dead.”
- The controller protects integrity by over-flagging errors.
- Clone first. Verify metadata. Rebuild only virtually.
- Always determine the true bad member before any parity writes.
- Never let a rebuild start until data is safely extracted.
