RAID 5 Triage — When Your Array Goes Dark » RAID 5, Two Drives Failed — Is It Game Over?

“Failed” ≠ Unreadable — How to Verify Members and Rebuild Parity Safely


First Impression vs. Reality

Seeing “Two Drives Failed” in a RAID-5 array feels fatal.
But “failed” in controller language rarely means “physically unreadable.”
It means the controller cannot confirm parity consistency or timing alignment, not necessarily that the data is gone.

In many cases, at least one of the “failed” drives is only logically offline — a member flagged bad after timeout, power fluctuation, or stale metadata.
That’s the difference between data loss and data salvage.


What Actually Happened

A RAID-5 array tolerates one failed member. When a second failure appears, several non-catastrophic possibilities exist:

  • False second failure: one drive timed out during rebuild or parity check.
  • Dropped member: controller saw a transient read error and marked it “offline.”
  • Foreign metadata mix: after reboot or import attempt, controller compared mismatched sequence numbers.
  • Power event or BBU fault: cache flushed incompletely, marking drives inconsistent.
  • Firmware update or slot remap: changed drive identifiers.

The controller flags both as “failed” to protect data — not because it tested every sector unreadable.


First Rule — Don’t Force It Online

Never “force online” both members and never rebuild yet.
Every controller rebuild writes parity — meaning it overwrites history.

The goal is to preserve the pre-failure state until imaging and verification.

Steps:

  1. Power down immediately.
  2. Label each disk with slot number and serial.
  3. Clone every drive (write-blocked).
  4. Record controller config: stripe size, order, cache policy.
  5. Work only from the images — never from the live drives.

Testing Members Individually

Clone complete images first, then:

CheckToolWhat You Learn
SMART & sector scanHDDSuperClone, ddrescue, DeepSparPhysical condition; weak heads vs. timeout flags
Header comparisonADR RAID Inspector™, UFS ExplorerMetadata consistency (sequence, parity, offsets)
Byte-level parity checkADR Virtual BuilderWhether parity reconstructs cleanly with each drive omitted in turn

Parity Reconstruction Strategy

  1. Mount virtual array excluding each member in rotation.
  2. Verify consistency across stripes — the configuration with valid parity throughout is the pre-failure state.
  3. Export user data read-only from that build.
  4. If one image has intermittent bad sectors, map them and rebuild parity only in those ranges.

ADR’s method leverages block-level parity testing across clones to rebuild missing segments while isolating weak reads — no destructive controller rebuilds.


When Both Really Are Bad

Even when both drives contain hard defects, it’s rarely total loss:

  • Interleaved sector reads: partial imaging from both drives may complete missing regions.
  • Parity stitching: reconstructing parity math from healthy portions fills gaps.
  • Hybrid rebuild: combine clean sectors from both degraded disks into a composite member.

This requires controlled imaging tools that log unreadable regions and maintain positional integrity — never raw dd.


Indicators of Logical vs. Physical Failure

SymptomLikely Type
Controller sees serial numbers but flags “Offline (Bad)”Logical/metadata
Drive spins and identifies but SMART OKLogical
Drive shows 0 GB capacity or clicksPhysical
Sector reads succeed in clone tool but parity failsStale parity
Two drives failed after power lossCache/Battery event, not mechanical

ADR Data Recovery Method

  • Clone all drives in-house using write-blocked imagers.
  • Compare metadata epochs and controller config dumps.
  • Simulate the array with every permutation to detect the only parity-consistent layout.
  • Export data read-only to verified media.
  • Deliver a configuration map documenting which drive truly failed and why the controller mis-flagged the second.

This process reverses controller logic errors without risking destructive writes.


Key Takeaways

  • “Two failed” doesn’t always mean “two dead.”
  • The controller protects integrity by over-flagging errors.
  • Clone first. Verify metadata. Rebuild only virtually.
  • Always determine the true bad member before any parity writes.
  • Never let a rebuild start until data is safely extracted.