Replace failed raid hard drive

RAID stands for redundant array of independent disks. A RAID appears to the operating system to be a single logical hard disk. RAID employs the technique of disk striping, which involves partitioning each drive’s storage space into units ranging from a sector (512 bytes) up to several megabytes. The stripes of all the disks are interleaved and addressed in order.

I recently change a failed RAID 1 hard drive on our server. This will happened when 1 disk on your RAID configuration has an error. Kernel will inform you when 1 of your disk drive  is failing. You can find this on your log files. See /var/log/messages or /var/log/syslog.

I will not explain each and every one of these types. I will give the definition only, as I will focus and use only the replacing RAID-1 failed hard drive.

Types of RAID:

RAID-0

This technique has striping but no redundancy of data.

RAID-1

This type is also known as disk mirroring and consists of at least two drives that duplicate the storage of data.

RAID-2

This type uses striping across disks with some disks storing error checking and correcting (ECC) information.

RAID-3

This type uses striping and dedicates one drive to storing parity information.

RAID-4

This type uses large stripes, which means you can read records from any single drive.

RAID-5

This type includes a rotating parity array, thus addressing the write limitation in RAID-4.

RAID-6

This type is similar to RAID-5 but includes a second parity scheme that is distributed across different drives and thus offers extremely high fault- and drive-failure tolerance.

RAID-7

This type includes a real-time embedded operating system as a controller, caching via a high-speed bus, and other characteristics of a stand-alone computer.

RAID-10

Combining RAID-0 and RAID-1 is often referred to as RAID-10, which offers higher performance than RAID-1 but at much higher cost.

RAID-50

This type consists of a series of RAID-5 groups and striped in RAID-0 fashion to improve RAID-5 performance without reducing data protection.

RAID-53

This type uses striping (in RAID-0 style) for RAID-3′s virtual disk blocks.

RAID-S

This is an alternate, proprietary method for striped parity RAID from EMC Symmetrix that is no longer in use on current equipment.

 

 

For my setup, we use the RAID 1 so that if ever the other disk fails we will always have a back up. RAID 1 is like a mirror, I will write on both disks what is the data on your computer. RAID 1 will appear as 1 disk on your computer.

Requirements for RAID 1:

1. Same hard drive model

2. Same hard drive size

3. Same partition size

 

This will be the partitioning of the hard drive:

Disk 1 (sda) has two (2) partitions: /dev/sda1 and /dev/sda2

Second disk (sdb) must have the save partition size and number: /dev/sdb1 and /dev/sdb2

/dev/sda1 and /dev/sdb1 make up the RAID1 array /dev/md0.

/dev/sda2 and /dev/sdb2 make up the RAID1 array /dev/md1.

 

How to tell that your hard drive is failing:

See the following files:

/var/log/messages or /var/log/syslog.

 

Also please take note of the output of this command:

cat /proc/mdstat

Notice if you have this [U_]  this is a failed disk drive already.

See the output below.

raid@ubuntucommand:~$ sudo cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sdb3[2] sda3[0] [F]
466802552 blocks super 1.2 [2/2] [U_]

md0 : active raid1 sdb1[2] sda1[0] [F]
19626936 blocks super 1.2 [2/2] [U_]

unused devices: <none>

 

To identify which is the disk drive that is failing:

Install the program hdparm

raid@ubuntucommand:~$ sudo apt-get install hdparm

Note:

Every hard drive has its own unique serial number. In this way we can separate the fail hard drive to the running one.

 

raid@ubuntucommand:~$ sudo hdparm –I /dev/sda

/dev/sda:

Model=ST3320620NS, FwRev=3.AEG, SerialNo=9QFAP54A
Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs RotSpdTol>.5% }
RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4
BuffType=unknown, BuffSize=16384kB, MaxMultSect=16, MultSect=off
CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=625142448
IORDY=on/off, tPIO={min:240,w/IORDY:120}, tDMA={min:120,rec:120}
PIO modes:  pio0 pio1 pio2 pio3 pio4
DMA modes:  mdma0 mdma1 mdma2
UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5 *udma6
AdvancedPM=yes: unknown setting WriteCache=enabled
Drive conforms to: Unspecified:  ATA/ATAPI-1,2,3,4,5,6,7

* signifies the current active mode

 

Mark the drive partition as fail:

Use the command:

mdadm –manage /dev/md0 –fail /dev/sdb1

 

Remove the drive partition on the RAID drive configuration:

Use the command:

mdadm –manage /dev/md0 –remove /dev/sdb1

raid@ubuntucommand:~$ sudo mdadm –manage /dev/md0 –remove /dev/sdb1
mdadm: hot removed /dev/sdb1

 

After removing see the check the RAID:

raid@ubuntucommand:~$ sudo cat /proc/mdstat

Personalities : [raid1]

md0 : active raid1 sda1[0]
24418688 blocks [2/1] [U_]

md1 : active raid1 sda2[0] sdb2[1]
24418688 blocks [2/2] [UU]

unused devices: <none>

 

Remove the second partition on the RAID configuration.

Shutdown the server:

shutdown -h now

 

Remove the correct fail hard drive; make sure it is the right serial number.

Install the new hard drive make sure it is properly detected.

Create the exact same partition on the new unpartitioned disk drive (sdb).

Use the command below to partition the disk drive.

raid@ubuntucommand:~$ sudo sfdisk -d /dev/sda | sfdisk /dev/sdb

Note:

If it fails………

Sfdisk I don’t like these partitions – nothing changed.

Solution:

-f or –force

Do what I say, even if it is stupid.

I spent 2 hours about this one. This will force the disk drive to be partition if there is an error

raid@ubuntucommand:~$ sudo sfdisk -d /dev/sda | sfdisk –force /dev/sdb

 

check the partition:

raid@ubuntucommand:~$ sudo fdisk –l

 

Add the first partition to the RAID configuration:

raid@ubuntucommand:~$ sudo mdadm –manage /dev/md0 –add /dev/sdb1

Output:

raid@ubuntucommand:~$ sudo mdadm –manage /dev/md0 –add /dev/sdb1
mdadm: re-added /dev/sdb1

 

Check the disk if it is on sync:

raid@ubuntucommand:~$ sudo cat /proc/mdstat

raid@ubuntucommand:~$ sudo cat /proc/mdstat
Personalities : [raid1]

md0 : active raid1 sda1[0] sdb1[1]
24418688 blocks [2/1] [U_]
[=>……………….]  recovery =  9.9% (2423168/24418688) finish=2.8min speed=127535K/sec

md1 : active raid1 sda2[0] sdb2[1]
24418688 blocks [2/1] [U_]
[=>……………….]  recovery =  6.4% (1572096/24418688) finish=1.9min speed=196512K/sec

 

After a few hours, check the final output.

raid@ubuntucommand:~$ sudo cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sdb3[2] sda3[0]
466802552 blocks super 1.2 [2/2] [UU]

md0 : active raid1 sdb1[2] sda1[0]
19626936 blocks super 1.2 [2/2] [UU]

unused devices: <none>

 

That’s it we have just successfully change the fail hard drive on a RAID 1 configuration.

Replace failed raid hard drive.

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *