Case of Offline Recovery of Another Disk When Replacing a RAID Disk Set (V7000 Storage Data Recovery)

【Fault description】

The customer’s equipment model is IBM V7000 storage, the architecture is AIX+oracle+V7000 storage array cabinet, and the data to be recovered is mainly stored on the array cabinet, a total of 8 SAS mechanical hard disks with 600G capacity (one of which is a hot spare disk).

IBM V7000 disk failure, when the replacement disk data synchronization to about 20%, another disk also has a problem, resulting in the logical disk cannot be attached to the minicomputer, business temporarily interrupted. From the storage management interface, two hard drives show that the failed disk is offline, of which the failed hard disk in slot 5 is a hot spare

A total of 2 sets of Mdisks were created in the customer’s array cabinet and added to a pool, but now the customer’s main data pool cannot be loaded, and a total of 5 general volumes cannot be mounted.

【Mirrored disk】

In order to prevent secondary damage to the original disk due to misoperation during the data recovery process, use the data recovery tool to mirror 7 of the disks, and use PC3000 to mirror the failed hard disk in slot 5 (there may be many bad sectors), and all future data recovery operations are performed on the mirror disk, which will not affect the original disk.

【Recovery process】

Recovery plan 1: Force the storage to go online

Analyze the offline order of failed hard disks in the failed storage.

Repair a failed hard drive after going offline.

Insert the repaired hard drive back into storage for forced online operation.

Recovery solution 2: Analyze the storage structure and restore server data

1. Mdisk analysis and reorganization

A. According to some configuration information given by the customer, classify the hard disk according to the Mdisk group.

B. Analyze all hard disks in each group of Mdisk to obtain relevant RAID information.

C. Use professional data recovery software to virtually reorganize Mdisk.

2. Pool analysis

A. Analyze all Mdisks to obtain relevant information about pool.

B. Analyze the distribution of pools on Mdisk.

3. LUN structure analysis

A. Analyze the stripe size in the pool.

B. Analyze the LUN bitmap and analyze the distribution of each LUN in the pool.

C. Write a program to extract LUNs.

According to the characteristics of RAID 5, it allows a maximum of one member disk to be offline, that is, it can be used normally in the event of a member disk failure. The customer’s storage device is dead, and only one drive in each group of Mdisks is offline.

Extract the logs stored by the V7000 and analyze the logs to obtain the offline order of each failed hard disk.

【Data recovery result migration】

The generated data is randomly sampled and there is no problem with the data. Create a number of LUNs of the same size as the original environment on the new storage device, and copy the image file of the extracted data LUN to the LUN created on the storage using dd, and the data is normal. The data recovery work was successfully completed.

You might also enjoy: