Recently, we received a RAID 5 array case with three-member drives taken out from Dell PowerEdge R720 Server. The IT technician described that two out of 3 members failed, and the server could not detect enough members to build the logical volume. The most important data we should retrieve is the Quickbooks database and shared fileserver.
Dell servers and storage systems are among the most popular storage devices. The PowerEdge and EqualLogic series are top-rated due to the flexible choice of corps: “tower,” rack, or modular systems. In addition, most companies equip Dell servers with RAID 5 and 6 configurations high-performance and redundancy.
Learning the Case Details from Customer
Like a medical doctor, we always ask questions to our customers to give us detailed information about device failure or any other data loss cases. This information helps our engineers to inspect and solve the issue much faster.
The Story of Failed PowerEdge R720 Server
The customer told us that one of the members died a few months ago. So they planned to replace the failed hard drive. But the hard disk drive delivery was delayed, and they also forgot about the server for a while.
The second hard drive recently failed, and the RAID volume stopped working. As you know that RAID 5 volumes are redundant if only one drive fails. If any other second one fails, the RAID arrays cannot function, and you lose access to data.
Most Common Causes of RAID 5 Array failures:
1. Simultaneous Hard Drive Failures
2. RAID Controller Battery Failures
3. RAID Controller Firmware Updates
4. Water and Fire Damages
5. Human errors
Seeing that data was not accessible and the business operations were affected, the technician decided to recover server functionality himself.
First, he tried to rebuild the RAID volume using RAID controller utilities and then ejected drives and replaced their slots in the server, which worsened the situation.
Next, he removed the hard drives and connected them externally to a desktop computer and heard that the hard drives were making strange mechanical noise. He understood he could not do anything and escalated the case to further inspection and data recovery.
The technical evaluation process is conducted based on the defined standard operating procedure, and every single engineer the evaluation checklist during the process. You can check our ISO 9001 – Quality Management certification from this link.
Technical Inspection of Hard Drives
After learning what happened and what kind of actions were taken after the RAID failure, we needed to inspect each member and understand the exact cause.
Initially, the printed circuit board was tested and found to be fully functional. Then engineers opened the hermetic cover of both failed hard drives in a cleanroom environment to check for any mechanical failures.
The hard drives were more than six years old and were running non-stop. Next, the reading-writing head assemblies were removed and examined under the microscope.
Head Replacement in the Cleanroom
The magnetic sliders were bent and had particles of platters. There were slight physical damages on all magnetic surfaces, but this should not prevent the data recovery process. According to the information provided by the customer, we installed a new head on the hard drive that failed last and started the byte-to-byte cloning process.
The process went successful despite a few bad sectors. These bad sectors might be a problem for specific files or file systems, but we will know when we build the RAID volume.
Recovering Data from RAID 5 Volume
We could fully imagine the last failed drive and the working drive. Now we need to find the RAID parameters like block size, delay count, the sequence of the hard drives to build the RAID 5 volume and retrieve the data correctly.
We analyzed the initial bytes of both images using a hexadecimal viewer and found all necessary parameters. Then, by adding a third dummy drive, we could successfully build the image.
Verification of Data Integrity
Building correct volume using images does not mean all the files are working 100%. So we need to check it before we confirm this case as successful.
We have our priority software that can check standard files headers. Using this software, we could confirm that all files are fully functional.
Customer File Verification
After the successful raid data recovery process, we arranged a remote session over TeamViewer so that the customer could verify critical files, including the QuickBooks database. We also installed QuickBooks software on the verification computer to open the database and check the latest transactions.
Recovering Operating System on the Server
The customer wanted us to recover the server’s standard functionality, not to set up the operating system, applications, and configurations.
We had a full image of the RAID array, and engineers cloned the image to a new hard drive. The server booted successfully from this single hard drive. Running a server on a single hard drive is risky, and that is why we added a new hard drive by configuring the RAID 1 volume to have redundancy.
Importance of Working Backup Process
RAID 5 data loss can cost a lot of money and reputation for business and even lawsuits. To avoid this critical situation, we advise you to make backups and check the hardware periodically. This way, you can prevent most data loss situations.
Backup & Recovery Consulting
If you need effective backup and disaster recovery planning, our team can help you achieve this. PITS Global Data Recovery Services provides businesses and individuals with a comprehensive set of data retrieval solutions, consulting, and IT services.
We are backed up with an advanced data recovery lab, experienced professionals, and a high customer satisfaction rate. Contact us via a request help form on our website or by calling (888) 611-0737 to get immediate help.