Dell RAID Data Recovery
In this blog post, we will describe a RAID array failure that happened in the Dell server. We will explain how our data recovery specialists rescued all needed data and restored the server within the next 24 hours after receiving the server and associated hard drives to our data recovery lab.
The customer was a pharmacy business. They had a pharmacy application installed on the Dell PowerEdge R430 server a long time ago. Unfortunately, the failure of the server stopped all the business operations.
HIPAA Compliant Data Recovery Company
PITS Global Data Recovery Services is a HIPAA compliant company, and we work with health organizations to help them with data recovery cases. The problem occurred after working hours on Friday, and we had time till Monday to recover the data and restore the server’s functionality. The company did not have active support contact with the IT company that set up this server.
As a result, no one from the customer side had any technical information about the server, RAID configuration, and application setup. The only thing they knew was that the server had an SQL server running and wanted to restore the server. They did not even have a single copy of a backup elsewhere and relied on a single server.
This case was urgent and should have been taken care of immediately. Therefore, the PITS emergency response team dispatched and waited for the server to arrive at our lab on Saturday morning. They brought the server and the hard drives. We started the initial inspection process for the server and associated hard drives.
There were four (4) hard drives on the server. Each hard drive was removed from the server, labeled, and then assigned to the engineers for further technical inspection. 2 expert engineers were responsible for handling this case.
We did not have technical information about the server RAID system configuration, and therefore we must follow standard procedures and inspect every device to find the root cause of failure.
Inspecting the Failed Dell Server
While inspecting the Dell server, we observed that it was not booting due to BIOS corruption. Finally, after resetting BIOS, the server started to boot but again got stuck on RAID Array Configuration Utility which manages hard drives connected to Dell Perc S130 RAID Controller.
The server was old, fans making abnormal noise and also lots of dust were inside of it. Our engineers cleaned the server using a high-pressure air blowgun. Then we updated the firmware of the RAID Controller to make sure the server was running the latest BIOS and controller firmware. Unfortunately, this server still did not want to boot, and we decided to focus on data recovery and restore the system to another server.
Hard Drive Diagnostics
There were four hard drives on the server. We inspected each of them. Unfortunately, one of the hard drives had a head failure, which means that 1 out of 6 heads could not read the data from the magnetic surfaces. Therefore, this drive needed a cleanroom recovery.
But before proceeding with the changing of reading writing assembly of the hard drive, we need to identify how these hard drives were configured. What type of RAID level was used? Or is it just a bunch of drives (JBOD)? Is this hard drive necessary for a successful recovery?
The remaining hard drives had bad sectors and plodding reading speed. But we could fully clone them and read the bad sectors using hardware read retries.
Studying the Dell RAID Controller
Our engineers needed to eliminate certain RAID types by doing simple research. For this purpose, we opened the datasheet of the Dell PERC S130 Controller installed on this server to get detailed information about this controller’s supported RAID server types.
The RAID levels supported are RAID 0, RAID 1, RAID 5, and RAID 10.
We analyzed the content stored on these hard drives and found data on them, which proved that the RAID array was using all four hard drives. Unfortunately, we could not get information from the RAID configuration utility because the RAID Controller reset on this server.
Dell RAID Recovery – Supported RAID Levels
Next, we know that only RAID 5 uses parity to build redundancy, but other RAID types do not. We did not have positive results by running entropy analysis on 4 RAID members, so we could confirm that it is not RAID 5 either.
We have left only RAID 0 (stripe) and RAID 10 (mirror+stripe). We identified that pair of hard drives contain the same data by comparing data sectors to each hard drive. This analysis confirmed that this is RAID 10 logical volume.
RAID 0 Array with 128 Sector Block Size
We only need one hard drive from each pair to recover data and combine them by finding the correct block size. The striping uses block size, and in this logical volume, it was 128 sectors. At this time, we still do not need to repair the hard drive with head failure. Engineers built the RAID 0 logical volume with 128 sector block size using the byte-to-byte images of hard drives; we got access to the file system, files, and folders.
File Integrity Check
While doing file verification to check the integrity, we saw that recent files did not work, but we could open the old ones. Thus, it was clear that one of the RAID members was outdated, and the RAID controller removed this from the logical volume long ago. It was necessary to repair the hard drive with head failure by replacing the reading writing heads and making a complete hard drive clone. Otherwise, the recovery of files and folders from the RAID array will be impossible.
One of the reading-writing heads of the hard drive lost its functionality. The only solution to this problem was to replace the magnetic head assembly with a new one. Once the engineer found a fully compatible hard drive to use as a donor for this patient drive, we did a precision head swap process in our certified cleanroom.
We had already read the data from the patient drive before we opened the hard drive, except for the data written with the drive with a failed head. After successfully changing the head assembly on a failed hard drive, we could read the remaining data and make a complete image of this drive.
Building Complete RAID Logical Image
As we had four complete images from each hard drive, the chances of successful data recovery were significant. Therefore, we used data recovery software to combine two drive images as stripe members using 128 sector block size. For each RAID 0 combination, we checked the integrity of the files with the latest modification date.
Once we found a few complete working files a few hours before the crash, we found the correct drive images to build a volume. By combining these two drive images, we made a single image of the RAID 10 logical volume. At this step, it was straightforward to open this image file and export the necessary files.
But the job was not over yet, because, on Monday morning, the business should have a fully functional pharmacy application on a new server. The customer did not have any spare server that we could use as the previous server was unreliable. Purchasing a new server in a few hours was not feasible. For this purpose, we always keep a few new and used servers in our stock to help the customers in these difficult situations.
After getting approval from the customer, we decided to use a refurbished HP Proliant ML350 Gen9. The engineers cloned the RAID Array image to a 1TB SSD SAS hard drive and configured the HP Smart Array controller to boot from this drive.
The server successfully booted, and all the services were running fine, including Microsoft SQL Server and business applications. Therefore, we successfully restored the functionality of the server.
Data Integrity Verification
The customer needed to verify the server functionality, MS SQL Server, and the business application. Then, we connected the server to our secure network and granted remote access to the responsible representatives from the customer side.
They spent some time remotely checking the SQL database, running SQL queries to check the transactions and application. The customer confirmed the successful data recovery and was happy to proceed with payment and other administrative processes. The server was running and serving the business on Monday morning.