How Data Avoid Loss
Table of Contents

Data loss can be a significant problem for businesses, especially when it comes to critical data stored on RAID arrays. RAID 10 is a popular choice for businesses because of its fault tolerance and performance benefits. However, even with RAID 10, data loss can occur, and when it does, it’s important to have a plan in place for recovery.

In this blog post, we will describe a RAID array failure that happened in the Dell server. We will explain how our data recovery specialists rescued all needed data and restored the server within the next 24 hours after receiving the server and associated hard drives to our data recovery lab.

The customer was a pharmacy business. They had a pharmacy application installed on the Dell PowerEdge R430 server a long time ago. Unfortunately, the failure of the server stopped all business operations.

Dell PowerEdge R430 Data Recovery

One of the most commonly used servers for RAID arrays is the Dell PowerEdge R430. This server is known for its reliability and performance, but like all hardware, it has a lifespan. The Dell PowerEdge R430 end of life date was in March 2021, which means that Dell will no longer provide technical support or security updates for this server model. However, that doesn’t mean businesses have to stop using it, as it can still be a reliable option for their needs.

For those interested in using the Dell PowerEdge R430, it’s important to understand its specifications. This is a 1U rack server with up to two Intel Xeon processors and up to 384GB of DDR4 memory. It supports multiple RAID configurations, including RAID 0, 1, 5, 6, 10, 50, and 60, and can support up to 16 2.5-inch drives or eight 3.5-inch drives.

PITS Global Data Recovery Services is a HIPAA-compliant company, and we work with health organizations to help them with data recovery cases. The problem occurred after working hours on Friday, and we had time till Monday to recover the data and restore the server’s functionality. The company did not have active support contact with the IT company that set up this server.

As a result, no one from the customer side had any technical information about the server, RAID configuration, and application setup. The only thing they knew was that the server had an SQL server running, and they wanted to restore the server. They did not even have a single copy of a backup elsewhere and relied on a single server.

Emergency Data Recovery Service

This case was urgent and should have been taken care of immediately. Therefore, the PITS emergency response team dispatched and waited for the server to arrive at our lab on Saturday morning. They brought the server and the hard drives. We started the initial inspection process for the server and associated hard drives.

Dell Drives Server Recovery

There were four (4) hard drives on the server. Each hard drive was removed from the server, labeled, and then assigned to the engineers for further technical inspection. 2 expert engineers were responsible for handling this case.

We did not have technical information about the server RAID system configuration, and therefore we must follow standard procedures and inspect every device to find the root cause of failure.

Inspecting the Failed Dell Server

While inspecting the Dell server, we observed that it was not booting due to BIOS corruption. Finally, after resetting BIOS, the server started to boot but again got stuck on RAID Array Configuration Utility which manages hard drives connected to Dell Perc S130 RAID Controller.

The server was old, fans making abnormal noise and also lots of dust were inside of it. Our engineers cleaned the server using a high-pressure air blowgun. Then we updated the firmware of the RAID Controller to make sure the server was running the latest BIOS and controller firmware. Unfortunately, this server still did not want to boot, and we decided to focus on data recovery and restore the system to another server.

In the event of data loss on a Dell PowerEdge R430 RAID array, the first step is to identify the cause of the problem. This could be anything from hardware failure to human error. Once the cause is identified, it’s important to have a professional data recovery service handle the recovery process.

When it comes to RAID recovery on Dell PowerEdge R430 servers, the Dell RAID controller plays a critical role. The RAID controller is responsible for managing the array and can often be the source of problems if it’s not functioning correctly. A professional data recovery service will have the tools and expertise to diagnose and repair any issues with the RAID controller and recover the lost data.

Hard Drive Diagnostics

There were four hard drives on the server. We inspected each of them. Unfortunately, one of the hard drives had a head failure, which means that 1 out of 6 heads could not read the data from the magnetic surfaces. Therefore, this drive needed a cleanroom recovery.

But before proceeding with changing the reading writing assembly of the hard drive, we need to identify how these hard drives were configured. What type of RAID level was used? Or is it just a bunch of drives (JBOD)? Is this hard drive necessary for a successful recovery?

The remaining hard drives had bad sectors and plodding reading speed. But we could fully clone them and read the bad sectors using hardware read retries.

Failed Hard Drive Data Repair

Studying the Dell RAID Controller

Our engineers needed to eliminate certain RAID types by doing simple research. For this purpose, we opened the datasheet of the Dell PERC S130 Controller installed on this server to get detailed information about this controller’s supported RAID server types.

The RAID levels supported are RAID 0, RAID 1, RAID 5, and RAID 10. 

We analyzed the content stored on these hard drives and found data on them, which proved that the RAID array was using all four hard drives. Unfortunately, we could not get information from the RAID configuration utility because the RAID Controller reset on this server.

Dell RAID Recovery - Supported RAID Levels

You can create RAID 1, RAID 5, or RAID 10 logical volumes using four hard drives. RAID 1 only needs two hard drives but not more. So we could eliminate RAID 1 from the list.

Next, we know that only RAID 5 uses parity to build redundancy, but other RAID types do not. We did not have positive results by running entropy analysis on 4 RAID members, so we could confirm that it is not RAID 5 either.

We have left only RAID 0 (stripe) and RAID 10 (mirror+stripe). We identified that pair of hard drives contain the same data by comparing data sectors to each hard drive. This analysis confirmed that this is RAID 10 logical volume.

RAID 0 Array with 128 Sector Block Size

We only need one hard drive from each pair to recover data and combine them by finding the correct block size. The striping uses block size, and in this logical volume, it was 128 sectors. At this time, we still do not need to repair the hard drive with head failure. Engineers built the RAID 0 logical volume with 128 sector block size using the byte-to-byte images of hard drives; we got access to the file system, files, and folders.

Hidden
Request CallBack

File Integrity Check

While doing file verification to check the integrity, we saw that recent files did not work, but we could open the old ones. Thus, it was clear that one of the RAID members was outdated, and the RAID controller removed this from the logical volume long ago. It was necessary to repair the hard drive with head failure by replacing the reading writing heads and making a complete hard drive clone. Otherwise, the recovery of files and folders from the RAID array will be impossible.

One of the reading-writing heads of the hard drive lost its functionality. The only solution to this problem was to replace the magnetic head assembly with a new one. 

Once the engineer found a fully compatible hard drive to use as a donor for this patient drive, we did a precision head swap process in our certified cleanroom.

Server on RAID Data Recovery

We had already read the data from the patient drive before we opened the hard drive, except for the data written with the drive with a failed head. After successfully changing the head assembly on a failed hard drive, we could read the remaining data and make a complete image of this drive.

Building Complete RAID Logical Image

As we had four complete images from each hard drive, the chances of successful data recovery were significant. Therefore, we used data recovery software to combine two drive images as stripe members using 128 sector block size. For each RAID 0 combination, we checked the integrity of the files with the latest modification date.

Once we found a few complete working files a few hours before the crash, we found the correct drive images to build a volume. By combining these two drive images, we made a single image of the RAID 10 logical volume. At this step, it was straightforward to open this image file and export the necessary files.

But the job was not over yet, because, on Monday morning, the business should have a fully functional pharmacy application on a new server. The customer did not have any spare server that we could use as the previous server was unreliable. Purchasing a new server in a few hours was not feasible. For this purpose, we always keep a few new and used servers in our stock to help the customers in these difficult situations.

Recovery of Failed Hard Drives

After getting approval from the customer, we decided to use a refurbished HP Proliant ML350 Gen9. 

The engineers cloned the RAID Array image to a 1TB SSD SAS hard drive and configured the HP Smart Array controller to boot from this drive. 

The server successfully booted, and all the services were running fine, including Microsoft SQL Server and business applications. Therefore, we successfully restored the functionality of the server.

Data Integrity Verification

The customer needed to verify the server functionality, MS SQL Server, and the business application. Then, we connected the server to our secure network and granted remote access to the responsible representatives from the customer side.

They spent some time remotely checking the SQL database, and running SQL queries to check the transactions and applications. The customer confirmed the successful data recovery and was happy to proceed with payment and other administrative processes. The server was running and serving the business on Monday morning. 

In summary, while the Dell PowerEdge R430 may be reaching the end of its life cycle, it can still be a reliable option for businesses needing a RAID array. It’s important to understand the server’s specifications and have a plan in place for data recovery in the event of data loss. With the help of a professional data recovery service, businesses can recover their critical data and get back to business as usual.

Frequently Asked Questions

  • Complexity: RAID 10 involves striping data across mirrored pairs of drives, which requires a meticulous approach to data recovery.

  • Multiple Drive Failure: RAID 10 can tolerate a single drive failure in each mirrored pair. However, if multiple drives in the same pair fail or if additional issues arise, data recovery becomes more complex.

  • Hardware Specifics: The Dell PowerEdge R430 server introduces specific hardware and configuration considerations that need to be addressed during data recovery.

  • Data Striping and Mirroring: Data is both striped and mirrored across multiple drives, necessitating the reassembly of data fragments while ensuring data integrity.

Attempting DIY recovery on a RAID 10 array, especially on a Dell PowerEdge R430 server, is strongly discouraged. RAID arrays require specialized knowledge, software, and hardware tools for data recovery. Incorrect actions can lead to further data loss or even permanent damage.

  • Regular Backups: Maintain up-to-date backups of critical data to safeguard against data loss.

  • Routine Maintenance: Conduct periodic hardware and software checks to identify and address issues proactively.

  • Redundancy: Consider additional redundancy or failover solutions to enhance data resilience.

Success rates depend on factors such as the extent of damage, the number of failed drives, and the speed of intervention. While we aim for high success rates, severe damage or multiple simultaneous drive failures may limit data recovery possibilities.

Related Case Studies

HP-Proliant-ML380-G6-recovery

RAID 5 Data Recovery from HP ProLiant ML380 G6 Server

This blog is about another successful data recovery case from a damaged RAID array. Click here to learn more about the recovery process.

Evaluation of Damaged RAID Hard Drives

Data Recovery for Multi-Hospital Healthcare Provider

Our skilled engineers were able to restore crucial data for a medical company in need. Discover the techniques and tools we used.