Recently a colleague called us and asked whether we could help him. His gentoo-system suffered from a boot problem and he was not able to start his system up which was showing "BOOT DRIVE FAILURE". Since he did not setup the system (which was his IMAP-server, fax-server, phone-management server and on top his webserver to handle the frontend for the customers) he was pretty much clueless about what to do. After arriving at his house he already took apart his computer and took out the drive which seems to took a crap. Connecting this drive via an external USB adapter to my laptop showed exactly nothing. The drive was not showing up, but I could hear how it was physically spinning up. I decided not to waste any time with this drive and asked him about the backups. He didn’t know the details but gave me the phone number of the guy who set up the system. After a call it was clear that there were indeed backups but only backups of the application data and none of the system. Grasping by the thought of setting up his fax-server (I hate doing this in linux) and his mail server within 3 hours (he needed to have his system running a.s.a.p.) I decided to have a quick look at the other drive which was built inside hoping to find at least a few config files etc. So I took out the other drive, connected it to my laptop and was happy to see a working filesystem with /bin, /boot etc folders which gave me a good feeling about being able to quickly get the system running. However I found out that the original maintainer of the system decided to take backups … onto the same drive. Doh!
After putting the drive back into the server and hoping that the system would start up, that 2nd drive was shown as failing in the POST-messages while booting. Ohboy! Disconnecting the drive and reconnecting it to my laptop showed indeed that the drive was crapping out as well and that I could not access the data anymore. What a great day. Additionally I realized that the drive was making very loud and weird noise indicating that it was about to say goodbye completely. Grasping even more… I initiated the download of an Ubuntu 8.0.4-server cd and went to lunch since I was behind a 2mbps DSL line and downloading a CD would take about 45minutes.
When I came back from lunch and while I started to burn the CD I decided to give the first drive another chance. I connected it and – damn! – it started and I could access the data on the drive. Making sure not to waste any time I initiated an image process of that drive (who would know how many minutes the drive would work?).
In a VM I had Knoppix (a live distro) running. I had connected a share from my Windows laptop to the Knoppix:
mount -t cifs -o username=administrator //192.168.222.1/laptopShare /mnt/smb/
Started a dd_rescue into a loop-file:
dd_rescue -b 4M /dev/sdb /mnt/smb/backup/sdb_image
A quick analysis showed that the disk had a capacity of 80GB, which was split onto two partitions. Partition 1 had 50GB, partition 2 30GB. While it was imaging the drive I started setting up the bare system from scratch just in case that I could not restore any of the system or the drive would crap out again.
After two hours the imaging process slowed down severely and the drive was making funny noises, but I was able to read about 57GB so chances were good that at least the first partition was rescued. Next step was to check out what kind of data I just rescued (as mentioned I didn’t want to take any chance so I did not interrupt or delay the imaging process), mounting the file. Since I did not copy each partition seperately I had to find out where the partitions beginnings were, using fdisk:
debian:~# fdisk -u -l /mnt/smb/backup/sdb
You must set cylinders.
You can do this from the extra functions menu.
Disk /mnt/smb/backup/sdb: 0 MB, 0 bytes
255 heads, 63 sectors/track, 0 cylinders, total 0 sectors
Units = sectors of 1 * 512 = 512 bytes
Device Boot Start End Blocks Id System
/mnt/smb/backup/sdb1 63 100020689 50010313+ 83 Linux
Partition 1 has different physical/logical endings:
phys=(1023, 254, 63) logical=(6225, 254, 63)
/mnt/smb/backup/sdb2 100020690 160826714 30403012+ 83 Linux
Partition 2 has different physical/logical beginnings (non-Linux?):
phys=(1023, 254, 63) logical=(6226, 0, 1)
Partition 2 has different physical/logical endings:
phys=(1023, 254, 63) logical=(10010, 254, 63)
In this case I wanted to know the offset for partition, which is sector 63. Multiply this with the sector size (512B/sector) and you get the offset of 32256 Bytes.
debian:~# mount -o loop,offset=32256 /mnt/smb/backup/sdb /mnt/loop debian:~# cd /mnt/loop debian:/mnt/loop# ls bin dev home lost+found opt root sys usr boot etc lib mnt proc sbin tmp var
Excellent. How about partition 2? Let’s have a look:
debian:/mnt/loop# mount -o loop,offset=51210593280 /mnt/smb/backup/sdb /mnt/loop2 debian:/mnt/loop# ls -al /mnt/loop2 total 637964 drwxr-xr-x 6 root root 4096 2009-06-13 14:35 . drwxr-xr-x 16 root root 4096 2009-06-18 15:51 .. -rw-r--r-- 1 root root 2937770 2009-06-13 14:06 bin.tar.bz2 -rw-r--r-- 1 root root 525861 2009-06-13 14:06 etc.tar.bz2 -rw-r--r-- 1 root root 6510707 2009-06-13 14:07 home.tar.bz2 -rw-r--r-- 1 root root 8406356 2009-06-13 14:07 lib.tar.bz2 drwx------ 2 root root 16384 2009-05-01 20:59 lost+found ?--------- ? ? ? ? ? /mnt/loop2/mailing ?--------- ? ? ? ? ? /mnt/loop2/mails ?--------- ? ? ? ? ? /mnt/loop2/test -rw-r--r-- 1 root root 182 2009-06-13 14:07 opt.tar.bz2 -rw-r--r-- 1 root root 13524123 2009-06-13 14:08 root.tar.bz2 -rw-r--r-- 1 root root 1883672 2009-06-13 14:08 sbin.tar.bz2 -rw-r--r-- 1 root root 447615119 2009-06-13 14:35 usr.tar.bz2 -rw-r--r-- 1 root root 171154246 2009-06-13 14:53 var.tar.bz2
Ah! There are the backups… on the same drive.
Since I had the images on my laptop and all I had was a rather slow USB adapter I decided to partly installed Ubunto on the server, add a new drive into that machine and copy the image back via network (my laptop and the server had a gigabit NIC) to save some time. Now there are nice ways to do this with netcat (described here, if you prefer ssh) I decided to use the more simple approach by mapping the share from my laptop to the server.
The next issue which I got into was that the dd’ed drive did not work properly. The filesystem showed all kind of different errors and while mounting the device I had some “attempt to access beyond end of device” errors (or similiar). cfdisk showed the proper sizes but it seems like something else was screwed. Even when I tried to create the fs again on that partition it showed a far too small partition size. I decided to create a larger partition manually ( >50GB ) and then just copy back the one partition only (the 2nd partition did not have any value to me anyway since it was incomplete). I calculated the offset on the target drive (see above) and startet another dd with supplying the source and target offsets. That worked fine and the data was consisten on accessible afterwards.
Now let’s get this drive booted. grub! It’s been ages that I have used grub so I had to get it done by reading, trial and error. Basically these were my steps:
- Boot the server from a Knoppix live system (so that drive names are not mixed. For some reason the Ubunto distro showed sda’s instead of hda’s)
- Copy over the grub-bootloader files from
/usr/lib/grub/i386pc/stage1to/boot/grub/ - If you are using Knoppix you have to get around the
/dev/null: Permission Denied-error. chrootinto the path where you have mounted the partition from where you want to boot from (more details here)- invoke a proper
grub-installcommand - edit the grub-menu, it may look something like
- reboot the system
- you should be done.
boot title Ubuntu, kernel 2.6.15-25-k7 (recovery mode) root (hd0,0) kernel /boot/vmlinuz-2.6.15-25-k7 root=/dev/sda1 ro single initrd /boot/initrd.img-2.6.15-25-k7
After that I could boot into the system… and encountered a freeze, which was because I forgot to edit the fstab. Correcting it made the system boot up properly and all of the service were accessible afterwards.
No Comments.