Why the RAID5, God?! WHY!?!
My investment in RAID5 was well intentioned. Have a mammoth sized volume and redundancy if one of the disks fails. I thought I was in heaven and nobody or nothing could touch my unsinkable ship, the HMS Titanic.
Well it turns out there was something that could breach the hull of my baby. The same thing that I’ve trusted to keep me in business throughout my life: electricity. Some may think this a foolish oversight, especially with my past experiences with these things but nevertheless I thought that bad things of the same nature were far less likely to happen to me again.
Before I reel off into the land of stories and dreams I’ll give anyone reading a quick overview of the relevant parts of what I’ve been running on and let you bathe in the glory of what should have been a dream:
- An APC UPS with built in surge protection holding up my fort.
- 480W Antec NeoPower PSU
- 3*300gig Maxtor Diamonmax 10 HDs in RAID5
- 2*36gig WD Raptors
- 1*120gig plain ol’ sata drive
So with that out the way, on with the story… Gather round, children…
It began on a slow-paced wednesday evening many miles from here in a land called Norwich. John and I were taking it in turns to do various bulks of washing and everything was peaceful and calm.
Suddenly I hear a “beep beep beep” coming from my UPS that only means one of two things: somebody has brought the toasted-sandwich maker down and plugged it in, forgetting that it doesnt work too well after I put too much cheese in one endeavour and ended up with most of it inside the circutry of the damned device; or someone has otherwise caused a short in the socket power ring.
Being on a UPS I get the pleasure of finding out the power has gone off without experiencing the pain and suffering of losing 3 hours of writing up technical specifications or and debatably worse losing the same amount of gameplay since the last save.
So yes. Power goes out and I head into the cave of suprises that is the cupboard under the stairs. The two local ring trips are still on but the main trip-switch covering them both has pinged up. Not the end of the world so I push it back up and it trips straight away and comes flying back down.
There is obviously something plugged in that is causing it to surge and therefore trip. So with my battery still beeping away, John and I scurry around the house unplugging (and ordering a coked-up Joe to unplug) what we can and try the trips again. I decide its time to turn my room off, so order John to turn my PC off via the UPS (windows hasnt shutdown for some reason). Off it goes and we’re still having the same problem. So we unplug the fridge-freezer, grumble-dryer and washing machine and try once more… At this time nothing is plugged in.
Sucess! The main trip is staying up but not with the secondary sub-trip on so no sockets have power around the house. Just to explain what each of the sub-trips control: one does my room alone and the other covers the rest of the house. Fair? Probably not but oh well.
With this in tow, I now have power just to my room and so I’ve turned my UPS back on and restarted Windows while we figure out what’s causing the rest of the house not to get power. In heinsight, I recognise this as my first mistake.
We need the fridge to have power, obviously, so that gets plugged back in. As soon as its turned on everything else turns off and we’re without power again but I notice something… My UPS, while on, should be making a racket and its not. I go through to see what’s going on and my PC is off. This doesnt bother me too much now but this was the first sign that something was seriously out of place…
After a little more digging around with the trips we decide to leave the fridge off overnight and get an electrician in the morning to come and see what the problem is most likely so I go back to my room and consider what to do with my spangley computer.
After hitting the on button numerous times and still not getting any breed of onnage, my heart sinks and realisation sets in. Something is uberfooked. So off with the case and time to have a look. There’s a lovely little green light on the motherboard showing it has power so I try the power button again. Nothing. Not even the mildest stirring of a fan.
To cut a long story erm… well slightly shorter than it would have been, I tried everything my experience with bastard computers has tought me. I tried it with nothing on the board, with reduced ram, while standing on one foot and hopping but nothing seemed to work. I even did the CMOS restart jumper thang.
This is whem I decide its above me and I send out the one plead anyone can muster when you’ve got no access to email or an instant messenger and its 4am: A bulk text message.
“Firstly I’m sorry about the timing of this bulk message but i dont know what else to do. the house’s power kept tripping and now my pc wont even think about turning on. there’s a light on the mobo but no fans or anything else… because of my cabling i need o find an identical replacement psu in norwich. i need an antec neopower today regardless of the price. i just need to know where. [yada yada] lets all pray my RAID is ok…”
Why did I say that last part? Nothing came back that night which didn’t suprise me too much because of the time but by morning and there was still nothing, I was starting to panic a little bit…
So I went out this morning, got an electrician and just as I was about to pack it in and come home to my fatality of electrical bodges I got a message from Paul. My crusader of morningtime was here to save me. After rapid textual-banter I decide I need to see a working computer in order to find somewhere with a PSU for me, so back in the car and to Paul and Steve’s house where I was greeted by a fully internet enabled Paul who helped me ring up every computer shop in norwich. As luck would have it, none of the 3billion shops in norwich had heard of Antec, let alone the model I wanted so I gave Paul my thanks (and a tip for his sister over the last few weeks) and went to where-in-the-world PC World where I was calmly robbed of 75 and given a different PSU.
I got back home, recabled everything perfectly fine and booted her up. Worked like a charm… So that’s the second Antec PSU I’ve had that’s blown up on me; the first being a thrilling tale of sparks and dissapearing eyebrows.
Windows boots reassuringly fast and I dont notice that the desktop is different until MSN loads and I see its MSN7. I hadnt booted to my raptors at all… I was working off my non-raided SATA drive. The string of curse-words forming in my head was almost tangable. I felt like tying a noose with them and hanging myself, before realising that none of the ceilings or walls would take an Oli of my stature. So I closed everything down and decided it was time to look at BIOS.
Having actively reset the BIOS, I knew that this was likely to be a minefield of default settings but after a few minutes I saw the problem and it was only the boot order. I turned off the splash screen my bios gives to see what was going on at boot and everything looked great. My Silicon Image RAID5 array looked to be intact and the nvidia RAID post screen was just as optimistic, reporting a healthy striped array.
So it was back to the proper install of Windows. I thought I was the luckiest guy in the world having come so close to breakdown and escaping at the last moment. Then I noticed something… Looking under My Computer, I seemed to be missing 2 volumes (namely the 2 partitions I have on my RAID5) so I restarted again and went into the Silicon Image RAID setup screen where it promptly told me that there were no raid arrays in effect but I was more than welcome to create a new one.
And that’s where I am now. Somewhere between a brick wall and a hard place.
Historically I recognise that its very hard to recover from an array that splits your data over multiple disks like RAID0 and RAID5 but I just dont want to believe that I’ve lost 600gigs of stuff. That just isnt fair. Not after I spent so much time and money making sure that this very thing would not happen if a disk died.
What I’ve lost:
- 6 years of work
- All my films
- All my music (that I dont have on my iPod) so about 70% lost
- All the ISOs
- My share of the TV
I just feel so physically sick that it was there this time yesterday and now its gone and unless I find some way of bringing it back, its gone forever.
So here’s my plead: you’re all awesome people who I know might get annoyed with me asking for stuff but if the worse comes to the worse, I’m going to need slightly more help building it back up. There’s going to be no efficient way to do this other than by network, so if its terminal, I’m going to see about setting up some sort of LAN gathering.
Until then, if anyone has any ideas on what to do to recover from an onboard RAID failiure, PLEASE contact me ASAP. Cheers and thanks for reading what’s probably the worst written rant ever.
Update: Just after I submitted this post, XP decided that everything was ok… I’m just so confused… I was on the brink of manually recreating the array (thus wiping everything properly)…