Website Outage on July 14th
ziggy @ Tue Jul 15, 2008 1:23 am
DrCaleb DrCaleb:
Canadaka Canadaka:
The diesel generaters died from overheating after 20min, and the backup UPS failed after another 15min. The data centre people worked all day to fix the problem, but it took them almost all day.
Lesson to take away - if you have a disaster plan in place - TEST IT!
Man, someone dug up another fiber bundle near Edmonton today. I had no email, no Sametime - and CKA was down. I was so bored I was playing with a shiny thing on a bit of string.
Fibre bundles,you hit one of them and you grab your lunch bucket and head for the highway cuz your done.

Next hole you dig will be with a hand shovel.
http://www.abc.net.au/news/stories/2008 ... 303837.htm
This happened at 6 pm EST and it still down. no cell no internet no landlines, anything on optus is screwed. One fiber link.
Regina @ Tue Jul 15, 2008 6:38 am
We should get a CT thing going and blame it on Bush. 
ziggy ziggy:
DrCaleb DrCaleb:
Canadaka Canadaka:
The diesel generaters died from overheating after 20min, and the backup UPS failed after another 15min. The data centre people worked all day to fix the problem, but it took them almost all day.
Lesson to take away - if you have a disaster plan in place - TEST IT!
Man, someone dug up another fiber bundle near Edmonton today. I had no email, no Sametime - and CKA was down. I was so bored I was playing with a shiny thing on a bit of string.
Fibre bundles,you hit one of them and you grab your lunch bucket and head for the highway cuz your done.

Next hole you dig will be with a hand shovel.

"Call before you dig" is not just a suggestion.
$1:
Dear Valued Customer,
I would like to apologize for Monday’s disruption to your business. We understand fully that you depend on Canada Web Hosting as your trusted managed hosting provider to ensure your online business applications are available to your customers all the time.
As you are already aware, our data centre was without power for approximately 6 hours on Monday, July 14, 2008.
Our Vancouver data centre is in the Harbour Centre building. Due to an underground fire and the resulting large-scale power outage in downtown Vancouver, Harbour Centre's main generator, Gen No.7, started and maintained power as designed. However, 20 minutes after the loss of commercial power, the Harbour Center’s No.7 generator stopped working due to a failure in the generator's cooling system. Gen No.7's cooling system depends upon water provided by the city of Vancouver and requires a certain level of water pressure to function normally. As fire fighters worked to put out the fire, water pressure in the downtown core declined to a point where it caused the generator to overheat and malfunction.
Once the generator failed the UPS system’s batteries drained and eventually failed. After three failed attempts to bring Gen No.7 online, the generator successfully came online at 3:40 PM PDT and restored power back to the data centre, UPS systems, and our client servers.
As a result of this event, we have been assured by the Harbour Center that changes will be made to prevent this failure in the future. During the event, we also discovered a number of weaknesses in our own business continuity plans. As a result, many clients could not reach us via our Vancouver number or one of our toll-free numbers which is routed through Vancouver despite our geographically redundant telephony systems. Changes will be made in the near future to fix this problem and others to allow us to maintain support services through normal channels regardless of a disaster in Toronto or Vancouver.
A time line of the outage is given below.
I thank you again for your patience and understanding.
Brian Shepard
President and CEO
Canada Web Hosting
TIMELINE
• 9:00 AM - Fire started in a manhole located in the 500 block of Richards Street, Vancouver, BC.
• 10:00 AM - Utility power to most of downtown was lost due to a subsequent transformer explosion. Harbour Centre’s main building generator, Gen No.7 started and provided power to the Data Centers. No clients were affected.
• 10:20 AM – Gen No. 7 failed due to overheating as the result of a failure in the generator’s cooling system. Power went with UPS systems now running on battery.
• 10:40 AM - First alarm of batteries failing in UPS systems.
• 10:45 AM - 11:00 AM - Clients lost power due to UPS batteries running out.
• 12:30 PM – Gen No.7 is restored and providing power to the Data Centers.
• 12:30 PM - 12:45 PM – Clients’ servers are back up and running.
• 1:00 PM – Gen No.7 fails a second time due to overheating as the result of a failure in the generator’s cooling system. Clients’ lost power within minutes due to UPS system batteries being low.
• 3:00 PM - Gen No.7 is restored and providing power to the Data Centers. PEER 1 did not turn client servers on immediately; waited for power to first stabilize.
• 3:30 PM - Gen No.7 fails a third time due to overheating as the result of a failure in the generator’s cooling system.
• 3:40 PM – Gen No.7 is restored and providing power to the Data Centers.
• 4:00 PM - Power restored to UPS systems, but not to client servers; waited for power to first stabilize.
• 4:20 PM - ALL UPS systems now online and power to client servers restored.
• 5:55 PM - Utility power restored to the Data Center and Gen No.7 shut down.