What Caused Bluehost’s MASSIVE FAIL

Click '+' for an email from Bluehost's CEO on Friday, April 18. Too bad he couldn't be bothered to say anything earlier.
bluehost-ceo
Click '+' for an update from Bluehost Support which, ironically, arrived 24 hours after yesterday's 1pm outage.
I would like to offer my sincerest apologizes for this lack of communication and to provide you with some details as to what happened. We experienced a degradation of network service in one of our data centers due to a firmware bug in one of our vendor’s hardware solutions. This was an undocumented bug and we worked with our partner to diagnose the issue and deployed a firmware update to the systems to remediate the problem. Only websites that were being served by this hardware were affected. This is unrelated to any previous outages and we have reviewed our entire network to make sure this problem will not occur elsewhere. Please let us know if there is anything else we can provide; whether information and other, but I would like ensure you I personally understand your frustrations and can appreciate your stance on the situation. Best regards – Ryan, Supervisor, BlueHost.com

An unknown number of Bluehost servers went down yesterday, April 16th, at 1pm central time. This may have been limited to their Dedicated (which I own) and virtual private servers (VPS) but that’s unknown too. It’s also unknown what caused it, even approximately when it will be fixed, or other pretty basic items a paying customer wants to know when a service is failing.

In this post I will tell you about two fails Bluehost made: them communicating to customers about the outage and what caused the outage in the first place.

BLUEHOST COMMUNICATION FAIL

Outages do occur at webhosts…they just do. But why so many unknowns and a clear reluctance to be transparent? Because Bluehost has failed dramatically at THE MOST BASIC customer relations item: communicating with customers about why something isn’t working as promised. Rather than have a status page at Bluehost.com that either has status updates on it or embeds their Twitter and Facebook feeds, they ask people to follow them “and check our Twitter feed and Facebook page for updates.” How incredibly bush-league.

A few cut-n-paste tweets from Bluehost Support

For hours and hours and hours they have been telling people essentially, “I dunno” which is unacceptable. Not only is this impacting an untold number of people (the tweets are numerous) this is a PR disaster and customers will undoubtedly flee. Especially those who have clients on Bluehost due to their recommendation, one that now makes those recommenders look like a bunch of clueless imbeciles.

I’ve also been evangelizing Bluehost’s new Dedicated server offering since it has been very fast and their Level III tech support access the best I’ve ever had with any host I’ve ever used. Several of my clients have purchased Dedicated servers (and yes, ALL of them pinged me about where they should go next because they are absolutely getting off Bluehost!).

From 1pm Wednesday April 16th through today, Bluehost Support can only tell customers “I dunno”

From 1pm Wednesday April 16th through today, Bluehost Support can only tell customers “I dunno”

Will I continue to evangelize? Nope. I might have cut Bluehost some slack IF they had been communicative. I may continue to evangelize IF Bluehost provides recompense for my server downtime and IF they provide a plan on how NOT to repeat a fiasco like this in the future. If they say or do nothing I’ll take my business and that of my clients elsewhere.

But here is what caused the outage.

BLUEHOST PARTNER FAIL: THE CAUSE

Some servers are back up and fortunately all of my sites on our dedicated server are up, except the most important one. This site is on its own dedicated IP address (since it uses an SSL certificate) and runs our key ecommerce site. Based upon our transaction history—and because we launched a new product and are simultaneously holding a sale—I estimate we’ve lost between $3,000 – $4,000 in sales since the site went down yesterday. Damn.

After waiting over two and a half hours this morning to talk with Level III technical support today, I learned the answer of what caused the outage. While on hold I thought I’d poke around and see what data I could uncover so I could ask intelligent questions if I ever connected with someone!

aceI did a traceroute on my site’s dedicated IP address and learned that it stopped dead at ve15.ar04.prov.acedc.net. Acedc.net is run by Ace Data Centers, a colocation and IP transit company. In my poking around I also discovered that all of Bluehost’s dedicated servers (which are rack-mounted blades and might include their VPS servers too) are colocated at this data center in Orem, Utah. Headquarters for Bluehost and Ace are just over two miles apart.

When I connected with support I got in to a conversation with the tech rep in order to ferret out the reason for the outage.

Turns out that the fail was caused by a small team which did a backend FIRMWARE UPDATE ON ROUTERS AND SWITCHES yesterday morning and was performed at Ace Data Center, the company that provides the IP transit service for Bluehost. Apparently the Ace team is small, obviously hosed up the firmware update so domain names were no longer resolving, and couldn’t fix it themselves. A netops team from Bluehost’s parent company, Endurance International Group (EIG), apparently scrambled to get over and help to fix the problem.

EIG is the company that owns Bluehost and numerous other hosting companies and businesses. EIG holds a colocation Master Service Agreement, an IP Transit Service (Carrier Services) Agreement, and an Data Center Rack Cabinet and Power Services Agreement with Ace Data Centers, Inc. (more here). EIG and Bluehost obviously knew the gravity of the screwup by Ace so have clearly pulled out all the stops to get servers back up and running.

The fix was done about 9am this morning and all but one of my sites is up (the one with its own dedicated IP…obviously ones being updated last). But now the netops team is apparently working on “the flow” from the router/switch to the Bluehost servers so the DNS works (i.e., so the domain name will properly “point” to the IP address for a server…or my site!). In router-speak they’re obviously doing something to fix DNS-based X.25 routing data flow and I have no idea what they’re doing or how long it will take.

When I asked the ETA on when my single dedicated IP might resolve was also an unknown and could be “10 minutes to 6-7 hours” from now.

Holy shit. What a massive screwup.

Yes, it pisses me off that my site is down and server was down for hours and hours yesterday. BUT I CANNOT EXCUSE BLUEHOST FROM NOT TELLING US WHAT I JUST DISCOVERED TODAY! Do they think we’re stupid or that the screwup won’t be found out? That maybe we little customers can’t handle the truth? Or perhaps they don’t want to demonstrate publicly how badly their partner Ace screwed up? Whatever the reason, there is NO excuse for not being honest, transparent, and forthright.

bluehost_now
  • Create a status update page that EXPLAINS what happened (and, God forbid there are future events, do actual updates on that page every 30 MINUTES!).
  • Provide redundancy/failover services. I would pay for a redundant data center so my dedicated server was never offline. Yes, I could move our site to Amazon Web Services or other multi-data-center facility and pay A LOT more for the service, but if I had those technical chops (or could afford to hire them) I’d not need Bluehost or the managed Dedicated server!
  • Create a mechanism so that, if a server is down or another DNS outage occurs, a DNS request is automatically captured and a page appears which states something like this so all of us running domains don’t look like a bunch of idiots that can’t find our ass with both hands:

“We apologize that the site, domainname.tld, is temporarily offline due to a server or network malfunction. This may be impacting both the website and email. Please use other means to contact the person or organization.”

  • By the way, PLEASE PROVIDE AN OPTION TO TURN OFF THAT LOOPING SALES PITCH WHILE ON HOLD FOR TECH SUPPORT! I heard it at least 50 times while waiting and had to listen since I didn’t want to miss the tech support person when they finally answered.

In any event please step it up Bluehost. If you’re the CEO or in leadership reading this, EIG owns a lot of other hosting companies and I’ll bet the shit will roll downhill pretty fast (or maybe already has) and you need to get your act together.

Posted in

6 Comments

  1. MR on April 20, 2014 at 4:16 pm

    I read your blog. It seems the explanation for the outage you give on the blog is contrary to the statement given by the CEO.



  2. Steve Borsch on April 20, 2014 at 4:59 pm

    Specifically how is it contrary “MR”? It was a firmware bug. Router. He didn’t go in to all the detail obviously.



  3. Jeffrey C on April 21, 2014 at 5:56 pm

    It’s contrary because you claim that the outage was caused by the Ace team who made a mistake on installing a firmware update. Bluehost’s CEO claims that it was a bug. One is human error. The other is not.

    Do you still say that it was human error or will you be amending your blog post to say that it was a bug?



  4. Steve Borsch on April 21, 2014 at 6:11 pm

    Jeffrey,

    *I* don’t claim. As you can read that is how it was explained to me. Paraphrasing that explanation was the paragraph, “Turns out that the fail was caused by a small team which did a backend FIRMWARE UPDATE ON ROUTERS AND SWITCHES yesterday morning and was performed at Ace Data Center, the company that provides the IP transit service for Bluehost. Apparently the Ace team is small, obviously hosed up the firmware update so domain names were no longer resolving, and couldn’t fix it themselves. A netops team from Bluehost’s parent company, Endurance International Group (EIG), apparently scrambled to get over and help to fix the problem.”

    Yes it was a firmware bug on the routers at Ace. The Bluehost CEO did not explain in his email with specifics that it was person “A” on small team “B” employed by Ace or a subcontractor.

    I won’t be amending my post. ‘Nuff said.



  5. Dan M on April 24, 2014 at 8:01 am

    Is it true that WordPress offically recommends BlueHost? Do you have any other recommendations?



  6. Steve Borsch on April 24, 2014 at 10:31 am

    Hey Dan-o,

    Yes. WP does recommend Bluehost.

    As you might have seen in the post above, I have a dedicated server and have had incredibly great support from their Level III support crew. Because the switching costs are so high I’m reluctant to go elsewhere since it took so long to settle on their relatively new Dedicated offering.

    Like the old adage about the builder constructing your new home, “You’ll start off loving your builder but you will hate them by the end” is true with webhosts too. As long as your host never has an outage—and believe me every host I’ve ever used has had major and minor outages—you’ll be in love. The moment they do they’re dirt and you hunt for a new one!



Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Posts Menu

Posts by Category

Archives (2004 – Present)

About Steve Borsch

Strategist. Learner. Idea Guy. Salesman. Connector of Dots. Friend. Husband & Dad. CEO. Janitor. More here.

Facebook | Twitter | LinkedIn

SiteGround is ‘The One’

READ THIS PAGE to learn how and why I finally found “the one” web hosting company I heartily endorse and use, SiteGround, and why it is highly likely to be the perfect web hosting company for you.

Connecting the Dots Podcast

Podcasting hit the mainstream in July of 2005 when Apple added podcast show support within iTunes. I’d seen this coming so started podcasting in May of 2005 and kept going until August of 2007. Unfortunately was never ‘discovered’ by national broadcasters, but made a delightfully large number of connections with people all over the world because of these shows. Click here to view the archive of my podcast posts.

The Best Web Host

It’s the best web hosting company. Your website will run FAST and they have all the tools you need to get up and running quickly, along with the support you need to make your website work for you. Check it out and sign up today!