The other day one of our websites went offline. It was down for more than an hour. Most of that time was waiting on our host provider to address the situation. After about an hour we took it upon ourselves to do a hard reboot on the CPanel, which is the server administration software installed on a Linux box that gives us the ability to manage our resources. About 10-15 minutes later everything was back to normal.
I don’t know if it was the hard reboot that restored things or if the host provider worked some mojo at the same time, but the experience led to some disheartening realizations about our host provider that has me wondering about the web hosting business all together.
Just about every host provider has a 99.9% uptime guarantee. With 24 hours to a day and an average 30 days per month, this means that your site should be online for at least 719.28 hours out of 720 hours. With more than one hour offline, this guarantee was not met, especially when you consider it was the third time in a 30-day period where it went offline.
To be fair, things happen. Computer technology is not perfect and servers go offline. In our case, it was most likely due to us breaching our allowance of resources which if you buy web hosting know that there is a limit to the resources allocated to your account. When that limit is met, the server is at risk of going offline. The provider could argue that this is our responsibility and not theirs, and I am fine with that.
What I am not fine with is how the matter was handled. When we called about the issue, Tier 1 Tech Support could not pinpoint the problem and so escalated it to Tier 2. Fine, but after waiting an hour I called only to find out that our ticket was sitting in a queue and that it could sit there for 12-24 hours. This seemed odd to me.
If the provider has a 99.9% uptime guarantee, then how can you knowingly allow a server to remain down for such a long period of time? Seems to me that a downed server would receive immediate attention. I am confident that if it was their website that went down, it would not sit in a queue for that long before getting addressed. So why should a customer’s website?
Now that I know how this host provider handles situations like this, I have much less confidence in running my websites through them. Adding salt to the wound is the fact that the Tier I tech support team should have and probably could have resolved the issue within the first 10 minutes or so by attempting the hard reboot. But considering that this host provider is one of the largest in the world, I suspect this situation would have been handled about the same no matter who it is.
And this has me thinking. In banking, you can typically have a line of credit or overdraft attached to your checking account so that if a transaction comes in when funds are not available, the overdraft kicks in and let’s the transaction go through, usually for a small fee. Same with credit cards. You can exceed your credit limit, but you will be charged a fee.
So why can’t web hosting be this way, at least for shared server environments which is the majority of hosting. Resource allocation limits are necessary, but in a shared hosting environment, more resources technically exist on the box. Seems to me, the admin software restricting this should be able to permit additional resources when necessary, and then charge me a fee when it occurs. I will be happier because my site wasn’t down. The host provider would make additional revenue and also not exhaust resources dealing with the issue.
I could be oversimplifying this or maybe what I suggest is not technologically possible. What I can say however is that the described current experience is a lose-lose for everybody that doesn’t seem like a long shot to keep from occurring.



