E-commerce fails: Why big-time websites can be completely screwed up

image

(Images by Rob Pegoraro/Yahoo Tech)

Maybe this isn’t the sort of thing I should admit in public, but I’ve been having trouble paying my gas bill.

I assure you that I have the money, even for the high heating costs of a 1920-vintage home. I just haven’t been able to log into my utility’s site to pay up.

It all started when I visited Washington Gas’s site in late December and was greeted with a request to set up an account at its new payment portal.

My first reaction was amusement at the inept security questions; almost anybody could have guessed the answers with just a little online sleuthing. But then I realized that the threat of strangers logging in as me might not be that great when even I couldn’t log in as me.

First the site rejected my old password for being insufficiently complex. Then it balked at my attempt to create a new one with a uselessly vague “There was an error updating your profile” error.

Clicking the password-reset link got me to a screen saying, “Washington Gas cannot process your request at this time” — with an invitation to answer two security questions that were not displayed.

This Kafka-esque confusion continued through a series of apologies posted on Washington Gas’s site, each touting the utility’s diligent efforts to fix the problems and promising their quick resolution.

Sadly, it’s not the first time we’ve seen this kind of ineptitude on what should be some big-league e-commerce sites.

IT irritations

The disastrous launch of the government’s HealthCare.gov insurance-registration site might be the most famous meltdown at a customer-facing site. But such high-profile Web-based systems break down more often than anybody would like to admit. For instance:

• In 2013, Maryland’s statewide health-insurance portal crashed. The state never could get it to work, so it switched to Connecticut’s setup. That move recouped $45 million of the $73 million sunk on the project.

• In 2011, a coding error at the investment company AXA Rosenberg cost its investors $217 million. The Securities and Exchange Commission made the firm pay that back and also cough up a $25 million fine.

• In 2008, malfunctions in the automated baggage-handling system at London Heathrow Airport’s new Terminal 5 led to planes taking off without checked luggage and checked luggage being delivered to cancelled flights. It was eerily similar to what happened at the then-new Denver International Airport more than a decade earlier.

(For more examples of information technology gone wrong, see the timeline IEEE Spectrum put together last year. While I’m at it, I must note that this site’s parent firm Yahoo has had its own large-scale malfunctions, such as the late 2013 outage that locked out many Yahoo Mail users for days.)

image

Blamestorming

Computing at large scale, it seems, is hard. I asked one of the techies who helped fix HealthCare.gov why these things seem to keep happening. Short answer: It’s often the old right-hand/left-hand problem.

“You can’t think of your online and offline business as two separate things,” e-mailed Greg Gershman, co-founder of the software firm Ad Hoc. “They have to be thought of as one integrated business process, and the team or teams that create them have to be integrated and work together.”

At HealthCare.gov, for example, “the teams responsible for the development and operations of the site were from lots of different companies, and no one person or team took overall responsibility.” Gershman said he and colleagues signed up by the government to fix things “forced all the teams to meet twice a day and share what they were working on.”

It sounds like part of Washington Gas’s problem was hiring a contractor who didn’t grasp how the business really worked. But a spokesman for the utility did not answer my question about which third-party company (or companies) might have been responsible.

“The site has not yet lived up to the enhanced online experience it was intended to deliver,” said Jim Monroe. “The test program for these site enhancements did not adequately cover the full spectrum of point-of-failure scenarios.”

Monroe added that the company was optimistic that a fix would resolve these issues “within the next week” — but he sent that e-mail in early February.

Two weeks later, the utility’s customer-service site still carried the latest in a long line of “Customer Updates”: “We remain focused on resolving technical issues causing log on and transaction delays on our eService portal. Again, we apologize for the inconvenience these technical issues have caused.”

But I still had to try again. I got the same sequence of errors as before. But when I tried clicking on the Pay Bill link shown on the last error page, I saw my account listed as if I’d had zero issues signing in. So now that I can finally log into my account and pay my bill, I trust this company will understand if I wait until the last possible day to do so.

Email Rob at rob@robpegoraro.com; follow him on Twitter at @robpegoraro.