Delta has a meltdown

Started by spuwho, August 08, 2016, 09:33:37 PM

spuwho

A power outage in 1 location should not cause this type of system outage in any Fortune 500 company in the world.

Either Delta has some pretty crappy contingency management or this is a cover story for something else.  I am still struggling to believe a power outage in Atlanta would cause this much chaos for a company.  The list is endless, generators, backup datacenters, database replication, disaster testing....the list goes on, and yet a power outage took the whole flight management system down?

The FAA should be all over this. Someone or some group of people are going to lose their jobs over this.

Per the NY Times:

http://www.nytimes.com/aponline/2016/08/08/world/europe/ap-eu-delta-outage-.html

Delta Recovering After Global Outage Delays, Cancels Flights

DALLAS — At least half of all Delta Air Lines flights Monday were delayed or canceled after a power outage knocked out the airline's computer systems worldwide.

About 17 hours after the outage at one of its facilities, Delta was struggling to resume normal operations and clear a backlog of stranded passengers. It sought to appease frustrated customers by offering refunds and $200 travel vouchers.

By 7 p.m., Delta said it had canceled more than 740 flights, although its computer systems were fully functioning again.

Tracking service FlightStats Inc. counted more than 2,400 delayed flights.

Delta representatives said the airline was investigating the cause of the meltdown. They declined to describe whether the airline's information-technology system had enough built-in redundancies to recover quickly from a hiccup like a power outage.

For passengers, hardship from the early morning meltdown was compounded by the fact that Delta's flight-status updates weren't working either. Instead of being able to stay home, many passengers only learned about the flight problems when they arrived at the airport.

"By the time I showed up at the gate the employees were already disgruntled, and it was really difficult to get anybody to speak to me or get any information," said Ashley Roache, whose flight from Lexington, Kentucky, to New York's LaGuardia Airport was delayed. "The company could have done a better job of explaining ... what was happening."

Delta said that about 3,300 of its nearly 6,000 scheduled flights had operated by 7 p.m. Eastern time. The airline posted a video apology by CEO Ed Bastian.

A power outage at an Atlanta facility at around 2:30 a.m. local time initiated a cascading meltdown, according to the airline, which is also based in Atlanta.

A spokesman for Georgia Power said that the company believes a failure of Delta equipment caused the airline's power outage. He said no other customers lost power.

Delta spokesman Eric O'Brien said he had no information on the report and that the airline was still investigating.

Flights that were already in the air when the outage occurred continued to their destinations, but flights on the ground remained there.

Airlines depend on huge, overlapping and complicated systems to operate flights, schedule crews and run ticketing, boarding, airport kiosks, websites and mobile phone apps. Even brief outages can snarl traffic and cause long delays.

That has afflicted airlines in the U.S. and abroad.

Last month, Southwest Airlines canceled more than 2,000 flights over four days after an outage that it blamed on a faulty network router.

United Airlines suffered a series of massive IT meltdowns after combining its technology systems with those of merger partner Continental Airlines.

Lines for British Airways at some airports have grown longer as the carrier updates its systems.

On Monday in Richmond, Virginia, Delta gate agents were writing out boarding passes by hand. In Tokyo, a dot-matrix printer was resurrected to keep track of passengers on a flight to Shanghai.

"Not only are their flights delayed, but in the case of Delta the website and other places are all saying that the flights are on time because the airline has been so crippled from a technical standpoint," said Daniel Baker, CEO of tracking service FlightAware.com.

Many passengers, like Bryan Kopsick, 20, from Richmond, were shocked that computer glitches could cause such turmoil.

"It does feel like the old days," Kopsick said. "Maybe they will let us smoke on the plane, and give us five-star meals in-flight too!"

In Las Vegas, stranded passengers were sleeping on the floor, covered in red blankets. When boarding finally began for a Minneapolis flight — the first to take off — a Delta worker urged people to find other travelers who had wandered away from the gate area, or who might be sleeping off the delays.

Tanzie Bodeen, 22, a software company intern from Beaverton, Oregon, left home at 4 a.m. to catch a flight from Minneapolis and learned about the delays only when she reached the airport and saw media trucks and news crews.

Bodeen said that passengers were taking the matter in stride. "It doesn't seem really hostile yet," she said.

The company said customers whose flights were canceled or delayed more than three hours could get a refund and $200 in travel vouchers. Travelers on some routes can also make a one-time change to the ticket without paying Delta's usual change fee of $200 for domestic flights and up to $500 for international flights.


RattlerGator

There is no way in hell this resulted from a power outage. I was caught in the mess, trying to get from Tallahassee to Detroit. But I'm not made at 'em for covering up whatever in the hell it was. It was frustrating as hell but I only suffered a 5.5 hour delay in my arrival time.

mbwright

There has to be much more to this story.  A power outage only at Delta's facility?  This does not make any sense. 

Maybe they should upgrade their systems, with the high profits they are making these days.

Steve

The story that I've heard is that when doing a routine generator test, flipping from one generator to another, the switching component caught fire, interrupting power from both generators. Given that they were testing generators, they were likely running just on generator power (note, this is a VERY common test that IT departments do). There is usually a battery backup, but those are usually designed to power for literally minutes (bridging the gap between utility power and the generators). If they lost power completely long enough that the batteries ran out (which could have been 5 minutes) or the batteries didn't pick up at all because of the outage, they likelyhad systems perform a hard shutdown which for a company like Delta with tons of old technology (as any company like them has) is REALLY bad. Systems generally need to go down and come up in a specific order.

The rumor I heard was that technology was mostly restored by about 10AM, however the results of a worldwide airline having six hours or so of NO aircraft movement is catastrophic. Is it acceptable that they missed something in redundancy? No, but it happens. Obviously they need to fix this, but being in IT, I get it.

CityLife

I happened to be flying Southwest last month the day the same thing happened to them. It was utter chaos and Southwest was completely unprepared to handle it.

I had a layover in Vegas on the way to San Diego. My flight was canceled and all other flights were grounded and they couldn't assure me I would get on any future flights. I had to be in SD in a hurry to meet up with friends that were heading down to Baja, so I ended up renting a car and driving. Thankfully I did, or I would have been stuck for a day or two in Vegas.

The fact that these two incidents are happening so close to each other points to possible cyber terrorism. I'm not saying it is, just that it makes you wonder.

BridgeTroll

Quote from: CityLife on August 09, 2016, 10:52:00 AM
I happened to be flying Southwest last month the day the same thing happened to them. It was utter chaos and Southwest was completely unprepared to handle it.

I had a layover in Vegas on the way to San Diego. My flight was canceled and all other flights were grounded and they couldn't assure me I would get on any future flights. I had to be in SD in a hurry to meet up with friends that were heading down to Baja, so I ended up renting a car and driving. Thankfully I did, or I would have been stuck for a day or two in Vegas.

The fact that these two incidents are happening so close to each other points to possible cyber terrorism. I'm not saying it is, just that it makes you wonder.
Bingo... and they would never admit it...
In a boat at sea one of the men began to bore a hole in the bottom of the boat. On being remonstrating with, he answered, "I am only boring under my own seat." "Yes," said his companions, "but when the sea rushes in we shall all be drowned with you."

spuwho

United also had a tech meltdown earlier this year. Hate to sound paranoid, but it seems US carriers are having a lot of them?

I get it that IT had systems up by 10AM, but usually companies that rely on real time systems like this have an active-active configuration that can take over instantly in the event of a contingency

Steve

Quote from: CityLife on August 09, 2016, 10:52:00 AMThe fact that these two incidents are happening so close to each other points to possible cyber terrorism. I'm not saying it is, just that it makes you wonder.

This seems easily disproven. Delta's incident resulted in a fire which required Atlanta Fire Rescue to put out. If this was cyberterrorism, I doubt they would go through the trouble to get the Fire Department involved.

Steve

Quote from: spuwho on August 09, 2016, 11:00:50 AMI get it that IT had systems up by 10AM, but usually companies that rely on real time systems like this have an active-active configuration that can take over instantly in the event of a contingency

As an outsider, I have a mixed view on how they handled this. My guess is they have some sort of active-active (or active-passive, which in many cases can be just as good), but in the same data center. I bet they assumed that since they had two generators (plus utility) they were covered on power and didn't forsee a meltdown in the switching mechanism causing a full power interruption.

Had they had something like a real-time transactional backup of all systems at a second location, yes they could have switched to another data system. Clearly they didn't, and my guess is they weighted the cost (time) to switch to another center, versus taking the time to get everything up in place, and found it faster to get everything up in place. A lot of legacy companies are in this position, and the time to switch entirely to another center can be a VERY long time so it's only done as a last resort.

Don't get me wrong - I'm not apologizing for them, though as Platinum with them I'm pretty loyal as well. I'm in tech and while I feel for the folks who my guess slept very little last - either from continual work or from the stress of being on the hot seat right now - this just can't happen.

I'm also flying tomorrow so I'm hoping this is worked out. There have been some decent delays between ATL-JAX today as well.

finehoe

Quote from: Steve on August 09, 2016, 11:17:44 AM
I feel for the folks who my guess slept very little last - either from continual work or from the stress of being on the hot seat right now - this just can't happen.

Not to mention the poor front-line people who bore the brunt of the angry passengers.  Nonetheless, who wants to wager Edward Bastian gets a bonus this year?

MusicMan

I got caught up in the mess, returning from Venice Italy yesterday. Our Atlanta bound flight was late 4 hours leaving Venice and that of course was the first domino in a long and tiresome day. Traveling with my family made it tougher. I give a lot of credit to Delta's front line people, at least the ones I dealt with. I finally caught a flight from Atlanta at 3:45 am arriving at 4:45 am.  I'm glad I didn't have to spend any more time at ATL than i did.

The railway tunnels connecting the concourses had a "Zombie apocalypse" feel to them, dark with people sleeping everywhere.

CityLife

Quote from: Steve on August 09, 2016, 11:08:42 AM
Quote from: CityLife on August 09, 2016, 10:52:00 AMThe fact that these two incidents are happening so close to each other points to possible cyber terrorism. I'm not saying it is, just that it makes you wonder.

This seems easily disproven. Delta's incident resulted in a fire which required Atlanta Fire Rescue to put out. If this was cyberterrorism, I doubt they would go through the trouble to get the Fire Department involved.

Do you even watch Stranger Things? I mean Netflix tells me the government can cover up whatever they want.  ;)

But seriously though, I doubt the government would want it getting out that Russia, China, ISIS or whoever has the capability to disrupt US commerce like this. Imagine the war cry from every person whose travel plans were massively affected by whichever player did it. Not to mention the impact it would have on the national psyche. Again not saying this is cyber terrorism, just that it is a possibility...and like BridgeTroll said, we will never know.

BridgeTroll

Cyber crime happens wayyyy more often than reported... especially that emanating from foreign origins.  There is no incentive to reporting your weaknesses.  State sponsored cyber crime is not publicly reported because there is little that can be done.
In a boat at sea one of the men began to bore a hole in the bottom of the boat. On being remonstrating with, he answered, "I am only boring under my own seat." "Yes," said his companions, "but when the sea rushes in we shall all be drowned with you."

Adam White

Yes - and the USA is probably doing its fair share of cyber crime. We used to do it to the Soviets and Iranians, no reason to assume we don't still do it.
"If you're going to play it out of tune, then play it out of tune properly."

David

Same thing happened to my company, but on a much smaller scale. Remember when Riverside ave last power completely a few years ago? It took 10 offices offline nationwide for our company, took an hour or two to bring the systems back up once power was restored, but in our case, we just didn't have any back up generators at the time. The data center ran off UPS for about 45 minutes before we lost power completely.

Thanks to the Riverside Ave blackout of 2014, we now have backup generators :D

My niece was caught up in this Delta mess, but it was only a 3 hour delay in her case.