How REA Group Weathered The AWS Cloud Outage

Real estate giant REA Group has in recent Amazon Web Services outage relatively unscathed Sydney availability zone through a multi-region and multi-availability cloud architecture area.

Earlier this month, one of Sydney AWS Availability Zones "sank after bad weather caused a UPS (UPS) configuration failure of the company.

The blackout sent some of the largest Web properties scrambling Australia when EC2 instances and EBS in AZ have become inaccessible services and others, including the API elastic internal DNS lookup and flows of the problems experienced.

REA Group, a big user of AWS services, was one of those affected, but managed to get away with only a slightly slower server rotates ads, a Web application offline, an application for Android wobbly and response times some services.

"... If we are not totally insensitive well, overall it was a good result," said Jeremy Burton greater technical chief.

Being prepared and luck ..

While the court has led many to reconsider their cloud architecture, REA Group said the design failure - along with the "luck" - helped him weather the storm.

SSP production systems are implemented in a preset multiple availability zone. His most critical systems - as well as Redshift not offer options Multi-AZ - are designed to run most regions, especially in Frankfurt and Sydney.

The IT team manages copies of independent systems that interact with the REA master data store in each region to eventual consistency, Burton said.

"The only thing that is common is the source of the data," he wrote.

"That way, if a region has problems, the other is affected at all."

API customers can talk cross region if local copies are not available, Burton said, using a combination of AWS Route53 routing latency and health checks Route53.

This approach was initiated during the recent Sydney court AZ - "one of our automatic switching services in our region of Europe, where some of his authorities had problems," Burton said.

Moreover, continuing with the host of some of its core systems demand data center and deployment directly to S3 for static assets REA helped avoid severe inactivity time.

"S3 by its nature is more durable than an EC2 instance, and more likely to survive a failure AZ" Burton said.

"It's multi-AZ default, and as events have shown weekend just be mutli-AZ is not necessarily enough to be resistant to failure AZ, the S3 service has."

deep pockets necessary

However, be prepared to see double infrastructure costs by adopting a multi-region approach, Burton said.

"You need good architecture systems running on eventual consistency, and to disengage in a way that provides redundancy in the relevant parts of the infrastructure," he said.

"Making your unchanging infrastructure has a cost of automation.

"And in some cases, simply not worth it. That SLA does not mean a need for multi region, or the system is not enough to justify the costs of critical engineering or infrastructure."

