Monday 5 March 2012

Cloud Architecture training at QCon (3)

Notes from the last part of the Cloud Architecture training at QCon.

This part was mostly about Cassandra - why Netflix uses it & how they use it. There were bits about monitoring and about scalability of their system. Here are the most interesting points:

1. CAP Theorem - Choose Consistency or Availability when Partitioned. Master-slave vs Quorum models.

2. An overview of current Netflix Persistence stack. It included info on Memcache, Cassandra, MongoDB and MySQL.

3. Introduction of Netflix open source tools for Cassandra users: Priam - Cassandra Automation and Astyanx - Cassandra java client. Both can be found on github. We also heard about Netflix contributions to Cassandra.

4. Interesting background on why Netflix uses Cassandra. Makes me really wonder about our use of Riak... Also we've got a very details Cassandra data flows for single and multi region apps. Included the replication, backup, archive and logging mechanism. It's all backed up by S3.

5. Cloud bring-up strategy - they used Shadow Traffic Redirection, worked in iterations, one page at a time, starting up with the simplest . They managed to "sell" the cloud to all developers early on a development boot camp. Most of the issues they faced were around key management and early lack of experience with the AWS.

6. The monitoring is based on logs (they log everything almost all the time). The logs are processed in Hadoop and are used to generate reports. They use AppDynamics as a portal that gives them deep look into what's running in production.

Last slides covered the structure of the team - notably small sys-ops, and DBA teams and very strong java dev teams.

Overall the training was very useful and packed with interesting info. I can see at least two more posts coming out of it. It also made me note some questions for Shopzilla's cloud migration in EU.

1 comment: