Wednesday, 7 March 2012

Qcon 2012 Keynote

This week I'm attending Qcon in London. The morning started off with a "The Data Panorama" keynote by Martin Fowler and Rebecca Parsons. Here are my notes from the talk.

The talk started off with some funny lines and Martin going crazy about how big the data is these days - bonus points in my book. Also bonus points for a female speaker in the first keynote in the conference.

1. The data is big, like really big. And it's not only a problem for companies like Google and Amazon, but for everybody, because we can't correctly estimate how big our data will get.

2. Other than big, the data is distributed (as in distributed on different data sources but also created by distributed contributors and on a range of different tools that introduce new behaviours), valuable, urgent (as in needs to be analysed almost as it comes), connected.

3. Martin tried to define NoSQL, or rather it's characteristics, and came up with non-relational, open source, cluster-friendly, 21st century web ready, schema-less. He proceed to divide more commonly used data sources into Document, Column, Graph and Key-Value. He noted that what's common for Document, Column and Key-Value ones is the approach to aggregates. He concluded saying that when you add relational DBs to the mix you'll get Polyglot Persistence - you have various tools that can be picked up depending on what your needs are - improvement over years of trying to use the same tool (relational DBs) to do everything.

4. Another shift noted in the way we use Data sources is on application level. Traditionally we would use Enterprise Data Model where one data source would be accessed by many different applications. The shift is towards All the apps owning their own data sources which are distributed, and can be synchronised when needed. The responsibility for the data shifts towards the teams that own the applications.

5. There was a slide or two on event sourcing, and how this is making building up Db from nothing much easier (also noted, that this is a trick "stolen" from version control tools).

6. Cloud Storage impact on the data was summarised briefly - the separation of location of the data owner and location of the data & very cheap, almost unlimited processing power were mentioned.


8. The last part of the talk covered how all of this will change the Data Scientist work and Data Warehouse function in companies. They've put a lot of emphasis on how the analysis process of the data will change in the future (more agile, more intelligent approach, searching for answers for questions we don't know yet). They also talked about privacy & how this will become even bigger concern in the future.

Overall the talk was a nice mix of "techy" and less "techy", all of this made a lot of sense & was quite useful.

No comments:

Post a Comment