Introduction to NoSQL by Martin Fowler

Big Data Jun 27, 2013

Excellent talk and presentation by Martin Fowler, author of my favorite book on the subject. He introduces NoSQL from a historical and technical perspective. He refers to actual use cases to describe the various types of NoSQL databases.

He first groups key-value, document oriented, column (family) oriented into “aggregate oriented” databases, and contrasts these with conventional relational databases. He then describes how graph databases are different from other NoSQL databases and are not aggregate oriented and actually allow you to jump across relationships between data elements – that relational databases ironically cannot do very easily. So the key is how do you “use” your data – and that determines which type of database you should use.

The second aspect that distinguishes databases is the consistency model. He identifies why aggregate oriented databases do not need transactional consistency, while graph databases and relational databases need to maintain transactional consistency and hence provide ACID transactions. He explains how we need to group a set of operations on databases into a transaction to avoid conflicts between writes by concurrent users. However, this introduces performance overheads for keeping transactions open and so the solution is to use version stamps on objects. And this is something we have to deal with in both, relational as well as NoSQL databases. He explains the CAP theorem in an easy to understand way.

Then the final question – Why do we use NoSQL databases? One reason is too much data – the conventional “big” data. The second reason is to get rid of the object-relational impedance mismatch. And why NoSQL databases are attempting at doing this differently from the failed object-oriented databases. However he emphasizes that NoSQL databases are not going to replace relational databases. He then introduces “Polyglot Persistence” where we can combine multiple databases to handle different types of data or the way we want to handle the data.