Rules of Engagement: Document Databases

That’s so crazy, it just might work

How many times have you implemented solutions that somehow manage to fit a square peg into a round hole? It’s the strangest thing too: you have a certain sense of pride that you were able to get SQL to do that thing that it really can’t do. At the same time, you feel dirty for having done it (and hope nobody ever finds that skeleton in your closet!). The simple message we are trying to send with Rules of Engagement is this: don’t use SQL Server to build Rube Goldberg Machines (unless you really need to). I’m not here to try to convince you to use NoSQL. However, I do think that you will make yourself more valuable by understanding more about NoSQL.

Document Databases: What

Simply put, document databases store data as objects or documents with potentially a variable number of fields in collections, instead of the tabular format that we are used to with tables in relational databases. Think key-value store where the value is a complex data structure (ie, document). Note that an update to that document typically means replacing the entire document. A document can, and often does have a pointer to other documents. The ‘schema’ is not enforced by the database itself, and this is often cited as a key reason to use NoSQL. In practice, the application accessing the data needs to deal with these schema differences, so one could argue that going ‘schemaless’ just shifts the burden from the database to the application. Document databases can partition the data across many servers, and replicate data for recovery purposes. However, document databases themselves cannot ‘join’ data. The application must handle any joins.

Who thought to themselves XML as I was talking about what documents might look like earlier? It turns out there is another data interchange protocol that has become quite popular and is used as the internal format for document databases: JSON (JavaScript Object Notation). And, as the name might suggest, JSON is JavaScript friendly and happens to be very lightweight. Here is a brief comparison of some sample data:

Relational Data

RoERelationalData

This shows how we would typically normalize data in a relational database. The Trims table has a foreign key back to the Models table to be able to look up the model name.

Document Data

RoENormalizedDocument

Storing data in a normalized document database doesn’t look all that different from a relational model to me. Document databases also have to make decisions about when it makes sense to denormalize. A common rule of thumb is to denormalize reference data that does not change frequently, to speed up reads.

Denormalized Document Data

RoEDenormalizedDocument

Just like with SQL Server, the trick is learning when to denormalize with a document database. And, like SQL Server, denormalizing will make reads faster at the expense of writes. One major difference with a document database is that denormalizing will help you scale horizontally. If all of the data you need is in a single document, you don’t need to check many servers for those related documents.

Think about how the JSON that the UI needs looks compared to how a relational database wants to store it. Right now developers are: writing code that reads data in a tabular format, tranlates it to JSON for the UI, and then translating the users’ input from JSON back into tabular format. So, you can understand why they might be enamored with a NoSQL store that actually stores the data in the same format that they want to use. And, that is why their argument of quicker development has some merit.

Document Databases: Who

Interestingly enough, one of the original document databases was Lotus Notes. However, for some reason, you won’t see many developers talking to you about wanting to develop with Lotus Notes. The most popular document databases are CouchDB (2007), mongoDB (2009), RavenDB (2010). I’m not going to go into the differences between them because our goal here is to look at document databases from a high level.

Document Databases: Where

mongoDB*
Craigslist, Foursquare, Shutterfly, Intuit
Windows, Linux, and OS X

CouchDB*
HootSuite, Vimeo
Windows, Linux, and OS X

RavenDB
MSNBC.COM, JetBrains
Windows and Linux

*Free levels on Heroku

Document Databases: When and Why

Unfortunately, this is not a simple question to answer. There are so many data store options available today, that it’s hard to find a general consensus. So, I will do the next best thing, Jeff Foxworthy style.

You Might Need a Document Database If…

…your application is document based, or the data isn’t tabular or structured

…you want to archive data, and do not want to deal with schema changes in the archived table

…you are looking for an easier upgrade path or have constant schema churn

…you are building an application for Content Management, Blogging, or User Data Management

…you have many small continuous reads and writes

programmer friendliness is a big concern

…you find yourself using SQL antipatterns, like EAV (Entity-attribute-value)

Use Case: Entity-attribute-value

Wikipedia’s definition of Entity-attribute-value model:

Entity–attribute–value model (EAV) is a data model to describe entities where the number of attributes (properties, parameters) that can be used to describe them is potentially vast, but the number that will actually apply to a given entity is relatively modest

Using EAV can have a significant performance impact, and SQL DBA’s will rightly try to avoid implementing it; EAV is a SQL antipattern, after all. I have typically seen the need for EAV in metadata driven applications where users or customers need to be able to define their own attributes. To make matters worse, they also need to be able to define the valid ranges of values for these attributes. In my hometown of SQLWorld, we want to predefine these attributes as columns; we certainly aren’t going to grant permissions allowing the application to change the schema and add a new column. The compromise we end up making – implementing a Rube Goldberg Machine EAV, where these user defined columns become rows, and we have to twist and turn the data every time we touch it.

Or, we can dump the user defined columns in a single XML column in the table and index the XML column. Now, our twisting and turning is made ‘easier’ by XPath. How many of you have a ‘I love XPath’ bumper sticker? If tables that need this flexibility are in a document database from the beginning, and the programmers are responsible for the data access, it keeps SQL developers out of the XML business.

Document Databases: How

Rules of Engagement is intended to help those familiar with relational databases understand more about NoSQL, so they know when it may be appropriate. Your typical SQL Server DBA is not going to be implementing a document database, so I’m not going to get into implementation specifics. However, one of our overarching messages in Rules of Engagement is to embrace polyglot persistence. And from that perspective, you will need to work with your developers on the best way to keep the applications’ document database in synch with your source of truth (SQL Sever).

Maybe it’s just me, but I always wonder what other database’s ‘Management Studio’ looks like, so I have included screenshots for the databases we’ve been talking about.

RavenDB Management Studio

RavenManagementStudio

CouchDB (Futon)

futon

mongoDB

MongoDB does not include a GUI-style administrative interface. Instead most administration is done from command line tools such as the mongo shell. However some UI’s are available as separate community projects… Some are focused on administration, while some focus on data viewing. – docs.mongoDB.org

Summary

There are many options available for storing data today. Although 90% or more of the time you should stick with your relational database, the NoSQL offerings today are very compelling. You owe it to yourself and your business partners to familiarize yourself with their strengths and weaknesses.

Add a Comment

Your email address will not be published. Required fields are marked *