Friday, May 13, 2011

Why We're Using MongoDB

ERPEL is using MongoDB as its (main*) database. While we don't want to add another article to the ongoing debate whether to use SQL or NoSQL, we'll rather describe why we've chosen MongoDB in our scenario.

First off, it's not about staying buzz-word compliant and always trying out the latest and greatest. While we're definitely welcoming new approaches, this can hardly be the main argument for using them.

Additionally our decision isn't mainly based on performance considerations. Obviously, everyone wants to build a responsive and scalable product, but this is possible with either SQL or NoSQL (or you can fail with both of them). For example Stack Overflow uses Microsoft SQL Server (surprising, isn't it?) for its core data and it's working very well for them [1]. And as Facebook shows [2], you can build incredible stuff with MySQL. Nevertheless they are also using HBase for Messaging due to performance issues with MySQL [3].
So our conclusion is that as long as you're not Facebook, Google,... both SQL and NoSQL can take you a long way.

The main argument for us (and others like guardian.co.uk [4]) is the schemaless nature of NoSQL in general and the approach of document stores in specific. We simply don't have a fixed schema - data varies a lot. Let's consider a simple example - phone numbers of a user:
  • Most users will have a single mobile phone and one in the office.
  • However, some might have a fax or even a pager as well.
  • And as soon as someone has two phones in the office, it's getting really complicated.
Either you can have lots and lots of null values or you'll need to create join tables. Both approaches aren't too appealing (both from a performance and easy of use perspective). Why do we even have to care about this so much? Can't we simply create a list for each phone type (mobile, office,...), add values as required, and empty collections are ignored - more or less like in Java? Well, we can but not with SQL. Using a document store this is easy: You simply have a JSON document where you have lists (arrays to be specific), empty values simply don't exist in the database. Now that's a great approach for us. Additionally schema changes are also a thing of the past (remember: there is no schema), allowing an extremely agile development process. And finally SQL queries can basically be translated to MongoDB queries so there are hardly and trade-offs here.

Once we had settled for MongoDB we decided to use it in combination with Morphia [5], which is basically an ORM for MongoDB. For a nice introduction into Morphia take a look at our presentation and example project we did for MongoUK2011 [6].

In case you're wondering why the current development of Morphia seems to be a bit slow don't worry. At the moment the official MongoDB Java driver and Morphia are combined [7] into an even more performant and feature-rich product so that we're definitely developing into the right direction.


* Our "core" data is stored in MongoDB, but we're also using / planning to use other solutions for specific scenarios - searching for example. But this will be the covered in another article...



[1] http://highscalability.com/blog/2011/3/3/stack-overflow-architecture-update-now-at-95-million-page-vi.html
[2] http://www.facebook.com/MySQLatFacebook
[3] http://www.quora.com/Why-did-Facebook-pick-HBase-instead-of-Cassandra-for-the-new-messaging-platform
[4] http://www.slideshare.net/tackers/why-we-chose-mongodb-for-guardiancouk
[5] https://code.google.com/p/morphia/
[6] https://github.com/xeraa/mongouk2011
[7] http://blog.mongodb.org/post/5217011262/improving-scalable-java-application-development-with-mon

No comments:

Post a Comment