Saturday, June 18, 2011

Installing Graylog2 on Ubuntu with Apache2

Graylog2 is an extremely interesting project for logging with MongoDB, so we wanted to give it a try. While there are some installation instructions out there, we had to do some digging around for getting the setup just the way we wanted. So here's our own step-by-step installation instruction:

Prerequisites: We're assuming you already have MongoDB, Apache2, and Java up and running on your Linux box (Ubuntu in our specific case, but at least Debian should be very similar).

First you'll need to set up the server (we're using the lastest version in this example, but this will obviously change in the future) - this is very straight forward:
  • wget https://github.com/downloads/Graylog2/graylog2-server/graylog2-server-0.9.5p1.tar.gz
  • tar xvfz graylog2-server-0.9.5p1.tar.gz
  • cd graylog2-server-0.9.5p1/
  • sudo cp graylog2.conf.example /etc/graylog2.conf
  • Edit the settings according to your MongoDB installation: sudo pico /etc/graylog2.conf
  • cd bin/
  • sudo ./graylog2ctl start
  • ps aux | grep gray
Once this is working as expected, you might want to automatically start the Graylog2 server. Assuming you moved it to /usr/local/graylog2-server/:
  • sudo touch /etc/init.d/graylog2-server
  • sudo pico /etc/init.d/graylog2-server
    #! /bin/sh
    ### BEGIN INIT INFO
    # Provides: Starts Graylog2 Server
    # Required-Start:
    # Required-Stop:
    # Default-Start: 2 3 4 5
    # Default-Stop: 0 1 6
    # Short-Description: Graylog2 Server
    # Description: Server aggregating the logs
    ### END INIT INFO
    # Author: Philipp Krenn

    # Aktionen
    case "$1" in
    start)
    cd /usr/local/graylog2-server/bin/
    ./graylog2ctl start
    ;;
    stop)
    cd /usr/local/graylog2-server/bin/
    ./graylog2ctl stop
    ;;
    restart)
    cd /usr/local/graylog2-server/bin/
    ./graylog2ctl restart
    ;;
    esac

    exit 0
  • sudo update-rc.d graylog2-server defaults
Once this is done we can install the Ruby on Rails based frontend. We wanted to install it into a subdirectory of Apache2, which made it a little trickier:
  • Get the required dependencies: sudo apt-get update && sudo apt-get install ruby1.8 rubygems rake make libopenssl-ruby ruby-dev build-essential git-core
  • cd ~
  • Get the Graylog2 frontend: wget https://github.com/downloads/Graylog2/graylog2-web-interface/graylog2-web-interface-0.9.5p2.tar.gz
  • tar xvfz graylog2-web-interface-0.9.5p2.tar.gz
  • Get the latest Ruby Gems to bundle the installation: wget http://production.cf.rubygems.org/rubygems/rubygems-1.8.5.tgz
  • tar xvfz rubygems-1.8.5.tgz
  • cd rubygems-1.8.5/
  • sudo ruby setup.rb
  • sudo gem update
  • sudo gem install bundler
  • cd ~/graylog2-web-interface-0.9.5p2/
  • bundle install
  • bundle show
  • Apache2's base directory is /var/www/, so let's create /var/rails/ and put the Rails files there. We'll then link the Apache2 directory to the Rails application.
  • sudo mkdir /var/rails/
  • sudo mkdir /var/rails/graylog2
  • sudo cp -R ./* /var/rails/graylog2/
  • Install Apache2's Passenger module, which integrates Rails applications.
  • sudo apt-get install libapache2-mod-passenger
  • sudo gem install passenger
  • sudo passenger-install-apache2-module
  • The previous step ran the module's installer, which will do quite some work for you. Additionally it will tell you which dependencies are not yet met, in our case we had to add the following packages: sudo apt-get install libcurl4-openssl-dev libssl-dev zlib1g-dev apache2-prefork-dev libapr1-dev libaprutil1-dev
  • Run the tool again and execute the steps it tells you to, except for the last one wanting you to change some configurations: sudo passenger-install-apache2-module
  • Instead open up the following file and make sure only the given content is available there: sudo pico /etc/apache2/mods-available/passenger.conf
    <ifmodule c="">
    PassengerRoot /usr

    PassengerRoot /usr/lib/ruby/gems/1.8/gems/passenger-3.0.7
    PassengerRuby /usr/bin/ruby1.8
    </ifmodule>
  • Do the same for another configuration file: sudo pico /etc/apache2/mods-available/passenger.load
    LoadModule passenger_module /usr/lib/ruby/gems/1.8/gems/passenger-3.0.7/ext/apache2/mod_passenger.so
  • Now add your subdirectory to your Apache2's configuration (inside the VHost past): sudo pico /etc/apache2/sites-available/default
    RailsBaseURI /graylog2
  • Link the public/ directory of your Rails application to the one you set up in Apache2's configuration: sudo ln -s /var/rails/graylog2/public/ /var/www/graylog2
  • Change the configuration, this must the be same as the server configuration: sudo pico /var/www/graylog/config/mongoid.yml
  • Finally restart Apache2: sudo /etc/init.d/apache2 restart
That's it, the Graylog2 frontend should now be available in the /graylog2 subdirectory.


UPDATE: As noted by Lennart (https://twitter.com/#!/_lennart/status/88330653246038016) you need to manually create your indexes: In /var/rails/graylog2 run the command sudo rake db:mongoid:create_indexes RAILS_ENV=production and you're good to go.

Logging MongoDB Queries

Why do you need to log queries?
  • In case you want to know how long it takes to execute a specific query.
  • To see which is the slowest query.
  • If you want to see which queries are actually being run (in case you're using a layer of abstraction like Morphia this can be a pretty interesting question)
  • ...
Using MongoDB's built in profiling tool it's very simple to achieve this:
  1. You can enable and disable profiling on a per database level (being disabled by default).
  2. Connect to the database on the shell / CMD (assuming you're using the port 8082): mongo localhost:8082/erpel_test
  3. Activate profiling for all queries, not just slow ones: db.setProfilingLevel(2);
  4. Exit the shell / CMD and run some queries.
  5. Your database should now have a new collection called system.profile, showing the raw queries and how long it took to execute them.
For additional profiling options see http://www.mongodb.org/display/DOCS/Database+Profiler.

Friday, May 13, 2011

Why We're Using MongoDB

ERPEL is using MongoDB as its (main*) database. While we don't want to add another article to the ongoing debate whether to use SQL or NoSQL, we'll rather describe why we've chosen MongoDB in our scenario.

First off, it's not about staying buzz-word compliant and always trying out the latest and greatest. While we're definitely welcoming new approaches, this can hardly be the main argument for using them.

Additionally our decision isn't mainly based on performance considerations. Obviously, everyone wants to build a responsive and scalable product, but this is possible with either SQL or NoSQL (or you can fail with both of them). For example Stack Overflow uses Microsoft SQL Server (surprising, isn't it?) for its core data and it's working very well for them [1]. And as Facebook shows [2], you can build incredible stuff with MySQL. Nevertheless they are also using HBase for Messaging due to performance issues with MySQL [3].
So our conclusion is that as long as you're not Facebook, Google,... both SQL and NoSQL can take you a long way.

The main argument for us (and others like guardian.co.uk [4]) is the schemaless nature of NoSQL in general and the approach of document stores in specific. We simply don't have a fixed schema - data varies a lot. Let's consider a simple example - phone numbers of a user:
  • Most users will have a single mobile phone and one in the office.
  • However, some might have a fax or even a pager as well.
  • And as soon as someone has two phones in the office, it's getting really complicated.
Either you can have lots and lots of null values or you'll need to create join tables. Both approaches aren't too appealing (both from a performance and easy of use perspective). Why do we even have to care about this so much? Can't we simply create a list for each phone type (mobile, office,...), add values as required, and empty collections are ignored - more or less like in Java? Well, we can but not with SQL. Using a document store this is easy: You simply have a JSON document where you have lists (arrays to be specific), empty values simply don't exist in the database. Now that's a great approach for us. Additionally schema changes are also a thing of the past (remember: there is no schema), allowing an extremely agile development process. And finally SQL queries can basically be translated to MongoDB queries so there are hardly and trade-offs here.

Once we had settled for MongoDB we decided to use it in combination with Morphia [5], which is basically an ORM for MongoDB. For a nice introduction into Morphia take a look at our presentation and example project we did for MongoUK2011 [6].

In case you're wondering why the current development of Morphia seems to be a bit slow don't worry. At the moment the official MongoDB Java driver and Morphia are combined [7] into an even more performant and feature-rich product so that we're definitely developing into the right direction.


* Our "core" data is stored in MongoDB, but we're also using / planning to use other solutions for specific scenarios - searching for example. But this will be the covered in another article...



[1] http://highscalability.com/blog/2011/3/3/stack-overflow-architecture-update-now-at-95-million-page-vi.html
[2] http://www.facebook.com/MySQLatFacebook
[3] http://www.quora.com/Why-did-Facebook-pick-HBase-instead-of-Cassandra-for-the-new-messaging-platform
[4] http://www.slideshare.net/tackers/why-we-chose-mongodb-for-guardiancouk
[5] https://code.google.com/p/morphia/
[6] https://github.com/xeraa/mongouk2011
[7] http://blog.mongodb.org/post/5217011262/improving-scalable-java-application-development-with-mon

Friday, April 15, 2011

Fun with C++ templates and generic insertion/extraction operator overloading

Recently, I ended up with a class hierarchy of business objects which was generated by a tool. One objective for me was to support the easy serialization and deserialization of these classes. The tools generating these classes provided some c library functions for that matter but what I really wanted is proper C++ insertion/extraction operator support. The users of the code shall be able to write something like this:

xsd__base64Binary binary; // a generated business object
binary.id = "a value";

// printing an (xml-based)
// version of the object to stdout:
std::cout << binary;


Likewise, the deserialization of objects from a given input stream shall work accordingly.

For this to happen we simply have to define overloaded operators implementing the custom streaming into an output stream and from an input stream. As the only commonality of all the classes was a public member variable called "soap", I decided to implement the operators as template functions and created a new header file for those:

template<typename T>  std::ostream&
operator<<(std::ostream &o, const T&p)
{
// C-serializer function provided
// by the toolkit
soap_serialize(p.soap);
return o;
}

template<class T>std::istream &
operator>>(std::istream &i, T &t)
{
soap_begin_recv(t.soap);
soap_end_recv(t.soap))

return i;
}


I wrapped the given C serializer functions, included my new header file and tried the code. At first it looked great but when I added more code...

std::cout << "Success!";


... my compiler greeted me with the following error:

1>test.cpp(123): error C2593: 'operator <<' is ambiguous
1> customoperators.h(109): could be 'std::ostream &operator <<<const char[9]>(std::ostream &,T (&))'
1> with
1> [
1> T=const char [9]
1> ]
1> c:\programme\microsoft visual studio 10.0\vc\include\ostream(851): or 'std::basic_ostream<_Elem,_Traits> &std::operator <<<char,std::char_traits<char>>(std::basic_ostream<_Elem,_Traits> &,const _Elem *)' [found using argument-dependent lookup]
1> with
1> [
1> _Elem=char,
1> _Traits=std::char_traits<char>
1> ]

What just happened is that we inadvertently introduced an ambiguity with certain template based streaming operators already predefined for us. The compiler cannot decide on which version of the operator to use and thus gives up with an error. (Nontemplate operators on the other hand should still be fine as the language prefers those overloads over template versions.)

Amazingly, it is still possibly to relax the situation by using the full power of template metaprogamming. Even though function templates do not support partial spezializations, it's still possible to come up with constructs where the compiler will ignore our newly introduced operators when feasible. The following section will highlight, how it's possible to let the compiler consider the operator if and only if objects of the given type parameter T contain a member named "soap". As our business objects lack any common interface beside this very member, it's clear that there may still be scenarios where this could lead to undesired results anyways. But as we're in full control of the source code then, it's much easier to work around further problems.

The first construct which helped me on my way was the enable_if family of templates introduced with the great boot c++ libraries. These template expressions are using the so called SFINAE principle (subsitution failure is not an error) to their great advantage. SFINAE comes into play when the compiler tries to substitute template parameters and ends up with an invalid type or expression. Instead of bailing out with a compiler error, the affected template construct is simply not considered as legal match for the actually used type parameters. By exploiting this fact, enable_if allows the specific inclusion or exclusion of templates into the set considered by the compiler. The basic construct given in boost is deceptively short and elegant:

template <bool B, class T = void>
struct enable_if_c {
typedef T type;
};

template <class T>
struct enable_if_c<false, T> {};


The template struct enable_if_c has a boolean type parameter B and a type parameter T. T is typedefed inside the struct and exposed by the name type. A partial specialization of enable_if_c for the concrete boolean value of false misses this type. When other templates are now referring to type inside enable_if_c, they will only be valid if the boolean compile time expression for B is evaluated to true. If this is not the case, enable_if_c::type is an invalid construct, SFINAE kicks in and the template relying on enable_if_c will be kicked out of the set of templates considered by the compiler.

SFINAE can be employed to gather a variety of useful information about a type. The paper Once, Weakly: SFINAE Sono Buoni describes a variety of use cases. I've taken out one example for my basic problem: when is a given type T actually a class type?


template<typename T>
struct IsClass
{
typedef char True ; //sizeof(True)==1
typedef struct{char a[2];} False; //sizeof(False)>1

template<class C>static True isClass(int C::*);
template<typename C>static False isClass(...);

enum
{
value = sizeof(isClass<T>(0))==sizeof(True)
};
};


This struct template uses the different sizes of the two member template functions and the C sizeof() operator to gather, whether T is a class or not. If T is a clas, the size of the return value of isClass(0) will be sizeof(True) as the first overloaded can be called. Else the generic (lowest priority) overload for variable arguments will be taken, resulting in a different return type and thus a different evaluation for value.

This basic idea of using the sizeof operator on the return types of overloaded template function can also be extended to gather, whether a given type has a certain member. The following construct is adapted from here and here and returns just that. Instead of going over the construct myself, I'm delegating the interested reader towards Cplusplus.co.il which already has a great walkthrough of the basic ideas.


template<typename T, typename Enable = void>
struct HasSoap {
struct Fallback {
int soap;
}; // introduce member name "soap",
// possibly creating an ambiguity

struct Derived : T, Fallback { };

template<typename C, C>
struct ChT;

template<typename C>
static char (&f(ChT<int Fallback::*, &C::soap>*));

template<typename C>
static int (&f(...));

static bool const value =
sizeof(f<Derived>(0)) == sizeof(int);
};


I had to add an additional partial class template specialization to make sure non class types can be used with HasSoap too. (Otherwise the compiler would complain that it can't derive from a non-class type):


// partial specialization: non class types have no soap member and can't be derived from.
template<typename T>
struct HasSoap<T, typename Enable_If_C<!IsClass<T>::value>::type>
{
static bool const value = false;
};


With all these additional templates in place there was still one thing to do: adapt my operators to be only considered when the given type has a "soap" member:
Changing the operator's signature from


template<typename T> std::ostream&
operator<<(std::ostream &o, const T&p)


to


template<typename T>
typename Enable_If_C<HasSoap<T>::value, std::ostream>::type&
operator<<(std::ostream &o, const T&p)


took care of this. What's happening now is, that HasSoap<T>::value will evaluate only to true for class types with soap members. If this is not the case, the Enable_If_C specialization without a type typedef will be chosen by the compiler, yielding an invalid reference to Enable_If_C::type. SFINAE kicks in again and the compiler is spared an unnecessary ambiguity.

Considering this little template adventure my conclusion for this post can only be: C++ Templates may be hard to understand sometimes but then again, they can provided us with really powerful ways to craft our source code.

Thursday, April 14, 2011

m2eclipse plugin: NoClassDefFoundError when running JUnit tests

We are using the m2eclipse plugin (including its WTP extension) for leveraging Maven integration in Eclipse. Recently, we ran into the issue of class not found errors, when trying to run JUnit tests from Eclipse (Run/Debug as JUnit test). More information about this problem can be found here, here or here.
A workaround has been suggested to change the build path order in Eclipse - i.e., moving the Maven Dependencies before the JRE System Library in the build order. However, this did not solve our problem.

We discovered that the auto-clean feature of the maven-clean-plugin caused the troubles in our setting, which we used as follows in the pom:


<plugin>    
  <artifactId>maven-clean-plugin</artifactId>    
  <version>2.4</version>    
  <executions>      
    <execution>           
      <id>auto-clean</id>        
      <phase>initialize</phase>        
      <goals>          
        <goal>clean</goal>        
      </goals>      
    </execution>    
  </executions>  
</plugin>

After removing the maven-clean-plugin from our pom.xml, running JUnit tests from a m2eclipse project worked like a charm.