Monday, March 11, 2013

Starting HBase Server Programmatically

I'm implementing a database application these days and for that I wanted to programmatically start and stop a standalone HBase server. More specifically I wanted to make HBase server a part of my application so that whenever my application starts, HBase server also starts up. This turned out to be more difficult than I thought it would be. To start a HBase server you actually need to start three things:
1. HBase master server
2. HBase region server
3. ZooKeeper
The default startup script shipped with the HBase binary distribution does all this for you. But I wanted a more tightly integrated and a fully programmatic solution. Unfortunately the HBase public API doesn't seem to expose the functionality required for programmatically starting and stopping the above components (at least not in a straightforward manner). So after going through the HBase source and trying out various things, I managed to come up with some code that does exactly what I want. At a high level, this is what my code does:
1. Create an instance of HQuorumPeer  and execute it on a separate thread.
2. Create an initialize a HBaseConfiguration instance.
3. Create an instance of HMaster and execute it on a separate thread.
4. Create an instance of HRegionServer and execute it on a separate thread.
Both HMaster and HRegionServer implement the Runnable interface. Therefore it's easy to run them on separate threads. I created a simple Java Executor instance and scheduled HMaster and HRegionServer for execution on it. But HQuorumPeer was a bit tricky. This class only contains a main method and has no such thing called a public API. So one solution is to create your own thread class, which simply invokes the above mentioned main method. The other option is to write your own HQuorumPeer class implementing the Runnable interface. The original HQuorumPeer class from the HBase project is fairly small and contains only a small amount of code. So I  took the second approach. I simply copied the code from the original HQuorumPeer and created my own HQuorumPeer implementing the Runnable interface. Overall this is what my finalized code looks like:
        
        exec.submit(new HQuorumPeer(properties));
        log.info("HBase ZooKeeper server started");
        
        Configuration config = HBaseConfiguration.create();
        File hbaseDir = new File(hbasePath, "data");
        config.set(HConstants.HBASE_DIR, hbaseDir.getAbsolutePath());
        for (String key : properties.stringPropertyNames()) {
            if (key.startsWith("hbase.")) {
                config.set(key, properties.getProperty(key));
            } else {
                String name = HConstants.ZK_CFG_PROPERTY_PREFIX + key;
                config.set(name, properties.getProperty(key));
            }
        }

        try {
            master = new HMaster(config);
            regionServer = new HRegionServer(config);
            masterFuture = exec.submit(master);
            regionServerFuture = exec.submit(regionServer);
            log.info("HBase server is up and running...");
        } catch (Exception e) {
            handleException("Error while initializing HBase server", e);
        }
Then I nicely wrapped up all this logic into a single reusable util class called HBaseServer. So whenever I want to start/stop HBase in my application, this is all I have to do.
HBaseServer hbaseServer = new HBaseServer();
hbaseServer.start();
Hope somebody finds this useful :)