The Tech Feast: programming

Showing posts with label programming. Show all posts

Saturday, June 20, 2015

Expose Any Shell Command or Script as a Web API

I implemented a tool that can expose any shell command or script as a simple web API. All you have to specify is the binary (command/script) that needs to be exposed, and optionally a port number for the HTTP server. Source code of the tool in its entirety is shown below. In addition to exposing simple web APIs, this code also shows how to use Golang's built-in logging package, slice to varargs conversion and a couple of other neat tricks.

// This tool exposes any binary (shell command/script) as an HTTP service.
// A remote client can trigger the execution of the command by sending
// a simple HTTP request. The output of the command execution is sent
// back to the client in plain text format.
package main

import (
 "flag"
 "fmt"
 "io/ioutil"
 "log"
 "net/http"
 "os"
 "os/exec"
 "strings"
)

func main() {
 binary := flag.String("b", "", "Path to the executable binary")
 port := flag.Int("p", 8080, "HTTP port to listen on")
 flag.Parse()

 if *binary == "" {
  fmt.Println("Path to binary not specified.")
  return
 }

 l := log.New(os.Stdout, "", log.Ldate|log.Ltime)
 http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
  var argString string
  if r.Body != nil {
   data, err := ioutil.ReadAll(r.Body)
   if err != nil {
    l.Print(err)
    http.Error(w, err.Error(), http.StatusInternalServerError)
    return
   }
   argString = string(data)
  }

  fields := strings.Fields(*binary)
  args := append(fields[1:], strings.Fields(argString)...)
  l.Printf("Command: [%s %s]", fields[0], strings.Join(args, " "))

  output, err := exec.Command(fields[0], args...).Output()
  if err != nil {
   http.Error(w, err.Error(), http.StatusInternalServerError)
   return
  }
  w.Header().Set("Content-Type", "text/plain")
  w.Write(output)
 })

 l.Printf("Listening on port %d...", *port)
 l.Printf("Exposed binary: %s", *binary)
 http.ListenAndServe(fmt.Sprintf("127.0.0.1:%d", *port), nil)
}

Clients invoke the web API by sending HTTP GET and POST requests. Clients can also send in additional flags and arguments to be passed into the command/script wrapped within the web API. Result of the command/script execution is sent back to the client as a plain text payload.

As an example, assume you need to expose the "date" command as a web API. You can simply run the tool as follows:

./bash2http -b date

Now, the clients can invoke the API by sending an HTTP request to http://host:8080. The tool will run the "date" command on the server, and send the resulting text back to the client. Similarly, to expose the "ls" command with the "-l" flag (i.e. long output format), we can execute the tool as follows:

./bash2http -b "ls -l"

Users sending an HTTP request to http://host:8080 will now get a file listing (in the long output format of course), of the current directory of the server. Alternatively users can POST additional flags and a file path to the web API, to get a more specific output. For instance:

curl -v -X POST -d "-h /usr/local" http://host:8080

This will return a file listing of /usr/local directory of the server with human-readable file size information.

You can also use this tool to expose custom shell scripts and other command-line programs. For example, if you have a Python script foo.py which you wish to expose as a web API, all you have to do is:

./bash2http -b "python foo.py"

Monday, June 8, 2015

Exposing a Local Directory Through a Web Server

Did you ever encounter a situation where you have to serve the contents of a directory in the local file system through a web server? Usually this scenario occurs when you want to quickly try out some HTML+JS+CSS combo, or when you want to temporarily share the directory with a remote user. How would you go about doing this? Setting up Apache HTTP server or something similar could take time. And you definitely don't want to be writing any new code for achieving such a simple goal. Ideally, what you want is a simple command, that when executed starts serving the current directory through a web server.

The good news is, if you have Python installed on your machine, you already have access to such a command:

python -m SimpleHTTPServer 8000

The last argument (8000) is the port number for the HTTP server. This will spawn a lightweight HTTP server, using the current directory as the document root. Hit ctrl+c to kill the server process when you're done with it.

Alternatively you can write your own solution, and install it permanently into the system so you reuse it in the future. Here's a working solution written in Go:

package main

import (
 "log"
 "net/http"
)

func main() {
 log.Fatal(http.ListenAndServe(":8080", http.FileServer(http.Dir("."))))
}

The port number (8080) is hardcoded into the solution, but it's not that hard to change it.

Wednesday, May 13, 2015

Using Java Thread Pools

Here's a quick (and somewhat dirty) solution in Java to process a set of tasks in parallel. It does not require any third party libraries. Users can specify the tasks to be executed by implementing the Task interface. Then, a collection of Task instances can be passed to the TaskFarm.processInParallel method. This method will farm out the tasks to a thread pool and wait for them to finish. When all tasks have finished, it will gather their outputs, put them in another collection, and return it as the final outcome of the method invocation.

This solution also provides some control over the number of threads that will be employed to process the tasks. If a positive value is provided as the max argument, it will use a fixed thread pool with an unbounded queue to ensure that no more than 'max' tasks will be executed in parallel at any time. By specifying a non-positive value for the max argument, the caller can request the TaskFarm to use as many threads as needed.

If any of the Task instances throw an exception, the processInParallel method will also throw an exception.

package edu.ucsb.cs.eager;

import java.util.ArrayList;
import java.util.Collection;
import java.util.List;
import java.util.concurrent.*;

public class TaskFarm<T> {

    /**
     * Process a collection of tasks in parallel. Wait for all tasks to finish, and then
     * return all the results as a collection.
     *
     * @param tasks The collection of tasks to be processed
     * @param max Maximum number of parallel threads to employ (non-positive values
     *            indicate no upper limit on the thread count)
     * @return A collection of results
     * @throws Exception If at least one of the tasks fail to complete normally
     */
    public Collection<T> processInParallel(Collection<Task<T>> tasks, int max) throws Exception {
        ExecutorService exec;
        if (max <= 0) {
            exec = Executors.newCachedThreadPool();
        } else {
            exec = Executors.newFixedThreadPool(max);
        }

        try {
            List<Future<T>> futures = new ArrayList<>();

            // farm it out...
            for (Task t : tasks) {
                final Task task = t;
                Future f = exec.submit(new Callable<T>() {
                    @Override
                    public T call() throws Exception {
                        return task.process();
                    }
                });
                futures.add(f);
            }

            List<T> results = new ArrayList<>();

            // wait for the results
            for (Future f : futures) {
                results.add(f.get());
            }
            return results;
        } finally {
            exec.shutdownNow();
        }
    }

}

Tuesday, May 5, 2015

Parsing Line-Oriented Text Files Using Go

The following example demonstrates several features of Golang, such as reading a file line-by-line (with error handling), deferred statements and higher order functions.

package main

import (
 "bufio"
 "fmt"
 "os"
)

func ParseLines(filePath string, parse func(string) (string,bool)) ([]string, error) {
  inputFile, err := os.Open(filePath)
  if err != nil {
    return nil, err
  }
  defer inputFile.Close()

  scanner := bufio.NewScanner(inputFile)
  var results []string
  for scanner.Scan() {
    if output, add := parse(scanner.Text()); add {
      results = append(results, output)
    }
  }
  if err := scanner.Err(); err != nil {
    return nil, err
  }
  return results, nil
}

func main() {
  if len(os.Args) != 2 {
    fmt.Println("Usage: line_parser ")
    return
  }

  lines, err := ParseLines(os.Args[1], func(s string)(string,bool){ 
    return s, true
  })
  if err != nil {
    fmt.Println("Error while parsing file", err)
    return
  }

  for _, l := range lines {
    fmt.Println(l)
  }
}

The ParseLines function takes a path (filePath) to an input file, and a function (parse) that will be applied on each line read from the input file. The parse function should return a [string,boolean] pair, where the boolean value indicates whether the string should be added to the final result of ParseLines or not. The example shows how to simply read and print all the lines of the input file.

The caller can inject more sophisticated transformation and filtering logic into ParseLines via the parse function. The following example invocation filters out all the strings that do not begin with the prefix "[valid]", and extracts the 3rd field from each line (assuming a simple whitespace separated line format).

lines, err := ParseLines(os.Args[1], func(s string)(string,bool){
   if strings.HasPrefix(s, "[valid] ") {
     return strings.Fields(s)[2], true
   }
   return s, false
})

A function like ParseLines is suitable for parsing small to moderately large files. However, if the input file is very large, ParseLines may cause some issues, since it accumulates the results in memory.

Friday, January 2, 2015

Developing Web Services with Go

Golang facilitates implementing powerful web applications and services using a very small amount of code. It can be used to implement both HTML rendering webapps as well as XML/JSON rendering web APIs. In this post, I'm going to demonstrate how easy it is to implement a simple JSON-based web service using Go. We are going to implement a simple addition service, that takes two integers as the input, and returns their sum as the output.

package main

import (
        "encoding/json"
        "net/http"
)

type addReq struct {
        Arg1,Arg2 int
}

type addResp struct {
        Sum int
}

func addHandler(w http.ResponseWriter, r *http.Request) {
        decoder := json.NewDecoder(r.Body)
        var req addReq
        if err := decoder.Decode(&req); err != nil {
                http.Error(w, err.Error(), http.StatusInternalServerError)
                return
        }
 
        jsonString, err := json.Marshal(addResp{Sum: req.Arg1 + req.Arg2})
        if err != nil {
                http.Error(w, err.Error(), http.StatusInternalServerError)
                return
        }
        w.Header().Set("Content-Type", "application/json")
        w.Write(jsonString)
}

func main() {
        http.HandleFunc("/add", addHandler)
        http.ListenAndServe(":8080", nil)
}

Lets review the code from top to bottom. First we need to import the JSON and HTTP packages into our code. The JSON package provides the functions for parsing and marshaling JSON messages. The HTTP package enables processing HTTP requests. Then we define two data types (addReq and addResp) to represent the incoming JSON request and the outgoing JSON response. Note how addReq contains two integers (Arg1, Arg2) for the two input values, and addResp contains only one integer (Sum) for holding the total.

Next we define what is called a HTTP handler function which implements the logic of our web service. This function simply parses the incoming request, and populates an instance of the addReq struct. Then it creates an instance of the addResp struct, and serializes it into JSON. The resulting JSON string is then written out using the http.ResponseWriter object.

Finally, we have a main function that ties everything together, and starts executing the web service. This main function, simply registers our HTTP handler with the "/add" URL context, and starts an HTTP server on port 8080. This means any requests sent to the "/add" URL will be dispatched to the addHandler function for processing.

That's all there's to it. You may compile and run the program to try it out. Use Curl as follows to send a test request.

curl -v -X POST -d '{"Arg1":5, "Arg2":4}' http://localhost:8080/add

You will get a JSON response back with the total.

Wednesday, December 3, 2014

Controlled Concurrency with Golang

Lately I have been doing a lot of programming in Golang. It is one of those languages which is somewhat difficult to fully grasp at the beginning. But a few hundred lines of code later, you feel like you cannot get enough of it -- very simple syntax, brilliant performance and very clean and precise API semantics. This language has got it all.

Concurrent programming is one area where Golang really excels at. The goroutines that make it trivial to start concurrent threads of execution, channels as a first-class programming construct and a plethora of built-in utilities and packages (e.g. sync) make the developer's life a lot easier. In this post I'm going to give a brief overview on how to instantiate new threads of execution in Golang. Lets start with a piece of sequential code:

for i := 0; i < len(array); i++ {
  doSomething(array[i])
}

Above code iterates over an array, and calls the function doSomething for each element in the array. But this code is sequential, which means doSomething(n) won't be called until doSomething(n-1) returns. Suppose you want to speed things up a little bit by running multiple invocations of doSomething in parallel (assuming it is safe to do so -- both control and data wise). In Golang this is all you have to do:

for i := 0; i < len(array); i++ {
  go doSomething(array[i])
}

The go keyword will start the doSomething function as a separate concurrent goroutine. But this code change causes an uncontrolled concurrency situation. In other words, the only thing that's limiting the number of parallel goroutines spawned by the program is the length of the array, which is not a good idea if the array has thousands of entries. Ideally, we need to put some kind of a fixed cap on how many goroutines are spawned by the loop. This can be easily achieved by using a channel with a fixed capacity.

c := make(chan bool, 8)
for i := 0; i < len(array); i++ {
  c <- true
  go func(index int){
    doSomething(index)
    <- c
  }(i)
}

We start by creating a channel that can hold at most 8 boolean values. Then inside the loop, whenever we spawn a goroutine, we first send a boolean value (true) into the channel. This operation will get blocked if the channel is already full (i.e it has 8 elements). Then in the goroutine, we remove an element from the channel before we return. This little trick makes sure that at most 8 parallel goroutines will be active in the program at any given time. If you need to change this limit, you simply have to change the max capacity of the channel. You can set this to a fixed number, or write some code to figure out the optimal value based on the number of CPU cores available in the system.

Saturday, November 22, 2014

Running Python from Python

It has been pointed out to me that I don't blog as often as I used to. So here's a first step towards rectifying that.

In this post, I'm going to briefly describe the support that Python provides for processing, well, "Python". If you're using Python for simple scripting and automation tasks, you might often have to load, parse and execute other Python files from your code. While you can always "import" some Python code as a module, and execute it, in many situations it is impossible to determine precisely at the development time, which Python files your code needs to import. Also some Python scripts are written as simple executable files, which are not ideal for inclusion via import. To deal with cases such as these, Python provides several built-in features that allow referring to and executing other Python files.

One of the easiest ways to execute an external Python file is by using the built-in execfile function. This function takes the path to another Python file as the only mandatory argument. Optionally, we can also provide a global and a local namespace. If provided, the external code will be executed within those namespace contexts. This is a great way to exert some control over how certain names mentioned in the external code will be resolved (more on this later).

execfile('/path/to/code.py')

Another way to include some external code in your script is by using the built-in __import__ function. This is the same function that gets called when we use the usual "import" keyword to include some module. But unlike the keyword, the __import__ function gives you lot more control over certain matters like namespaces.

Another way to run some external Python code from your Python script is to first read the external file contents into memory (as a string), and then use the exec keyword on it. The exec keyword can be used as a function call or as keyword statement.

code_string = load_file_content('/path/to/code.py')
exec(code_string)

Similar to the execfile function, you have the option of passing custom global and local namespaces. Here's some code I've written for a project that uses the exec keyword:

globals_map = globals().copy()
globals_map['app'] = app
globals_map['assert_app_dependency'] = assert_app_dependency
globals_map['assert_not_app_dependency'] = assert_not_app_dependency
globals_map['assert_app_dependency_in_range'] = assert_app_dependency_in_range
globals_map['assert_true'] = assert_true
globals_map['assert_false'] = assert_false
globals_map['compare_versions'] = compare_versions
try:
    exec(self.source_code, globals_map, {})
except Exception as ex:
    utils.log('[{0}] Unexpected policy exception: {1}'.format(self.name, ex))

Here I first create a clone of the current global namespace, and pass it as an argument to the exec function. The clone is discarded at the end of the execution. This makes sure that the code in the external file does not pollute my existing global namespace. I also add some of my own variables and functions (e.g assert_true, assert_false etc.) into the global namespace clone, which allows the external code to refer to them as built-in constructs. In other words, the external script can be written in a slightly extended version of Python.

There are other neat little tricks you can do using the constructs like exec and execfile. Go through the official documentation for more details.

Wednesday, May 14, 2014

Java Code Analysis and Optimization with Soot

This is a quick shout out about the project Soot. If you're doing anything even remotely related to static analysis in Java, Soot is the way to go. It's simple, open source, well documented and extremely powerful. Soot can analyze any Java program (source or bytecode), and provide you with the control flow graph (CFG). Here's an example that shows how to construct the CFG for the main method of a class named MyClass.

SootClass c = Scene.v().loadClassAndSupport("MyClass");
c.setApplicationClass();
SootMethod m = c.getMethodByName("main");
Body b = m.retrieveActiveBody();
UnitGraph g = new BriefUnitGraph(b);

Once you get your hands on the CFG, you can walk it, search it and do anything else you would normally do with a graph data structure.

Soot converts Java code into one of four intermediate representations (Jimple, Baf, Shimple and Grimp). These representations are designed to make it easier to analyze programs written in Java. For example, Jimple maps Java code from its typical stack-based model to a three-registers-based model. You can also make modifications/optimizations to the code and try out new ideas for compiler and runtime optimizations. Alternatively you can "tag" instructions with metadata which can be helpful in building new development tools with powerful code visualization capabilities.

Soot also provides a set of APIs for performing data flow analysis. These APIs can help you to code anything from live variable analysis to very busy expression analysis and more. And finally, Soot can also be invoked from the command-line without having to write any extension code.

So if you have any cool new ideas related to program analysis or optimization, grab the latest version of Soot. Whatever it is that you're trying to do, I'm sure Soot can help you implement it.

Friday, June 14, 2013

More Reasons to Love Python - A Lesson on KISS

Recently I've been doing some work in the area of programming language design. At one point I wanted to define a Python subset which allows only the simplest Python statements without loops, conditionals, functions, classes and a bunch of other high-level constructs. So I looked into the grammar specification of the Python language and I was astonished by its simplicity and succinctness. Click here to take a look for yourself. It's no longer than 125 lines of text, and the whole thing can be printed on one side of an A4 sheet. This is definitely one of those instances where the best design is also the simplest design. No wonder everybody loves Python.

However that's not the whole point. Having selected a suitable Python subset, I was looking into ways for implementing a simple parser for those grammar rules. I've done some work with JavaCC in the past, so I straightaway jumped into implementing a Java-based parser for the selected Python subset using JavaCC. After a few hours of coding I managed to get it working too. The next step of my project required me to do some analysis on the abstract syntax tree (AST) produced by the parser. I was looking around for some existing work that fits my requirements, and I came across Python's native ast module. I immediately realized that all those hours I spent on implementing the JavaCC-based parser is a complete waste. The ast module provides excellent support for parsing Python code and constructing ASTs. This is all you have to do parse some Python code using the ast module and obtain an AST representation of the code.

import ast

# The variable 'source' contains the Python statement to be parsed
source = 'x = y + z'
tree = ast.parse(source)

The ast module supports several modes. The default mode is exec which supports parsing a sequence of Python statements. The module also supports a special eval mode which can be used to parse simple one-liner Python statements. It turned out the eval mode supports more or less the same exact Python subset I wanted to use. So I threw away my JavaCC-based parser and wrote the following snippet of Python code to get my job done.

import ast

# The variable 'source' contains the Python statement to be parsed
source = 'x = y + z'
tree = ast.parse(source, mode='eval')

Now when it came to analyzing the AST produced by the parser, the ast module again turned out to be useful. The module provides two helper classes, namely NodeVisitor and NodeTransformer which can be used to either traverse or transform a given Python AST. To use these helper classes, we just need to extend them and implement the appropriate visit methods. There's a unique top level visit method and one visit_ method per AST node type (e.g. visit_Str, visit_Num, visit_BoolOp etc.). Here's an example NodeVisitor implementation, that flattens a given Python AST into a list.

class NodeEnumerator(ast.NodeVisitor):
  def get_node_list(self, tree):
    self.nodes = []
    self.visit(tree)
    return self.nodes

  def visit(self, node):
    self.generic_visit(node)
    self.nodes.append(node)

These helper classes can be used to do virtually anything with a given AST. If you want you can even implement a Python interpreter in Python using this approach. In my case I'm running some search and isomorphism detection algorithms on the Python AST's.

So once again I've been pleasantly surprised and deeply impressed by the simplicity and richness of Python. It looks like the designers of Python have thought of everything. Kudos to Python aside, this whole experience taught me to always looks for existing, simple solutions before doing it in my own complicated way. It actually reminds me of the good old KISS principle - "Keep It Simple, Stupid".

Friday, April 5, 2013

MDCC - Strong Consistency with Performance

A few weeks back me and a couple of my colleagues finished developing a complete implementation of the MDCC (Multi-Data Center Consistency) protocol. MDCC is a fast commit protocol proposed by UC Berkeley for large-scale geo-replicated databases. The main advantage of MDCC is that is supports strong consistency for data while providing transaction performance similar to eventually consistent systems.

With traditional distributed commit protocols, supporting strong consistency usually requires executing complex distributed consensus algorithms (e.g. Paxos). Such algorithms generally require multiple rounds of communication. Therefore when deployed in a multi-data center setting where the inter-data center latency is close to 100ms, the performance of the transactions being executed degrades to almost unacceptable levels. For this reason most replicated database systems and cloud data stores has opted to support a weaker notion of consistency. This greatly speeds up the transactions but you always run the risk of data becoming inconsistent or even lost.

MDCC employs a special variant of Paxos called Fast Paxos. Fast Paxos takes a rather optimistic approach by which it is able to commit most transactions within a single network roundtrip. This way a data object update can be replicated to any number of data centers within a single request-response window. The protocol is also effectively masterless which means if the application is executing in a data center in Europe, it does not have to contact a special master server which could potentially reside in a data center in USA. The only time this protocol doesn't finish within a single request-response window is when two or more transactions attempt to update the same data object (transaction conflict). In that case a per-object master is elected and the Classic Paxos protocol is invoked to resolve the conflict. If the possibility of a conflict is small, MDCC will commit most transactions within a single network roundtrip thus greatly improving the transaction throughput and latency.

Unlike most replicated database systems, MDCC doesn't require explicit sharding of data into multiple segments. But it can be supported on MDCC if needed. Also unlike most cloud data stores, MDCC has excellent support for atomic multi-row (multi-object) transactions. That is multiple data objects can be updated atomically within a single read-write transaction. All these interesting properties make MDCC an excellent choice for implementing powerful database engines for modern day distributed and cloud computing environments.

Our implementation of MDCC is based on Java. We use Apache Thrift as the communication framework between different components. ZooKeeper is used for leader election purposes (we need to elect a per-object leader whenever there is a conflict). HBase server is used as the storage engine. All the application data and metadata are stored in HBase. In order to reduce the number of storage accesses we also have a layer of in-memory caching. All the critical information and updates are written through to the underlying HBase server to maintain strong consistency. The cache still helps to avoid a large fraction of storage references. Our experiments show that most read operations are able to complete without ever going to HBase layer.

We provide a simple and intuitive API in our MDCC implementation so that users can write their own applications using our MDCC engine. A simple transaction implementing using this API would look like this.

        TransactionFactory factory = new TransactionFactory();
        Transaction txn = factory.create();
        try {
            txn.begin();
            byte[] foo = txn.read("foo");
            txn.write("bar", "bar".getBytes());
            txn.commit();
        } catch (TransactionException e){
            reportError(e);
        } finally {
            factory.close();
        }

We also did some basic performance tests on our MDCC implementation using the YCSB benchmark. We used 5 EC2 micro instances distributed across 3 data centers (regions) and deployed a simple 2-shard MDCC cluster. Each shard consisted of 5 MDCC storage nodes (amounting to a total of 10 MDCC storage nodes). We ran several different types of workloads on this cluster and in general succeeded in achieving < 1ms latency for read operations and < 100ms latency for write operations. Our implementation performs best with mostly-read workloads, but even with a fairly large number of conflicts, the system delivers reasonable performance.

Our system ensures correct and consistent transaction semantics. We have excellent support for atomic multi-row transactions, concurrent transactions and even some rudimentary support for crash recovery. If you are interested to give this implementation a try, grab the source code from https://github.com/hiranya911/mdcc. Use Maven3 to build the distribution, extract and run.

Monday, March 11, 2013

Starting HBase Server Programmatically

I'm implementing a database application these days and for that I wanted to programmatically start and stop a standalone HBase server. More specifically I wanted to make HBase server a part of my application so that whenever my application starts, HBase server also starts up. This turned out to be more difficult than I thought it would be. To start a HBase server you actually need to start three things:

1. HBase master server

2. HBase region server

3. ZooKeeper

The default startup script shipped with the HBase binary distribution does all this for you. But I wanted a more tightly integrated and a fully programmatic solution. Unfortunately the HBase public API doesn't seem to expose the functionality required for programmatically starting and stopping the above components (at least not in a straightforward manner). So after going through the HBase source and trying out various things, I managed to come up with some code that does exactly what I want. At a high level, this is what my code does:

1. Create an instance of HQuorumPeer and execute it on a separate thread.

2. Create an initialize a HBaseConfiguration instance.

3. Create an instance of HMaster and execute it on a separate thread.

4. Create an instance of HRegionServer and execute it on a separate thread.

Both HMaster and HRegionServer implement the Runnable interface. Therefore it's easy to run them on separate threads. I created a simple Java Executor instance and scheduled HMaster and HRegionServer for execution on it. But HQuorumPeer was a bit tricky. This class only contains a main method and has no such thing called a public API. So one solution is to create your own thread class, which simply invokes the above mentioned main method. The other option is to write your own HQuorumPeer class implementing the Runnable interface. The original HQuorumPeer class from the HBase project is fairly small and contains only a small amount of code. So I took the second approach. I simply copied the code from the original HQuorumPeer and created my own HQuorumPeer implementing the Runnable interface. Overall this is what my finalized code looks like:

        
        exec.submit(new HQuorumPeer(properties));
        log.info("HBase ZooKeeper server started");
        
        Configuration config = HBaseConfiguration.create();
        File hbaseDir = new File(hbasePath, "data");
        config.set(HConstants.HBASE_DIR, hbaseDir.getAbsolutePath());
        for (String key : properties.stringPropertyNames()) {
            if (key.startsWith("hbase.")) {
                config.set(key, properties.getProperty(key));
            } else {
                String name = HConstants.ZK_CFG_PROPERTY_PREFIX + key;
                config.set(name, properties.getProperty(key));
            }
        }

        try {
            master = new HMaster(config);
            regionServer = new HRegionServer(config);
            masterFuture = exec.submit(master);
            regionServerFuture = exec.submit(regionServer);
            log.info("HBase server is up and running...");
        } catch (Exception e) {
            handleException("Error while initializing HBase server", e);
        }

Then I nicely wrapped up all this logic into a single reusable util class called HBaseServer. So whenever I want to start/stop HBase in my application, this is all I have to do.

HBaseServer hbaseServer = new HBaseServer();
hbaseServer.start();

Hope somebody finds this useful :)

Wednesday, January 9, 2013

How to Get Your Third Party APIs to Shutup?

When programming with 3rd party libraries, sometimes we need to suppress or redirect the standard output generated by the 3rd party libraries. A very common scenario is that a third party library we use in an application generates a very verbose output which clutters up the output of our program. With most programming languages we can write a simple suppress/redirect procedure to fix this problem. Such functions are sometimes colloquially known as STFU functions. Here I'm describing a couple of STFU functions I implemented in some of my recent work.

1. AppsCake (Web interface for AppScale-Tools)

This is a Ruby based dynamic web component which uses some of the core AppScale-Tools libraries. For this project I wanted to capture the standard output of the AppScale-Tools libraries and display it on a web page. As the first step I wanted to redirect the standard output of AppScale-Tools to a separate text file. Here's what I did.

def redirect_standard_io(timestamp)
  begin
    orig_stderr = $stderr.clone
    orig_stdout = $stdout.clone
    log_path = File.join(File.expand_path(File.dirname(__FILE__)), "..", "logs")
    $stderr.reopen File.new(File.join(log_path, "deploy-#{timestamp}.log"), "w")
    $stderr.sync = true
    $stdout.reopen File.new(File.join(log_path, "deploy-#{timestamp}.log"), "w")
    $stdout.sync = true
    retval = yield
  rescue Exception => e
    puts "[__ERROR__] Runtime error in deployment process: #{e.message}"
    $stdout.reopen orig_stdout
    $stderr.reopen orig_stderr
    raise e
  ensure
    $stdout.reopen orig_stdout
    $stderr.reopen orig_stderr
  end
  retval
end

Now whenever I want to redirect the standard output and invoke the AppScale-Tools API I can do this.

redirect_standard_io(timestamp) do
   # Call AppScale-Tools API
end

2. Hawkeye (API fidelity test suite for AppScale)

This is a Python based framework which makes a lot of RESTful invocations using the standard Python httplib API. I wanted to trace the HTTP requests and responses that are being exchanged during the execution of the framework and log them to a separate log file. Python httplib has a verbose mode which can be enabled by passing a special flag to the HTTPConnection class and it turns out this mode logs almost all the information I need. But unfortunately it logs all this information to the standard output of the program thus messing up the output I wanted to present to users. Therefore I needed a way to redirect the standard output for all httplib API calls. Here's how that problem was solved.

http_log = open('logs/http.log', 'a')
original = sys.stdout
sys.stdout = http_log
try:
  # Invoke httplib
finally:
  sys.stdout = original
  http_log.close()

Sunday, July 22, 2012

Handling I/O in Java

Handling input/output or I/O is one of the most common situations that programmers have to deal with. If you are writing a real world application, then you have to write a considerable amount of code to handle I/O regardless of the programming language or platform you’re going to use. Interestingly I/O plays a major role even in some of the simplest programs we can write. Even to write a standard ‘Hello World’ program in a language like Java or C, you need to know how to output characters to a console.

However, most developers often tend to ignore the importance and significance of I/O when writing code. Developers generally have an API level understanding of how to perform I/O operations using their main stream programming language. But they do not possess an in-depth understanding of how I/O works in the underlying system or how it can affect the performance and stability of the programs they write. Java developers in particular believe that knowing how to use the standard I/O API of Java is sufficient to write applications of good quality. Their lack of understanding on limitations and performance bottlenecks of the standard I/O API is astonishing. Most Java developers don’t have a firm grip on I/O coding best practices, third party I/O libraries available or relatively new concepts like NIO.

Last week I gave a talk at JAVA Colombo, the Sri Lankan JUG, trying to address some of the above issues. I started by giving a brief overview on I/O and I/O APIs in Java. I also introduced the NIO API and gave a short live demonstration of it comparing its performance to the standard I/O API of Java. Finally I discussed some of the best practices, patterns and anti-patterns related to writing I/O related code in Java. Full slide deck is now available on-line. Feel free to go through it send in your feedback.

Handling I/O in Java from Hiranya Jayathilaka

Thursday, November 4, 2010

Taming the Java Garbage Collector

Tuning the garbage collection in JVM is one of those things that developers often tend to ignore and overlook. But if done properly, it can save you hundreds of megabytes worth precious memory, without making a significant impact on the application performance. Also GC tuning becomes an absolute necessity in certain applications due to various QoS requirements such as real-time processing, low response time and high throughput. So today I'm going to discuss a little bit about GC in JVM and how to properly tune up GC for best application performance. My discussion is entirely focused on Java 5 and 6, so if you are using an older JVM you are at the wrong place.

Before we jump into the discussion on GC tuning we need to have a basic understanding on the various concepts associated with GC. So here goes...

What is garbage collection?

Garbage collection is the process of finding objects which are no longer reachable from references in the executing code, and reclaiming the memory occupied by such objects. An object which is reachable by at least one reference in the executing code is considered to be 'live'. Objects which cannot be reached by any references are called 'dead' objects. So the process of garbage collection can be also defined as the process of finding dead objects in the memory and reclaiming the memory used by them. In general, a garbage collector is responsible for 3 tasks:

Allocating memory for new objects
Ensuring that any referenced objects (live objects) remain in memory
Recovering memory used by objects that are no longer reachable (dead objects)

The following characteristics govern the notion of a 'good' garbage collector:

Safe (never reclaims a live object)
Comprehensive (a dead object does not remain unclaimed for more than a small number of collection cycles)
Efficient (does not introduce too long pauses)
Defragmentation (does not leave the memory fragments all over the place)
Scalable (allocation and deallocation scales well in multithreaded systems)

It is not possible and necessary for a garbage collector to display all these desirable characteristics at once. Usually each garbage collector has its own strengths and weaknesses.

GC Classification

Garbage collectors can be classified based on a number of factors.

Serial vs Parallel:

In serial collection only one collection occurs at a time (even with multiple CPU cores). In parallel collection, the task of collection is divided into subtasks and executed in parallel, possibly on multiple CPUs. This speeds up collection but is more complex and leads to potential fragmentation.

Stop-the-world vs Concurrent:

Stop-the-world collectors suspend the entire application during collections. Concurrent collectors run concurrently with the application (there could be occasional stop-the-world collections). With concurrent collection, freeze times are shorter but it has to operate on the objects which are being used by the running application. This adds more overhead on the collector and requires more CPU power and heap.

Compacting vs Non-compacting vs Copying:

Compacting collectors arrange all the live objects together in contiguous memory blocks. Then the remaining space can be considered free. This way the collection is slow but the allocations are faster. Non-compacting collectors free dead objects in-place. This leads to faster collections but also a recipe for fragmentation. Copying collectors copy (in contrast to moving) all the live objects to a different area in the memory. Then the source area can be considered free. This leads to slower and expensive collections but provides better allocation performance.

Generational Collection

Modern garbage collectors follow a scheme known as 'generational collection'. With this approach, the heap memory is allocated to two or more regions known as generations. A generation is a block of memory which contains objects of a certain age. In many implementations there are two generations, one for young (new and reasonably new) objects and one for old objects. Young generation is usually much smaller compared to the old generation. Generational collection allows employing different GC algorithms in different generations. This enables selecting a suitable algorithm based on the maturity of the objects.

Generational collection makes use of an interesting characteristic of applications, often referred to as the "generational hypothesis". This hypothesis states:

Most allocated objects are not referenced for long (they die young - sometimes also stated as infant mortality)
Few references from older objects to younger objects exist

Based on this hypothesis, modern garbage collectors run collections on the young generation frequently. This is fast and efficient because the young generation is small and likely to contain many dead objects. Objects that survive some number of young generation collections are moved (tenured) to the old generation. Because old generation is much larger, its occupancy grows very slowly. Therefore old generation collections are infrequent. But when they do occur, they tend to take a much longer time to complete.

In new HotSpot JVMs, the garbage collector divides the heap into 3 generations:

Young generation - contains newly allocated objects
Old generation - objects that has survived some number of young gen collections and some very large objects that were directly allocated in old gen
Permanent generation (perm gen) - contains classes, methods and associated descriptors which are managed by the JVM

The young generation is further divided into 3 regions. The larger division is known as the Eden. This is where almost all the new object allocations take place (under special circumstances, large objects may get allocated in the old generation). The other smaller spaces are known as survivor spaces. One of the survivor spaces are always kept empty until the next young generation collection.

When an old generation fills up a full collection (major collection) is performed. All generations are collected during a full collection. First the young generation is collected using the young generation collection algorithm. Then the old generation collection algorithm is run on the old generation and permanent generation. If compaction occurs, each generation is compacted separately. During a full collection if the old generation is too full to accept tenured objects from the young generation, the old generation collection algorithm is run on the entire heap (except with CMS collector).

Available Garbage Collectors

HotSpot JVM contains 3 garbage collectors:

Serial collector
Parallel collector
Concurrent mark-sweep collector

In addition to that there is a special enhanced version of the parallel collector known as the parallel compacting collector. Let's see how each of these collectors work.

Serial Collector (Mark-Sweep-Compact collector)

This is the collector used by default on Java HotSpot client JVM. It is a serial, stop-the-world, copying collector. Because it is serial and operates in the stop-the-world mode it is not a very efficient collector.

Young generation collection:

Live objects in Eden are copied to the empty survivor space (objects that won't fit are directly copied to the old generation)
Live objects in the occupied survivor space are copied to the empty survivor space (relatively old objects are copied to old generation)
If the free survivor space becomes full during the process, remaining live objects in Eden and occupied survivor space are tenured
At this point Eden and the occupied survivor space contains only dead objects and so can be considered empty - The previously free survivor space contains some live objects
The two survivor spaces switch roles

Old/Permanent generation collection:

A two phase Mark-and-sweep algorithm is used to clean up dead objects
After that a sliding compaction (live objects are slided towards the beginning of the generation) is performed to compact the generations

Parallel Collector (Throughput Collector)

This is very similar to the serial collector in many ways. In fact the only notable difference is that parallel collector uses multiple threads to perform the young generation collection. Other than that it uses the same algorithms as the serial collector. The number of threads used for collection is equal to the number of CPUs available. Because of the parallel collection feature, this collector usually results in much shorter pauses and higher application throughput. However note that the old generation collection is still carried out using a single thread in serial fashion. This is the default collector used in Java HotSpot server JVM.

Parallel Compacting Collector

This is an enhanced version of the parallel collector. It uses multiple threads to perform the old generation collection as well. The old generation collection divides the generations into regions and operate on individual regions in parallel. The algorithm used for old generation collection is also slightly different from what's used in serial and parallel collectors.

Concurrent Mark-Sweep Collector (CMS Collector)

While the parallel collectors give prominence to application throughput, this collector gives prominence to low response time. It uses the same young generation collection algorithm as the parallel collectors. But the old generation collection is performed concurrently with the application instead of going to stop-the-world mode (at least most of the time). A collection cycle starts with a short pause known as the initial mark. This identifies the initial set of live objects directly reachable from the application code. Then during the concurrent marking phase, collector marks all live objects transitively reachable from the initially marked set. Because this happens concurrently with the application not all live objects get marked up. To handle this, the application stops again for a second pause for the remark phase. Remark phase is often run using multiple threads for efficiency. After this marking process a concurrent sweep phase is initiated.

CMS collector is not a compacting collector. Therefore it uses a set of free-lists when it comes to allocation. Therefore the allocation overhead is higher. Also CMS collector is best suited for large heaps. Because collection happens concurrently, the old generation will continue to grow even during collection. So the heap should be large enough to accommodate that growth. Another issue with CMS is floating garbage. That is objects considered as live may become garbage towards the end of the collection cycle. These will not get immediately cleaned up but will definitely get cleaned up during the next collection cycle. CMS collector requires lot of CPU power as well.

Unlike other collectors, CMS collector does not wait till the old generation becomes full to start a collection. Instead it starts collecting early so it can avoid old generation getting filled up to the capacity. If the old generation gets filled up before CMS kicks in, it resorts to the serial stop-the-world collection mode used by serial and parallel collectors. To avoid this CMS uses some statistics regarding previous collection times and the time taken to fill up the old generation. CMS also kicks in if the old generation occupancy exceeds a predefined threshold known as the initiating occupancy fraction. This is a configurable parameter and defaults to 68%.

There is a special mode of operation for the CMS collector known as the incremental mode. In the incremental mode, concurrent collection cycles are broken down into smaller chunks. Therefore during a concurrent collection cycle, the collector will suspend itself to give full CPU to the running application. This in turns reduces the impact of long concurrent collection phases. This mode is particularly useful in cases where the number of CPUs is small.

Selecting the Appropriate Collector

Most of the time JVM is smart enough to select the right collector for the situation by analyzing the system configuration (see the next section). But since the JVM has no knowledge of the application requirements, sometimes the JVM chosen collector will not be good enough. When it comes to manually selecting a collector for an application, there are no hard and fast rules that say a particular collector is suitable for a given scenario. So there are only a set of general guidelines and recommendations. Most of the time they will work, but there could be exceptions. Hope you find them useful:

Serial collector is the collector of choice for most applications that run on client-style machines and do not have low pause requirements. On modern hardware this collector can handle a wide range of applications with heaps as small as 64MB.
If the application has a small data set select the serial collector.
If the application has no low pause requirements and runs on a machine with a single CPU select the serial collector.
Parallel collector is best for applications that run on machines with multiple CPUs and do not have low pause requirements.
If application throughput is the main consideration and slightly longer pauses are acceptable select the parallel collector.
The parallel compacting collector can be used in almost any scenario where the parallel collector seems suitable. In fact the parallel compacting collector is almost always preferred over the parallel collector. However with this collector a single application can hog the CPU for an extended period of time. Therefore it's not very suitable in cases where multiple JVMs reside on a single large machine.
CMS collector should be the collector of choice whenever there is a requirement for low pauses and low response time. However this collector makes use of lot of CPU resources. Therefore impact on CPU usage must be carefully evaluated. Usually applications that have large sets of long-lived data (a large old generation), which run on machines with multiple CPUs, tend to benefit from this collector. Web servers and most interactive applications often fall into the category of CMS benefactors.
If it is needed to run an application with low pause requirements on a machine which doesn't have too many CPU resources, consider using the CMS incremental mode.

Note: Command line options needed to enable different collectors are given at the end of the post

Ergonomics

Modern JVMs has the ability to select a suitable garbage collector, heap size and JVM type by looking at the host platform and the OS. In addition to that, a new way of dynamically tuning memory management has been introduced to the parallel collectors. This way, a user can specify the desired behavior and the collector dynamically tunes the sizes of the heap regions in an attempt to achieve the requested behavior. The combination of platform-dependent default selections and dynamic GC tuning is referred to as ergonomics.

The ergonomics default selections are done based on the 'class' of the machine. A machine is considered to be server-class if it has:

2 or more processors
2 GB or more physical memory

This applies to all platforms except for 32-bit machines running Windows.

If the machine is classified as non server-class, ergonomics will choose the following settings:

HotSpot client JVM
serial collector
initial heap size of 4 MB
maximum heap size of 64 MB

For server-class machines it is slightly complicated. However in general it is something like:

HotSpot server JVM
parallel collector
initial heap size of 1/64 th of physical memory
maximum heap size of 1/4 th of physical memory under an upper limit of 1 GB

One can run the HotSpot client JVM on a server-class machine by explicitly enabling the -client command line option. In that case ergonomics will select the serial collector. Also when the collector selected by ergonomics is not the parallel collector the initial heap size and maximum heap size will be 4MB and 64MB respectively.

Sizing the Heap

There are lots of options available when it comes to sizing the heap, its regions and generations. Most of the time we don't have to meddle with these settings, but there could be exceptional situations where we have to tune them up to obtain optimal performance and avoid out of memory errors. The most common settings related to heap size are the initial heap size and maximum heap size which are set by -Xms and -Xmx command line options respectively. If you set a lower value to the initial heap size than the maximum heap size, JVM will try to start with a heap of initial size and then grow the heap as and when required. If you set equal values to both parameters, JVM will start with the specified maximum heap size.

Another important parameter is the NewRatio value which governs the ratio between the old generation size and young generation size. On most server class systems this defaults to 8. That means the old generation is 8 times larger than the young generation. If the application tends to do lot of new allocations then reducing this ratio to a reasonable value may not be a bad idea. Reducing this ratio will generally result in less minor collections. The size of the young generation can be further controlled by the NewSize and MaxNewSize options.

We can get the JVM to dynamically grow or shrink the generations based on how they are utilized by setting the MinHeapFreeRatio and MaxHeapFreeRatio parameters. These parameters try to impose some restrictions on the amount of free heap space in each generation. If the free space percentage in a given generation is about to drop below the MinHeapFreeRatio, JVM will expand the generation to meet the lower limit. Similarly generations will be contracted if the free space percentage crosses the MaxHeapRatio.

Command Line Options

Collector selection

Serial collector: -XX:+UseSerialGC
Parallel collector: -XX:+UseParallelGC
Parallel compacting collector: -XX:+UseParallelOldGC (combine with -XX:+UseParallelGC)
CMS collector: -XX:+UseConcMarkSweepGC

Parallel collector settings

Parallel GC thread count: -XX:ParallelGCThreads=n
Desired maximum pause length: -XX:MaxGCPauseMilis=n
Throughput (percentage of CPU time spent on application - defaults to 99): -XX:GCTimeRatio=n

CMS collector settings

Enable incremental mode: -XX:+CMSIncrementalMode
Parallel GC thread count: -XX:+ParallelGCThreads=n
Old gen occupancy threshold that triggers collections: -XX:CMSInitiatingOccupancyFraction=n

Heap sizing options

Initial size: -Xms
Maximum size: -Xmx
Initial size of the new generation: -XX:NewSize=n
Maximum size of the perm gen space: -XX:MaxPermSize=n
Ratio between old and new generation sizes: -XX:NewRatio=n

Debug options

Print basic GC info: -XX:+PrintGC
Print verbose GC info: -XX:+PrintGCDetails
Print details with time: -XX:+PrintGCTimeStamps

References

The Tech Feast