Wednesday, December 3, 2014

Controlled Concurrency with Golang

Lately I have been doing a lot of programming in Golang. It is one of those languages which is somewhat difficult to fully grasp at the beginning. But a few hundred lines of code later, you feel like you cannot get enough of it -- very simple syntax, brilliant performance and very clean and precise API semantics. This language has got it all.
Concurrent programming is one area where Golang really excels at. The goroutines that make it trivial to start concurrent threads of execution, channels as a first-class programming construct and a plethora of built-in utilities and packages (e.g. sync) make the developer's life a lot easier. In this post I'm going to give a brief overview on how to instantiate new threads of execution in Golang. Lets start with a piece of sequential code:
for i := 0; i < len(array); i++ {
  doSomething(array[i])
}
Above code iterates over an array, and calls the function doSomething for each element in the array. But this code is sequential, which means doSomething(n) won't be called until doSomething(n-1) returns. Suppose you want to speed things up a little bit by running multiple invocations of doSomething in parallel (assuming it is safe to do so -- both control and data wise). In Golang this is all you have to do:
for i := 0; i < len(array); i++ {
  go doSomething(array[i])
}
The go keyword will start the doSomething function as a separate concurrent goroutine. But this code change causes an uncontrolled concurrency situation. In other words, the only thing that's limiting the number of parallel goroutines spawned by the program is the length of the array, which is not a good idea if the array has thousands of entries. Ideally, we need to put some kind of a fixed cap on how many goroutines are spawned by the loop. This can be easily achieved by using a channel with a fixed capacity.
c := make(chan bool, 8)
for i := 0; i < len(array); i++ {
  c <- true
  go func(index int){
    doSomething(index)
    <- c
  }(i)
}
We start by creating a channel that can hold at most 8 boolean values. Then inside the loop, whenever we spawn a goroutine, we first send a boolean value (true) into the channel. This operation will get blocked if the channel is already full (i.e it has 8 elements). Then in the goroutine, we remove an element from the channel before we return. This little trick makes sure that at most 8 parallel goroutines will be active in the program at any given time. If you need to change this limit, you simply have to change the max capacity of the channel. You can set this to a fixed number, or write some code to figure out the optimal value based on the number of CPU cores available in the system.

Saturday, November 22, 2014

Running Python from Python

It has been pointed out to me that I don't blog as often as I used to. So here's a first step towards rectifying that.
In this post, I'm going to briefly describe the support that Python provides for processing, well, "Python". If you're using Python for simple scripting and automation tasks, you might often have to load, parse and execute other Python files from your code. While you can always "import" some Python code as a module, and execute it, in many situations it is impossible to determine precisely at the development time, which Python files your code needs to import. Also some Python scripts are written as simple executable files, which are not ideal for inclusion via import. To deal with cases such as these, Python provides several built-in features that allow referring to and executing other Python files.
One of the easiest ways to execute an external Python file is by using the built-in execfile function. This function takes the path to another Python file as the only mandatory argument. Optionally, we can also provide a global and a local namespace. If provided, the external code will be executed within those namespace contexts. This is a great way to exert some control over how certain names mentioned in the external code will be resolved (more on this later).
execfile('/path/to/code.py')
Another way to include some external code in your script is by using the built-in __import__ function. This is the same function that gets called when we use the usual "import" keyword to include some module. But unlike the keyword, the __import__ function gives you lot more control over certain matters like namespaces.
Another way to run some external Python code from your Python script is to first read the external file contents into memory (as a string), and then use the exec keyword on it. The exec keyword can be used as a function call or as keyword statement.
code_string = load_file_content('/path/to/code.py')
exec(code_string)
Similar to the execfile function, you have the option of passing custom global and local namespaces. Here's some code I've written for a project that uses the exec keyword:
globals_map = globals().copy()
globals_map['app'] = app
globals_map['assert_app_dependency'] = assert_app_dependency
globals_map['assert_not_app_dependency'] = assert_not_app_dependency
globals_map['assert_app_dependency_in_range'] = assert_app_dependency_in_range
globals_map['assert_true'] = assert_true
globals_map['assert_false'] = assert_false
globals_map['compare_versions'] = compare_versions
try:
    exec(self.source_code, globals_map, {})
except Exception as ex:
    utils.log('[{0}] Unexpected policy exception: {1}'.format(self.name, ex))
Here I first create a clone of the current global namespace, and pass it as an argument to the exec function. The clone is discarded at the end of the execution. This makes sure that the code in the external file does not pollute my existing global namespace. I also add some of my own variables and functions (e.g assert_true, assert_false etc.) into the global namespace clone, which allows the external code to refer to them as built-in constructs. In other words, the external script can be written in a slightly extended version of Python.
There are other neat little tricks you can do using the constructs like exec and execfile. Go through the official documentation for more details.

Wednesday, May 14, 2014

Java Code Analysis and Optimization with Soot

This is a quick shout out about the project Soot. If you're doing anything even remotely related to static analysis in Java, Soot is the way to go. It's simple, open source, well documented and extremely powerful. Soot can analyze any Java program (source or bytecode), and provide you with the control flow graph (CFG). Here's an example that shows how to construct the CFG for the main method of a class named MyClass.
SootClass c = Scene.v().loadClassAndSupport("MyClass");
c.setApplicationClass();
SootMethod m = c.getMethodByName("main");
Body b = m.retrieveActiveBody();
UnitGraph g = new BriefUnitGraph(b);
Once you get your hands on the CFG, you can walk it, search it and do anything else you would normally do with a graph data structure. 
Soot converts Java code into one of four intermediate representations (Jimple, Baf, Shimple and Grimp). These representations are designed to make it easier to analyze programs written in Java. For example, Jimple maps Java code from its typical stack-based model to a three-registers-based model. You can also make modifications/optimizations to the code and try out new ideas for compiler and runtime optimizations. Alternatively you can "tag" instructions with metadata which can be helpful in building new development tools with powerful code visualization capabilities.
Soot also provides a set of APIs for performing data flow analysis. These APIs can help you to code anything from live variable analysis to very busy expression analysis and more. And finally, Soot can also be invoked from the command-line without having to write any extension code.
So if you have any cool new ideas related to program analysis or optimization, grab the latest version of Soot. Whatever it is that you're trying to do, I'm sure Soot can help you implement it.

Thursday, January 2, 2014

Calling WSO2 Admin Services in Python

I’m using some WSO2 middleware for my ongoing research, and recently I had the requirement of calling some admin services from Python 2.7. All WSO2 products expose a number of special administrative web services (admin services), using which the WSO2 server instances can be controlled, configured and monitored. In fact, all the web-based UI components that ship with WSO2 middleware make use of these admin services under the hood to manage the server runtime.
WSO2 admin services are SOAP services (based on Apache Axis2), and are secured using HTTP basic authentication. All admin services expose a WSDL document using which client applications can be written or generated to consume the admin services. In this post I’m going to summarize how to implement a simple Python client to consume the WSO2 admin services.
We will be writing our Python client using the Suds SOAP library for Python. Suds is simple, lightweight and extremely easy to use. As the first step, we should install Suds. Depending on the Python package manager you wish to use, one of the following commands should do the trick (tested on OS X and Ubuntu):
sudo easy_install suds
sudo pip install suds
Next we need to instruct the target WSO2 server product to expose the admin service WSDLs. By default these WSDLs are hidden. To unhide them, open up the repository/conf/carbon.xml file of the WSO2 product, and set the value of HideAdminServiceWSDLs parameter to false:
<HideAdminServiceWSDLs>false</HideAdminServiceWSDLs>
Now restart the WSO2 server, and you should be able to access the admin service WSDLs using a web browser. For example, to access the WSDL of the UserAdmin service, point your browser to http://localhost:9443/services/UserAdmin?wsdl
Now we can go ahead and write the Python code to consume any of the available admin services. Here’s a working sample that consumes the UserAdmin service. This simply prints out a list of roles defined in the WSO2 User Management component:
from suds.client import Client
from suds.transport.http import HttpAuthenticated
import logging

if __name__ == '__main__':
    #logging.basicConfig(level=logging.INFO)
    #logging.getLogger('suds.client').setLevel(logging.DEBUG)

    t = HttpAuthenticated(username='admin', password='admin')
    client = Client('https://localhost:9443/services/UserAdmin?wsdl', location='https://localhost:9443/services/UserAdmin', transport=t)
    print client.service.getAllRolesNames()
That’s pretty much it. I have tested this approach with several WSO2 admin services, and they all seem to work without any issues. If you need to debug something, uncomment the two commented out lines in the above example. That will print all the SOAP messages and the HTTP headers that are being exchanged.
I also tried to write a client using the popular SOAPy library, but unfortunately couldn’t get it to work due to several bugs in SOAPy. SOAPy was incapable of retrieving the admin service WSDLs over HTTPS. This can be worked around by using the HTTP URL for the WSDL, but in that case SOAPy failed to generate the correct request messages to call the admin services. Basically, the namespaces of the generated SOAP messages were messed up. But with Suds I didn’t run into any issues.