Wednesday, December 3, 2014

Controlled Concurrency with Golang

Lately I have been doing a lot of programming in Golang. It is one of those languages which is somewhat difficult to fully grasp at the beginning. But a few hundred lines of code later, you feel like you cannot get enough of it -- very simple syntax, brilliant performance and very clean and precise API semantics. This language has got it all.
Concurrent programming is one area where Golang really excels at. The goroutines that make it trivial to start concurrent threads of execution, channels as a first-class programming construct and a plethora of built-in utilities and packages (e.g. sync) make the developer's life a lot easier. In this post I'm going to give a brief overview on how to instantiate new threads of execution in Golang. Lets start with a piece of sequential code:
for i := 0; i < len(array); i++ {
Above code iterates over an array, and calls the function doSomething for each element in the array. But this code is sequential, which means doSomething(n) won't be called until doSomething(n-1) returns. Suppose you want to speed things up a little bit by running multiple invocations of doSomething in parallel (assuming it is safe to do so -- both control and data wise). In Golang this is all you have to do:
for i := 0; i < len(array); i++ {
  go doSomething(array[i])
The go keyword will start the doSomething function as a separate concurrent goroutine. But this code change causes an uncontrolled concurrency situation. In other words, the only thing that's limiting the number of parallel goroutines spawned by the program is the length of the array, which is not a good idea if the array has thousands of entries. Ideally, we need to put some kind of a fixed cap on how many goroutines are spawned by the loop. This can be easily achieved by using a channel with a fixed capacity.
c := make(chan bool, 8)
for i := 0; i < len(array); i++ {
  c <- true
  go func(index int){
    <- c
We start by creating a channel that can hold at most 8 boolean values. Then inside the loop, whenever we spawn a goroutine, we first send a boolean value (true) into the channel. This operation will get blocked if the channel is already full (i.e it has 8 elements). Then in the goroutine, we remove an element from the channel before we return. This little trick makes sure that at most 8 parallel goroutines will be active in the program at any given time. If you need to change this limit, you simply have to change the max capacity of the channel. You can set this to a fixed number, or write some code to figure out the optimal value based on the number of CPU cores available in the system.

Saturday, November 22, 2014

Running Python from Python

It has been pointed out to me that I don't blog as often as I used to. So here's a first step towards rectifying that.
In this post, I'm going to briefly describe the support that Python provides for processing, well, "Python". If you're using Python for simple scripting and automation tasks, you might often have to load, parse and execute other Python files from your code. While you can always "import" some Python code as a module, and execute it, in many situations it is impossible to determine precisely at the development time, which Python files your code needs to import. Also some Python scripts are written as simple executable files, which are not ideal for inclusion via import. To deal with cases such as these, Python provides several built-in features that allow referring to and executing other Python files.
One of the easiest ways to execute an external Python file is by using the built-in execfile function. This function takes the path to another Python file as the only mandatory argument. Optionally, we can also provide a global and a local namespace. If provided, the external code will be executed within those namespace contexts. This is a great way to exert some control over how certain names mentioned in the external code will be resolved (more on this later).
Another way to include some external code in your script is by using the built-in __import__ function. This is the same function that gets called when we use the usual "import" keyword to include some module. But unlike the keyword, the __import__ function gives you lot more control over certain matters like namespaces.
Another way to run some external Python code from your Python script is to first read the external file contents into memory (as a string), and then use the exec keyword on it. The exec keyword can be used as a function call or as keyword statement.
code_string = load_file_content('/path/to/')
Similar to the execfile function, you have the option of passing custom global and local namespaces. Here's some code I've written for a project that uses the exec keyword:
globals_map = globals().copy()
globals_map['app'] = app
globals_map['assert_app_dependency'] = assert_app_dependency
globals_map['assert_not_app_dependency'] = assert_not_app_dependency
globals_map['assert_app_dependency_in_range'] = assert_app_dependency_in_range
globals_map['assert_true'] = assert_true
globals_map['assert_false'] = assert_false
globals_map['compare_versions'] = compare_versions
    exec(self.source_code, globals_map, {})
except Exception as ex:
    utils.log('[{0}] Unexpected policy exception: {1}'.format(, ex))
Here I first create a clone of the current global namespace, and pass it as an argument to the exec function. The clone is discarded at the end of the execution. This makes sure that the code in the external file does not pollute my existing global namespace. I also add some of my own variables and functions (e.g assert_true, assert_false etc.) into the global namespace clone, which allows the external code to refer to them as built-in constructs. In other words, the external script can be written in a slightly extended version of Python.
There are other neat little tricks you can do using the constructs like exec and execfile. Go through the official documentation for more details.

Wednesday, May 14, 2014

Java Code Analysis and Optimization with Soot

This is a quick shout out about the project Soot. If you're doing anything even remotely related to static analysis in Java, Soot is the way to go. It's simple, open source, well documented and extremely powerful. Soot can analyze any Java program (source or bytecode), and provide you with the control flow graph (CFG). Here's an example that shows how to construct the CFG for the main method of a class named MyClass.
SootClass c = Scene.v().loadClassAndSupport("MyClass");
SootMethod m = c.getMethodByName("main");
Body b = m.retrieveActiveBody();
UnitGraph g = new BriefUnitGraph(b);
Once you get your hands on the CFG, you can walk it, search it and do anything else you would normally do with a graph data structure. 
Soot converts Java code into one of four intermediate representations (Jimple, Baf, Shimple and Grimp). These representations are designed to make it easier to analyze programs written in Java. For example, Jimple maps Java code from its typical stack-based model to a three-registers-based model. You can also make modifications/optimizations to the code and try out new ideas for compiler and runtime optimizations. Alternatively you can "tag" instructions with metadata which can be helpful in building new development tools with powerful code visualization capabilities.
Soot also provides a set of APIs for performing data flow analysis. These APIs can help you to code anything from live variable analysis to very busy expression analysis and more. And finally, Soot can also be invoked from the command-line without having to write any extension code.
So if you have any cool new ideas related to program analysis or optimization, grab the latest version of Soot. Whatever it is that you're trying to do, I'm sure Soot can help you implement it.

Thursday, January 2, 2014

Calling WSO2 Admin Services in Python

I’m using some WSO2 middleware for my ongoing research, and recently I had the requirement of calling some admin services from Python 2.7. All WSO2 products expose a number of special administrative web services (admin services), using which the WSO2 server instances can be controlled, configured and monitored. In fact, all the web-based UI components that ship with WSO2 middleware make use of these admin services under the hood to manage the server runtime.
WSO2 admin services are SOAP services (based on Apache Axis2), and are secured using HTTP basic authentication. All admin services expose a WSDL document using which client applications can be written or generated to consume the admin services. In this post I’m going to summarize how to implement a simple Python client to consume the WSO2 admin services.
We will be writing our Python client using the Suds SOAP library for Python. Suds is simple, lightweight and extremely easy to use. As the first step, we should install Suds. Depending on the Python package manager you wish to use, one of the following commands should do the trick (tested on OS X and Ubuntu):
sudo easy_install suds
sudo pip install suds
Next we need to instruct the target WSO2 server product to expose the admin service WSDLs. By default these WSDLs are hidden. To unhide them, open up the repository/conf/carbon.xml file of the WSO2 product, and set the value of HideAdminServiceWSDLs parameter to false:
Now restart the WSO2 server, and you should be able to access the admin service WSDLs using a web browser. For example, to access the WSDL of the UserAdmin service, point your browser to http://localhost:9443/services/UserAdmin?wsdl
Now we can go ahead and write the Python code to consume any of the available admin services. Here’s a working sample that consumes the UserAdmin service. This simply prints out a list of roles defined in the WSO2 User Management component:
from suds.client import Client
from suds.transport.http import HttpAuthenticated
import logging

if __name__ == '__main__':

    t = HttpAuthenticated(username='admin', password='admin')
    client = Client('https://localhost:9443/services/UserAdmin?wsdl', location='https://localhost:9443/services/UserAdmin', transport=t)
    print client.service.getAllRolesNames()
That’s pretty much it. I have tested this approach with several WSO2 admin services, and they all seem to work without any issues. If you need to debug something, uncomment the two commented out lines in the above example. That will print all the SOAP messages and the HTTP headers that are being exchanged.
I also tried to write a client using the popular SOAPy library, but unfortunately couldn’t get it to work due to several bugs in SOAPy. SOAPy was incapable of retrieving the admin service WSDLs over HTTPS. This can be worked around by using the HTTP URL for the WSDL, but in that case SOAPy failed to generate the correct request messages to call the admin services. Basically, the namespaces of the generated SOAP messages were messed up. But with Suds I didn’t run into any issues.

Friday, July 26, 2013

Avoiding the Risks of Cloud

It's no secret that cloud computing has transformed the way enterprises do business. It has changed the way developers write software and users interact with applications. By now, almost every business organization has a strategy on how to adopt the cloud. Those who don’t will soon be extinct. The influence of the cloud has been so phenomenal, that it truly has turned into a "take it or die" kind of a deal over the last few years.
It is also no secret that today the cloud movement is steered by a handful of giants in the IT industry. Companies like Amazon, Google, Microsoft and Salesforce are clearly among this elite group. These companies, their products and vision have been instrumental in the introduction, evolution and the popularization of the cloud technology. 
With that being the case, we must think about the implications of cloud computing on the current IT landscape of the world. Are all S&M organizations around the world going to get rid of their server racks and transfer their IT infrastructure to Amazon EC2? Are all Web applications and mobile applications going to be based on Google App Engine APIs? Are all enterprise data going to end up in Amazon S3 and Google Megastore? What sort of defenses are in place to prevent a few IT giants from monopolizing the entire IT infrastructure and services market? How easy it would be for us to migrate from one cloud vendor to another? All these are indeed very real and very important problems that all organizations should take under careful consideration.
Fortunately there are several practical solutions to all the above issues. One is openness and standardization. Cloud platforms that are based on open standards and protocols should be preferred over those that use proprietary standards and protocols. Open standards and protocols are likely to be supported by more than just one cloud vendor thus enabling the users to migrate between different vendors easily. Also, in many cases open standards make it easier to port existing standalone applications to the cloud. Take a Java web application for an example. Most Java web applications are based on the J2EE suite of standards (JSP, Servlets, JDBC etc.). If the target cloud platform also supports these open standards, the user can easily migrate his J2EE app to the cloud without having to make too many changes. Similarly he can easily migrate the app from one cloud platform to another as long as both platforms support the same J2EE standards. 
Speaking of openness, cloud platforms that are open source and distributed under liberal licenses should get extra credit over closed source ones. Open source cloud platforms allow the user to modify and shape the platform according to the user requirements, rather than forcing the user to change their apps according to the changes made by the cloud platform vendor. Also, with an open source cloud framework, users will be in a position to maintain and support the platform on their own, in a situation where the original vendor decides to discontinue support for the platform.
Another possible solution is to use a hybrid cloud approach instead of solely relying on a remote public cloud maintained by a third party vendor. A hybrid cloud approach typically involves a private cloud maintained by the user, and then selectively bursting into the public cloud to handle high availability and high scalability scenarios. This method does involve some additional expenses and legwork on the user's part but the user ultimately remains in control of his data and applications, and no third party vendor can take that away from the user. Also as far as most S&M organizations are concerned, what they expect from the cloud are features like multi-tenancy, self-provisioning, optimal resource utilization and auto-scaling. Spending a few bucks on running a server rack or two to make that happen is usually not a big deal. Most companies do that today anyway. However, from a technical standpoint, we need easy-to-deploy, easy-to-maintain and reliable private cloud frameworks, which are compatible with popular public cloud platforms to really take advantage of this hybrid cloud model. Fortunately, thanks to some excellent work by a few start-ups like Eucalyptus and AppScale, this is no longer an issue. These vendors provide highly competitive private cloud and hybrid cloud solutions that are fully compatible with widely used public cloud platforms such as AWS and Google App Engine. If the user is capable of procuring the necessary hardware resources and manpower, these cloud platforms can even be used to setup fully-fledged private clouds that have all the bells and whistles of popular public clouds. That’s a great way to bask in the glory of the cloud, while maintaining full ownership and control over your enterprise IT assets.
Software frameworks like Apache JClouds provide another approach for dealing with potential risks of the cloud. These software frameworks allow user's code to interact with multiple heterogeneous cloud platforms by abstracting out the differences between various clouds. If we consider JClouds, as of now it supports close to 30 different cloud platforms including AWS, OpenStack and Rackspace. This implies that any application written using JClouds can be executed on around 30 different cloud platforms without having to make any code changes. As the influence of the cloud continues to grow, developers should seriously consider writing their code using high-level APIs like JClouds, without getting tied into a single specific cloud platform.
Cloud has certainly changed the way we all think about IT and computing. While its benefits are quite attractive, it also comes with a few potential risks. Users and developers should think carefully, plan ahead and take preventive action soon to avoid these pitfalls.

Friday, June 21, 2013

White House API Standards, DX and UX

The White House recently published some standards for developing web APIs. While going through the documentation, I came across a new term - DX. DX stands for developer experience. As anybody would understand, providing a good developer experience is the key to the success of a web API. Developers love to program with clean, intuitive APIs. On the other hand clunky, non-intuitive APIs are difficult to program with and usually are full of nasty surprises that make the developer's life hard. Therefore DX is perhaps the single most important factor when it comes to differentiating a good API from a not-so-good API.
The term DX reminds me of another similar term - UX. As you would guess UX stands for user experience. A few years ago UX was one of the most exciting topics in the IT industry. For a moment there everybody was talking and writing about UX and how websites and applications should be developed with UX best practices in mind. It seems with the rise of the web APIs, cloud and mobile apps, DX is starting to generate a similar buzz. In fact I think for a wide range of application development, PaaS, web and middleware products DX would be way more important than UX. Stephen O'Grady was so right. Developers are the new kingmakers