Wednesday, May 14, 2014

Java Code Analysis and Optimization with Soot

This is a quick shout out about the project Soot. If you're doing anything even remotely related to static analysis in Java, Soot is the way to go. It's simple, open source, well documented and extremely powerful. Soot can analyze any Java program (source or bytecode), and provide you with the control flow graph (CFG). Here's an example that shows how to construct the CFG for the main method of a class named MyClass.
SootClass c = Scene.v().loadClassAndSupport("MyClass");
SootMethod m = c.getMethodByName("main");
Body b = m.retrieveActiveBody();
UnitGraph g = new BriefUnitGraph(b);
Once you get your hands on the CFG, you can walk it, search it and do anything else you would normally do with a graph data structure. 
Soot converts Java code into one of four intermediate representations (Jimple, Baf, Shimple and Grimp). These representations are designed to make it easier to analyze programs written in Java. For example, Jimple maps Java code from its typical stack-based model to a three-registers-based model. You can also make modifications/optimizations to the code and try out new ideas for compiler and runtime optimizations. Alternatively you can "tag" instructions with metadata which can be helpful in building new development tools with powerful code visualization capabilities.
Soot also provides a set of APIs for performing data flow analysis. These APIs can help you to code anything from live variable analysis to very busy expression analysis and more. And finally, Soot can also be invoked from the command-line without having to write any extension code.
So if you have any cool new ideas related to program analysis or optimization, grab the latest version of Soot. Whatever it is that you're trying to do, I'm sure Soot can help you implement it.

Thursday, January 2, 2014

Calling WSO2 Admin Services in Python

I’m using some WSO2 middleware for my ongoing research, and recently I had the requirement of calling some admin services from Python 2.7. All WSO2 products expose a number of special administrative web services (admin services), using which the WSO2 server instances can be controlled, configured and monitored. In fact, all the web-based UI components that ship with WSO2 middleware make use of these admin services under the hood to manage the server runtime.
WSO2 admin services are SOAP services (based on Apache Axis2), and are secured using HTTP basic authentication. All admin services expose a WSDL document using which client applications can be written or generated to consume the admin services. In this post I’m going to summarize how to implement a simple Python client to consume the WSO2 admin services.
We will be writing our Python client using the Suds SOAP library for Python. Suds is simple, lightweight and extremely easy to use. As the first step, we should install Suds. Depending on the Python package manager you wish to use, one of the following commands should do the trick (tested on OS X and Ubuntu):
sudo easy_install suds
sudo pip install suds
Next we need to instruct the target WSO2 server product to expose the admin service WSDLs. By default these WSDLs are hidden. To unhide them, open up the repository/conf/carbon.xml file of the WSO2 product, and set the value of HideAdminServiceWSDLs parameter to false:
Now restart the WSO2 server, and you should be able to access the admin service WSDLs using a web browser. For example, to access the WSDL of the UserAdmin service, point your browser to http://localhost:9443/services/UserAdmin?wsdl
Now we can go ahead and write the Python code to consume any of the available admin services. Here’s a working sample that consumes the UserAdmin service. This simply prints out a list of roles defined in the WSO2 User Management component:
from suds.client import Client
from suds.transport.http import HttpAuthenticated
import logging

if __name__ == '__main__':

    t = HttpAuthenticated(username='admin', password='admin')
    client = Client('https://localhost:9443/services/UserAdmin?wsdl', location='https://localhost:9443/services/UserAdmin', transport=t)
    print client.service.getAllRolesNames()
That’s pretty much it. I have tested this approach with several WSO2 admin services, and they all seem to work without any issues. If you need to debug something, uncomment the two commented out lines in the above example. That will print all the SOAP messages and the HTTP headers that are being exchanged.
I also tried to write a client using the popular SOAPy library, but unfortunately couldn’t get it to work due to several bugs in SOAPy. SOAPy was incapable of retrieving the admin service WSDLs over HTTPS. This can be worked around by using the HTTP URL for the WSDL, but in that case SOAPy failed to generate the correct request messages to call the admin services. Basically, the namespaces of the generated SOAP messages were messed up. But with Suds I didn’t run into any issues.

Friday, July 26, 2013

Avoiding the Risks of Cloud

It's no secret that cloud computing has transformed the way enterprises do business. It has changed the way developers write software and users interact with applications. By now, almost every business organization has a strategy on how to adopt the cloud. Those who don’t will soon be extinct. The influence of the cloud has been so phenomenal, that it truly has turned into a "take it or die" kind of a deal over the last few years.
It is also no secret that today the cloud movement is steered by a handful of giants in the IT industry. Companies like Amazon, Google, Microsoft and Salesforce are clearly among this elite group. These companies, their products and vision have been instrumental in the introduction, evolution and the popularization of the cloud technology. 
With that being the case, we must think about the implications of cloud computing on the current IT landscape of the world. Are all S&M organizations around the world going to get rid of their server racks and transfer their IT infrastructure to Amazon EC2? Are all Web applications and mobile applications going to be based on Google App Engine APIs? Are all enterprise data going to end up in Amazon S3 and Google Megastore? What sort of defenses are in place to prevent a few IT giants from monopolizing the entire IT infrastructure and services market? How easy it would be for us to migrate from one cloud vendor to another? All these are indeed very real and very important problems that all organizations should take under careful consideration.
Fortunately there are several practical solutions to all the above issues. One is openness and standardization. Cloud platforms that are based on open standards and protocols should be preferred over those that use proprietary standards and protocols. Open standards and protocols are likely to be supported by more than just one cloud vendor thus enabling the users to migrate between different vendors easily. Also, in many cases open standards make it easier to port existing standalone applications to the cloud. Take a Java web application for an example. Most Java web applications are based on the J2EE suite of standards (JSP, Servlets, JDBC etc.). If the target cloud platform also supports these open standards, the user can easily migrate his J2EE app to the cloud without having to make too many changes. Similarly he can easily migrate the app from one cloud platform to another as long as both platforms support the same J2EE standards. 
Speaking of openness, cloud platforms that are open source and distributed under liberal licenses should get extra credit over closed source ones. Open source cloud platforms allow the user to modify and shape the platform according to the user requirements, rather than forcing the user to change their apps according to the changes made by the cloud platform vendor. Also, with an open source cloud framework, users will be in a position to maintain and support the platform on their own, in a situation where the original vendor decides to discontinue support for the platform.
Another possible solution is to use a hybrid cloud approach instead of solely relying on a remote public cloud maintained by a third party vendor. A hybrid cloud approach typically involves a private cloud maintained by the user, and then selectively bursting into the public cloud to handle high availability and high scalability scenarios. This method does involve some additional expenses and legwork on the user's part but the user ultimately remains in control of his data and applications, and no third party vendor can take that away from the user. Also as far as most S&M organizations are concerned, what they expect from the cloud are features like multi-tenancy, self-provisioning, optimal resource utilization and auto-scaling. Spending a few bucks on running a server rack or two to make that happen is usually not a big deal. Most companies do that today anyway. However, from a technical standpoint, we need easy-to-deploy, easy-to-maintain and reliable private cloud frameworks, which are compatible with popular public cloud platforms to really take advantage of this hybrid cloud model. Fortunately, thanks to some excellent work by a few start-ups like Eucalyptus and AppScale, this is no longer an issue. These vendors provide highly competitive private cloud and hybrid cloud solutions that are fully compatible with widely used public cloud platforms such as AWS and Google App Engine. If the user is capable of procuring the necessary hardware resources and manpower, these cloud platforms can even be used to setup fully-fledged private clouds that have all the bells and whistles of popular public clouds. That’s a great way to bask in the glory of the cloud, while maintaining full ownership and control over your enterprise IT assets.
Software frameworks like Apache JClouds provide another approach for dealing with potential risks of the cloud. These software frameworks allow user's code to interact with multiple heterogeneous cloud platforms by abstracting out the differences between various clouds. If we consider JClouds, as of now it supports close to 30 different cloud platforms including AWS, OpenStack and Rackspace. This implies that any application written using JClouds can be executed on around 30 different cloud platforms without having to make any code changes. As the influence of the cloud continues to grow, developers should seriously consider writing their code using high-level APIs like JClouds, without getting tied into a single specific cloud platform.
Cloud has certainly changed the way we all think about IT and computing. While its benefits are quite attractive, it also comes with a few potential risks. Users and developers should think carefully, plan ahead and take preventive action soon to avoid these pitfalls.

Friday, June 21, 2013

White House API Standards, DX and UX

The White House recently published some standards for developing web APIs. While going through the documentation, I came across a new term - DX. DX stands for developer experience. As anybody would understand, providing a good developer experience is the key to the success of a web API. Developers love to program with clean, intuitive APIs. On the other hand clunky, non-intuitive APIs are difficult to program with and usually are full of nasty surprises that make the developer's life hard. Therefore DX is perhaps the single most important factor when it comes to differentiating a good API from a not-so-good API.
The term DX reminds me of another similar term - UX. As you would guess UX stands for user experience. A few years ago UX was one of the most exciting topics in the IT industry. For a moment there everybody was talking and writing about UX and how websites and applications should be developed with UX best practices in mind. It seems with the rise of the web APIs, cloud and mobile apps, DX is starting to generate a similar buzz. In fact I think for a wide range of application development, PaaS, web and middleware products DX would be way more important than UX. Stephen O'Grady was so right. Developers are the new kingmakers

Wednesday, June 19, 2013

Is Subversion Going to Make a Come Back?

The Apache Software Foundation (ASF) announced the release of Subversion 1.8 yesterday. As I started to read the release note, I started wondering how come Subversion is still alive. The ASF heavily use Subversion for pretty much everything. In fact the source code of Subversion is also managed using a Subversion repository. But outside the ASF I've seen a strong push towards switching from Subversion to Git. Most startups and research groups that I know of have been using Git from day one. WSO2, the company I used to work for, is in the process of moving their code to Git. Being an Apache committer I obviously have to use Subversion regularly. But about a year ago I started using Git (GitHub to be exact) for my other development activities, and I absolutely adore it. It scales well for large code bases and large development teams, and it makes common tasks such as merging, reverting, reviewing other people's work and branching so much easier and intuitive. 
But as it turns out Subversion is still the world's most widely used source version control system. As declared in the official blog post rolled out by the ASF yesterday, a number of tech giants including WordPress heavily use Subversion. According to Ohloh, the percentage of open source projects that use Subversion is around 53%, compared to the 29% that use Git. Looks like Subversion has managed to capture quite a share of the market making it a very hard-to-kill technology. It would be interesting to see how the competition between Subversion and Git would unfold in the future. It seems the new release comes with a bunch of new features, which indicates that the project is very much alive and kicking and the Subversion community is not even close to giving up on the project.

Friday, June 14, 2013

More Reasons to Love Python - A Lesson on KISS

Recently I've been doing some work in the area of programming language design. At one point I wanted to define a Python subset which allows only the simplest Python statements without loops, conditionals, functions, classes and a bunch of other high-level constructs. So I looked into the grammar specification of the Python language and I was astonished by its simplicity and succinctness. Click here to take a look for yourself. It's no longer than 125 lines of text, and the whole thing can be printed on one side of an A4 sheet. This is definitely one of those instances where the best design is also the simplest design. No wonder everybody loves Python.
However that's not the whole point. Having selected a suitable Python subset, I was looking into ways for implementing a simple parser for those grammar rules. I've done some work with JavaCC in the past, so I straightaway jumped into implementing a Java-based parser for the selected Python subset using JavaCC. After a few hours of coding I managed to get it working too. The next step of my project required me to do some analysis on the abstract syntax tree (AST) produced by the parser. I was looking around for some existing work that fits my requirements, and I came across Python's native ast module. I immediately realized that all those hours I spent on implementing the JavaCC-based parser is a complete waste. The ast module provides excellent support for parsing Python code and constructing ASTs. This is all you have to do parse some Python code using the ast module and obtain an AST representation of the code.
import ast

# The variable 'source' contains the Python statement to be parsed
source = 'x = y + z'
tree = ast.parse(source)
The ast module supports several modes. The default mode is exec which supports parsing a sequence of Python statements. The module also supports a special eval mode which can be used to parse simple one-liner Python statements. It turned out the eval mode supports more or less the same exact Python subset I wanted to use. So I threw away my JavaCC-based parser and wrote the following snippet of Python code to get my job done.
import ast

# The variable 'source' contains the Python statement to be parsed
source = 'x = y + z'
tree = ast.parse(source, mode='eval')
Now when it came to analyzing the AST produced by the parser, the ast module again turned out to be useful. The module provides two helper classes, namely NodeVisitor and NodeTransformer which can be used to either traverse or transform a given Python AST. To use these helper classes, we just need to extend them and implement the appropriate visit methods. There's a unique top level visit method and one visit_ method per AST node type (e.g. visit_Str, visit_Num, visit_BoolOp etc.). Here's an example NodeVisitor implementation, that flattens a given Python AST into a list.
class NodeEnumerator(ast.NodeVisitor):
  def get_node_list(self, tree):
    self.nodes = []
    return self.nodes

  def visit(self, node):
These helper classes can be used to do virtually anything with a given AST. If you want you can even implement a Python interpreter in Python using this approach. In my case I'm running some search and isomorphism detection algorithms on the Python AST's.
So once again I've been pleasantly surprised and deeply impressed by the simplicity and richness of Python. It looks like the designers of Python have thought of everything. Kudos to Python aside, this whole experience taught me to always looks for existing, simple solutions before doing it in my own complicated way. It actually reminds me of the good old KISS principle - "Keep It Simple, Stupid".