Monday, January 28, 2013

Introducing AppsCake: Makes Deploying AppScale a Piece of Cake

One of my very first contributions to AppScale was a component named AppsCake. AppsCake is a dynamic web component, which provides a web frontend for the command-line AppScale Tools. It enables the users to deploy and start AppScale over several different types of infrastructure. This greatly reduces the overhead of starting and managing a PaaS as most of the heavy lifting operations can be performed easily by a click of a button. Users do not need to learn the AppScale Tools commands nor they have to be familiar with any command-line interface. With AppsCake, a regular web browser is all you need to initialize AppScale and start deploying applications in the cloud.
As of now AppsCake supports deploying AppScale over virtualized clusters (eg: Xen), Amazon EC2 and Eucalyptus. Users can select the environment in which AppScale should be deployed and provide the required credentials and other metadata for the target environment through the web interface. AppsCake takes care of invoking the proper command sequences with the appropriate arguments to initialize AppScale. The web frontend also allows the users to view deployment logs and monitor the deployment progress in near real-time. 
This component can be further extended and be offered as a service of its own if needed. That way, users can access AppsCake through a well-known URL and setup an AppScale deployment remotely for the purpose of executing a specific task or an application. As an example consider a group of scientists who want to run various scientific computations in the cloud (say as MPI or MapReduce jobs).  The group can use a private Eucalyptus cluster or a shared EC2 account as their computing infrastructure. The group can be provided with a single well-known AppsCake instance as the entry point for AppScale. Then whenever a member of the team wants to run a computation on the target shared environment, he or she can use the AppsCake service to initiate his or her own AppScale instance and run the required computation in the cloud. This scheme maximizes resource sharing while providing sufficient isolation between applications/jobs initiated by individual users.
AppsCake is implemented using Ruby and Sinatra. To try this out, simply checkout the source from Github, and execute the bin/ script (build script only supports Debian/Ubuntu environments as of now). Then execute bin/appscake to start the AppsCake web service. Now you can point your browser to https://localhost:28443 and start interacting with the service.
Chris has posted a neat little screencast that explains how to use AppsCake to deploy AppScale on Virtual Box. Don’t forget to check that out too.

Sunday, January 20, 2013

On Premise API Management for Services in the Cloud

In some of my recent posts I explained how to install and start AppScale. I showed how to use AppScale command-line tools to manage an AppScale PaaS on virtualized environments such as Xen and IaaS environments such as EC2 and Eucalyptus. Then we also looked at how to deploy Google App Engine (GAE) apps over AppScale. In this post we are going to try something different.
Here I’m going to describe a possible hybrid architecture for deploying RESTful services in the cloud and exposing those services through an on-premise API management platform. This type of an architecture is most suitable for B2B integration scenarios where one organization provides a range of services and several other organizations consume them with their own custom use cases and SLAs. Both service providers and service consumers can greatly benefit from the proposed hybrid architecture. It enables the API providers to reap the benefits of the cloud with reduced deployment cost, reduced long-term maintenance overhead and reduced time-to-market. API consumers can use their own on-premise API management platform as a local proxy, which provides powerful access control, rate control, analytics and community features on top of the services already deployed in the cloud. 
To try this out, first spin up an AppScale PaaS in a desired cloud environment. You can refer my previous posts or go through the AppScale wiki to learn how to do this. Then we can deploy a simple RESTful web service in our AppScale cloud. Here I’m posting the source code for a simple web service called “starbucks” written in Python using the GAE APIs. The “starbucks” service can be used to submit and manage simple drink orders. It uses the GAE datastore API to store all the application data and exposes all the fundamental CRUD operations as REST calls (Creare = POST, Update = PUT, Read = GET, Delete = DELETE).
  import json
except ImportError:
  import simplejson as json

import random
import uuid
from google.appengine.ext import db, webapp
from google.appengine.ext.webapp.util import run_wsgi_app


class Order(db.Model):
  order_id = db.StringProperty(required=True)
  drink = db.StringProperty(required=True)
  additions = db.StringListProperty()
  cost = db.FloatProperty()

def get_price(order):
  if PRICE_CHART.has_key(order.drink):
    price = PRICE_CHART[order.drink]
    price = random.randint(2, 6) - 0.01
    PRICE_CHART[order.drink] = price
  if order.additions is not None:
    price += 0.50 * len(order.additions)
  return price

def send_json_response(response, payload, status=200):
  response.headers['Content-Type'] = 'application/json'
  if isinstance(payload, Order):
    payload = {
      'id' : payload.order_id,
      'drink' : payload.drink,
      'cost' : payload.cost,
      'additions' : payload.additions

class OrderSubmissionHandler(webapp.RequestHandler):
  def post(self):
    order_info = json.loads(self.request.body)
    order_id = str(uuid.uuid1())
    drink = order_info['drink']
    order = Order(order_id=order_id, drink=drink, key_name=order_id)
    if order_info.has_key('additions'):
      additions = order_info['additions']
      if isinstance(additions, list):
        order.additions = additions
        order.additions = [ additions ]
      order.additions = []
    order.cost = get_price(order)
    self.response.headers['Location'] = self.request.url + '/' + order_id
    send_json_response(self.response, order, 201)

class OrderManagementHandler(webapp.RequestHandler):
    def get(self, order_id):
      order = Order.get_by_key_name(order_id)
      if order is not None:
        send_json_response(self.response, order)

    def put(self, order_id):
      order = Order.get_by_key_name(order_id)
      if order is not None:
        order_info = json.loads(self.request.body)
        drink = order_info['drink']
        order.drink = drink
        if order_info.has_key('additions'):
          additions = order_info['additions']
          if isinstance(additions, list):
            order.additions = additions
            order.additions = [ additions ]
          order.additions = []
        order.cost = get_price(order)
        send_json_response(self.response, order)

    def delete(self, order_id):
      order = Order.get_by_key_name(order_id)
      if order is not None:
        send_json_response(self.response, order)

    def send_order_not_found(self, order_id):
      info = {
        'error' : 'Not Found',
        'message' : 'No order exists by the ID: %s' % order_id,
      send_json_response(self.response, info, 404)

app = webapp.WSGIApplication([
    ('/order', OrderSubmissionHandler),
    ('/order/(.*)', OrderManagementHandler)
], debug=True)

if __name__ == '__main__':
Before we go any further let’s take a few seconds and appreciate how simple and concise this piece of code is. With just about 100 lines of Python code we have developed a comprehensive webapp, which uses JSON as the data exchange format and also does database access and provides decent error handling. Imagine doing the same thing in a language like Java in a traditional servlet container environment. We will have to write lot more code and also bundle a ridiculous amount of additional dependencies to parse and construct JSON and perform database queries. But as seen here, GAE APIs make it absolutely trivial to develop powerful web APIs for the cloud with a minimum amount of code.
You can download the complete “starbucks” application from here. Simply extract the downloaded tar ball and you’re good to go. The webapp consists of just 2 files. The contains all the source code of the app and app.yaml is the GAE webpp descriptor. No additional libraries or files are needed to make this work. Use AppScale-Tools to deploy the app in your AppScale cloud.
appscale-upload-app –-file /path/to/starbucks --keyname my_key_name
To try out the app, put the following JSON string into a file named order.json:
  "drink" : "Caramel Frapaccino",
  "additions" : [ "Whip Cream" ]
Now execute the following Curl request on your App:
curl –v –d @order.json –H “Content-type: application/json” http://host:port/order
Replace 'host' and 'port'  with the appropriate values for your AppScale PaaS. This request should return a HTTP 201 Created response with a Location header.
And now for the API management part. For this I’m going to use the open source API management solution from WSO2, a project that I was a part of a while ago. Download the latest WSO2 API Manager and install it on your local computer by extracting the zip archive. Go into the bin directory and execute (or wso2server.bat for Windows) to start the API Manager. You need to have JDK 1.6 or higher installed to be able to do this.
Once the server is up and running, navigate to http://localhost:9763/publisher and sign in to the console using “admin” as both the username and the password. Go ahead and create an API for our “starbucks” service in the cloud. You can use http://host:port as the service URL where 'host' and 'port' should point to the AppScale PaaS. API creation process should be pretty straightforward. If you need any help, you can refer my past blog posts on WSO2 API Manager or go through the WSO2 documentation. Once the API is created and published, head over to the API Store at http://localhost:9763/store.
Now you can sign up at the API Store as an API consumer, generate an API key for the Starbucks API and start using it.
Submit Order:
curl –v –d @order.json –H “Content-type: application/json” –H “Authorization: Bearer api_key” http://localhost:8280/starbucks/1.0.0/order
Review Order:
curl –v –H “Authorization: Bearer api_key” http://localhost:8280/starbucks/1.0.0/order/order_id
Delete Order:
curl –v –X DELETE –H “Authorization: Bearer api_key” http://localhost:8280/starbucks/1.0.0/order/order_id
Replace 'api_key' with the API key generated by the API Store. Replace the 'order_id' with the unique identifier sent in the response for the submit order request.
There you have it. On-premise API management for services in the cloud. This looks pretty simple at first glimpse, but actually this is a quite powerful architecture. Note that all the critical components (service runtime, registry and consumer) are very well separated from each other, which allows maximum flexibility. The portions in the cloud can benefit from cloud specific features such as autoscaling to deliver the maximum throughput with optimal resource utilization. Since the API management platform is being controlled by individual consumer organizations, they can easily enforce their own custom policies, SLAs and optimize for their common access patterns.

Wednesday, January 9, 2013

How to Get Your Third Party APIs to Shutup?

When programming with 3rd party libraries, sometimes we need to suppress or redirect the standard output generated by the 3rd party libraries. A very common scenario is that a third party library we use in an application generates a very verbose output which clutters up the output of our program. With most programming languages we can write a simple suppress/redirect procedure to fix this problem. Such functions are sometimes colloquially known as STFU functions. Here I'm describing a couple of STFU functions I implemented in some of my recent work.

1. AppsCake (Web interface for AppScale-Tools)
This is a Ruby based dynamic web component which uses some of the core AppScale-Tools libraries. For this project I wanted to capture the standard output of the AppScale-Tools libraries and display it on a web page. As the first step I wanted to redirect the standard output of AppScale-Tools to a separate text file. Here's what I did.
def redirect_standard_io(timestamp)
    orig_stderr = $stderr.clone
    orig_stdout = $stdout.clone
    log_path = File.join(File.expand_path(File.dirname(__FILE__)), "..", "logs")
    $stderr.reopen, "deploy-#{timestamp}.log"), "w")
    $stderr.sync = true
    $stdout.reopen, "deploy-#{timestamp}.log"), "w")
    $stdout.sync = true
    retval = yield
  rescue Exception => e
    puts "[__ERROR__] Runtime error in deployment process: #{e.message}"
    $stdout.reopen orig_stdout
    $stderr.reopen orig_stderr
    raise e
    $stdout.reopen orig_stdout
    $stderr.reopen orig_stderr
Now whenever I want to redirect the standard output and invoke the AppScale-Tools API I can do this.
redirect_standard_io(timestamp) do
   # Call AppScale-Tools API
2. Hawkeye (API fidelity test suite for AppScale)
This is a Python based framework which makes a lot of RESTful invocations using the standard Python httplib API. I wanted to trace the HTTP requests and responses that are being exchanged during the execution of the framework and log them to a separate log file. Python httplib has a verbose mode which can be enabled by passing a special flag to the HTTPConnection class and it turns out this mode logs almost all the information I need. But unfortunately it logs all this information to the standard output of the program thus messing up the output I wanted to present to users. Therefore I needed a way to redirect the standard output for all httplib API calls. Here's how that problem was solved.
http_log = open('logs/http.log', 'a')
original = sys.stdout
sys.stdout = http_log
  # Invoke httplib
  sys.stdout = original

Tuesday, January 8, 2013

Evolution of Networked Computing

  • 1969: As of now, computer networks are still in their infancy, but as they grow up and become sophisticated, we will probably see the spread of 'computer utilities', which like present electric and telephone utilities, will service individual homes and offices across the country.Leonard Kleinrock, UCLA
  • 1984: The network is the computer. - John Gage, Sun Microsystems
  • 2008: The data center is the computer. - David Patterson, UC Berkeley
  • 2008: Cloud is the computer. - Rajkumar Buyya, Melbourne University
From the book "Distributed and Cloud Computing" by Kai Hwang et al.

Thursday, January 3, 2013

The Era of Webapps is Over - Say Hello to Web APIs

I remember a time when developing a web application (webapp) was considered one of the coolest and most exciting feats a software developer could perform. It was not so long ago when a software product with no web application component was considered uncool, unpresentable and unmarketable. A wide range of dynamic web programming technologies and standards were born in this era and they continued to thrive thanks to the unprecedented growth and evolution of the Internet. However today if we take a closer look at how the Internet is being used in our day-to-day lives and in the business world, one thing becomes hauntingly obvious. The era of the webapps has passed. Now is the era of the web APIs.
First of all lets take a closer look at webapps. Webapps are designed to be directly consumed by human users. The user is aware of the URL through which the application can be accessed which he/she enters into a web browser. The web browser then interacts with the remote URL by making HTTP GET requests to pull content and HTTP POST requests to submit content. HTML is used as the primary data exchange format and how the content appears on the user’s web browser (style and formatting) is actually a huge deal. All the important information that should be communicated to the user must be embedded in the HTML payload as that’s the only thing that’s going to get rendered on the screen by the web browser. Things like HTTP status codes and headers do not play a big role in webapps, except perhaps when reporting an error (404 Not Found, 500 Internal Server Error etc) and performing a redirect (302 Found + Location header). 
But webapps are increasingly becoming less interesting to both developers and users. Today it’s all about web APIs. Interestingly most of the technologies that are used to develop webapps can also be used to create and expose web APIs. In fact it’s not wrong to say that web APIs are the next generation webapps and webapp development frameworks mutated into web API development frameworks as a consequence of the natural evolution of the web.
So how do web APIs differ from webapps? And what makes them cooler? Unlike webapps, web APIs are not designed to be directly consumed by human users. They are APIs, meaning they are designed to be consumed by other applications. Developers can use web APIs to construct other high level APIs and end-user applications. The end-user may or may not know the exact location or the URL with which the application is interacting with. Web APIs may use any content exchange format but JSON and XML are the most popular choices. A properly designed web API would use most of the available HTTP methods (at very least GET, POST, PUT, DELETE and OPTIONS) combined with the proper use of HTTP status codes and headers to pass critical control information. Things like layout and formatting don’t mean anything in the web API world but effective use of URL patterns, intuitiveness and simplicity of the APIs mean everything.
The end-user applications that are developed using web APIs can include desktop applications and more interestingly mobile applications. In my opinion this is the last nail in the coffin of the webapps. Simply put people are no longer browsing the web. Rather they use mobile apps. Latest reports and Internet usage surveys show that the amount of mobile Internet traffic is starting to bypass the amount of desktop Internet traffic (see the references section). This means the webapps are increasingly becoming obsolete. To point out a real world example lets take GitHub. GitHub is a pretty cool webapp, one of the best in my personal opinion. We can consume this app in a desktop environment using a traditional web browser such as Firefox. We can do the same with a smart phone, using the mini web browser the device is equipped with (eg: Safari for iPhones). But that’s not good enough for us. We need a native GitHub app for our smart phones. And as a result we now have the GitHub app for iPhone and Android platforms. Soon these mobile apps will become dominant traffic sources of GitHub. This has already happened to many of the popular social networking service providers such as Twitter and Facebook. The web API is a more critical component in these systems compared to their webapp counterparts. 
This transition from webapps to web APIs is a crucial one for technology driven companies. They have to carefully assess this trend and adjust their game plans accordingly. Many organizations have already realized the shift towards the web APIs and started to expose their business functionality as web APIs rather than as webapps. Web APIs open up businesses towards a larger and more diverse clientele with ample future proofing and more room for change. This massive push towards APIs has also given birth to the field of API management, which has now become a business of its own and starting to produce many lucrative business opportunities around the world. The growing number of on-premise and cloud API management solutions (Layer7, Apigee, Mashery etc) available is a testament to this fact.
Perhaps the most impacted by this change are the developers. They need to take a whole different stance in the way they think about software solutions that run over the web. They should learn to implement systems for other developers rather than for end users. They should learn to optimize systems for machine-to-machine interaction rather than for human computer interaction. They need to pay a lot of attention to little things like using proper HTTP codes, using proper HTTP headers, and adhering to open standards. Problems they are used to be messing with such as improving the browser compatibility of webapps and session management are becoming less important by day. They will have to learn to pay less attention to things like form authentication and adopt more HTTP-friendly security mechanisms such as BasicAuth and OAuth. API management is going to become a mainstream technology and developers will have to start treating it as primary development tool just like version controlling and issue tracking. Technology platform providers will also have to take this shift seriously. Developers no longer need webapp development frameworks. They need web API development frameworks. This is why platforms like Google App Engine has become so successful. Their staying power resides in their ability to facilitate the development of powerful web APIs with very little amount of code.
So does all this mean traditional webapps are dead (as in dead for good)? Well, not really. The need for webapps will always be there, at least for the foreseeable future. But we will see more dominant deployment and adoption of web APIs compared to webapps. Webapps will become the Cobol of 21st century; Thousands of lines of code are written every year, but nobody gives a damn. Newly implemented webapps will sit on a layer of web APIs making the webapp just one of the many frontends of a larger system. 
All in all, if you’re a developer, some very exciting times are ahead of you. So buckle up!