Sunday, October 4, 2009

A Quick Spin with Google App Engine

Google App Engine (GAE) lets you build a Web app on your local machine, then deploy it to Google’s infrastructure. I played with GAE this weekend and was impressed.

The set-up and “hello world” experience was fast and painless. The tutorial and example code were excellent. Less than 30 minutes from downloading the software development kit, I felt like I understood the basics and could start doing my own stuff. It’s a great feeling to learn that quickly. Credit goes to the GAE developers and documenters, who went the extra mile to make things easy for new users like me. Thank you.

For the sake of doing something simple but practical, I wrote an app that automatically keeps my Twitter tweets backed up in the Google cloud. It uses the Twitter API to get my tweets, then stores them in GAE’s datastore so they can be queried and displayed by various fields. As examples of the possible outputs, I created a page that displays just the tweets’ content, and another that dumps every field of every tweet as tab-delimited text.

I got all the above done and deployed in three hours. The app is now running on Google’s infrastructure, checking for new tweets every 30 minutes and storing what it finds.

So, from my brief experience, a small Web app can be developed and deployed rapidly on GAE. Compared to other cloud platforms, which give you more flexibility at the cost of more configuration and administration, GAE seems particularly well suited to quick, small solutions. I suspect that the bigger your ambitions become, the more GAE’s simplifying aspects will become obstacles. However, depending on what you’re trying to do, there may be a lot of headroom until bigger becomes a problem. (In this context, bigger means more functionality, code, and dependencies, not more traffic. For the latter, if you do things the GAE way, your app will benefit from Google’s infrastructure and should handle as much traffic as you can attract.)

For the technically inclined, here are some additional notes:

I used GAE’s sandboxed version of Python. GAE has an equivalent for Java and other languages that run on the Java Virtual Machine.

Instead of having access to a file system or relational database, you use GAE’s datastore for the equivalent of local storage. At first, it feels like an object-relational mapping, where you define a Python class for each kind of entity you want to store. For example, you might define a class Person, with instance variables name, birthdate, and so on. If you create a Web form that allows someone to submit his or her name, birthdate, and other information, your app would take the input and instantiate a Person object, p. Storing it would be as simple as p.put().

However, GAE’s datastore is not relational, so if you go beyond retrieving all objects of type Person, you’ll need to learn some new ways of doing SQLish things. If I had gone deeper here, I’m sure I would have encountered a steeper part of the GAE learning curve. Considering this aspect of GAE is furthest from what most programmers know, it’s an area where the documentation and examples could benefit from being more extensive.

About the development process: You develop on your local machine, using the SDK’s app server and a simulated, local version of the datastore. When you make code changes, you just hit the URL you changed, and the new version will be called. When something breaks, you get prolific debugging info back.

Once you’ve deployed an app, the Web-based management dashboard is surprisingly good, especially the logging UI.

I only saw one inconsistency between the development version of my app running locally on my computer versus the deployed version on Google’s servers: The deployed version’s Twitter API requests were often denied by the Twitter server. This was not caused by the GAE technology. Rather, it was due to other GAE apps on the same IP(s) as my app, pounding Twitter hard enough to cause Twitter’s servers to rate-limit said IP(s). It was guilt by association, cloud-computing-style.

In theory, I could have authenticated my requests to Twitter, thus avoiding the IP limit. In practice, authenticating my requests would have required either including my Twitter password with requests (distastefully insecure) or implementing oAuth (distastefully complex for this little project). So, in the name of “good enough for now,” I decided to let some requests be denied. I found that if the script checked Twitter every 30 minutes, it succeeded often enough to stay reasonably current with any changes throughout the day.

No comments:

Post a Comment