Wednesday 2 March 2011

Introducing “Sydney Trains & Ferries”

So my first app submission, it’s a great feeling.

There’s a little bit more the ST&F than a phone app with a bit of data so I thought it’d be a good idea to break it all down. ST&F is probably more Azure than it is Windows phone. Not expecting to make much on it also presents some interesting dilemmas and really challenged me to be razor sharp about what’s really important when it comes to cloud architecture. In other words – cost.

The WP7 app

The app is relatively straight forward. While uniqueness can be good, sticking to conventions allows users to focus on achieving their goals rather than wowing at chrome.

I need to give a big shout out to the developer community, for tools and guidance:

The source data

ST&F gets its data from http://www.131500.com.au. Once unzipped, it’s a 1.5-2GB XML dataset comprising about 70 different transport services in and around Sydney. The schema itself is quite challenging to deal with. My first attempt at transforming a complete dataset took around 3 hours to run on my desktop PC, which is far from ideal. So as they say … to the cloud!

The data upload routine has now been farmed out to windows azure worker roles, the transformation now takes around 15 minutes to run using 4 worker roles, spinning up the instances often takes more time than the run itself (hopefully an issue that will be addressed by the azure compute team one day). So for about a dollar I save myself a whole bunch on frustration. It still has it’s issues – some documents are co-dependant, so there’s some workflow involved that’s invoked manually. In the near future this will be mitigated, but that’s a subject for another post.

Oh, did I mention I despise XML namespaces? They are everything that’s wrong with the world …. well almost.

The eternal problem of (current) cloud services

I’ll give you 2 guesses …. oh you got it on the first – cost.

While excelling in being able to handle the cost of burst loads like data transformations effectively, what cloud compute sucks at is low volume. Minimum cost of an SLA compliant web solution is US$180. Eeep. Sure you can use a stand alone extra small instance, but you’re still looking at $40+, and for less that that I can use old fashion web hosting, where’s the fun in that?

Time to rethink what you thought about ‘web services’.

App Data

What I realised while building the app is that while on the surface one thinks of “querying” service information, in fact, most lookups are unique key based requests like “Get Service Times For Stop Location”. As my previous posts spoil – I started from a typical “Table/Entity” mindset and tried this in Azure table store with frustrating results. It’s a bit challenging querying Table Store from WP7, which meant I’d most likely have to run a web role, which was a cost I was keen to avoid.

Taking a step back I realized that being able to identify each request is actually unique – things like “Chatswood Station” or “Bankstown Line” means each request can be represented in REST, so, blobs to the rescue! The data processing done in azure now transforms hierarchical semi relational data into thousands of static blobs. Sure there’s a ton of duplicated data, but blob store is cheap as chips per GB and I don’t need to run a web role. I have to watch out for the number of requests, but that’s mitigated by caching in the Sterling database.

In Summary

I had a lot of fun, but this is just the beginning. First the phone app needs to pass cert, then there are still a number of perspectives and views I need to present. The azure processing pipeline is also undergoing some re-engineering, so stay posted.

No comments: