Ramble On: Grid

I was catching up on some wreading (web-reading - geddit?) the other day and stumbled across Julian Browne's blogsite. Julian was the Head of Architecture at Virgin Mobile that commissioned my company to build a space-based reliable transaction layer into the web platform they were putting together at the time. I did a write-up which can be found on the PSJ site in the 'whitepapers' section. There's also a more Giga-centric write-up here.

Anyway, towards the end of Julian's time at VM he was experiencing the joys of being consulted at by one of the big four consultancies. Julian's blog has a very droll take on the experience of being involved in externally-driven "change programmes" - that's 'change' as in you won't have any left after they've picked your pocket.

Julian's blog uses a child-friendly Scoobie-Doo analogy to make his point. Having been a spectator at a few of these events I'd characterize it more as gang rape. My heart has sunk on more than one occasion when I've been working for a client with a few teething problems but generally stumbling along in the right direction (and that's all we can really hope for, kids) goes down the death by consultancy track.

Reading Julian's posting made me recall one project I worked on for a now defunct investment bank and was a classic case in point. The company I worked for at the time was working with them to implement a system that the previous time around with another bank had taken 7 of our people and 5 of theirs and done in a respectable timeframe. Within minutes of engaging said consulting firm the sky became black with parachuting consultant types. Hmmm, the metaphors are getting mixed as the anger builds;-) They even tried to get away with parachuting in an architect team to tell my own people how to use my company's own product!

Result: a project that with a bit of proper defect analysis would have probably come in on budget a couple of months late ended up (you probably guessed this bit) cancelled six months later having overspent the budget by 100%. Our consulting friends had 50 billing heads at the peak and by my reckoning walked away with around 70% of the project spend, having contributed, at best zip, and at worst (IMHO) being the prime cause of project failure.

Moral: go and work for a big consulting firm, clearly ;-)

Anyway, take a look at Julian's article which says it all much better than I can...

Following my post on JMX and the Grid which got picked up by The Server Side and by Nati Shalom's blog here I thought I'd add some more brief thoughts on another complimentary JMX pattern we've used in conjunction with grid applications.

The original post talks about collating client-side access to a distributed population of JMX MBeans that comprise the application. In essence the technique described in that post to use a JavaSpace (or other rendezvous technology) to act as a point of registration and lookup. This gives the client-side access to (say) a list of MBeans for each instance of a given type component wherever its running in the grid and the ability for the agent to communicate with any MBean to get/set attributes, invoke management operations or hear notification events.

The client-side (or "agent") of JMX is by nature pretty dumb. Generally the agent uses metadata info about the MBean to generate a UI on the fly. Although it's possible to write custom JMX agents for your application (and we do that), to make sure your management MBeans will work with any JMX agent you really have to design to the lowest common denominator agent.

So let's consider the use-case where our MBeans are collecting stats about (say) our application's performance: average task execution time, latency etc. Stats can be produced for each individual component and made available via the MBean, but we also want to be able to see an aggregated view statistics for the application as a whole.

Aggregation

To deal with the dumb JMX agent we really need to collate and aggregate the data server-side. I'm not going to dwell too much on the approach to this, other than to say aggregation might be done in one of three ways:

Writing an server-side component that collects stats from individual MBeans and aggregates. In this case, using the approach outlined in my previous JMX piece might be handy
Tapping into the underlying components using some application-specific API and aggregating from there
Having the components publishing their stats into a JavaSpace and having an aggregating component attached to the space to perform the aggregation.

Focussing on the last of these approaches for a moment, using the space as a rendezvous point for collation and aggregation has some merits: publication of stats as POJOs to the space is easy and listening to those publications to trigger aggregation is also simple to implement.

Publication to JMX

Regardless of the approach to aggregation, we also need a technique for making the aggregated stats available to dumb JMX agent. The aggregating component needs to expose an MBean to provide access to the aggregated data values. In a simple application these can be held as in-memory values within the aggregating component. However, to deal with large data volumes and to provide fault-tolerance we prefer the following approach:

Aggregating components write the results back to the JavaSpace
A stateless component provides an MBean that acts as a facade to the aggregated data, which is actually fetched on demand from the space

Using the GigaSpaces product we can rely on the space itself to manage live reliable backup of our aggregated data and the Service Grid to host and maintain our stateless aggregated MBean facade.

Summary

Although in our simple aggregating stats use-case we might not care about dropping data or fault-tolerance, there are many real-world examples where we would care far more about these issues. The bare-bones architecture of using the space as both a rendezvous point and a safe holding repository, with access via stateless service components applies well.

One of the reasons I'm a fan of GigaSpaces and space-based architectures is that a number of architectural choices that are traditionally hard-wired: transactional/non-transactional, sync or async replication can be changed through configuration only. This enables common design patterns (and therefore components) to be applied to a wide range of application problems, by enabling the data integrity/performance equation to be tweeked at a late stage of application assembly.

I know this last paragraph is a bit of a leap from the initial topic, but I'll return to this theme in later postings which discuss other use-cases where data integrity and fault-tolerance are a significant issue, in an attempt to make it stand up.

Ramble On

Monday, 1 December 2008

Julian's Box of Puppies

Friday, 14 November 2008

Data Aggregation via JMX and the Grid

Previous Items

About Me