Ramble On: 2008

Monday, 1 December 2008

Julian's Box of Puppies

I was catching up on some wreading (web-reading - geddit?) the other day and stumbled across Julian Browne's blogsite. Julian was the Head of Architecture at Virgin Mobile that commissioned my company to build a space-based reliable transaction layer into the web platform they were putting together at the time. I did a write-up which can be found on the PSJ site in the 'whitepapers' section. There's also a more Giga-centric write-up here.

Anyway, towards the end of Julian's time at VM he was experiencing the joys of being consulted at by one of the big four consultancies. Julian's blog has a very droll take on the experience of being involved in externally-driven "change programmes" - that's 'change' as in you won't have any left after they've picked your pocket.

Julian's blog uses a child-friendly Scoobie-Doo analogy to make his point. Having been a spectator at a few of these events I'd characterize it more as gang rape. My heart has sunk on more than one occasion when I've been working for a client with a few teething problems but generally stumbling along in the right direction (and that's all we can really hope for, kids) goes down the death by consultancy track.

Reading Julian's posting made me recall one project I worked on for a now defunct investment bank and was a classic case in point. The company I worked for at the time was working with them to implement a system that the previous time around with another bank had taken 7 of our people and 5 of theirs and done in a respectable timeframe. Within minutes of engaging said consulting firm the sky became black with parachuting consultant types. Hmmm, the metaphors are getting mixed as the anger builds;-) They even tried to get away with parachuting in an architect team to tell my own people how to use my company's own product!

Result: a project that with a bit of proper defect analysis would have probably come in on budget a couple of months late ended up (you probably guessed this bit) cancelled six months later having overspent the budget by 100%. Our consulting friends had 50 billing heads at the peak and by my reckoning walked away with around 70% of the project spend, having contributed, at best zip, and at worst (IMHO) being the prime cause of project failure.

Moral: go and work for a big consulting firm, clearly ;-)

Anyway, take a look at Julian's article which says it all much better than I can...

Friday, 14 November 2008

Data Aggregation via JMX and the Grid

Following my post on JMX and the Grid which got picked up by The Server Side and by Nati Shalom's blog here I thought I'd add some more brief thoughts on another complimentary JMX pattern we've used in conjunction with grid applications.

The original post talks about collating client-side access to a distributed population of JMX MBeans that comprise the application. In essence the technique described in that post to use a JavaSpace (or other rendezvous technology) to act as a point of registration and lookup. This gives the client-side access to (say) a list of MBeans for each instance of a given type component wherever its running in the grid and the ability for the agent to communicate with any MBean to get/set attributes, invoke management operations or hear notification events.

The client-side (or "agent") of JMX is by nature pretty dumb. Generally the agent uses metadata info about the MBean to generate a UI on the fly. Although it's possible to write custom JMX agents for your application (and we do that), to make sure your management MBeans will work with any JMX agent you really have to design to the lowest common denominator agent.

So let's consider the use-case where our MBeans are collecting stats about (say) our application's performance: average task execution time, latency etc. Stats can be produced for each individual component and made available via the MBean, but we also want to be able to see an aggregated view statistics for the application as a whole.

Aggregation

To deal with the dumb JMX agent we really need to collate and aggregate the data server-side. I'm not going to dwell too much on the approach to this, other than to say aggregation might be done in one of three ways:

Writing an server-side component that collects stats from individual MBeans and aggregates. In this case, using the approach outlined in my previous JMX piece might be handy
Tapping into the underlying components using some application-specific API and aggregating from there
Having the components publishing their stats into a JavaSpace and having an aggregating component attached to the space to perform the aggregation.

Focussing on the last of these approaches for a moment, using the space as a rendezvous point for collation and aggregation has some merits: publication of stats as POJOs to the space is easy and listening to those publications to trigger aggregation is also simple to implement.

Publication to JMX

Regardless of the approach to aggregation, we also need a technique for making the aggregated stats available to dumb JMX agent. The aggregating component needs to expose an MBean to provide access to the aggregated data values. In a simple application these can be held as in-memory values within the aggregating component. However, to deal with large data volumes and to provide fault-tolerance we prefer the following approach:

Aggregating components write the results back to the JavaSpace
A stateless component provides an MBean that acts as a facade to the aggregated data, which is actually fetched on demand from the space

Using the GigaSpaces product we can rely on the space itself to manage live reliable backup of our aggregated data and the Service Grid to host and maintain our stateless aggregated MBean facade.

Summary

Although in our simple aggregating stats use-case we might not care about dropping data or fault-tolerance, there are many real-world examples where we would care far more about these issues. The bare-bones architecture of using the space as both a rendezvous point and a safe holding repository, with access via stateless service components applies well.

One of the reasons I'm a fan of GigaSpaces and space-based architectures is that a number of architectural choices that are traditionally hard-wired: transactional/non-transactional, sync or async replication can be changed through configuration only. This enables common design patterns (and therefore components) to be applied to a wide range of application problems, by enabling the data integrity/performance equation to be tweeked at a late stage of application assembly.

I know this last paragraph is a bit of a leap from the initial topic, but I'll return to this theme in later postings which discuss other use-cases where data integrity and fault-tolerance are a significant issue, in an attempt to make it stand up.

Monday, 3 November 2008

JMX For Grid-Based Applications

This blog talks about a (fairly) simple technique that solves the problem of how you get a unified view of JMX Management Beans in a distributed application. This is proving useful in a number of application projects we are doing at PSJ where the application MBeans, for example worker beans are distributed around the network when deployed in the GigaSpaces Service Grid. The technique describes how a new protocol can be added to enable JMX agents and JMX servers to find each other in a network deployment. The protocol uses the GigaSpace as the rendezvous point, but the approach can be easily adapted to work with other network registration and rendezvous technologies.

A Brief History of JMX

JMX has been around for a long time as one of the core APIs in J2EE and recent versions of Java have seen it incorporated into the JVM to provide memory and other basic stats. The basics of JMX are pretty simple: you instrument your applications by providing one or more management beans MBeans. MBeans provide read-only and read-write attributes, and operations that provide information and enable the application's management characteristics to be controlled. MBeans are published to the world by registering them with an MBeanServer. MBeans are interacted with by a separate agent, which finds MBeans in the MBeanServer and provides a UI to interact with them. Often this UI takes the form of a generic user interface based on properties provided by the MBean, or as specific to the application as required.

The original JMX specs were charmingly vague about where you might find an MBeanServer to register with or how you'd connect to one from the agent side. In more recent times, the JVM provides a default MBeanServer inside the JVM. As well as providing management capability for the JVM this MBeanServer can act as a repository for application MBeans. The jconsole provided with Java 1.6 provides a decent, if primitive, GUI for you to see the MBeans registered with the JVM's MBeanServer and interact with them.

JMX JSR-160 Connectors

JMX also now provides connectors that facilitate remote connection between agent and server, and come in two flavours: server-side and client-side. The server-side connector enables you to set up a connection channel to talk to the MBeanServer remotely, for example by exporting an RMI stub. The client-side connector provides the means of binding to the server-side connector and establishing a connection, for example via JNDI lookup of the server-side connector's RMI stub.

If you use Spring, you can simply declare connectors in Spring configuration, as follows:

<bean id="serverConnector"
class="org.springframework.jmx.support.ConnectorServerFactoryBean">
<property name="objectName" value="system:name=spaceconnector"/>
<property name="serviceUrl”
value="service:jmx:rmi://localhost/jndi/rmi://localhost:1099/test"/>
<property name="environment">
<map>
 <entry key="jmx.remote.jndi.rebind" value="true" />
</map>
</property>
</bean>

The client-side connector is declared as:

<bean id="clientConnector"
class="org.springframework.jmx.support.MBeanServerConnectionFactoryBean">
<property name="serviceUrl"
value="service:jmx:rmi://localhost:1099"/>
</bean>

JMX Issues Specific to the Grid

For monolithic applications that run in a single JVM this architecture works fine, but when applied to distributed applications running in the grid we have two additional problems:

How does the client-side find all the MBeans that comprise the application?
How can it interact with the distributed parts of the application from a single client?

This has become a real-world problem for PSJ in implementing grid-deployed applications within the GigaSpaces Service Grid and other grid fabrics. Here's the nub of the solution we came up with to solve these problems. In essence the solution has two parts:

Provide a means of binding the MBeanServers from individual JVMs into a community that represents the application on the grid.
Collate the MBeanServers together to provide a single client-side federated connector.

Communities of MBeanServers

Fortunately JMX provides a very open means of extending the protocols that are supported in bringing JMX server and client connections together. By providing an additional protocol we can control the server-side connector's registration process and the client-side connector's binding mechanism. There are various options to achieve our ends here including using naming hierarchies in JNDI, JINI lookup groups, but the one we settled on uses a JavaSpace to act as a rendezvous point for servers and clients. The rationale here is partially simply expedience: the applications to which we've applied this pattern already use a GigaSpace to share state and so in some senses the community is bound by the fact that all components comprising the application point at the same GigaSpace instance. The second rationale is that adding space-based registration and client-side lookup using the GigaSpace is very quick and easy to write, using simple POJOs to represent the registration.

Adding a Space-Based Server-Side JMX Connector

The standard RMI-based JMX URL can be changed by replacing the service.jmx.rmi part with service:jmx:space to indicate that we want to use a space protocol. The JMX spec lets us add handler classes to deal with the "space" protocol referenced in the URL on the server and client sides. On the server-side we need to provide a specific ServerProvider class implementing JMXConnectorServerProvider. The ServerProvider actually just piggy-backs on the existing RMI one and in addition to RMI stub registration places an entry in the JavaSpace with the RMI connection URL needed by the client. To finish off, using the Spring approach we simply need to declare the server connector to use the new protocol:

<bean id="serverConnector"
class="org.springframework.jmx.support.ConnectorServerFactoryBean">
<property name="objectName" value="system:name=spaceconnector"/>
<property name="serviceUrl"
value="service:jmx:space://localhost/jndi/rmi://localhost:1099/test"/>
<property name="environment">
<map>
  <entry key="jmx.remote.jndi.rebind" value="true" />
  <entry key="jmx.remote.protocol.provider.pkgs"
    value="com.psjsolutions.sflib.jmx.protocols"/>
  <entry key="space"><ref bean="space"/></entry>
</map>
</property>
</bean>

Notice that we've had to reference our protocol support package in the environment map for the connector factory. We can also place protocol-specific properties in the environment - in this case the space we want to use to hold the entries.

The Client-Side Connector

I said earlier that there are two problems to overcome in applying JMX to the grid: one being rendezvous/binding and the other being obtaining a collated view of all the MBeans out there. Both these issues are addressed by the client-side connector. First off we need a client-side connector that can look in the space to find all the MBeanServer connection details for the networked community. This is pretty easy. In symmetry with the server-side provider all we need to do is to write a ClientProvider that understands the space protocol and provides the agent with a client connector. In Spring this looks like:

<bean id="clientConnector"
class="org.springframework.jmx.support.MBeanServerConnectionFactoryBean">
<property name="serviceUrl"
value="service:jmx:space:///jini://*/*/${javaspace.name}"/>
</bean>

All we've done here is to replace the RMI-based URL with one that specifies the space protocol and provides the URL of the space. The ClientProvider parses the service URL extracting the space URL and using it to bind to the space and extract all the server connector details. This brings us to the final part, which is to provide a collated "virtual" server connection that sits between the client-side user code and the set of MBeanServers.

Federating the Client-Side Server Connection

As far as the client-side code is concerned the connector it gets is an object that implements javax.management.MBeanServerConnection. This API is actually pretty straight-forward, enabling MBeans to be found by naming and query patterns and using the found MBeanName handles to get/set attributes and invoke operations. In Java 7 there may well be a formally supported means of cascading or handing on these requests, but as of the time of writing there's no capability out of the box. We therefore implemented a FederatedMBeanServerConnection class that picks up a number of MBeanServer connections from the space, connects to them and then delegates operations to the set of servers, effectively acting as a multiplexer.

In Summary

As we like to carry around these common solutions to common problems from job-to-job, we've added the capabilities described here to our foundation libraries that we often use to implement client engagements. By seperating the federation/multiplexing capability from the space protocol we can use this approach in a number of different distributed architectures, and will probably add protocols as the need arises. The beauty of the approach is that neither the client or server side code that use the connectors know what's being done under the covers. It's all abstracted into protocol URLs and therefore simple configuration changes.

Are there any gotchas? The one we've hit so far is possible ambiguity in naming MBeans. In many senses the set of MBeanServers in the grid can be thought of as a single virtual networked MBeanServer. However, whilst you can't register the same named MBean twice with a single MBeanServer, there's nothing to stop you registering the same named MBean with different MBeanServers. In fact in a grid-style environment where a given application unit is deployed as many replicated instances it's quite likely that you will hit this issue. Why is this important:? Well inability to enforce unique names can lead to ambiguity when we use the virtual MBeanServer as multiple MBeans with the same name can be found in the virtual server. We can work around this to a large extent by collating MBean queries and tagging the owning server in the MBeanName handed back through the client connector. This disambiguates things in most of the use cases including attribute get/set and operation invocation. However if the client code asks for a specific MBean by name then all we can do is return the first one we encounter. This is not ideal, but in practice the way to solve this is to use naming strategies when registering MBeans server-side. Spring has the idea of a pluggable NamingStrategy for MBeans that are auto-created by Spring. Using the instance-based NamingStragey resolves this problem for Spring Mbeans and, if it is a real issue, this approach could also be adopted by application code that is using explicit creation/registration of MBeans.

Thursday, 30 October 2008

OSGI, GigaSpaces and Buddy Classloading

I've been doing a reasonable amount of GUI development using Eclipse Rich Client Platform recently. This is a great framework and getting better. In the Eclipse 3.1 dark days I got a bit burned trying to build applications, largely because the development support for RCP within Eclipse was itself a bit flakey. There's nothing more galling than spending half a day trying to diagnose a problem only to find that restarting Eclipse clean sorts it all out. But I digress...

Eclipse RCP and Plugin Architecture

I won't wax too lyrical about the benefits of the Eclipse plugin model. If you are a Java developer you probably already understand the great value of being able to drop in plugins from around the web and rely on Eclipse to run with simultaneous multiple versions of the same jar. At my company, PSJ, we've been developing a number of operations console GUIs that interact with application services and the GigaSpaces implementation of JavaSpaces. The plugin model provides a great basis for us to build small UI plugins that interact with different service components we've developed over the years. Using Spring we're then able to wire together our re-usable plugins and UI pieces with custom components written in the context of an engagement into a customer-specific GUI.

Eclipse's ability to manage multiple versions of classes concurrently is ultimately down to the multiple classloader OSGI mechanism that underpins the plugin architecture. So far so triffic, however if you've ever tried to use plugins with technologies like Spring, JavaSpaces even Log4j, you'll have encountered ClassNotFoundException problems galore. Reading around the web when I first hit this a couple of years ago the recommendations were to hack in and manipulate the classloaders in application code.

ClassLoader loader = Thread.currentThread().getContextClassLoader();
ClassLoader thisClassesLoader = this.getClass().getClassLoader();
Thread.currentThread().setContextClassLoader(thisClassesLoader);
// do your stuff here
Thread.currentThread().setContextClassLoader(loader);

Not only was this pretty horrible, but it's very difficult to make bombproof as you've got to track down all points where the issue can occur and problematic when using technogies like GigaSpaces that use internal thread pools even for synchronous calls and deliver events to you on their own threads.

In revisiting this topic again recently in some work for a client, I'm grateful to my colleague John Nichol who has shown me the one true way: buddy classloaders. Like many things with Eclipse RCP the documentation of this is vanishingly thin and largely what I call "non-doc" - you know the kind of thing:

"To press button B, click the B button"

Arggh! Anyway after trawling around the web and single-step debugging within the OSGI class resolution code, I have mined the following nuggets of true knowledge.

Why is ClassNotFound in the First Place?

If you are using technologies like Spring or GigaSpaces, they both need access to the classes in your application code - in one case to instantiate instances and wire them together, and in the other to store instances in a sharable in-memory location on the network. If you put the Spring and GigaSpaces jar files inside your plugin you can't easily share instances of objects across plugins. The solution to this is to create independent plugins for Spring, GigaSpaces, Log4j etc and then build dependencies between your higher level plugins and these lower level ones. So far so good, but this is where you can hit ClassNotFoundException. Let's say Spring needed to instantiate an instance of your application class Foo. Well Foo isn't in the Spring plugin and you're screwed. You could also wodge all the application code, Spring, Gigaspaces jars together into one big plugin, but then really you've lost the advantage of component separation you were trying achieve in the first place. You'd also be back to manually manipulating the thread context classloaders. So, you're screwed, right...

Buddy Classloading

Fortunately to get you out of this jam Eclipse has a mechanism called buddy classloading. This lets you add directives to the plugin manifests to selectively delegate classloading to other friendly plugins. Add the following line:

Eclipse-BuddyPolicy: registered

To the lower-level Spring, GigaSpaces plugin manifests. This tells those plugins that they can delegate class-loading to any plugin that registers with them. You also need to add directives to your application-level plugins to perform registration with lines in their manifests like:

Eclipse-RegisterBuddy: org.springframework,com.gigaspaces, org.apache.log4j,org.apache.commons

Going back to our Foo class example earlier, when the Spring plugin tries to instantiate a Foo instance it will fail to resolve the class from its own plugin and will then attempt to resolve from any buddies that it knows about. Foo therefore gets resolved from the application plugin that is registered with the Spring plugin. This happens no matter what thread is attempting to resolve the class and doesn't suffer from the holes that context classloader manipulation suffered from.

Direction of Buddy Registration

One point to make clear here (coz it tripped me up when I was trying to get my feeble brain around it all) is the direction of registration. I had originally thought from reading the non-doc that registration was spring, gigapsaces plugins registering with the application plugin. Actually it's the reverse. Application plugins register with spring, gigaspaces because they want their classes to be accessible to those generic technologies for instantiation purposes. The confusion arises because only the application plugin can specify which generic plugins it wants to register with and the manifest entry Eclipse-RegisterBuddy implies (to me at least) that the list that follows the entry is registering with the current plugin.

Welcome

Hello World! (as we geeks have it).

I eventually decided to add my voice to the growing clamour of the web-literate chattering classes. This blog is probably going to end up with an unholy blend of:

Technical musings on my professional life (Java, grid, enterprise architecture, trading systems)
Authoritative statements that sound like they could be facts but probably aren't
Random personal predjudices

Enjoy...

Ramble On