Wednesday, March 07, 2007

Atom Publishing Protocol, Where is the Batch Semantics?

Batch support has been proposed (PaceBatch) and heavily discussed in the Atom Publishing Protocol Working Group but it did not make it.

Google needed it for its Google Data API and went ahead with their PaceBatch idea. There is no need to explain further why batch support is needed. PaceBatch has good rational on it. Google has a real world use case for it, Google Base.

But, in my opinion, the PaceBatch/Google-Data solution has some critical issues.
  • It mixes up the transport layer (the POST/PUT/DELETE operation and HTTP headers) with the data layer (the atom:entry element).
  • An Atom entry has to be different (has to include transport information) just because it submitted in batch.
  • To support batch processing code has to be rewritten at protocol and data handling level.
  • Large batch submissions cannot be handled with XML DOM parsers.
So, how about the following alternative (which, by the way it is suggested in the PaceBatch in the Limitations section) ?

[I have not considered HTTP 1.1 Pipeline as "8.1.2.2 Pipelining" section clearly discourages the use of pipelines for non-idempotent methods and the recommended behavior is to serialize request/response-s if non-idempotent methods are used, thus taking us back to square one]
  • Use MIME Multi-Part (RFC 2046) document to post a batch.
  • Each part has a headers section, which it would be used to mimic the HTTP header section of a single operation using an 'Atom-Operation' header to indicate the operation for the entry in the part.
  • The data section of each part would be the Atom entry to insert/update/delete (If delete data section is empty).
  • The response it would be a MIME Multi-Part document of the same number of parts as the request, each part containing the operation status of the corresponding request part plus other headers and the echoed Atom entry if necessary.
This alternative addresses the 4 issues I've described above:
  • The transport and data layers remain separate from each other.
  • It is transparent for an Atom entry if it is sent as part of a single operation or a batch operation.
  • Only the code handling the protocol level has to be rewritten, data code handling remains the same.
  • An XML DOM parser can operate on each entry without having to process the full batch.
While this adds MIME Multi-Part to the mix it is something it will be buried inside of the batch implementation with no exposure to the application developer.

Further thoughts:
  • An HTTP header on the request could indicate if the semantics of the batch submission is full or partial failure.
  • An HTTP header on the request could indicate that response could be a simple HTTP OK if all entries are processed successfully (not to respond with an HTTP status for each one).
  • A correlation header in the MIME Part could be used to correlate an entry with a status response.
  • Members with other content type (such as images) would be just another MIME part.
  • A correlation header in the MIME Part could be used to correlate entries with members of other content type.
References:

Labels:

Thursday, August 10, 2006

JSR286 (Portlets 2.0), First Impressions

As one of the spec-leads of the Portlet Specification 1.0 I'm glad to see a new version is in the works taking care of a much needed revamping.

I did a quick read of the draft while coming to work (I take the school bus) and following are my first impressions of it.

Before going to the point, I'm fully aware that this is an early draft and that some of my coments will be addressed once things are iron out. And please keep them coming with track changes on between drafts.

AA-1) PLT.5.2.3 End of Service (P28/L8..29): I'm not sure this is a good idea, this breaks with the previous assumption that the init() method is called once sometime before the first request is sent to it, and that the destroy is called when the application is going down. This will most likely create some funny situations in cases things are coded with this assumption.

AA-2) PLT.5.4 Request Handling (P32/L12..13): "In addition to these portlet initiated events the portal/portlet container may issue portal/portlet container specific events.". I wonder what kind of events the portal/portlet container would generate that are useful for the portlet application. I was under the impression that eventing was for 'inter-portlet communication'. If a portal/portlet-container wants to provide some info to the portlet this can be done via request attributes. It sounds like there would be another programmable API somewhere in the portal that would receive/generate events from/for portlets, if that is the case please spell it out.

AA-3) PLT.5.4 Request Handling (P32/L15..25,P33/L1..4): a portlet serving a resource? That looks like the Servlet.service() method to me. A portlet is a portlet, not a mutant portlet-servlet. I'd rather go for defining a path in the portlet configuration (deployment descriptor) that indicates the path for a resources servlet and that servlet when invoked by the container should receive the context of the portlet as request attributes (note that this can easily be implemented as a servlet filter).

AA-4) PLT.5.4.5 GenericPortlet (P36/L25...33): Not sure using annotations to avoid a switch or if then else in a single processEvent() method is a good idea. From the container/portlet API contract perspective I'd think a well defined set of methods is better than several ones with arbitrary names. I'd rather implement in the GenericPortlet a dispatching mechanism as render() does with doView(), doEdit() and doHelp().

AA-5) PLT.11.1.1.2 Render Request Parameters (P56/L19): talking about 'non-shared parameters', they have not been defined yet, things should be re-arranged or it should be mentioned that they are defined in the next section.

AA-6) PLT.11.1.1.3 Shared Render Parameters (P58/L3..34): I don't understand the reason for depriving portlets of shared-parameters only because they were not the target of a render URL? It seems at odds with the idea of shared-parameters, shared but only sometimes? I guess this is because pre-caching of generated content and portlet URL, but it is definitely strange and not intuitive to the developer. Either share parameters are there always or they are not, but only sometimes, under certain conditions, it will be highly error prone. Also, another thing with shared parameters, they are independent of the portlet and the portlet URL that set them, they are sticky, what happens when there are race conditions, 2 portlets setting different values for the same shared parameters, one portlet setting them the other deleting it?

AA-7) PLT.11.2 ClientHttpRequest (P62/L1..7): Serving resources based on what is described before in the spec can be done only via GET calls, why this interface that is to be used for resources has methods for uploading data?

AA-8) PLT.11.5 EventRequest Interface (P63/L32..38): Event payload is defined as a serializable object or as JAXB mapping. Wouldn't be more consistent to use the same paradigm as for request/response IO, streams, and on top of it, depending on the application needs the proper reader/writer is used. I guess still would make sense to include a alternate signature that sets and gets a Java object, that when eventing among collocated portlets, could share objects by reference (still this could cause problems if the object in question is distributed to many portlets simultaneously and it is a mutable object, not to mention classloader issues when being across portlet-apps).

AA-9) PLT.12. StateAwareResponse Interface (P66/L14..19): Not clear what is this about but it seems to me that the request/response interfaces are being broken in to many specialized interface layers thus cluttering the API and making more difficult for a developer to look at it in the javadocs (yes, stupid as it may sound think how often developers looks at the javadocs).

AA-10) PLT.13 Resource Service (P72..74): same as #AA-3. Again, a servlet should be used for this, not need to define a new (and handicapped) way of doing it.

AA-11) PLT.17.5 Shared session attributes (P92/L7..P93/L19): P92/L16 defeats the purpose of this, what is the point then? Not that I like it, until now portlet container implementations could get a free ride on high availability, fail over and scalability by leveraging the underlying servlet container. By adding shared session data across web-apps (portlet-apps) the portlet container implementation must implement its own mechanisms for such functionality and in doing so it will have to do it for the regular session, including servlet session, in order to ensure data consistency, else you may end up in a situation that the data failover strategy is inconsistent.

AA-12) PLT.17.6.1 Process action and process event phase (P95/L16..18): atomic transactions? there is not database in here. I guess the intention is to say that session operations must be thread-safe.

AA-13) PLT.18.1 Obtaining a PortletRequestDispatcher (P97/L20..21): again, this is reinventing a handicapped servlet container within the portlet container, let a servlet do the resource serving.

AA-14) PLT.19.2.1 Filter Lifecycle (P1004/L(no more numbers in the spec): having a single doFilter() method handling all request types (action, event, render, resource) is will make filters (unnecessary) error prone, I'd rather have one filter method per type of request.

Wednesday, May 03, 2006

ROME.Mano, a few more feed handlers for the toolbox (echo-uploader, null and sort)

Echo handler echoes (returns back) a feed POSTed to the Mano servlet. An interesting use of the Echo handler is that together with a handler like the File handler it can be used to upload and store feeds for further retrieval (a *Feed Management System* anyone?).

Null handler discards the feed from the response and returns no content.

Sort handler sorts entries by title or published date, in ascendent (default) or descendent order.

The ROME.Mano toolbox has now 11 handlers. Some of them ready for real use, others are a proof of concept to show possibilities (a proper implementation is needed for serious usage).

The handlers in the toolbox follow the same philosophy of Unix commands, they do one thing, usual something simple, and the real power is when you combine them. It is not an accident that the way of invoking them looks like Unix commands piping, it was a deliberate design decision.

Tuesday, May 02, 2006

ROME.Mano, tracking feeds views and clicks

The ROME.Mano toolbox has a new handler (and pairing servlet), TrackingFeedHandler.

When the tracking handler is in the handler chain it rewrites the feed site URL and the entries link URLs to point to a tracking servlet. Then, when a user clicks on the links from within his/her reader application the tracking servlet logs the click action (plus some user information and the original URL) and redirects the user to the real URL. Pretty much what java.blog does when displaying the feeds, the difference is that this is done in the feed itself when served by Mano.

For some serious use some of the methods in the Tracking handler and servlet should be rewritten.

I've added to the documentation a small section explaining this and one of the examples, at the bottom of the documentation uses this new handler.

More handlers coming soon.