Wednesday, March 07, 2007

Atom Publishing Protocol, Where is the Batch Semantics?

Batch support has been proposed (PaceBatch) and heavily discussed in the Atom Publishing Protocol Working Group but it did not make it.

Google needed it for its Google Data API and went ahead with their PaceBatch idea. There is no need to explain further why batch support is needed. PaceBatch has good rational on it. Google has a real world use case for it, Google Base.

But, in my opinion, the PaceBatch/Google-Data solution has some critical issues.
  • It mixes up the transport layer (the POST/PUT/DELETE operation and HTTP headers) with the data layer (the atom:entry element).
  • An Atom entry has to be different (has to include transport information) just because it submitted in batch.
  • To support batch processing code has to be rewritten at protocol and data handling level.
  • Large batch submissions cannot be handled with XML DOM parsers.
So, how about the following alternative (which, by the way it is suggested in the PaceBatch in the Limitations section) ?

[I have not considered HTTP 1.1 Pipeline as " Pipelining" section clearly discourages the use of pipelines for non-idempotent methods and the recommended behavior is to serialize request/response-s if non-idempotent methods are used, thus taking us back to square one]
  • Use MIME Multi-Part (RFC 2046) document to post a batch.
  • Each part has a headers section, which it would be used to mimic the HTTP header section of a single operation using an 'Atom-Operation' header to indicate the operation for the entry in the part.
  • The data section of each part would be the Atom entry to insert/update/delete (If delete data section is empty).
  • The response it would be a MIME Multi-Part document of the same number of parts as the request, each part containing the operation status of the corresponding request part plus other headers and the echoed Atom entry if necessary.
This alternative addresses the 4 issues I've described above:
  • The transport and data layers remain separate from each other.
  • It is transparent for an Atom entry if it is sent as part of a single operation or a batch operation.
  • Only the code handling the protocol level has to be rewritten, data code handling remains the same.
  • An XML DOM parser can operate on each entry without having to process the full batch.
While this adds MIME Multi-Part to the mix it is something it will be buried inside of the batch implementation with no exposure to the application developer.

Further thoughts:
  • An HTTP header on the request could indicate if the semantics of the batch submission is full or partial failure.
  • An HTTP header on the request could indicate that response could be a simple HTTP OK if all entries are processed successfully (not to respond with an HTTP status for each one).
  • A correlation header in the MIME Part could be used to correlate an entry with a status response.
  • Members with other content type (such as images) would be just another MIME part.
  • A correlation header in the MIME Part could be used to correlate entries with members of other content type.