What’s the problem is often the problem…

Preface

Recently I’ve been involved in a number of ‘chats’ which have led me to revisit my understanding on what errors are versus exceptions and when each could and should be used. The wider context of this is based around my own belief that recoverable or application specific abnormal conditions should be returned to clients in the form of exceptions. Typically I’ve heard errors and exceptions referred to interchangeably, with some difficulty arising when the conversation covers ‘system failures’, at which point these are assumed to be ‘exceptions’ (versus everything else being implicitly deemed as errors). My own understanding here differs from this naming almost diametrically and I think this clarification is becoming especially important, as my own development is moving from that steeped in web frameworks (and their own terminology), to having more of a ‘Service’ or ‘Remote Component’ focus with web based front ends communicating asynchronously via XML or JSON.
Broadly speaking, ‘typical/traditional’ n-tier web architectures tend towards the following design:
Web client [browser] <-> web framework [which attempts to abstract various things such as request/response handling, session handling, 'error' handling, and implement an MVC pattern: e.g. Struts; WebWork; SpringMVC; Velocity], <-> Application modules [which attempt to centralise the application logic handling to serve disparate clients and often are responsible for caching, data connection pooling, authentication and authorisation etc: e.g. Spring beans, EJBs] <-> Persistence store [e.g. SQL database or NoSQL system of record].
This design also tends to err on the side of synchronous request-response handling that is abstracted to some extent by the web framework in use, with many web containers implementing a thread-per-request or some flavour of continuation based thread pooling and request processing. Personal experience also suggests that such applications carry out ‘all-or-nothing’ request processing, whereby constituent parts of the request are not decomposed into discrete units and handled either concurrently (when possible) nor scheduled for later processing. In this scenario requests mark an artificial transaction boundary, whereby impediments to complete processing will return a list of errors to the client. Common patterns I have encountered here try to decompose the monolithic processing into smaller units, (for a combination of reducing processing latency, better user experience and resilience), by either submitting data via multi-page wizards [which can be ‘chatty’ end-to-end by sequentially processing each stage as it is submitted, or less so, by session attribute gathering before performing the processing as a bulk], or by paging results back to clients on read requests. Naturally, there are still outstanding questions of how much resilience this approach can add (depending on the implementation) and whether this takes advantage of any parallel execution capabilities.
Increased network speeds, more powerful web client capabilities and a proliferation of client libraries, (which have lowered the barrier of entry in building asycnhronous browser based clients), have had a profound effect on the nature of web based application development. Use of XML and JSON as de facto serialisation transports between client and server have also lessened the need to abstract the HTTP request-response model (to a certain extent at least). Hence, it appears we are slowly moving from a world where the prevalent paradigm of multi-phase, synchronous, request bound tx-context processing is switching to an emergent loosely coupled, asynchronous, multi-staged one.

NOTE: Other designs I’ve seen used in ‘Internet-scale’ web sites include client side widget based frameworks (e.g. iGoogle, Netvibes), which are effectively an implementation of the architecture I have outlined and ‘render farm’ based websites, where separate processes may look to snapshot rendering pages server side as part of a scheduled task or are user initiated and generate, render and pre-cache a set of pages server side. Neither of these architectures directly map to the request -> generate -> render -> response model I have initially described.


General paradigm shift, from using web frameworks that abstract request-response model and error handling to clients, to using richer web clients that can call independant web services directly and asynchronously.

So, given this new direction in web development, how does this impact error handling between services and clients ?
First let’s try to map out what the current state of error handling is by most web frameworks, before a reminder of what exceptions are and how they should be applied. Finally, let’s consider appropriate usage according to varying contexts.
Typically, web frameworks sanitise non-system failures (i.e. failures that are outside of the direct application scope) into a collection of error messages, (sometimes with an associated id for the error). Numbering and format of these messages is usually on an ad hoc basis, unless this is dictated by an internal policy or standard.
As an aide memoire to the different types of exceptions, here’s a quick summary:
– There are typically 2 types of exception: checked and unchecked
– Within the subset of checked exceptions there is further division between Errors and Runtime Exceptions
– Checked exceptions signal abnormal conditions that client programmers should deal with.
– Checked exceptions should adhere to the catch or specify rule
– Most unchecked throwables, (i.e. subclasses of Error and RuntimeException) are problems that would be detected by the JVM
– Errors usually signal abnormal conditions that you wouldn’t want a program to handle and from which the application cannot reasonably be expected to recover, (e.g. a H/W failure such as OutOfMemoryError). As such, it is usually reasonable that the thread of execution should terminate.
– Runtime exceptions are usually exceptional conditions that are internal to an application and could reasonably be thrown
Exceptions are either checked or unchecked. Checked exceptions should implement ‘catch or specify’ semantics and should return the information to the client. Unchecked exceptions are either errors (exceptional, typically system, circumstances from which an application cannot recover and which shouldn’t return detail to the client) or runtime exceptions (exceptional conditions internal to the app, which may return detail to the client).

So where are we now ? To my reckoning, problems that have traditionally been deemed ‘Errors’ in web frameworks are actually checked exceptions. Practically speaking, these errors are assumed to be multifarious and many (as indicated by their usual envelopment within a Collection type) and typically have ad hoc definitions and format. Furthermore, it’s difficult to see what sense Collections of ‘errors’ make in an asynchronous processing capacity (i.e. if processing is deferred for later execution, what sense do error return values make ?)
So, what are the usual contexts in which errors and exceptions are used in a web scenario and can their usage dictate the design decision.
  • In-process – such as when calling DI’d spring beans. Of all others, this scenario probably benefits the most from the ‘proper’ use of exceptions to propagate errors up the call stack and the separation of error handling code from normal code. Obviously, there is the case for performance here (it’s often mooted that unwinding the call stack has performance implications), but I’d argue that in 80% of cases ‘correctness’ would outweigh this concern.
  • Remote – specifically I’m thinking of web service calls here, but RPC and web requests would also fall into this category. Within the field of web services, there are clearly defined strategies for handling exceptions as WebFaults which are then returned to the client. For a more RESTful style of HTTP method invocation, there are also well defined HTTP status codes that can report success or failure to clients. From a web framework perspective, wrapping exceptions into Collections of errors makes sense when using a framework that abstracts the request-response cycle from the developer and provides utilities for thus wrapping exceptions. The drawback here is henceforth being tied to the pattern employed by the specific library/framework in use. For all of these, I’d still assume that the application code would throw exceptions and that the channel handling library could then be configured to map back to the appropriate format and serialisation.
A further distinction that is relevant here is between synch and asynch request execution. Realistically throwing exceptions doesn’t make sense (from a client perspective) in an asynchronous execution environment. I’d actually propose that declaring exceptions from remote code, which perform asynch processing, is still appropriate in terms of exposing fail-fast semantics and allows the remote application code to be designed according to best practices. (Specifically, the implementation code would: internally separate error handling code from normal code; be able to group and differentiate error conditions and handle exceptions in a coherent and centralised way. In such circumstances generic actions such as: publishing events; logging exceptions; ceasing execution (in the case of Errors and some Runtime Exceptions) and reporting incidents to a supervisor node [such as in Erlang OTP or Scala Akka environments], could be wired appropriately).
Scenario Appropriate handling
In Process Appropriate/typical Exception handling
Remote Web fwk Whatever is supported by the fwk – with the caveat of being tied to that fwk
Remote SOAP Exceptions thrown by Applications and exposed as WebFaults over the SOAP Interface
Remote REST Exceptions thrown by Applications and exposed as HTTP Error codes and messages over the RESTful interface
As some further reference, here’s how some API’s currently handle error conditions. Typically they seem to reuse the appropriate HTTP Status code, with a more detailed short error msg attached and description. Terse enough to neither kill logs or clog up the network, but detailed enough to be useful:
Amazon AWS – (really like the UUID/request Id here too for later tracing) – http://docs.amazonwebservices.com/AWSImportExport/2010-06-03/API/index.html?Errors.html

Leave a Reply