Some seafaring ships divide their body into multiple watertight compartments, so that if one compartment becomes flooded the rest of the ship will remain floodfree and intact. I got the idea of using a similar pattern of “bulkheads” in software to limit the damage caused by an unhandled exception.
Normally an unhandled exception will cause its thread to print diagnostic information to the console and then stop. This is an acceptable error-handling strategy for many command-line programs, but it is not a good error-handling strategy for a graphical program where the console is invisible.
I’ve been working on a graphical program where I want all exceptions and errors to be reported to the user in some way, rather than failing silently and presenting an unresponsive user interface when something goes wrong.
Additionally, there are certain critical sections of the program where an unhandled exception could leave a nearby data structure in an invalid state, causing subsequent operations on that structure to fail and/or propagate the corruption to other parts of the program state. It would be better if an unhandled exception occurring in a critical section marked any affected data structures as corrupt, such that further operations attempting to access those structures would refuse to run.
The Bulkheads Pattern can be used to meet both of those goals:
I’ve implemented a family of @capture_crashes_to...
function decorators in Python which mark a critical section of code and define which Bulkhead
the critical section is related to.
Any exception raised in the critical section which bubbles up to the decorator will be caught at the last moment and set as the crash_reason
for the associated bulkhead.
If a function calls into a critical section marked with @capture_crashes_to...
whose bulkhead has been marked as crashed, the critical section will refuse to run and will instead return to the caller immediately, possibly returning a default value.
Below, the child_task_did_complete
listener method is marked with @capture_crashes_to_self
, so any unexpected exception raised inside the listener won’t escape and crash the thread which called the listener. Instead the exception will be stored in the Task’s crash_reason
.
class Task(Bulkhead):
crash_reason: Optional[BaseException] = None
class DownloadResourceTask(Task):
@capture_crashes_to_self
def child_task_did_complete(self, task: Task) -> None:
if task is self._download_body_task:
...
elif task is self._parse_links_task:
links = self._parse_links_task.future.result()
embedded_resources = [
Resource(
self._resource.project,
urljoin(self._resource.url, link.relative_url))
for link in links if not link.embedded
]
for resource in embedded_resources:
self.append_child(resource.create_download_task(...))
else:
...
And indeed an unhandled exception did end up getting raised in the above code: The urljoin
operation was observed to raise an unexpected exception for certain unusual inputs even though it generally never raises.
The infrastructure for using bulkheads currently lives as a utility module in the Crystal project. You can install it to your local Python virtual environment using:
python -m pip install crystal-web
Here is a small program you can run in the Python REPL that demonstrates the use of bulkheads:
>>> from crystal.util.bulkheads import BulkheadCell, capture_crashes_to
>>>
>>> divide_bulkhead = BulkheadCell()
>>>
>>> @capture_crashes_to(divide_bulkhead)
... def divide(x, y):
... return x / y
...
>>> print(divide(10, 2))
5.0
>>> print(divide(10, 0))
Exception in bulkhead:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/me/Projects/crystal-web/crystal/util/bulkheads.py", line 292, in bulkhead_call
return func(*args, **kwargs) # cr-traceback: ignore
File "<stdin>", line 3, in divide
ZeroDivisionError: division by zero
None
>>> divide_bulkhead.crash_reason
ZeroDivisionError('division by zero')
>>> print(divide(10, 5))
None
See the API for more detailed information about how to use bulkheads.
For critical sections, I’ve found it effective to protect:
For bulkheads, I’ve found it effective to protect:
It can be useful for a caller to enforce that a callee has an explicitly defined error-handling strategy (via a @capture_crashes_to*
decorator). For example, a function that is dispatching events to listeners may want to enforce that each listener is marked with @capture_crashes_to*
before calling it. Such enforcement can be implemented with the following utilities:
run_bulkhead_call(func)
– Calls a function after verifying it is marked with @capture_crashes_to*
. Raises if the function is not marked.ensure_is_bulkhead_call(func)
– Raises if the specified function is not marked with @capture_crashes_to*
.is_bulkhead_call(func)
– Returns whether the specified function is marked with @capture_crashes_to*
.Example of an event dispatcher which enforces its listeners are marked with @capture_crashes_to*
:
def _resource_did_instantiate(self, resource: Resource) -> None:
for lis in self.listeners:
run_bulkhead_call(lis.resource_did_instantiate, resource)
See API > Error-handling enforcement utilities for more information.
Everything in this section is imported from crystal.util.bulkheads
. For example:
from crystal.util.bulkheads import Bulkhead, capture_crashes_to_stderr
First, define locations where unhandled exceptions can be captured to by implementing the Bulkhead
protocol:
class Bulkhead(Protocol): # abstract
"""
A sink for unhandled exceptions (i.e. crashes).
"""
crash_reason: Optional[CrashReason]
CrashReason = BaseException
Or if you don’t want to implement your own bulkhead type, you can use the included BulkheadCell
:
class BulkheadCell(Bulkhead):
"""
A concrete Bulkhead which stores any crash that occurs,
but takes no special action to report such crashes.
"""
crash_reason: Optional[CrashReason]
def __init__(self, value: Optional[CrashReason]=None) -> None:
self.crash_reason = value
Then, mark critical sections in your code with an appropriate @capture_crashes_to*
decorator. There are several to choose from:
@capture_crashes_to_self
is a good choice:def capture_crashes_to_self(
bulkhead_method: Optional[Callable[Concatenate[_B, _P], _RT]]=None,
*, return_if_crashed=None # _RF
):
"""
A Bulkhead method that captures any raised exceptions to itself,
as the "crash reason" of the bulkhead.
If the bulkhead was already crashed (with a non-None "crash reason") then
this method will immediately abort, returning `return_if_crashed`.
Examples:
class MyBulkhead(Bulkhead):
@capture_crashes_to_self
def foo_did_bar(self) -> None:
...
@capture_crashes_to_self(return_if_crashed=Ellipsis)
def calculate_foo(self) -> Result:
...
"""
@capture_crashes_to_bulkhead_arg
is a good choice:def capture_crashes_to_bulkhead_arg(
method: Optional[Callable[Concatenate[_S, _B, _P], _RT]]=None,
*, return_if_crashed=None # _RF
):
"""
A method that captures any raised exceptions to its first Bulkhead argument,
as the "crash reason" of the bulkhead.
If the bulkhead was already crashed (with a non-None "crash reason") then
this method will immediately abort, returning `return_if_crashed`.
Examples:
class MyClass:
@capture_crashes_to_bulkhead_arg
def other_foo_did_bar(self, other: Bulkhead) -> None:
...
@capture_crashes_to_bulkhead_arg(return_if_crashed=Ellipsis)
def calculate_baz(self, other: Bulkhead) -> Result:
...
"""
@capture_crashes_to(bulkhead)
, specifying the bulkhead to use at the time of decoration:def capture_crashes_to(
bulkhead: Bulkhead,
return_if_crashed=None # _RF
) -> Callable[[Callable[_P, _RT]], Callable[_P, Union[_RT, _RF]]]:
"""
A method that captures any raised exceptions to the specified Bulkhead,
as the "crash reason" of the bulkhead.
If the bulkhead was already crashed (with a non-None "crash reason") then
this method will immediately abort, returning `return_if_crashed`.
Examples:
@capture_crashes_to(bulkhead)
def foo_did_bar() -> None:
...
@capture_crashes_to(bulkhead, return_if_crashed=Ellipsis)
def calculate_foo() -> Result:
...
"""
display_error
function) fails, the only reasonable choice may be to fallback to printing the exception to the standard error stream (stderr
):def capture_crashes_to_stderr(
func: Optional[Callable[_P, _RT]]=None,
*, return_if_crashed=None # _RF
):
"""
A method that captures any raised exceptions, and prints them to stderr.
Examples:
@capture_crashes_to_stderr
def foo(self) -> None:
...
@capture_crashes_to_stderr(return_if_crashed=Ellipsis)
def calculate_foo(self) -> Result:
...
"""
The current implementation of the @capture_crashes_to*
decorators in Crystal always prints unhandled exceptions to stderr
as a side effect, to indicate that a crash happened.
@capture_crashes_to_stderr
decorator prints in red error text to signal that the only handling of the crash was to print it to stderr
.In the future, if/when the bulkheads.py
module is extracted to its own package on PyPI, independent of Crystal, I expect I’ll alter the @capture_crashes_to*
decorators to NOT print anything to stderr
by default since I don’t think that side-effect is appropriate for all programs that may want to use these decorators.
def run_bulkhead_call(
bulkhead_call: Callable[_P, _R],
/, *args: _P.args,
**kwargs: _P.kwargs
) -> '_R':
"""
Calls a function marked as @capture_crashes_to*,
which does not reraise exceptions from its interior.
Raises AssertionError if the specified function is not actually
marked with @capture_crashes_to*.
"""
def ensure_is_bulkhead_call(callable: Callable) -> None:
"""
Raises AssertionError if the specified function is not actually
marked with @capture_crashes_to*.
"""
def is_bulkhead_call(callable: Callable) -> bool:
"""
Returns whether the specified function is marked with @capture_crashes_to*.
"""
Component bulkhead structures in my website downloader Crystal include (1) Task
(which compose to form the Task Tree) and (2) entitytree.Node
(which compose to form the Entity Tree)↩
Top-level bulkhead structures in my website downloader Crystal include (1) the Task Scheduler itself (by using the root Task
as the associated bulkhead), (2) the Task Tree UI (by again using the root Task
as the associated bulkhead), and (3) the Entity Tree UI (which is its own bulkhead that reports errors to the Task Tree UI).↩