Most of the time programs operate happily inside their main scenario. Occasionally they need to cope with unusual circumstances, such as being no longer able to read data from the network because the user has turned off their WiFi. The process by which a program responds to an error is called error handling.
At the time an error occurs in a function there are a few things it can do:
The particular error handling strategy of a function reporting an error to its caller is so common that many programming languages provide special mechanisms to support it, particularly in the form of exceptions.
For example a web browser that cannot access a particular webpage will display an error page explaining why the page could not be accessed.
die()
function upon the failure of many operations:
$connection = mysql_connect('localhost', 'dbuser', 'password') or die('Could not connect to MySQL server.');
The Linux kernel makes great use of the panic()
function when it gets into a bad state, stopping the entire operation of the computer.
Naturally there needs to be somebody who goes through the log occasionally to look for problems if this error handling strategy is used.
In C, this just means ignoring the result of a function that returns an error code.
r = new Resource();
try {
...
} finally {
try {
r.close();
} catch (IOException e) {
// ignore
}
}
Less common handling strategies include:
When a function wants to report an error, it has to pass information about the error to its caller. There are a few ways that error information can be encapsulated:
OSErr
). Upon success the function returns 0 (or noErr
). Upon failure the function returns some other value that describes the type of error encountered. Should the function need to return other values upon success it must declare out parameters to pass them back to the caller.fnfErr
which indicates that the specified file was not found. Note however that this error code alone doesn’t include information about which file couldn’t be found, which is necessary context to display a meaningful error message.printf()
, even though it can return an error code upon I/O error. This behavior is reasonable in this case because there isn’t much a program can do to alert the user if it can’t print to the screen.malloc()
for NULL
when allocating objects. Such programs will crash when malloc
returns NULL
during an out-of-memory condition and the caller tries to manipulate it.These days error codes are only used commonly in programming languages like C that don’t support the use of exceptions.
null
, -1
, or 0
)
read()
from an InputStream
in Java, usually the resultant byte value is returned. But if the end-of-file is reached, the special value of -1
is returned instead.null
, remains common in many modern programs despite their disadvantages. Even in cases where a Null Object could be used to better effect.Some statically typed languages like Haskell and SML explicitly represent the potential of a sentinel being returned using a datatype such as Maybe
or Option
. Typically this is done so that the compiler can automatically flag callers that aren’t checking for sentinels properly, eliminating their largest disadvantage.
Either
. Compilers in such languages enforce that callers explicitly unpack the Either
value and handle any contained exception object explicitly.In summary:
Often the seriousness of an error is related to how it is handled. Generally speaking, errors are either:
Very few errors fall into this category.
I/O errors related to reading from a file or from a network socket are an example.
This includes dividing by zero, dereferencing a null pointer, attempting to access an object off the end of an array, reading from a file that was closed, and other “programmer errors”.
Common errors are typically reported using error sentinels or error codes for performance reasons. Other error categories are usually reported using exceptions unless the language in use doesn’t support exceptions, in which case error codes are used instead.
The standard libraries of programming languages typically distinguish between expected, unexpected, and fatal errors by using different exception base classes for each. This makes it easy to write exception handlers that can catch all thrown exceptions in a particular category. For example:
Java uses Error
as the base class for fatal errors, RuntimeException
for unexpected errors, and all other subclasses of Exception
for expected errors.
C# uses SystemException
(informally) as the base class for unexpected and fatal errors. All other subclasses of Exception
are used for expected errors.3
Python uses StandardError
(informally) as the base class for unexpected and fatal errors. All other subclasses of Exception
are used for expected errors.4
If a language supports checked exceptions, the expected exceptions should generally be marked as checked to force callers to handle them appropriately. Conversely, unexpected and fatal exceptions should not be marked as checked since this burdens the caller unnecessarily.5
When an error occurs in the middle of multi-step function, the function has to make a decision about what kind of state it wants to leave the program in when it returns to its caller. In general a function can leave the program:
Functions that make this guarantee are called atomic or transactional.
If a full rollback is not possible, it is still usually possible to partially rollback to a valid state.
Any complex fault-tolerant function should document which of these guarantees it makes.6 If it makes no guarantees at all, the caller may have to assume that whatever resource the function was operating on is in a bad state if the function returns an error.
Consider the concrete example of a program that copies a comma-separated-value (CSV) file to a new file. This generally involves the steps:
Here is a naive implementation that makes no guarantees to its caller in the event of an error.
/** Copies the source CSV file to the destination file. */
// (I didn't think about error handling at all.)
public static void copyCsvFile(File sourceFile, File destFile) throws IOException {
InputStream fileIn = new FileInputStream(sourceFile);
OutputStream fileOut = new FileOutputStream(destFile);
int b;
while ((b = fileIn.read()) != -1) {
fileOut.write(b);
}
fileOut.close();
fileIn.close();
}
Now imagine what happens if an I/O error occurs in the middle of copying bytes: The write
method will throw an IOException
and since copyCsvFile
has no matching exception handler, the copyCsvFile
function itself will stop and rethrow the IOException
. Notably, the destination file is left with incomplete and invalid CSV contents. And neither the source nor the destination file is closed, leaking those resources from the operating system.
We can at least avoid leaking resources by adding logic that ensures that resources are always closed when the function completes:
/** Copies the source CSV file to the destination file. */
// (No open file handles will be leaked even in the event of an error.)
public static void copyCsvFile(File sourceFile, File destFile) throws IOException {
InputStream fileIn = new FileInputStream(sourceFile);
try {
OutputStream fileOut = new FileOutputStream(destFile);
try {
int b;
while ((b = fileIn.read()) != -1) {
fileOut.write(b);
}
fileOut.flush();
} finally {
try {
fileOut.close();
} catch (IOException e) {
// Ignore I/O errors upon close since nothing can be done
}
}
} finally {
try {
fileIn.close();
} catch (IOException e) {
// Ignore I/O errors upon close since nothing can be done
}
}
}
This improved function will no longer leak open file handles in the event of an error but it will still leave an invalid destination CSV file.
If we document the additional guarantee that the function performs an atomic file copy, we’d want to explicitly code the function to delete the destination file in the event that it couldn’t be fully copied. Here’s an implementation:
/** Copies the source CSV file to the destination file atomically. */
public static void copyCsvFile(File sourceFile, File destFile) throws IOException {
InputStream fileIn = new FileInputStream(sourceFile);
try {
OutputStream fileOut = new FileOutputStream(destFile);
boolean finishedCopying = false;
try {
int b;
while ((b = fileIn.read()) != -1) {
fileOut.write(b);
}
fileOut.flush();
finishedCopying = true;
} finally {
try {
fileOut.close();
} catch (IOException e) {
// Ignore I/O errors upon close since nothing can be done
}
if (!finishedCopying) {
boolean deleteSuccess = destFile.delete();
// If the delete fails then the rollback failed.
// Since there's nothing that can be done in that case,
// we ignore deletion failures.
}
}
} finally {
try {
fileIn.close();
} catch (IOException e) {
// Ignore I/O errors upon close since nothing can be done
}
}
}
If we wanted to get even more fancy we could document instead that the function guarantees that it will copy as much of the source CSV file to the destination CSV file, leaving the longest valid destination CSV file even in the case of an error. In particular if the entire file cannot be copied, the function will copy as many complete lines from the source CSV file as possible, stripping off any incompletely written lines.
This is actually relatively difficult to implement correctly in Java while still handling characters correctly and preserving end-of-line sequences, so here’s a Python 2 implementation instead:
def copy_csv_file(source_filepath, dest_filepath):
"""
Copies the source CSV file to the destination file.
If an error occurs while copying, as many rows as possible are copied,
leaving a valid destination CSV file.
"""
with open(source_filepath, 'rb') as file_in:
offset_to_last_line_written = 0
finished_copying = False
file_out = open(dest_filepath, 'wb')
try:
while True:
cur_line_bytes = file_in.readline()
file_out.write(cur_line_bytes)
offset_to_last_line_written = file_out.tell()
file_out.flush()
finished_copying = True
finally:
truncated_successfully = False
if not finished_copying:
try:
file_out.truncate(offset_to_last_line_written)
truncated_successfully = True
except IOError:
# Unable to truncate. Will try to delete the file instead...
pass
try:
file_out.close()
finally:
if not finished_copying and not truncated_successfully:
try:
os.remove(dest_filepath)
except IOError:
# Unable to truncate or remove the destination file.
# Nothing else can be done.
pass
Actually the preceding implementation isn’t correct in the presence of output stream buffering (unless it is line-buffered), since it could be the case that the offset_to_last_line_written
points to the end of a line that in fact has not been written to disk but is rather in the output buffer. A correct and performant implementation that additionally handles that case is left as a (non-trivial) exercise for the reader.
Errors are easiest to handle when they are signaled at the exact point where a problem first occurred, or as close to it as possible. Thus functions should try to fail fast whenever possible.
It is a good idea for functions to check their inputs (especially their arguments) immediately upon invocation to see whether they conform to the expected format. This provides early warning of state corruption that could get introduced into the derived output of the function.
In addition if there are points where a function can make a non-trivial assertion about its current state, and this assertion is at risk of breaking due to modifications by maintainers, it should make an explicit check that the assertion is true.
assert
vs. if
When checking assertions, a function can always use the humble if
statement:
public class Registry {
private Map<String, Object> items = new LinkedHashMap<String, Object>();
public void register(String id, Object item) {
if (id == null)
throw new IllegalArgumentException(
"Cannot register an item with a null ID.");
if (item == null)
throw new IllegalArgumentException(
"Cannot register a null item with ID \"" + id + "\".");
if (items.containsKey(id))
throw new IllegalStateException(
"Already have an item registered with the ID \"" + id + "\".");
items.put(id, item);
}
// (... more methods ...)
}
However there is also an assert
statement in many languages. The assert
statement typically differs from if
in that it can be compiled-out of the program automatically if desired, for a modest performance boost at the expense of safety. Therefore assert
should typically only be used in performance-critical code (that has been verified as such by a profiler).
In practice I almost never use assert
, preferring to rely on if
instead.
Expected exceptions7 are part of a function’s API. Consequently:
Expected exceptions should be given the same coverage in a function’s documentation as its parameters or return type. Remembering to document expected exceptions is particularly important when writing API documentation for languages lacking checked exceptions (i.e. everything other than Java, including C#).
Callers may depend on the expected exceptions in the function’s API documentation.
A function cannot remove or change the exceptions it throws without breaking callers that have been coded to expect the old set of exceptions. And new exceptions that are added will not be expected by existing callers.
It is generally a good idea have a separate exception type for each specific type of error that a caller might want to handle distinctly. That way a caller can easily write an exception handler that catches a specific exception of interest. For example FileNotFoundException
is likely to be treated distinctly from a generic IOException
, so it is given a separate exception type (that inherits from the generic IOException
class).
Errors that are not likely to be handled distinctly by the caller can just reuse a generic exception class directly. For example a piece of code that detected a “bad media” error when reading from a disk could just throw a plain IOException
with a message instead of creating a special subclass. However note that by doing this you provide no API for any caller that does in fact want to catch this kind of exception.8
An error message typically accompanies an exception, and it is this message that is typically presented verbatim to the user if high-level code doesn’t recognize the exception type itself. Therefore it is important that the message be:
A surprising number of real-world exception messages don’t even meet this low bar.
By constrast unexpected and fatal errors are typically programmer errors and only likely to be seen by other developers when developing a program. Therefore they should be written with the developer in mind.
For example a error encountered while parsing a text file should include information about the location of the parse error, so that the user can find the problem in the original document. Any good compiler will give you at least a line number when it encounters a syntax error and may even provide a specific column number as well.
It is common for a single system to have a single high-level exception type that is thrown by most of its functions. For example a parser’s functions may all throw the high-level ParseException
.
However the implementation of such a system may delegate to other systems that use a different set of exceptions. In the case of a parser, it typically has to read the characters it is parsing from an I/O stream, which may throw a low-level IOException
.
Now, top-level parser functions have a few options in reporting the underlying IOException
as a possible failure case to its caller:
ParseException
and IOException
.
IOException
directly.Therefore this is not generally recommended for top-level functions, although it may be used by internal functions.
IOException
s and translates them to a generic ParseException
that wraps the original IOException
.
ParseException
.Callers of the parser remain able to intercept and extract the underlying IOException
if they wish.
IOException
s and maps them to a special subclass of ParseException
(like ParseIOException
), optionally wrapping the original IOException
for further inspection by the caller.
ParseIOException
).In summary, most low-level level exceptions should be wrapped in the generic high-level exception. For prominent low-level exceptions, they should be mapped to a specific high-level exception subclass.
Here’s a Java example of a parser taking the “wrapping” approach to low-level I/O exceptions.
// The top-level methods of this class all throw ParseException in their API.
public class RuleParser {
public static Rule parse(InputStream input) throws ParseException {
try {
return new RuleParser(input).readRule();
} catch (IOException e) {
throw new ParseException("I/O error while parsing rule.", e);
}
}
private Rule readRule() throws ParseException, IOException {
// (...)
}
// (...)
}
Sometimes a high-level function using exceptions needs to report an error received from a low-level function that uses error codes. In this case the low-level error code needs to be communicated to the caller somehow via an exception.
Typically the low-level function comes from an overall subsystem of some kind which uses error codes in general for error reporting. In such a case it is typical to define a generic exception to wrap all error codes received from the subsystem. This generic exception should preserve the original error code for inspection by callers, along with whatever extra context may be available from the subsystem, typically just an error message.
For example Python uses the OSError
exception to wrap error codes received from the underlying C library. It is populated with the error code received from the C errno
global variable and the message received from the C function perror()
.
Now the high-level function may not want to report this kind of low-level subsystem exception directly, in which case the “wrap” or “map” technique discussed above in “Wrapping Low-Level Exceptions in High-Level Exceptions” should be used in addition.
Typically error sentinels are used to report common errors that are intended to be handled immediately by the caller. However if the caller cannot handle the error itself but needs to delegate to its second-order caller, it typically needs to promote the sentinel to an exception.
As an example, the end-of-stream condition when reading from a stream is generally considered to be a common error in Java. However a function that is parsing a data structure out of a stream does not expect an end-of-stream when it is in the middle of parsing a structure. Thus the parser wishes to report the end-of-stream condition to its caller as either an expected or unexpected error (depending on context), both of which require an exception.
public class BinaryInputStream extends FilterInputStream {
public BinaryInputStream(InputStream in) {
super(in);
}
public int readUInt8() throws IOException {
int b = this.read();
if (b == -1) {
throw new EOFException("Unexpected end of stream.");
}
return b;
}
public int readUInt16() throws IOException {
return
(readUInt8() << 8) |
(readUInt8() << 0);
}
}
In the preceding example the -1
error sentinel was promoted to an EOFException
.
The use of internal error codes within a generic exception class should generally be avoided, since this makes it difficult to handle them. (One exception to this guideline is when using an exception to wrap external error codes received from another subsystem, as described above in “Wrapping External Error Codes in Exceptions”.)
Consider the following example:
// A generic exception that wraps its own set of error codes. DON'T DO THIS.
public class FetchException extends RuntimeException {
private int code;
private String text;
public static final int JOB_NOTREADY = 1;
public static final int TIMEOUT = 2;
public static final int AMBIGUOUS_NAME = 3;
FetchException(int code, String text) {
super(text);
this.code = code;
this.text = text;
}
public int getCode() {
return code;
}
public String getText() {
return text;
}
}
If you actually wanted to detect the JOB_NOTREADY
case, you’d have to write code like:
public Job fetchJob(String jobName) {
for (int triesLeft = MAX_FETCH_ATTEMPTS; triesLeft > 0; triesLeft--) {
try {
return service.getJobs().get(jobName);
} catch (FetchException e) {
if (e.getCode() == FetchException.JOB_NOTREADY) {
// Retry again
continue;
} else {
throw e;
}
}
}
throw new FetchException(TIMEOUT,
"Job \"" + jobName + "\" was not ready after " +
MAX_FETCH_ATTEMPTS + " fetch attempts.");
}
It’s not pleasant having to put that if-statement in the exception handler. And the throwing of the TIMEOUT
-coded FetchException
couldn’t save contextual information like the jobName
and MAX_FETCH_ATTEMPTS
in a machine-readable field since the generic FetchException
didn’t have fields that were specific to the TIMEOUT
code.
A better solution would be to use specific subclasses of FetchException
instead:
public class FetchException extends RuntimeException { ... }
public class JobNotReadyException extends FetchException { ... }
public class FetchTimeoutException extends FetchException {
private String jobName;
private int numFetchAttempts;
FetchTimeoutException(String jobName, int numFetchAttempts) {
super("Job \"" + jobName + "\" was not ready after " +
MAX_FETCH_ATTEMPTS + " fetch attempts.");
this.jobName = jobName;
this.numFetchAttempts = numFetchAttempts;
}
// (... Accessors for jobName and numFetchAttempts ...)
}
public class AmbiguousNameException extends FetchException { ... }
Then the code could be simplified to just read:
public Job fetchJob(String jobName) {
for (int triesLeft = MAX_FETCH_ATTEMPTS; triesLeft > 0; triesLeft--) {
try {
return service.getJobs().get(jobName);
} catch (JobNotReadyException e) {
// Retry again
continue;
}
}
throw new FetchTimeoutException(jobName, MAX_FETCH_ATTEMPTS);
}
Sometimes programs report common errors in the form of an exception instead of using a more appropriate mechanism such as an error sentinel. This is inefficient since throwing exceptions is slow in the common case. And it is awkward for the caller who must have an explicit exception handler around every invocation to deal with the common case. Don’t do it.
However one case where exceptions are useful as a means of flow control is to force (or recommend) that a thread terminate. Such exceptions are classified as fatal errors so that most exception handlers ignore them.
For example Java uses the ThreadDeath
exception (a subclass of the fatal Error
) to terminate a thread. And Python uses the KeyboardInterrupt
exception (a subclass of the fatal BaseException
) to kill the main thread when the user presses Control-C.
Error handling is hard. But you’ll provide a better experience by properly handling and communicating errors back the user.
Don’t be that guy who provides useless generic error messages:
(I doubt even the program itself knows what went wrong.)
Summarizes the most prominent strategies for handling runtime errors.
Series
This article is part of the Programming for Perfectionists series.
Flawed functions that return sentinel values that are actually valid place extra onus on the caller to perform additional checks to determine whether an error actually occurred. For example PHP’s stream_get_contents
function can return ''
or FALSE
upon failure. But it can also return ''
upon success. See my insane workaround.↩
Most programs treat out-of-memory errors as fatal, although there are a few hardened programs like SQLite that treat out-of-memory as an expected error.↩
The C# exception hierarchy is illustrated in “C# exception hierarchy”.↩
The Python exception hierarchy is documented in “Exception Hierarchy”.↩
Java has a few annoying examples where unexpected exceptions were marked as checked, burdening all subsequent callers. In particular Object.clone()
throws the checked CloneNotSupportedException
, making it hard to use. And Java’s reflection library throws the checked IllegalAccessException
and InvocationTargetException
whenever you try to invoke()
a method, neither of which are expected errors. And Thread.sleep()
throws the checked InterruptedException
. Now IOException
, thrown by all I/O functions, is legitimately a checked exception because it is an expected exception.↩
Instead of documenting the guarantees after failure for individual functions, it also common to document an overall failure handling strategy for the entire system. For example most databases are documented as generally operating in a transactional fashion, with failed operations leaving the database in its original state. Of course some functions may deviate from the general policy, in which case the deviation should be documented.↩
Unexpected and fatal exceptions, on the other hand, are not typically part of a function’s API. As such, callers should not write exception handlers that depend on them.↩
In this circumstance a caller is forced to guess the exception type by parsing the exception’s message. However this solution is brittle since the message isn’t part of the API and could change in the future or vary depending on the current locale.↩