Recently I took the liberty of reading one of the defining books in the domain of computer science: The Structure and Interpretation of Computer Programs, often abbreviated as SICP.
SICP is the computer science textbook given to undergraduates at MIT. It serves as an advanced text on software design and as an introductory text for the Lisp programming language.
Here are some interesting things I learned:
Units of computation can be abstracted in several forms:
For you youngin’s that have never used Pascal, just replace “Pascal” with “Java” in this section and you should get the right idea.
Two design philosophies:
Lisp-school: Create abstract data types by combining a small set of general-purpose data types (particularly collections). These complex structures can then be manipulated using operations on these general-purpose types.
“Lisp is for building organisms - imposing, breathtaking, dynamic structures built by squads fitting fluctuating myriads of simpler organisms into place.”
“Lisp programs inflate libraries with functions whose utility transcends the application that produced them.”
Pascal-school: Create many special-purpose data types (i.e. classes) and specialized operations to manipulate them.
“Pascal is for building pyramids - imposing, breathtaking, static structures built by armies pushing heavy blocks into place.”
“In Pascal the plethora of declarable data structures induces a specialization within functions that inhibits and penalizes casual cooperation.”
Roughly speaking, I think of the Lisp philosophy as programming with collections and the Pascal philosophy as programming with classes.4
The Pascal philosophy has won out in most statically typed languages such as C++ and Java, and in languages with poor (or nonexistent) built-in collections.
A hybrid approach (using both philosophies) is seen in languages that are dynamically typed, have built-in collections, and have built-in classes, such as Python and Ruby.
Polymorphism is where multiple abstract data types implement a common interface, which is typically defined as a series of methods that can be called on all implementing types.
This allows client code, when given an object known only to implement a particular interface, to invoke interface methods on the object and end up calling the correct implementation of that method depending on the runtime type of the object.
Polymorphism can be implemented in several different ways:
Many languages provide a default implementation strategy as a language construct. For example C++ and Java use virtual lookup tables. Smalltalk and Objective-C use message passing. C doesn’t give you anything for free, so you have to roll your own polymorphism.
These implementation strategies for polymorphism have some tradeoffs, which are worth knowing:
Introducing cross-type operations, such as
add(Integer, Complex), is a very tricky design issue.
Having explicit functions that operate on all combinations of types is possible but highly verbose. With n types and m operations, you need n*m functions to implement all combinations. Impractical.
Another strategy is to use coercion to convert a value from one type to another. So instead of defining
add(Integer, Complex), just define
convertToComplex(Integer) : Complex, and use the existing
add(Complex, Complex). To convert between all types requires at least n but no more than n2 conversion functions.
implicitor explicit one-argument constructors).
One wrinkle is that these conversion functions might introduce a loss in precision. For example not every integer can be represented as double of exactly the same value.
Introducing assignment creates lots of complications. In particular referential transparency is lost. Optimizations related to reordering and coalescing expressions need to be a lot more careful. Static reasoning of various kinds is impaired.
To reduce bugs it is best to minimize the use of mutation by using immutable objects whenever possible.6
Haskell takes a very aggressive stance against assignments, mutable state, and other side effects: they are banned by default. However Haskell does allow side effects within the context of a monad. This is a special construct unique to Haskell (so far as I know).
Normally expressions are evaluated immediately, in the order that they occur in code. Lazy evaluation changes this behavior such that expressions are only evaluated when some primitive operation (like print or add) requires the value of the expression. Until then the unevaluated expression is passed around (as a “thunk”).
Lazy evaluation enables that creation of lazy data structures, which is a useful performance optimization in some contexts.
Unrestricted mutation and lazy evaluation do not mix well in programming languages. Since unrestricted mutation is very common in mainstream programming languages, it is quite rare to be in an environment that supports lazy evaluation. Haskell is one of the few.
Logic programming languages are at the far declarative end of the imperative-declarative spectrum. They can be used to deduce answers from a set of initial set of declarative statements.
Prolog is the best-known example of a logic programming language.
Query systems for databases are a type of logic programming language.
A “query” (i.e. an expression in the language) is transformed into a “query plan” (i.e. a specific set of steps to follow) by a query planner. The implementation of these planners is quite complex.
So those are a few interesting things I learned from reading SICP. Taking notes while reading made it a lot easier for me to remember the content. You might try it when reading a technical book with a lot of new concepts.
For example, complex numbers can be expressed in rectangular (a + bi) or polar form (r*cos(𝜽)). – Some representations are better than others for different operations. Adding works better in rectangular form. Multiplying works better in polar form.↩
SICP does not mention the notion of a module or an assembly, however these are common higher-level units for abstracting computation in languages other than Lisp.↩
Versioning assemblies effectively is a challenging topic onto itself. If done without care you get so-called “dependency hell”. Just getting a consistent version numbering scheme can be tricky. One popular versioning scheme is codified as Semantic Versioning.↩
In my opinion, a language that supports neither first-class collections nor first-class classes is non-viable for large scale general purpose software development. C, Assembly, and Fortran fall into this category.↩
For example, the Linux virtual filesystem, which is implemented in message-passing style, has a common set of operations that all filesystems are expected to support (ex:
unlink). However individual filesystems may support additional operations: For example the HFS+ filesystem on Mac OS X additionally supports a
delete operation, which has slightly different semantics than the standard
A common case where eliminating mutation may not be practical is when defining and working with large data structures that need many small updates made to them over time. If such a structure were made immutable, there would be a large performance penalty for recopying the entire structure whenever a small change needed to be made.↩