Books as Software

I wrote the first version of this on April 8, 2006 in response to a question I was repeatedly asked about my programming languages textbook (PLAI). I revised this on April 26, 2007 because the original question no longer applies: the book is now available in print, though with a significant caveat that keeps the original issues relevant. (Futher revisions: August 28, 2007; January 7, 2008; April 1, 2008 (thanks, Vivek Pai); August 14, 2009 (thanks, Benjamin Prosnitz); Jan 24, 2019 (thanks, Peter Sibley).) I welcome comments, especially the perspective of those whom I do not have ready access to, such as professional programmers or readers in developing countries.

Short Answer #1:

It already does. Download the book and engage your document reader's print command.

Short Answer #2:

It does, but not under any publisher's imprimatur. It remains self-published, but is now offered in print at lulu.com.

Long Answer:

What people mean by this question is really, ``When will a formally recognized publisher publish this?'' Not for the foreseeable future.

I have been offered contracts, by traditional publishers for whose wisdom and experience I have the highest respect. Unfortunately, I am not convinced that the publishers can bring enough to the table relative to what the traditional publishing model takes away. This, then, is an experiment. Below are some of the considerations I see.

Formality and Existence. Bookstores can identify and sell it, libraries can index it, authors can cite it, and the general public can refer to it using traditional methods (such as an ISBN—though now an author can obtain one directly). It exists! Its existence gives is weight and gravity (literally and metaphorically). Self-published books have a whiff of seediness.
Physical Attraction. The top academic publishing houses know a great deal about the physics of books: cover materials, paper, layout and typesetting, binding, and so forth. There is a certain delight to holding and turning the pages of a good hardcover book.
Editorial Supervision. Publishers have high-level knowledge about ideal lengths and content. They also provide professional services such as copy-editing.
International Dissemination. The best publishers will try to find publishing houses that can produce low-cost international editions. It's easy for rich-world countries to forget, but if books seem expensive to them, they cannot begin to comprehend how much more expensive the same books are to an developing-country audience. My reading as a youth was partially fueled by low-cost editions such as Prentice-Hall's Eastern Economy Editions. Beside me as I type stands a yellowing copy of Apostol's Mathematical Analysis, published in India by a house called Narosa on behalf of Addison-Wesley.
Advertising. Publishers put out catalogs; those of the best publishers are themselves beautiful objects that are a joy to browse. They contact book-buyers. They rent stalls at academic conferences. Selling books is one of their core competences.
Ego Gratification. I confess to great joy that copies of How to Design Programs are on the shelves of world-class bookstores like Foyles (London) or of beloved ones such as Gangaram's (Bangalore). Attend an event at Dagstuhl and there's your book right on the front shelf of the library. That's a thrill that is hard to quantify. (Then again, in February 2007 the Dagstuhl library happily recognized my on-line PDF as a “book” and added a printed copy to their collection.)
Money. A percentage of the proceeds from sales are given to the author as a royalty.

These are all wonderful benefits, ranging from the very abstract to the very concrete. So what's not to like?

Let's take these on in order.

Formality and Existence. This, my book will never have. But how much does it matter? In today's world, a URL seems to have as much stature as an ISBN, especially a URL that an author commits to keeping current. As for gravity, I (have to) hope my own reputation and standing count for something; if a user places less faith in my work than in a bound volume by an author with no standing in the research community, perhaps we have nothing to communicate anyway.
Physical Attraction. Right now, obviously, the publishing houses have a major advantage over me. But will this last? Isn't it only a matter of time before some publishing house zeroes in on this world, and can offer small print runs with quick turn-around in a professional binding on smart paper? Why hasn't O'Reilly done this already? Will Amazon? (At the time of originally writing this, in 2006, I wasn't aware of printing houses like lulu.com. I found out about them shortly thereafter. I won't pretend the print quality they produce, combined with my minimal artistic skill, matches the production values of an MIT or Cambridge University Press, but it's good enough if you buy the rest of this essay.)
Editorial Supervision. Another win for the publishers. But maybe size is not an obstacle when the consumer can control content rather than having to accept it on an all-or-nothing basis, which I discuss in greater detail below. And as for copy-editing, I work hard at it myself, and my readers are kind enough to send in comments. If you have many eyes, do all bugs become shallow?
International Dissemination. In principle I cannot compete with this, and I think this is a noble goal that is extremely important to me, personally, owing to my background. But if that developing-world student has access to a computer and an Internet connection, they already have this book. I have some evidence that this phenomenon is already in play.
Advertising, Ego Gratification, Money. This is easy: the publishers win hands-down. But perhaps the publishing model of this book will earn it its own following? I have to hope it will; otherwise I have no hope of competing, and a great deal of effort will have gone to waste.

All I have offered so far is a point-by-point refutation (not especially strong, at that) of the traditional publishing model. As you have no doubt noticed, I haven't, in fact, really answered my own question: ``What's not to like?'' Before I answer it directly, I'd like to present a vision of what a book can be, then explain how this vision is itching to emerge in the use of my book.

For decades, we treated software like books. We boxed it, wrapped it, bundled it up and shipped it to stores, where it was displayed on windows. While many commercial software products still share several of these elements, the high-street software store has all but disappeared. Its place has been taken by an increasingly agile deployment method, first from obscure FTP servers but now revolutionized in scale and convenience by the Web. So why shouldn't books move in the same direction?

This question deserves some introspection, not merely a affirmative shout from the radical gallery. The question actually represents several concerns rolled into one, and it would help to unbundle them.

The most obvious problem is one of medium quality. Have you tried reading a Project Guttenberg book on-line? I have, and I've always failed. I can, at best, get through small excerpts of Shakespearean plays before the medium becomes an obstacle. But electronic content has its own obvious benefit, namely searchability (which is how I find those Shakespeare fragments in the first place). And disseminating camera-ready copy, as I effectively do, seems to strike a happy medium between these two extremes.

Next, do we even need to distribute upgrades? A literary reader might find this not only useless but, worse, horrifying. But an academic textbook is not, primarily, a literary work. A good book should demonstrate an endless series of innovations and improvements, just like a software package. I find, from experience, that most of these changes are far closer in spirit to the software notion of ``versions'' than the publishing notion of ``editions''. This, in fact, is a realm in which software has taken a lead, making a distinction that traditional publishing could not: major-version numbers in software correspond to editions, while minor-version numbers represent smaller, more localized upgrades.

We can push the metaphor further and argue that a book is not so much a software package as a collection of software components. Many textbooks make this explicit by providing a flow-diagram of dependencies between chapters, which are a representation of expected and provided interfaces. Textbooks even indicate different routes through the material. But the physical book can never correspond to these virtual routes: you have to take all or nothing. If a professor wants to assemble a course from parts of five books, why should students suffer through that much bulk? Software linkers solve precisely this problem.

In a word, control. In the present publishing model, control rests in one place: with the publisher. Once the author forks over their document, they are effectively at the mercy of an organization over which, no matter how benevolent (as the academic publishing houses most assuredly are), neither the author nor the adopter has very much influence.

The author's need for control stems from two simple forces: the constant stream of innovation, and the impossibility of predicting when that stream will bear new fruit (if you will pardon my hashed metaphor). To me, this is far from a hypothetical concern:

In 2001, I entirely revised the presentation of continuations, using the Web as a driving metaphor. (This was driven by hot-off-the-press research.)
In 2003, I began to make programming-by-search an important part of the course.
In 2004, I made programming-by-search the working example of a domain-specific embedding.
At the same time, Eli Barzilay began to experiment with embedding laziness in Scheme transparently enough that students could perform non-trivial experiments with it. (Eli and John Clements have since written up this work.)
A year later, Matthew Flatt, with his characteristic ability to surprise and delight, created a dynamically-scoped version of Scheme.
In 2005, Greg Cooper completely changed the way I thought about teaching garbage collection. In one fell swoop, he both removed a huge volume of dependency in the book and entirely overhauled a student's ability to experiment. (Greg is writing this up formally.)

Is this kind of spate of innovation characteristic to textbooks? I don't know, and it doesn't matter. Something about this book has inspired people to contribute, and I want to embrace their contributions. (This suggests some kind of collaborative authoring, and indeed I have come to think of my text as a group product. I still, however, cling to an old-fashioned notion that an author provides a vision and sets a tone for a book and, however limited that tone may be, it is often superior to the random walk of a collaborative document. I am now wrestling with how to properly attribute credit to all these parties.)

These speak for returning control to the author. Publishers can help authors by keeping print-runs small, but this ultimately becomes a matter of economics and business models. I think this makes it hard for a publisher to embrace an author's passion for disseminating a really great idea. To a publisher, a new print run is not something you enter lightly. To me, Greg's ideas on teaching garbage collection were a sea-change, and I couldn't wait to get them out there; the Web let me do that.

There are many more reasons why authors benefit. In a world where users have grown accustomed to being able to experiment with software before buying it, they are increasingly going to be drawn to books that offer the same facility. A book you can sample is better than one you can't, but one you can use is better than one you sample. Of course, I can afford to take this position because I know the royalties will never do more than pay for a nice dinner, so I can forego them entirely. But an author who wanted the revenue and felt his content deserved it may be brave enough to set up a PayPal link and hope for his readers to do right by him. (I don't know statistics on how often corresponding software requests for donations are successful, but I am not sanguine.) I have generally left money out of this critique, but if we address it we should from both sides, analyzing the agency costs inherent in publishing.

But how about the consumer? How do they benefit from agility? Most obviously, they and their students enjoy the fruits of the latest pedagogic inventions; they can already do this for software, but books still lack disintermediation. More importantly, no book is usually perfect: each professor has their own model of what a course should cover, and that model is never the same from one professor to the next. This calls for a hybrid approach (inasmuch as terminology and notation can be reconciled). Indeed, while several universities appear to have adopted my book as an auxilliary text, some have adopted portions of it as primary material. But if my book were to go physical, cost and weight would matter. At present, you cannot have five (fractional) primary texts: students cannot afford it, and their backs cannot bear the load.

These arguments have primarily been directed at academic consumers. In fact, I wish academic computer science texts would make a far greater effort to accommodate the large commercial population in our discipline: we need textbooks for hackers. The best of them exhibit a drive for self-improvement that exceeds that of most of our students, but their sources of information are limited to Web sites often populated by those who have the most time, not those who know the most. But many of these users are of the electronic generation, and lack access to the signposts erected inside academia to lead readers to the best sources of information. I conjecture that a software-like dissemination model will appeal to them far more than traditional textbook publishing (I contrast texts, here, to the manual-like realm that O'Reilly has mastered).

In fact, the only obstacle to publishing new versions almost daily is its effect on the user community. A Web-based application can afford to make small changes on a regular basis, and users hardly notice. But many users prefer reading books on paper, and it is impossible to align edits so that page numbers and page breaks are not disrupted by changes (think of the student who keeps the printed content in a binder). It is, even more importantly, disruptive to teachers to have to teach against a moving target. For that reason alone, book upgrades should happen infrequently. But this principle already governs software upgrades: DrScheme, for instance, is released only at well-defined points after the ends of semesters. If anything, then, the author has a responsibility to restrain the exuberance that this model affords.

This brings me to my most provocative thesis.

I cringe to say this, as a reader and book-lover, but: We live in a plastic era, but treat books as sacredly as our medieval academic brethren did. Perhaps we have gone too far in the respect we afford. To most of our students, books are already throw-away artifacts, except they are too dear to literally dispose; but how many keep their textbooks after a semester finishes? We can wring our hands, or we can try to understand better the way they function. We do not fret if they uninstall software, so why should a textbook be less ephemeral?

I have said that a book is a collection of components. I have concrete evidence that some of my users specifically excerpt sections that suit their purpose. What excited me so much about Greg's garbage collection approach is that it helped me explicitly reduce dependencies on other chapters, i.e., to shrink that component's required interface. So this modularity is very much central to the way others and I think about the book.

I forecast that one day, rich document formats like PDF will recognize this reality and permit precisely such specifications. Then, when a user selects a group of desired chapters to generate a thinner volume, the software will automatically evaluate constraints and include all dependencies. To enable this we will even need ``program'' analyses that help us find all the dependencies, using textual concordances as a starting point and the index as an auxilliary data structure.

(As an aside, for some years now I have already wished I could do this to generate personalized, mini-travel guides. I am a carry-on traveler. I detest having to choose between the medium-sized volume that covers one city in great detail and the fat volume that covers a country slimly, when what I want is to visit three or four cities. I abhor having to lug around hotel guides to cities where I have long since made reservations. I would rather have a restaurant listing of three vegetarian restaurants than a dozen seafood ones. And I would gladly pay for the privilege. When will a travel publisher cotton on to this idea? — And if this idea does not strike you as a violence to the notion of a book, why should the analogous suggestion for textbooks upset you?)

This is not to suggest that books and publishers must go their separate ways, and never the twain should meet. My critique is, rather, based on the present model and the evolving world in which that model is forced to function. Until the publishers evolve more flexible dissemination models, they cannot properly accommodate either producer or consumer.

Books as Software

When will PLAI appear in print?

If you can't get a contract, is it worth publishing?

What benefits does a traditional publisher confer?

A Counterpoint

Books as Software

What's Not to Like?

Publishing Light: Books as Throw-Away Artifacts