Books as Software

I wrote the first version of this on April 8, 2006 in response to a question I was repeatedly asked about my programming languages textbook (PLAI). I revised this on April 26, 2007 because the original question no longer applies: the book is now available in print, though with a significant caveat that keeps the original issues relevant. (Futher revisions: August 28, 2007; January 7, 2008; April 1, 2008 (thanks, Vivek Pai); August 14, 2009 (thanks, Benjamin Prosnitz); Jan 24, 2019 (thanks, Peter Sibley).) I welcome comments, especially the perspective of those whom I do not have ready access to, such as professional programmers or readers in developing countries.

Short Answer #1:

It already does. Download the book and engage your document reader's print command.

Short Answer #2:

It does, but not under any publisher's imprimatur. It remains self-published, but is now offered in print at lulu.com.

Long Answer:

What people mean by this question is really, ``When will a formally recognized publisher publish this?'' Not for the foreseeable future.

I have been offered contracts, by traditional publishers for whose wisdom and experience I have the highest respect. Unfortunately, I am not convinced that the publishers can bring enough to the table relative to what the traditional publishing model takes away. This, then, is an experiment. Below are some of the considerations I see.

These are all wonderful benefits, ranging from the very abstract to the very concrete. So what's not to like?

Let's take these on in order.

All I have offered so far is a point-by-point refutation (not especially strong, at that) of the traditional publishing model. As you have no doubt noticed, I haven't, in fact, really answered my own question: ``What's not to like?'' Before I answer it directly, I'd like to present a vision of what a book can be, then explain how this vision is itching to emerge in the use of my book.

For decades, we treated software like books. We boxed it, wrapped it, bundled it up and shipped it to stores, where it was displayed on windows. While many commercial software products still share several of these elements, the high-street software store has all but disappeared. Its place has been taken by an increasingly agile deployment method, first from obscure FTP servers but now revolutionized in scale and convenience by the Web. So why shouldn't books move in the same direction?

This question deserves some introspection, not merely a affirmative shout from the radical gallery. The question actually represents several concerns rolled into one, and it would help to unbundle them.

The most obvious problem is one of medium quality. Have you tried reading a Project Guttenberg book on-line? I have, and I've always failed. I can, at best, get through small excerpts of Shakespearean plays before the medium becomes an obstacle. But electronic content has its own obvious benefit, namely searchability (which is how I find those Shakespeare fragments in the first place). And disseminating camera-ready copy, as I effectively do, seems to strike a happy medium between these two extremes.

Next, do we even need to distribute upgrades? A literary reader might find this not only useless but, worse, horrifying. But an academic textbook is not, primarily, a literary work. A good book should demonstrate an endless series of innovations and improvements, just like a software package. I find, from experience, that most of these changes are far closer in spirit to the software notion of ``versions'' than the publishing notion of ``editions''. This, in fact, is a realm in which software has taken a lead, making a distinction that traditional publishing could not: major-version numbers in software correspond to editions, while minor-version numbers represent smaller, more localized upgrades.

We can push the metaphor further and argue that a book is not so much a software package as a collection of software components. Many textbooks make this explicit by providing a flow-diagram of dependencies between chapters, which are a representation of expected and provided interfaces. Textbooks even indicate different routes through the material. But the physical book can never correspond to these virtual routes: you have to take all or nothing. If a professor wants to assemble a course from parts of five books, why should students suffer through that much bulk? Software linkers solve precisely this problem.

In a word, control. In the present publishing model, control rests in one place: with the publisher. Once the author forks over their document, they are effectively at the mercy of an organization over which, no matter how benevolent (as the academic publishing houses most assuredly are), neither the author nor the adopter has very much influence.

The author's need for control stems from two simple forces: the constant stream of innovation, and the impossibility of predicting when that stream will bear new fruit (if you will pardon my hashed metaphor). To me, this is far from a hypothetical concern:

Is this kind of spate of innovation characteristic to textbooks? I don't know, and it doesn't matter. Something about this book has inspired people to contribute, and I want to embrace their contributions. (This suggests some kind of collaborative authoring, and indeed I have come to think of my text as a group product. I still, however, cling to an old-fashioned notion that an author provides a vision and sets a tone for a book and, however limited that tone may be, it is often superior to the random walk of a collaborative document. I am now wrestling with how to properly attribute credit to all these parties.)

These speak for returning control to the author. Publishers can help authors by keeping print-runs small, but this ultimately becomes a matter of economics and business models. I think this makes it hard for a publisher to embrace an author's passion for disseminating a really great idea. To a publisher, a new print run is not something you enter lightly. To me, Greg's ideas on teaching garbage collection were a sea-change, and I couldn't wait to get them out there; the Web let me do that.

There are many more reasons why authors benefit. In a world where users have grown accustomed to being able to experiment with software before buying it, they are increasingly going to be drawn to books that offer the same facility. A book you can sample is better than one you can't, but one you can use is better than one you sample. Of course, I can afford to take this position because I know the royalties will never do more than pay for a nice dinner, so I can forego them entirely. But an author who wanted the revenue and felt his content deserved it may be brave enough to set up a PayPal link and hope for his readers to do right by him. (I don't know statistics on how often corresponding software requests for donations are successful, but I am not sanguine.) I have generally left money out of this critique, but if we address it we should from both sides, analyzing the agency costs inherent in publishing.

But how about the consumer? How do they benefit from agility? Most obviously, they and their students enjoy the fruits of the latest pedagogic inventions; they can already do this for software, but books still lack disintermediation. More importantly, no book is usually perfect: each professor has their own model of what a course should cover, and that model is never the same from one professor to the next. This calls for a hybrid approach (inasmuch as terminology and notation can be reconciled). Indeed, while several universities appear to have adopted my book as an auxilliary text, some have adopted portions of it as primary material. But if my book were to go physical, cost and weight would matter. At present, you cannot have five (fractional) primary texts: students cannot afford it, and their backs cannot bear the load.

These arguments have primarily been directed at academic consumers. In fact, I wish academic computer science texts would make a far greater effort to accommodate the large commercial population in our discipline: we need textbooks for hackers. The best of them exhibit a drive for self-improvement that exceeds that of most of our students, but their sources of information are limited to Web sites often populated by those who have the most time, not those who know the most. But many of these users are of the electronic generation, and lack access to the signposts erected inside academia to lead readers to the best sources of information. I conjecture that a software-like dissemination model will appeal to them far more than traditional textbook publishing (I contrast texts, here, to the manual-like realm that O'Reilly has mastered).

In fact, the only obstacle to publishing new versions almost daily is its effect on the user community. A Web-based application can afford to make small changes on a regular basis, and users hardly notice. But many users prefer reading books on paper, and it is impossible to align edits so that page numbers and page breaks are not disrupted by changes (think of the student who keeps the printed content in a binder). It is, even more importantly, disruptive to teachers to have to teach against a moving target. For that reason alone, book upgrades should happen infrequently. But this principle already governs software upgrades: DrScheme, for instance, is released only at well-defined points after the ends of semesters. If anything, then, the author has a responsibility to restrain the exuberance that this model affords.

This brings me to my most provocative thesis.

I cringe to say this, as a reader and book-lover, but: We live in a plastic era, but treat books as sacredly as our medieval academic brethren did. Perhaps we have gone too far in the respect we afford. To most of our students, books are already throw-away artifacts, except they are too dear to literally dispose; but how many keep their textbooks after a semester finishes? We can wring our hands, or we can try to understand better the way they function. We do not fret if they uninstall software, so why should a textbook be less ephemeral?

I have said that a book is a collection of components. I have concrete evidence that some of my users specifically excerpt sections that suit their purpose. What excited me so much about Greg's garbage collection approach is that it helped me explicitly reduce dependencies on other chapters, i.e., to shrink that component's required interface. So this modularity is very much central to the way others and I think about the book.

I forecast that one day, rich document formats like PDF will recognize this reality and permit precisely such specifications. Then, when a user selects a group of desired chapters to generate a thinner volume, the software will automatically evaluate constraints and include all dependencies. To enable this we will even need ``program'' analyses that help us find all the dependencies, using textual concordances as a starting point and the index as an auxilliary data structure.

(As an aside, for some years now I have already wished I could do this to generate personalized, mini-travel guides. I am a carry-on traveler. I detest having to choose between the medium-sized volume that covers one city in great detail and the fat volume that covers a country slimly, when what I want is to visit three or four cities. I abhor having to lug around hotel guides to cities where I have long since made reservations. I would rather have a restaurant listing of three vegetarian restaurants than a dozen seafood ones. And I would gladly pay for the privilege. When will a travel publisher cotton on to this idea? — And if this idea does not strike you as a violence to the notion of a book, why should the analogous suggestion for textbooks upset you?)

This is not to suggest that books and publishers must go their separate ways, and never the twain should meet. My critique is, rather, based on the present model and the evolving world in which that model is forced to function. Until the publishers evolve more flexible dissemination models, they cannot properly accommodate either producer or consumer.