This list is part of a larger work in progress: "Hypertext Development Strategies: Indexing as a Pattern Discovery and Retrieval Tool"

PRINCIPLES OF RICH INDEXING

Purposes of an Index

Structure

See Cross References

Users hate See cross references. In addition, they are a violation of the shortest path principle for direct searching. Thus, they are to be avoided where possible and used only for the following purposes:

Deep Nesting

Deep nesting is good, so long as terms found at the second and third levels are also found at the top level. Thus, if the editing process discovers a term in a sub-entry or sub-sub-entry that also should be a main entry, it should be copied, not moved, to a separate main entry.

The advantage of deep nesting is two-fold. It:

Deep nesting, also called rich indexing, has been explicitly complemented in reviews of my indexes and has been commented on by users in communications with me as one of the things that helps them the most. It provides alternative structures to the primary structure of the book and helps in clarifying complex concepts whose components are sometimes widely separated.

In the case of multiple entries for topics on the same or neighboring pages, they help to extract the structure and relationships of the concepts.

Factoring

Scanning, and hence retrieval, is facilitated by factoring. Lexical factoring is in most cases a reasonable strategy because of the way the mind/eye expectations work. Duplication of the same word at top level is harder to scan because of the noise level produced by having to repetitively read the initial word(s) in order to find the differentiating word(s).

Factoring also concerns technical term placement - the decision as to whether technical terms should always be main entries or whether they can be sub-entries of a lexically similar term. For example, should the equal function be on a line by itself or can it be a sub-entry under equality. Again, the arguments in favor of it being a sub-entry are scanning ease and the benefits of complete conceptual grouping as a browsing, retrieving, and learning tool.

Main Entry Length

The issue of main entry length is related to the issues of user search and browsing facilitation already mentioned.

A long entry has disadvantages in going over several pages and advantages of presenting in one location all of the material related to that topic - again, users do not like to have to deal with jumping around in an index and do like being able to see all the relevant material in one place.

The disadvantage of multiple column and page length is countered by having continuation heads on both the main and sub-entry levels so that the context is preserved over the whole entry.

Main Entry Content

What about a main entry which has unrelated topics as sub-entries? The difficulty is that conceptually the entry can be misleading and certainly See Also references for the entry as a whole cannot be included at the top level, so the argument can be made that the entry should be broken out into separate main entries in which all the sub-entries are conceptually related.

Alternatively, such groupings do scan easily if the sub-entries are carefully crafted, and the See Also references can certainly be placed at the sub-entry level to further clarify the intention.

This issue really needs to be decided on a case-by-case basis, with a view to scanning, browsing, and retrieval ease.

Sub-Entry Content

Duplication of main entry terms in the sub-entries can help improve the index entry readability, even though it adds length to the sub-entries. The reason for verbose sub-entries is because they aid in understanding the topic.

Navigation

See Also Trails

See Also trails provide the semantic net over the domain of the book. They may be combined with See cross-references to provide meta-level abstractions, much as an abstract class does for an object-oriented system. Ideally, each See Also entry should be modified with a relationship qualifier, analogous to the context qualifiers in the index entries. This would broaden the scope of meaningful See Also entries to permit contrasting or tangential relationships that help to deepen the understanding of the concepts.

Abstract Class Entries

Entries, such as the meta-level abstractions, that would not normally sought by direct retrieval can be found through See Also references as well as by notes at the beginning of the index. These can provide a value-added dimension.

Entry Selection

Concept Duplication

The issue of indexing the same topic in multiple ways, which manifests in multiple synonyms and duplication of entries in different contexts, is related to retrieval and browsing ease.

The purpose of this multiplicity is to accomodate different user mind sets and strategies, and different search contexts.

In addition, inclusion of related, although not identical, topics helps the user to understand the broad context for a particular concept. Sometimes this is handled through 'See Also' entries, while at other times sub-entries are more appropriate. The context of the developing patterns determines which is best.

Overview Chapter Details

I include the early book overview references to topics treated in more detail later on because it is very helpful to a student to see a concept presented in the context of an overview. Thus, these are not trivial and irrelevant mentions, but are actually quite valuable.

Bibliographic References

Indexing bibliographic names, both in the bibliography and where they appear in the book is a useful tool for the reader because it provides additional context. Most authors request that the bibliography references be included.

Examples

Examples can be of critical importance in helping a user to understand concepts. Thus, they should be included in the index, both as an "examples" main entry and as subentries of the topics that they illustrate. They should be structurally identified by including (example) or (code) as an entry suffix.

Glossary and term descriptions

Including the qualifier "term description" in an entry alerts the reader that this is where the term is described or defined, not just mentioned or used. Simply having the page reference is ambiguous. Term descriptions can be grouped as a main entry to provide a glossary for those books that don't have one. This is particularly important in a tutorial or primer, but has value for reference works as well.

If a glossary exists, it should be indexed with a structural suffix of (glossary). Ideally, sub-entries for terms included in the glossary will also include references to locations in the document where the term is used. Qualifiers for the usage entries should clearly indicate the use context.

Entry Completeness at Top Level

All technical terms should appear at top level as well as under the topic to which they are related, because a user might encounter the entity in a document and be unable to find it in the index if it is located only in a sub-entry. This relates to the issues of direct immediate retrieval - minimal See references - and browsing completeness.

Humorous, Tangential, and Minor Reference Entries

Including names like Hamlet, which appear in the book, in a technical index is a matter of stylistic preference - many authors have liked the idea of lightening an index by such references; Guy Steele, for example, had great fun in including jokes and references to the quotes that start each chapter. Users have conveyed to me their delight in those jokes.

Names of examples or tangential references not directly related to the subject matter of the book can be fun to include - most readers enjoy a little lightness and unexpected surprises so long as they can also find what they are looking for. A rich index is used for browsing as well as for direct information retrieval, and in fact becomes a teaching tool.

Inclusion of minor references of a term that is discussed in depth in other locations is, somewhat controverial among indexers. My view is that completeness is a virtue for most users so long as the context is made clear.

For example, indexing the elements of a table as well as the topic of the table can provide additional context for users who want to understand an element in depth. Again, it can be argued that if a user is interested in the topic of the table they will look under that topic, but I would argue that a user interested in a element of that table would be focused on the element and should be able all the material relating to that element directly under its entry.

Formatting Principles

Structure Elements

Structure elements should be identified by name. For example, (example), (figure), (table), (footnote), (chapter), (glossary), (bibliography).

Continuation Heads

Continuation heads should be on every column for main and sub-entries as needed.

Main Entry Case

Main entries should be lower-cased unless a proper noun (because case carries semantics), and should be in bold.

Line Wraps

Line wraps should be consistent and deeper than the indent of the sub-sub-entry.

Letter Separators

Letter separators should divide the letter groups.