Also available is a superior 600dpi version in compressed postscript. This one is 24.6 Mb uncompressed (1.7Mb compressed).
If you have comments or questions about what you see, please send me some e-mail.
The thesis is also retrievable in smaller chunks to make printing easier. All of the links here are to gzipped, postscript files. Chapter 5 is by far the most printer-intensive chapter, as it is filled with screendumps. It may take quite a while to print, and it is big.
- Frontstuff (28 pages) Abstract, Acknowledgements, Contents, Preface
- Part I: The Letter Spirit Project
- Part II: Recognition of Gridletters
- Chapter 4 (73 pages) The Role Model: Letter Spirit's Examiner
- Chapter 5 (45 pages) Performance of the Role Model
- Chapter 6 (53 pages) Competing Models of Letter Recognition
- Chapter 7 (23 pages) Gridletter Perception by Humans
- Chapter 8 (36 pages) Human and Machine Performance
- Chapter 9 (12 pages) Contributions of This Research
- Endstuff (22 pages) Appendices, References
If for some reason you cannot retrieve the thesis here, please contact Helga Keller about ordering hardcopy.
The thesis is organized in two parts. Part I provides an overview of the entire Letter Spirit project --- a cognitive model of typeface design focusing on creativity. Part II presents an implemented computer model of letter recognition that will serve as Letter Spirit's letter-perception system. Though the solid core of this thesis is certainly the letter recognition work presented in Part II, Part I serves to put this work in perspective, providing a justification of our emphasis on perception in early work on Letter Spirit.
Chapter 1 provides an overview of the Letter Spirit project. After the orthogonal categories of letter (corresponding to categorical sameness) and spirit (corresponding to stylistic sameness) are introduced, the gridfont microdomain is carefully explained. Special care was taken during the development of the grid to ensure that the deep, conceptual-level issues raised by the problem of typeface design were kept intact while surface-level details were left by the wayside. The grid serves to focus attention away from aspects of letter design that tend to bog down letter-design programs in domain-specific details, such as exact curvature of lines, line thicknesses, tapering serifs, and the like. Constraining letters to a grid brings the cognitive issues to the forefront --- especially the nature of fluid concepts and the conceptual nature of style. A theory of letter concepts, based on the idea that letter concepts are made up of constituent roles, is proposed. Chapter 1 also introduces the architecture of Letter Spirit and discusses its four main ``agents'': the Imaginer, the Drafter, the Examiner, and the Abstractor. The ``central feedback loop of creativity'' is introduced, and the crucial role of perception in creative models is discussed.
The aims of Letter Spirit can be viewed both narrowly and broadly, as the program can be seen as both a typeface design system (the narrow view) and as a model of creativity (the broad view). Chapters 2 and 3 take first one, then the other of these two perspectives on the project, repspectively. Chapter 2 surveys some typeface automation programs, discussing their creative and non-creative aspects. Comparisons between Letter Spirit and several other typeface design programs serve to emphasize the fact that what we are after is more than an engineering solution to typeface design. What we are after is a model of human creativity.
Chapter 3 delves more deeply into the idea of automating the creative process. Hard questions of autonomy --- that is, of a program making its own decisions --- provide a framework for understanding models of creativity in the AI literature. We propose that the sort of autonomy that we want to capture arises from the following five attributes: a rich, highly-interconnected knowledge base; active, dynamic concepts; conceptual-level exploration; the ability to perceive and assess tentative output in order to adjust it; and, gradual convergence on a large-scale creative product through an interleaved process of suggestion and assessment. Special emphasis is placed on the idea of high-level perception, without which a program would be unable to know what it is doing. A related thread that runs through Chapter 3 involves the tradeoff between flexibility on the one hand and control on the other. Balancing these two opposing forces in a creative program turns out to be a hard problem. The parallel terraced scan, a way of implementing the emergence of perceptual structure under top-down pressure from the conceptual level, provides an elegant solution to this problem. Chapter 3 serves to justify our emphasis on perception in early implementation work on Letter Spirit. Without a sophisticated understanding of letters at the conceptual level, Letter Spirit would end up as just another ``blind'' program, relying on its human programmer/user to assess and judge its output.
Part II introduces the centerpiece of this thesis --- a computer model of letter perception based on the idea that conceptual roles are the fundamental building blocks of letter concepts. We call this model the ``role model''. The most general aim of Part II is to provide a thorough understanding of the role model.
Chapter 4 begins with a discussion of the role hypothesis, restating and refining ideas about letter concepts introduced in Chapter 1. The role model, implemented as Letter Spirit's Examiner agent, serves as a concrete test of the role hypothesis. The role model is based on an approach to cognitive modeling in which dynamic, associative, active concepts play a major role. Our role-model program not only categorizes letterforms, but more importantly segments letterforms into natural constituent parts that correspond to the conceptual roles of a letter conceptualization. Roles provide a conceptual handle on the deepest level of style in letterforms. At the deepest level, style has to do with how roles are filled and how roles are inter-related. Thus a role-based model of letter perception that can get to this deep-style information and make it available for use in the analogical design of other letterforms is a necessary first step in the implementation of a full-blown Letter Spirit.
The role model is an emergent, stochastic model in which the processing of each run is done by hundreds of micro-agents (called codelets) that are instantiations of sixteen codelet types. The asynchronous, parallel, local processing done by the codelets implements a parallel terraced scan of possible structures (as in the role model's predecessor Copycat) that is applied to our categorization task. Role provide top-down pressures on the kind of emergent perceptual structure built up by codelets. The resulting mix of top-down pressures and bottom-up structure-building makes for a surprisingly flexible model of letter perception.
The three main architectural features of the role model are: 1) a Conceptual Memory made up of a set of highly interconnected, flexible, norm-based concepts; 2) a Visual Focus serving as a subcognitive workspace in which codelet processing --- building, destroying and re-arranging perceptual structure --- occurs in parallel; and 3) a stochastic Coderack that implements (simulated) codelet parallelism by picking codelets to run non-deterministically, but in a probabilistically-biased way, according to estimates of their relevance to the current situation. The implementation of these three architectural features is the main subject of Chapter 4.
Chapter 5 concerns itself with the performance of the role model. Performance is analyzed at multiple levels, from a ``microscopic'' view of codelet-level processing on six particular runs, to a ``galactic'' view of thousands of runs considered together at once. This chapter makes no attempt to hide the weaknesses of the role model, and in fact discusses some of them in detail. The last section of Chapter 5 relates experiments in which main parameters in the role model are systematically varied.
A discussion of competing models of letter recognition makes up the contents of Chapter 6. First, a framework for understanding the widely varying approaches to modeling letter-perception is introduced. This framework posits a continuum with two opposing approaches at either end --- a ``flat'' approach based on low-level features and template matching, and a ``structured'' approach based on the use of built-in category models having both hierarchical (part/whole) and relational (between-part) structure. The role model falls at the latter end of the spectrum. By contrast, most optical character recognition (OCR) models lie at the former end, since they tend to be based on template matching. Throughout Chapter 6, both OCR models and psychological models of letter recognition are fit into our framework. Special attention is paid to two models in particular: Sanocki's model, which has ``attributes'' that in some ways are related to our thinking about roles, takes a psychological approach to letter perception; Williams and Hinton's statistically complex deformable-models system takes an engineering approach to the problem. In addition to considering models developed outside our research group, we have developed two ``in-house'' models to act as competition: Dumrec, a ``scruffy'' symbolic-AI recognizer based on weighted property-list matching, and Netrec, a three-layer backpropagation network that learns to categorize letterforms based on associative training. These two rudimentary models provide contrasting views of letter perception that we have tested against the role model using the same gridletter datasets.
Chapter 7 concerns itself with human letter perception and presents the results of two psychological experiments. The first of these explores the gridletter recognition problem. Several subjects were run on a dataset of hundreds of gridletters that included letterforms of various styles (ranging from standard to eccentric). The resulting confusion matrix is richer than those usually found in letter perception work, since acute variations in style cause more interesting and less artificial errors than do the conditions usually used in such experiments (such as very short stimulus-display times or the use of letterforms displayed far away from a subject). One interesting side issue is to what extent recognition of standard, prototypical letters differs from the recognition of eccentric, highly-stylized letters. The second psychological experiment investigates letterform parsing, probing how people break letterforms into ``natural'' parts. Together, these two experiments suggest that human letter perception (of single characters, presented alone) is strongly influenced by role-level representations.
Behavior of the role model, Dumrec, Netrec, and humans is compared and contrasted in Chapter 8. The role model shows strong and consistent correlation with our human data, especially when error-making is taken into account. Analysis of Dumrec and Netrec errors suggests that these simpler models are closely based on template matching and low-level feature extraction (two common approaches taken by most OCR programs). Theoretical error matrices developed according to these two particular representational theories --- low-level features and template matching --- confirm this view, and shed more light on the kind of processing done by Netrec and Dumrec. The role model is thus empirically shown to be a very strong model of letter recognition. Our results point out fundamental shortcomings in approaches to letter recognition that do not deal with letters as realizations of roles in the same way as people do.
In the final chapter of the thesis, the contributions of this research are brought together in one place. A summary of major results is presented, some minor problems with the role model are covered, and the future of the Letter Spirit project is discussed. As an intellectual project, Letter Spirit is quite old, the domain having been initially proposed and a computational architecture sketched out as far back as 1980, but as an implemented computer program it is in its infancy. (A skeptic might therefore say that it exists only ``in fancy''.) Right now, the program can recognize and understand many letters, but is still very far from being able to create any. We hope this will change over the next few years.