/prog/ - Writing SICP, since you never read it.

Name: Anonymous 2021-03-16 8:46

These 2011 nostalgia autists didn't expect that:

Structure and Interpretation
of Computer Programs

second edition 





Harold Abelson and Gerald Jay Sussman
with Julie Sussman 

foreword by Alan J. Perlis 







The MIT Press
Cambridge, Massachusetts     London, England

McGraw-Hill Book Company
New York     St. Louis     San Francisco     Montreal     Toronto 


 This book is one of a series of texts written by faculty of the Electrical Engineering and Computer Science Department at the Massachusetts Institute of Technology. It was edited and produced by The MIT Press under a joint production-distribution arrangement with the McGraw-Hill Book Company.

Ordering Information:

North America
Text orders should be addressed to the McGraw-Hill Book Company.
All other orders should be addressed to The MIT Press.

Outside North America
All orders should be addressed to The MIT Press or its local distributor.

© 1996 by The Massachusetts Institute of Technology

Second edition

Creative Commons License
Structure and Interpretation of Computer Programs by Harold Abelson and Gerald Jay Sussman with Julie Sussman is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License by the MIT Press.

This book was set by the authors using the LATEX typesetting system and was printed and bound in the United States of America.

Library of Congress Cataloging-in-Publication Data

Abelson, Harold
    Structure and interpretation of computer programs / Harold Abelson
and Gerald Jay Sussman, with Julie Sussman. -- 2nd ed.
    p. cm. -- (Electrical engineering and computer science
series)
    Includes bibliographical references and index.
    ISBN 0-262-01153-0 (MIT Press hardcover)
    ISBN 0-262-51087-1 (MIT Press paperback)
    ISBN 0-07-000484-6 (McGraw-Hill hardcover)
    1. Electronic digital computers -- Programming. 2. LISP (Computer
program language) I. Sussman, Gerald Jay. II. Sussman, Julie.
III. Title. IV. Series: MIT electrical engineering and computer
science series.
QA76.6.A255         1996
005.13'3 -- dc20              96-17756

Fourth printing, 1999 

 This book is dedicated, in respect and admiration, to the spirit that lives in the computer.

``I think that it's extraordinarily important that we in computer science keep fun in computing. When it started out, it was an awful lot of fun. Of course, the paying customers got shafted every now and then, and after a while we began to take their complaints seriously. We began to feel as if we really were responsible for the successful, error-free perfect use of these machines. I don't think we are. I think we're responsible for stretching them, setting them off in new directions, and keeping fun in the house. I hope the field of computer science never loses its sense of fun. Above all, I hope we don't become missionaries. Don't feel as if you're Bible salesmen. The world has too many of those already. What you know about computing other people will learn. Don't feel as if the key to successful computing is only in your hands. What's in your hands, I think and hope, is intelligence: the ability to see the machine as more than when you were first led up to it, that you can make it more.''

Alan J. Perlis (April 1, 1922-February 7, 1990)


Contents

    Foreword

    Preface to the Second Edition

    Preface to the First Edition

    Acknowledgments

    1  Building Abstractions with Procedures
        1.1  The Elements of Programming
            1.1.1  Expressions
            1.1.2  Naming and the Environment
            1.1.3  Evaluating Combinations
            1.1.4  Compound Procedures
            1.1.5  The Substitution Model for Procedure Application
            1.1.6  Conditional Expressions and Predicates
            1.1.7  Example: Square Roots by Newton's Method
            1.1.8  Procedures as Black-Box Abstractions
        1.2  Procedures and the Processes They Generate
            1.2.1  Linear Recursion and Iteration
            1.2.2  Tree Recursion
            1.2.3  Orders of Growth
            1.2.4  Exponentiation
            1.2.5  Greatest Common Divisors
            1.2.6  Example: Testing for Primality
        1.3  Formulating Abstractions with Higher-Order Procedures
            1.3.1  Procedures as Arguments
            1.3.2  Constructing Procedures Using Lambda
            1.3.3  Procedures as General Methods
            1.3.4  Procedures as Returned Values

    2  Building Abstractions with Data
        2.1  Introduction to Data Abstraction
            2.1.1  Example: Arithmetic Operations for Rational Numbers
            2.1.2  Abstraction Barriers
            2.1.3  What Is Meant by Data?
            2.1.4  Extended Exercise: Interval Arithmetic
        2.2  Hierarchical Data and the Closure Property
            2.2.1  Representing Sequences
            2.2.2  Hierarchical Structures
            2.2.3  Sequences as Conventional Interfaces
            2.2.4  Example: A Picture Language
        2.3  Symbolic Data
            2.3.1  Quotation
            2.3.2  Example: Symbolic Differentiation
            2.3.3  Example: Representing Sets
            2.3.4  Example: Huffman Encoding Trees
        2.4  Multiple Representations for Abstract Data
            2.4.1  Representations for Complex Numbers
            2.4.2  Tagged data
            2.4.3  Data-Directed Programming and Additivity
        2.5  Systems with Generic Operations
            2.5.1  Generic Arithmetic Operations
            2.5.2  Combining Data of Different Types
            2.5.3  Example: Symbolic Algebra

    3  Modularity, Objects, and State
        3.1  Assignment and Local State
            3.1.1  Local State Variables
            3.1.2  The Benefits of Introducing Assignment
            3.1.3  The Costs of Introducing Assignment
        3.2  The Environment Model of Evaluation
            3.2.1  The Rules for Evaluation
            3.2.2  Applying Simple Procedures
            3.2.3  Frames as the Repository of Local State
            3.2.4  Internal Definitions
        3.3  Modeling with Mutable Data
            3.3.1  Mutable List Structure
            3.3.2  Representing Queues
            3.3.3  Representing Tables
            3.3.4  A Simulator for Digital Circuits
            3.3.5  Propagation of Constraints
        3.4  Concurrency: Time Is of the Essence
            3.4.1  The Nature of Time in Concurrent Systems
            3.4.2  Mechanisms for Controlling Concurrency
        3.5  Streams
            3.5.1  Streams Are Delayed Lists
            3.5.2  Infinite Streams
            3.5.3  Exploiting the Stream Paradigm
            3.5.4  Streams and Delayed Evaluation
            3.5.5  Modularity of Functional Programs and Modularity of Objects

    4  Metalinguistic Abstraction
        4.1  The Metacircular Evaluator
            4.1.1  The Core of the Evaluator
            4.1.2  Representing Expressions
            4.1.3  Evaluator Data Structures
            4.1.4  Running the Evaluator as a Program
            4.1.5  Data as Programs
            4.1.6  Internal Definitions
            4.1.7  Separating Syntactic Analysis from Execution
        4.2  Variations on a Scheme -- Lazy Evaluation
            4.2.1  Normal Order and Applicative Order
            4.2.2  An Interpreter with Lazy Evaluation
            4.2.3  Streams as Lazy Lists
        4.3  Variations on a Scheme -- Nondeterministic Computing
            4.3.1  Amb and Search
            4.3.2  Examples of Nondeterministic Programs
            4.3.3  Implementing the Amb Evaluator
        4.4  Logic Programming
            4.4.1  Deductive Information Retrieval
            4.4.2  How the Query System Works
            4.4.3  Is Logic Programming Mathematical Logic?
            4.4.4  Implementing the Query System

    5  Computing with Register Machines
        5.1  Designing Register Machines
            5.1.1  A Language for Describing Register Machines
            5.1.2  Abstraction in Machine Design
            5.1.3  Subroutines
            5.1.4  Using a Stack to Implement Recursion
            5.1.5  Instruction Summary
        5.2  A Register-Machine Simulator
            5.2.1  The Machine Model
            5.2.2  The Assembler
            5.2.3  Generating Execution Procedures for Instructions
            5.2.4  Monitoring Machine Performance
        5.3  Storage Allocation and Garbage Collection
            5.3.1  Memory as Vectors
            5.3.2  Maintaining the Illusion of Infinite Memory
        5.4  The Explicit-Control Evaluator
            5.4.1  The Core of the Explicit-Control Evaluator
            5.4.2  Sequence Evaluation and Tail Recursion
            5.4.3  Conditionals, Assignments, and Definitions
            5.4.4  Running the Evaluator
        5.5  Compilation
            5.5.1  Structure of the Compiler
            5.5.2  Compiling Expressions
            5.5.3  Compiling Combinations
            5.5.4  Combining Instruction Sequences
            5.5.5  An Example of Compiled Code
            5.5.6  Lexical Addressing
            5.5.7  Interfacing Compiled Code to the Evaluator

    References

    List of Exercises

    Index

Name: Anonymous 2021-03-16 10:14

Exercise 5.10.  Design a new syntax for register-machine instructions and modify the simulator to use your new syntax. Can you implement your new syntax without changing any part of the simulator except the syntax procedures in this section?

Exercise 5.11.  When we introduced save and restore in section 5.1.4, we didn't specify what would happen if you tried to restore a register that was not the last one saved, as in the sequence

(save y)
(save x)
(restore y)

There are several reasonable possibilities for the meaning of restore:

a.  (restore y) puts into y the last value saved on the stack, regardless of what register that value came from. This is the way our simulator behaves. Show how to take advantage of this behavior to eliminate one instruction from the Fibonacci machine of section 5.1.4 (figure 5.12).

b.  (restore y) puts into y the last value saved on the stack, but only if that value was saved from y; otherwise, it signals an error. Modify the simulator to behave this way. You will have to change save to put the register name on the stack along with the value.

c.  (restore y) puts into y the last value saved from y regardless of what other registers were saved after y and not restored. Modify the simulator to behave this way. You will have to associate a separate stack with each register. You should make the initialize-stack operation initialize all the register stacks.

Exercise 5.12.  The simulator can be used to help determine the data paths required for implementing a machine with a given controller. Extend the assembler to store the following information in the machine model:

    a list of all instructions, with duplicates removed, sorted by instruction type (assign, goto, and so on);

    a list (without duplicates) of the registers used to hold entry points (these are the registers referenced by goto instructions);

    a list (without duplicates) of the registers that are saved or restored;

    for each register, a list (without duplicates) of the sources from which it is assigned (for example, the sources for register val in the factorial machine of figure 5.11 are (const 1) and ((op *) (reg n) (reg val))). 

Extend the message-passing interface to the machine to provide access to this new information. To test your analyzer, define the Fibonacci machine from figure 5.12 and examine the lists you constructed.

Exercise 5.13.  Modify the simulator so that it uses the controller sequence to determine what registers the machine has rather than requiring a list of registers as an argument to make-machine. Instead of pre-allocating the registers in make-machine, you can allocate them one at a time when they are first seen during assembly of the instructions.

5.2.4  Monitoring Machine Performance

Simulation is useful not only for verifying the correctness of a proposed machine design but also for measuring the machine's performance. For example, we can install in our simulation program a ``meter'' that measures the number of stack operations used in a computation. To do this, we modify our simulated stack to keep track of the number of times registers are saved on the stack and the maximum depth reached by the stack, and add a message to the stack's interface that prints the statistics, as shown below. We also add an operation to the basic machine model to print the stack statistics, by initializing the-ops in make-new-machine to

(list (list 'initialize-stack
            (lambda () (stack 'initialize)))
      (list 'print-stack-statistics
            (lambda () (stack 'print-statistics))))

Here is the new version of make-stack:

(define (make-stack)
  (let ((s '())
        (number-pushes 0)
        (max-depth 0)
        (current-depth 0))
    (define (push x)
      (set! s (cons x s))
      (set! number-pushes (+ 1 number-pushes))
      (set! current-depth (+ 1 current-depth))
      (set! max-depth (max current-depth max-depth)))
    (define (pop)
      (if (null? s)
          (error "Empty stack -- POP")
          (let ((top (car s)))
            (set! s (cdr s))
            (set! current-depth (- current-depth 1))
            top)))    
    (define (initialize)
      (set! s '())
      (set! number-pushes 0)
      (set! max-depth 0)
      (set! current-depth 0)
      'done)
    (define (print-statistics)
      (newline)
      (display (list 'total-pushes  '= number-pushes
                     'maximum-depth '= max-depth)))
    (define (dispatch message)
      (cond ((eq? message 'push) push)
            ((eq? message 'pop) (pop))
            ((eq? message 'initialize) (initialize))
            ((eq? message 'print-statistics)
             (print-statistics))
            (else
             (error "Unknown request -- STACK" message))))
    dispatch))

Exercises 5.15 through 5.19 describe other useful monitoring and debugging features that can be added to the register-machine simulator.

Exercise 5.14.  Measure the number of pushes and the maximum stack depth required to compute n! for various small values of n using the factorial machine shown in figure 5.11. From your data determine formulas in terms of n for the total number of push operations and the maximum stack depth used in computing n! for any n > 1. Note that each of these is a linear function of n and is thus determined by two constants. In order to get the statistics printed, you will have to augment the factorial machine with instructions to initialize the stack and print the statistics. You may want to also modify the machine so that it repeatedly reads a value for n, computes the factorial, and prints the result (as we did for the GCD machine in figure 5.4), so that you will not have to repeatedly invoke get-register-contents, set-register-contents!, and start.

Exercise 5.15.  Add instruction counting to the register machine simulation. That is, have the machine model keep track of the number of instructions executed. Extend the machine model's interface to accept a new message that prints the value of the instruction count and resets the count to zero.

Exercise 5.16.  Augment the simulator to provide for instruction tracing. That is, before each instruction is executed, the simulator should print the text of the instruction. Make the machine model accept trace-on and trace-off messages to turn tracing on and off.

Exercise 5.17.  Extend the instruction tracing of exercise 5.16 so that before printing an instruction, the simulator prints any labels that immediately precede that instruction in the controller sequence. Be careful to do this in a way that does not interfere with instruction counting (exercise 5.15). You will have to make the simulator retain the necessary label information.

Exercise 5.18.  Modify the make-register procedure of section 5.2.1 so that registers can be traced. Registers should accept messages that turn tracing on and off. When a register is traced, assigning a value to the register should print the name of the register, the old contents of the register, and the new contents being assigned. Extend the interface to the machine model to permit you to turn tracing on and off for designated machine registers.

Exercise 5.19.  Alyssa P. Hacker wants a breakpoint feature in the simulator to help her debug her machine designs. You have been hired to install this feature for her. She wants to be able to specify a place in the controller sequence where the simulator will stop and allow her to examine the state of the machine. You are to implement a procedure

(set-breakpoint <machine> <label> <n>)

that sets a breakpoint just before the nth instruction after the given label. For example,

(set-breakpoint gcd-machine 'test-b 4)

installs a breakpoint in gcd-machine just before the assignment to register a. When the simulator reaches the breakpoint it should print the label and the offset of the breakpoint and stop executing instructions. Alyssa can then use get-register-contents and set-register-contents! to manipulate the state of the simulated machine. She should then be able to continue execution by saying

(proceed-machine <machine>)

She should also be able to remove a specific breakpoint by means of

(cancel-breakpoint <machine> <label> <n>)

or to remove all breakpoints by means of

(cancel-all-breakpoints <machine>)

4 Using the receive procedure here is a way to get extract-labels to effectively return two values -- labels and insts -- without explicitly making a compound data structure to hold them. An alternative implementation, which returns an explicit pair of values, is

(define (extract-labels text)
  (if (null? text)
      (cons '() '())
      (let ((result (extract-labels (cdr text))))
        (let ((insts (car result)) (labels (cdr result)))
          (let ((next-inst (car text)))
            (if (symbol? next-inst)
                (cons insts
                      (cons (make-label-entry next-inst insts) labels))
                (cons (cons (make-instruction next-inst) insts)
                      labels)))))))

which would be called by assemble as follows:

(define (assemble controller-text machine)
  (let ((result (extract-labels controller-text)))
    (let ((insts (car result)) (labels (cdr result)))
      (update-insts! insts labels machine)
      insts)))

You can consider our use of receive as demonstrating an elegant way to return multiple values, or simply an excuse to show off a programming trick. An argument like receive that is the next procedure to be invoked is called a ``continuation.'' Recall that we also used continuations to implement the backtracking control structure in the amb evaluator in section 4.3.3.

Name: Anonymous 2021-03-16 10:15

5.3  Storage Allocation and Garbage Collection

In section 5.4, we will show how to implement a Scheme evaluator as a register machine. In order to simplify the discussion, we will assume that our register machines can be equipped with a list-structured memory, in which the basic operations for manipulating list-structured data are primitive. Postulating the existence of such a memory is a useful abstraction when one is focusing on the mechanisms of control in a Scheme interpreter, but this does not reflect a realistic view of the actual primitive data operations of contemporary computers. To obtain a more complete picture of how a Lisp system operates, we must investigate how list structure can be represented in a way that is compatible with conventional computer memories.

There are two considerations in implementing list structure. The first is purely an issue of representation: how to represent the ``box-and-pointer'' structure of Lisp pairs, using only the storage and addressing capabilities of typical computer memories. The second issue concerns the management of memory as a computation proceeds. The operation of a Lisp system depends crucially on the ability to continually create new data objects. These include objects that are explicitly created by the Lisp procedures being interpreted as well as structures created by the interpreter itself, such as environments and argument lists. Although the constant creation of new data objects would pose no problem on a computer with an infinite amount of rapidly addressable memory, computer memories are available only in finite sizes (more's the pity). Lisp systems thus provide an automatic storage allocation facility to support the illusion of an infinite memory. When a data object is no longer needed, the memory allocated to it is automatically recycled and used to construct new data objects. There are various techniques for providing such automatic storage allocation. The method we shall discuss in this section is called garbage collection.

5.3.1  Memory as Vectors

A conventional computer memory can be thought of as an array of cubbyholes, each of which can contain a piece of information. Each cubbyhole has a unique name, called its address or location. Typical memory systems provide two primitive operations: one that fetches the data stored in a specified location and one that assigns new data to a specified location. Memory addresses can be incremented to support sequential access to some set of the cubbyholes. More generally, many important data operations require that memory addresses be treated as data, which can be stored in memory locations and manipulated in machine registers. The representation of list structure is one application of such address arithmetic.

To model computer memory, we use a new kind of data structure called a vector. Abstractly, a vector is a compound data object whose individual elements can be accessed by means of an integer index in an amount of time that is independent of the index.5 In order to describe memory operations, we use two primitive Scheme procedures for manipulating vectors:

    (vector-ref <vector> <n>) returns the nth element of the vector.

    (vector-set! <vector> <n> <value>) sets the nth element of the vector to the designated value. 

For example, if v is a vector, then (vector-ref v 5) gets the fifth entry in the vector v and (vector-set! v 5 7) changes the value of the fifth entry of the vector v to 7.6 For computer memory, this access can be implemented through the use of address arithmetic to combine a base address that specifies the beginning location of a vector in memory with an index that specifies the offset of a particular element of the vector.

Representing Lisp data

We can use vectors to implement the basic pair structures required for a list-structured memory. Let us imagine that computer memory is divided into two vectors: the-cars and the-cdrs. We will represent list structure as follows: A pointer to a pair is an index into the two vectors. The car of the pair is the entry in the-cars with the designated index, and the cdr of the pair is the entry in the-cdrs with the designated index. We also need a representation for objects other than pairs (such as numbers and symbols) and a way to distinguish one kind of data from another. There are many methods of accomplishing this, but they all reduce to using typed pointers, that is, to extending the notion of ``pointer'' to include information on data type.7 The data type enables the system to distinguish a pointer to a pair (which consists of the ``pair'' data type and an index into the memory vectors) from pointers to other kinds of data (which consist of some other data type and whatever is being used to represent data of that type). Two data objects are considered to be the same (eq?) if their pointers are identical.8 Figure 5.14 illustrates the use of this method to represent the list ((1 2) 3 4), whose box-and-pointer diagram is also shown. We use letter prefixes to denote the data-type information. Thus, a pointer to the pair with index 5 is denoted p5, the empty list is denoted by the pointer e0, and a pointer to the number 4 is denoted n4. In the box-and-pointer diagram, we have indicated at the lower left of each pair the vector index that specifies where the car and cdr of the pair are stored. The blank locations in the-cars and the-cdrs may contain parts of other list structures (not of interest here).

Figure 5.14:  Box-and-pointer and memory-vector representations of the list ((1 2) 3 4).

A pointer to a number, such as n4, might consist of a type indicating numeric data together with the actual representation of the number 4.9 To deal with numbers that are too large to be represented in the fixed amount of space allocated for a single pointer, we could use a distinct bignum data type, for which the pointer designates a list in which the parts of the number are stored.10

A symbol might be represented as a typed pointer that designates a sequence of the characters that form the symbol's printed representation. This sequence is constructed by the Lisp reader when the character string is initially encountered in input. Since we want two instances of a symbol to be recognized as the ``same'' symbol by eq? and we want eq? to be a simple test for equality of pointers, we must ensure that if the reader sees the same character string twice, it will use the same pointer (to the same sequence of characters) to represent both occurrences. To accomplish this, the reader maintains a table, traditionally called the obarray, of all the symbols it has ever encountered. When the reader encounters a character string and is about to construct a symbol, it checks the obarray to see if it has ever before seen the same character string. If it has not, it uses the characters to construct a new symbol (a typed pointer to a new character sequence) and enters this pointer in the obarray. If the reader has seen the string before, it returns the symbol pointer stored in the obarray. This process of replacing character strings by unique pointers is called interning symbols.

Implementing the primitive list operations

Given the above representation scheme, we can replace each ``primitive'' list operation of a register machine with one or more primitive vector operations. We will use two registers, the-cars and the-cdrs, to identify the memory vectors, and will assume that vector-ref and vector-set! are available as primitive operations. We also assume that numeric operations on pointers (such as incrementing a pointer, using a pair pointer to index a vector, or adding two numbers) use only the index portion of the typed pointer.

For example, we can make a register machine support the instructions

(assign <reg1> (op car) (reg <reg2>))

(assign <reg1> (op cdr) (reg <reg2>))

if we implement these, respectively, as

(assign <reg1> (op vector-ref) (reg the-cars) (reg <reg2>))

(assign <reg1> (op vector-ref) (reg the-cdrs) (reg <reg2>))

The instructions

(perform (op set-car!) (reg <reg1>) (reg <reg2>))

(perform (op set-cdr!) (reg <reg1>) (reg <reg2>))

are implemented as

(perform
 (op vector-set!) (reg the-cars) (reg <reg1>) (reg <reg2>))

(perform
 (op vector-set!) (reg the-cdrs) (reg <reg1>) (reg <reg2>))

Cons is performed by allocating an unused index and storing the arguments to cons in the-cars and the-cdrs at that indexed vector position. We presume that there is a special register, free, that always holds a pair pointer containing the next available index, and that we can increment the index part of that pointer to find the next free location.11 For example, the instruction

(assign <reg1> (op cons) (reg <reg2>) (reg <reg3>))

is implemented as the following sequence of vector operations:12

(perform
 (op vector-set!) (reg the-cars) (reg free) (reg <reg2>))
(perform
 (op vector-set!) (reg the-cdrs) (reg free) (reg <reg3>))
(assign <reg1> (reg free))
(assign free (op +) (reg free) (const 1))

The eq? operation

(op eq?) (reg <reg1>) (reg <reg2>)

simply tests the equality of all fields in the registers, and predicates such as pair?, null?, symbol?, and number? need only check the type field.

Implementing stacks

Although our register machines use stacks, we need do nothing special here, since stacks can be modeled in terms of lists. The stack can be a list of the saved values, pointed to by a special register the-stack. Thus, (save <reg>) can be implemented as

(assign the-stack (op cons) (reg <reg>) (reg the-stack))

Similarly, (restore <reg>) can be implemented as

(assign <reg> (op car) (reg the-stack))
(assign the-stack (op cdr) (reg the-stack))

and (perform (op initialize-stack)) can be implemented as

(assign the-stack (const ()))

These operations can be further expanded in terms of the vector operations given above. In conventional computer architectures, however, it is usually advantageous to allocate the stack as a separate vector. Then pushing and popping the stack can be accomplished by incrementing or decrementing an index into that vector.

Exercise 5.20.  Draw the box-and-pointer representation and the memory-vector representation (as in figure 5.14) of the list structure produced by

(define x (cons 1 2))
(define y (list x x))

with the free pointer initially p1. What is the final value of free ? What pointers represent the values of x and y ?

Name: Anonymous 2021-03-16 10:15

Exercise 5.21.  Implement register machines for the following procedures. Assume that the list-structure memory operations are available as machine primitives.

a. Recursive count-leaves:

(define (count-leaves tree)
  (cond ((null? tree) 0)
        ((not (pair? tree)) 1)
        (else (+ (count-leaves (car tree))
                 (count-leaves (cdr tree))))))

b. Recursive count-leaves with explicit counter:

(define (count-leaves tree)
  (define (count-iter tree n)
    (cond ((null? tree) n)
          ((not (pair? tree)) (+ n 1))
          (else (count-iter (cdr tree)
                            (count-iter (car tree) n)))))
  (count-iter tree 0))

Exercise 5.22.  Exercise 3.12 of section 3.3.1 presented an append procedure that appends two lists to form a new list and an append! procedure that splices two lists together. Design a register machine to implement each of these procedures. Assume that the list-structure memory operations are available as primitive operations.

5.3.2  Maintaining the Illusion of Infinite Memory

The representation method outlined in section 5.3.1 solves the problem of implementing list structure, provided that we have an infinite amount of memory. With a real computer we will eventually run out of free space in which to construct new pairs.13 However, most of the pairs generated in a typical computation are used only to hold intermediate results. After these results are accessed, the pairs are no longer needed -- they are garbage. For instance, the computation

(accumulate + 0 (filter odd? (enumerate-interval 0 n)))

constructs two lists: the enumeration and the result of filtering the enumeration. When the accumulation is complete, these lists are no longer needed, and the allocated memory can be reclaimed. If we can arrange to collect all the garbage periodically, and if this turns out to recycle memory at about the same rate at which we construct new pairs, we will have preserved the illusion that there is an infinite amount of memory.

In order to recycle pairs, we must have a way to determine which allocated pairs are not needed (in the sense that their contents can no longer influence the future of the computation). The method we shall examine for accomplishing this is known as garbage collection. Garbage collection is based on the observation that, at any moment in a Lisp interpretation, the only objects that can affect the future of the computation are those that can be reached by some succession of car and cdr operations starting from the pointers that are currently in the machine registers.14 Any memory cell that is not so accessible may be recycled.

There are many ways to perform garbage collection. The method we shall examine here is called stop-and-copy. The basic idea is to divide memory into two halves: ``working memory'' and ``free memory.'' When cons constructs pairs, it allocates these in working memory. When working memory is full, we perform garbage collection by locating all the useful pairs in working memory and copying these into consecutive locations in free memory. (The useful pairs are located by tracing all the car and cdr pointers, starting with the machine registers.) Since we do not copy the garbage, there will presumably be additional free memory that we can use to allocate new pairs. In addition, nothing in the working memory is needed, since all the useful pairs in it have been copied. Thus, if we interchange the roles of working memory and free memory, we can continue processing; new pairs will be allocated in the new working memory (which was the old free memory). When this is full, we can copy the useful pairs into the new free memory (which was the old working memory).15

Implementation of a stop-and-copy garbage collector

We now use our register-machine language to describe the stop-and-copy algorithm in more detail. We will assume that there is a register called root that contains a pointer to a structure that eventually points at all accessible data. This can be arranged by storing the contents of all the machine registers in a pre-allocated list pointed at by root just before starting garbage collection.16 We also assume that, in addition to the current working memory, there is free memory available into which we can copy the useful data. The current working memory consists of vectors whose base addresses are in registers called the-cars and the-cdrs, and the free memory is in registers called new-cars and new-cdrs.

Garbage collection is triggered when we exhaust the free cells in the current working memory, that is, when a cons operation attempts to increment the free pointer beyond the end of the memory vector. When the garbage-collection process is complete, the root pointer will point into the new memory, all objects accessible from the root will have been moved to the new memory, and the free pointer will indicate the next place in the new memory where a new pair can be allocated. In addition, the roles of working memory and new memory will have been interchanged -- new pairs will be constructed in the new memory, beginning at the place indicated by free, and the (previous) working memory will be available as the new memory for the next garbage collection. Figure 5.15 shows the arrangement of memory just before and just after garbage collection.

Figure 5.15:  Reconfiguration of memory by the garbage-collection process.

The state of the garbage-collection process is controlled by maintaining two pointers: free and scan. These are initialized to point to the beginning of the new memory. The algorithm begins by relocating the pair pointed at by root to the beginning of the new memory. The pair is copied, the root pointer is adjusted to point to the new location, and the free pointer is incremented. In addition, the old location of the pair is marked to show that its contents have been moved. This marking is done as follows: In the car position, we place a special tag that signals that this is an already-moved object. (Such an object is traditionally called a broken heart.)17 In the cdr position we place a forwarding address that points at the location to which the object has been moved.

After relocating the root, the garbage collector enters its basic cycle. At each step in the algorithm, the scan pointer (initially pointing at the relocated root) points at a pair that has been moved to the new memory but whose car and cdr pointers still refer to objects in the old memory. These objects are each relocated, and the scan pointer is incremented. To relocate an object (for example, the object indicated by the car pointer of the pair we are scanning) we check to see if the object has already been moved (as indicated by the presence of a broken-heart tag in the car position of the object). If the object has not already been moved, we copy it to the place indicated by free, update free, set up a broken heart at the object's old location, and update the pointer to the object (in this example, the car pointer of the pair we are scanning) to point to the new location. If the object has already been moved, its forwarding address (found in the cdr position of the broken heart) is substituted for the pointer in the pair being scanned. Eventually, all accessible objects will have been moved and scanned, at which point the scan pointer will overtake the free pointer and the process will terminate.

We can specify the stop-and-copy algorithm as a sequence of instructions for a register machine. The basic step of relocating an object is accomplished by a subroutine called relocate-old-result-in-new. This subroutine gets its argument, a pointer to the object to be relocated, from a register named old. It relocates the designated object (incrementing free in the process), puts a pointer to the relocated object into a register called new, and returns by branching to the entry point stored in the register relocate-continue. To begin garbage collection, we invoke this subroutine to relocate the root pointer, after initializing free and scan. When the relocation of root has been accomplished, we install the new pointer as the new root and enter the main loop of the garbage collector.

begin-garbage-collection
  (assign free (const 0))
  (assign scan (const 0))
  (assign old (reg root))
  (assign relocate-continue (label reassign-root))
  (goto (label relocate-old-result-in-new))
reassign-root
  (assign root (reg new))
  (goto (label gc-loop))

In the main loop of the garbage collector we must determine whether there are any more objects to be scanned. We do this by testing whether the scan pointer is coincident with the free pointer. If the pointers are equal, then all accessible objects have been relocated, and we branch to gc-flip, which cleans things up so that we can continue the interrupted computation. If there are still pairs to be scanned, we call the relocate subroutine to relocate the car of the next pair (by placing the car pointer in old). The relocate-continue register is set up so that the subroutine will return to update the car pointer.

gc-loop
  (test (op =) (reg scan) (reg free))
  (branch (label gc-flip))
  (assign old (op vector-ref) (reg new-cars) (reg scan))
  (assign relocate-continue (label update-car))
  (goto (label relocate-old-result-in-new))

At update-car, we modify the car pointer of the pair being scanned, then proceed to relocate the cdr of the pair. We return to update-cdr when that relocation has been accomplished. After relocating and updating the cdr, we are finished scanning that pair, so we continue with the main loop.

update-car
  (perform
   (op vector-set!) (reg new-cars) (reg scan) (reg new))
  (assign old (op vector-ref) (reg new-cdrs) (reg scan))
  (assign relocate-continue (label update-cdr))
  (goto (label relocate-old-result-in-new))

update-cdr
  (perform
   (op vector-set!) (reg new-cdrs) (reg scan) (reg new))
  (assign scan (op +) (reg scan) (const 1))
  (goto (label gc-loop))

The subroutine relocate-old-result-in-new relocates objects as follows: If the object to be relocated (pointed at by old) is not a pair, then we return the same pointer to the object unchanged (in new). (For example, we may be scanning a pair whose car is the number 4. If we represent the car by n4, as described in section 5.3.1, then we want the ``relocated'' car pointer to still be n4.) Otherwise, we must perform the relocation. If the car position of the pair to be relocated contains a broken-heart tag, then the pair has in fact already been moved, so we retrieve the forwarding address (from the cdr position of the broken heart) and return this in new. If the pointer in old points at a yet-unmoved pair, then we move the pair to the first free cell in new memory (pointed at by free) and set up the broken heart by storing a broken-heart tag and forwarding address at the old location. Relocate-old-result-in-new uses a register oldcr to hold the car or the cdr of the object pointed at by old.18

relocate-old-result-in-new
  (test (op pointer-to-pair?) (reg old))
  (branch (label pair))
  (assign new (reg old))
  (goto (reg relocate-continue))
pair
  (assign oldcr (op vector-ref) (reg the-cars) (reg old))
  (test (op broken-heart?) (reg oldcr))
  (branch (label already-moved))
  (assign new (reg free)) ; new location for pair
  ;; Update free pointer.
  (assign free (op +) (reg free) (const 1))
  ;; Copy the car and cdr to new memory.
  (perform (op vector-set!)
           (reg new-cars) (reg new) (reg oldcr))
  (assign oldcr (op vector-ref) (reg the-cdrs) (reg old))
  (perform (op vector-set!)
           (reg new-cdrs) (reg new) (reg oldcr))
  ;; Construct the broken heart.
  (perform (op vector-set!)
           (reg the-cars) (reg old) (const broken-heart))
  (perform
   (op vector-set!) (reg the-cdrs) (reg old) (reg new))
  (goto (reg relocate-continue))
already-moved
  (assign new (op vector-ref) (reg the-cdrs) (reg old))
  (goto (reg relocate-continue))

At the very end of the garbage-collection process, we interchange the role of old and new memories by interchanging pointers: interchanging the-cars with new-cars, and the-cdrs with new-cdrs. We will then be ready to perform another garbage collection the next time memory runs out.

gc-flip
  (assign temp (reg the-cdrs))
  (assign the-cdrs (reg new-cdrs))
  (assign new-cdrs (reg temp))
  (assign temp (reg the-cars))
  (assign the-cars (reg new-cars))
  (assign new-cars (reg temp))

Name: Anonymous 2021-03-16 10:16

5 We could represent memory as lists of items. However, the access time would then not be independent of the index, since accessing the nth element of a list requires n - 1 cdr operations.

6 For completeness, we should specify a make-vector operation that constructs vectors. However, in the present application we will use vectors only to model fixed divisions of the computer memory.

7 This is precisely the same ``tagged data'' idea we introduced in chapter 2 for dealing with generic operations. Here, however, the data types are included at the primitive machine level rather than constructed through the use of lists.

8 Type information may be encoded in a variety of ways, depending on the details of the machine on which the Lisp system is to be implemented. The execution efficiency of Lisp programs will be strongly dependent on how cleverly this choice is made, but it is difficult to formulate general design rules for good choices. The most straightforward way to implement typed pointers is to allocate a fixed set of bits in each pointer to be a type field that encodes the data type. Important questions to be addressed in designing such a representation include the following: How many type bits are required? How large must the vector indices be? How efficiently can the primitive machine instructions be used to manipulate the type fields of pointers? Machines that include special hardware for the efficient handling of type fields are said to have tagged architectures.

9 This decision on the representation of numbers determines whether eq?, which tests equality of pointers, can be used to test for equality of numbers. If the pointer contains the number itself, then equal numbers will have the same pointer. But if the pointer contains the index of a location where the number is stored, equal numbers will be guaranteed to have equal pointers only if we are careful never to store the same number in more than one location.

10 This is just like writing a number as a sequence of digits, except that each ``digit'' is a number between 0 and the largest number that can be stored in a single pointer.

11 There are other ways of finding free storage. For example, we could link together all the unused pairs into a free list. Our free locations are consecutive (and hence can be accessed by incrementing a pointer) because we are using a compacting garbage collector, as we will see in section 5.3.2.

12 This is essentially the implementation of cons in terms of set-car! and set-cdr!, as described in section 3.3.1. The operation get-new-pair used in that implementation is realized here by the free pointer.

13 This may not be true eventually, because memories may get large enough so that it would be impossible to run out of free memory in the lifetime of the computer. For example, there are about 3× 1013, microseconds in a year, so if we were to cons once per microsecond we would need about 1015 cells of memory to build a machine that could operate for 30 years without running out of memory. That much memory seems absurdly large by today's standards, but it is not physically impossible. On the other hand, processors are getting faster and a future computer may have large numbers of processors operating in parallel on a single memory, so it may be possible to use up memory much faster than we have postulated.

14 We assume here that the stack is represented as a list as described in section 5.3.1, so that items on the stack are accessible via the pointer in the stack register.

15 This idea was invented and first implemented by Minsky, as part of the implementation of Lisp for the PDP-1 at the MIT Research Laboratory of Electronics. It was further developed by Fenichel and Yochelson (1969) for use in the Lisp implementation for the Multics time-sharing system. Later, Baker (1978) developed a ``real-time'' version of the method, which does not require the computation to stop during garbage collection. Baker's idea was extended by Hewitt, Lieberman, and Moon (see Lieberman and Hewitt 1983) to take advantage of the fact that some structure is more volatile and other structure is more permanent.

An alternative commonly used garbage-collection technique is the mark-sweep method. This consists of tracing all the structure accessible from the machine registers and marking each pair we reach. We then scan all of memory, and any location that is unmarked is ``swept up'' as garbage and made available for reuse. A full discussion of the mark-sweep method can be found in Allen 1978.

The Minsky-Fenichel-Yochelson algorithm is the dominant algorithm in use for large-memory systems because it examines only the useful part of memory. This is in contrast to mark-sweep, in which the sweep phase must check all of memory. A second advantage of stop-and-copy is that it is a compacting garbage collector. That is, at the end of the garbage-collection phase the useful data will have been moved to consecutive memory locations, with all garbage pairs compressed out. This can be an extremely important performance consideration in machines with virtual memory, in which accesses to widely separated memory addresses may require extra paging operations.

16 This list of registers does not include the registers used by the storage-allocation system -- root, the-cars, the-cdrs, and the other registers that will be introduced in this section.

17 The term broken heart was coined by David Cressey, who wrote a garbage collector for MDL, a dialect of Lisp developed at MIT during the early 1970s.

18 The garbage collector uses the low-level predicate pointer-to-pair? instead of the list-structure pair? operation because in a real system there might be various things that are treated as pairs for garbage-collection purposes. For example, in a Scheme system that conforms to the IEEE standard a procedure object may be implemented as a special kind of ``pair'' that doesn't satisfy the pair? predicate. For simulation purposes, pointer-to-pair? can be implemented as pair?.

Name: Anonymous 2021-03-16 10:16

5.4  The Explicit-Control Evaluator

In section 5.1 we saw how to transform simple Scheme programs into descriptions of register machines. We will now perform this transformation on a more complex program, the metacircular evaluator of sections 4.1.1-4.1.4, which shows how the behavior of a Scheme interpreter can be described in terms of the procedures eval and apply. The explicit-control evaluator that we develop in this section shows how the underlying procedure-calling and argument-passing mechanisms used in the evaluation process can be described in terms of operations on registers and stacks. In addition, the explicit-control evaluator can serve as an implementation of a Scheme interpreter, written in a language that is very similar to the native machine language of conventional computers. The evaluator can be executed by the register-machine simulator of section 5.2. Alternatively, it can be used as a starting point for building a machine-language implementation of a Scheme evaluator, or even a special-purpose machine for evaluating Scheme expressions. Figure 5.16 shows such a hardware implementation: a silicon chip that acts as an evaluator for Scheme. The chip designers started with the data-path and controller specifications for a register machine similar to the evaluator described in this section and used design automation programs to construct the integrated-circuit layout.19

Registers and operations

In designing the explicit-control evaluator, we must specify the operations to be used in our register machine. We described the metacircular evaluator in terms of abstract syntax, using procedures such as quoted? and make-procedure. In implementing the register machine, we could expand these procedures into sequences of elementary list-structure memory operations, and implement these operations on our register machine. However, this would make our evaluator very long, obscuring the basic structure with details. To clarify the presentation, we will include as primitive operations of the register machine the syntax procedures given in section 4.1.2 and the procedures for representing environments and other run-time data given in sections 4.1.3 and 4.1.4. In order to completely specify an evaluator that could be programmed in a low-level machine language or implemented in hardware, we would replace these operations by more elementary operations, using the list-structure implementation we described in section 5.3.

Figure 5.16:  A silicon-chip implementation of an evaluator for Scheme.

Our Scheme evaluator register machine includes a stack and seven registers: exp, env, val, continue, proc, argl, and unev. Exp is used to hold the expression to be evaluated, and env contains the environment in which the evaluation is to be performed. At the end of an evaluation, val contains the value obtained by evaluating the expression in the designated environment. The continue register is used to implement recursion, as explained in section 5.1.4. (The evaluator needs to call itself recursively, since evaluating an expression requires evaluating its subexpressions.) The registers proc, argl, and unev are used in evaluating combinations.

We will not provide a data-path diagram to show how the registers and operations of the evaluator are connected, nor will we give the complete list of machine operations. These are implicit in the evaluator's controller, which will be presented in detail.

5.4.1  The Core of the Explicit-Control Evaluator

The central element in the evaluator is the sequence of instructions beginning at eval-dispatch. This corresponds to the eval procedure of the metacircular evaluator described in section 4.1.1. When the controller starts at eval-dispatch, it evaluates the expression specified by exp in the environment specified by env. When evaluation is complete, the controller will go to the entry point stored in continue, and the val register will hold the value of the expression. As with the metacircular eval, the structure of eval-dispatch is a case analysis on the syntactic type of the expression to be evaluated.20

eval-dispatch
  (test (op self-evaluating?) (reg exp))
  (branch (label ev-self-eval))
  (test (op variable?) (reg exp))
  (branch (label ev-variable))
  (test (op quoted?) (reg exp))
  (branch (label ev-quoted))
  (test (op assignment?) (reg exp))
  (branch (label ev-assignment))
  (test (op definition?) (reg exp))
  (branch (label ev-definition))
  (test (op if?) (reg exp))
  (branch (label ev-if))
  (test (op lambda?) (reg exp))
  (branch (label ev-lambda))
  (test (op begin?) (reg exp))
  (branch (label ev-begin))
  (test (op application?) (reg exp))
  (branch (label ev-application))
  (goto (label unknown-expression-type))

Evaluating simple expressions

Numbers and strings (which are self-evaluating), variables, quotations, and lambda expressions have no subexpressions to be evaluated. For these, the evaluator simply places the correct value in the val register and continues execution at the entry point specified by continue. Evaluation of simple expressions is performed by the following controller code:

ev-self-eval
  (assign val (reg exp))
  (goto (reg continue))
ev-variable
  (assign val (op lookup-variable-value) (reg exp) (reg env))
  (goto (reg continue))
ev-quoted
  (assign val (op text-of-quotation) (reg exp))
  (goto (reg continue))
ev-lambda
  (assign unev (op lambda-parameters) (reg exp))
  (assign exp (op lambda-body) (reg exp))
  (assign val (op make-procedure)
              (reg unev) (reg exp) (reg env))
  (goto (reg continue))

Observe how ev-lambda uses the unev and exp registers to hold the parameters and body of the lambda expression so that they can be passed to the make-procedure operation, along with the environment in env.

Evaluating procedure applications

A procedure application is specified by a combination containing an operator and operands. The operator is a subexpression whose value is a procedure, and the operands are subexpressions whose values are the arguments to which the procedure should be applied. The metacircular eval handles applications by calling itself recursively to evaluate each element of the combination, and then passing the results to apply, which performs the actual procedure application. The explicit-control evaluator does the same thing; these recursive calls are implemented by goto instructions, together with use of the stack to save registers that will be restored after the recursive call returns. Before each call we will be careful to identify which registers must be saved (because their values will be needed later).21

We begin the evaluation of an application by evaluating the operator to produce a procedure, which will later be applied to the evaluated operands. To evaluate the operator, we move it to the exp register and go to eval-dispatch. The environment in the env register is already the correct one in which to evaluate the operator. However, we save env because we will need it later to evaluate the operands. We also extract the operands into unev and save this on the stack. We set up continue so that eval-dispatch will resume at ev-appl-did-operator after the operator has been evaluated. First, however, we save the old value of continue, which tells the controller where to continue after the application.

ev-application
  (save continue)
  (save env)
  (assign unev (op operands) (reg exp))
  (save unev)
  (assign exp (op operator) (reg exp))
  (assign continue (label ev-appl-did-operator))
  (goto (label eval-dispatch))

Upon returning from evaluating the operator subexpression, we proceed to evaluate the operands of the combination and to accumulate the resulting arguments in a list, held in argl. First we restore the unevaluated operands and the environment. We initialize argl to an empty list. Then we assign to the proc register the procedure that was produced by evaluating the operator. If there are no operands, we go directly to apply-dispatch. Otherwise we save proc on the stack and start the argument-evaluation loop:22

ev-appl-did-operator
  (restore unev)                  ; the operands
  (restore env)
  (assign argl (op empty-arglist))
  (assign proc (reg val))         ; the operator
  (test (op no-operands?) (reg unev))
  (branch (label apply-dispatch))
  (save proc)

Each cycle of the argument-evaluation loop evaluates an operand from the list in unev and accumulates the result into argl. To evaluate an operand, we place it in the exp register and go to eval-dispatch, after setting continue so that execution will resume with the argument-accumulation phase. But first we save the arguments accumulated so far (held in argl), the environment (held in env), and the remaining operands to be evaluated (held in unev). A special case is made for the evaluation of the last operand, which is handled at ev-appl-last-arg.

ev-appl-operand-loop
  (save argl)
  (assign exp (op first-operand) (reg unev))
  (test (op last-operand?) (reg unev))
  (branch (label ev-appl-last-arg))
  (save env)
  (save unev)
  (assign continue (label ev-appl-accumulate-arg))
  (goto (label eval-dispatch))

When an operand has been evaluated, the value is accumulated into the list held in argl. The operand is then removed from the list of unevaluated operands in unev, and the argument-evaluation continues.

ev-appl-accumulate-arg
  (restore unev)
  (restore env)
  (restore argl)
  (assign argl (op adjoin-arg) (reg val) (reg argl))
  (assign unev (op rest-operands) (reg unev))
  (goto (label ev-appl-operand-loop))

Evaluation of the last argument is handled differently. There is no need to save the environment or the list of unevaluated operands before going to eval-dispatch, since they will not be required after the last operand is evaluated. Thus, we return from the evaluation to a special entry point ev-appl-accum-last-arg, which restores the argument list, accumulates the new argument, restores the saved procedure, and goes off to perform the application.23

ev-appl-last-arg
  (assign continue (label ev-appl-accum-last-arg))
  (goto (label eval-dispatch))
ev-appl-accum-last-arg
  (restore argl)
  (assign argl (op adjoin-arg) (reg val) (reg argl))
  (restore proc)
  (goto (label apply-dispatch))

The details of the argument-evaluation loop determine the order in which the interpreter evaluates the operands of a combination (e.g., left to right or right to left -- see exercise 3.8). This order is not determined by the metacircular evaluator, which inherits its control structure from the underlying Scheme in which it is implemented.24 Because the first-operand selector (used in ev-appl-operand-loop to extract successive operands from unev) is implemented as car and the rest-operands selector is implemented as cdr, the explicit-control evaluator will evaluate the operands of a combination in left-to-right order.

Name: Anonymous 2021-03-16 10:17

Procedure application

The entry point apply-dispatch corresponds to the apply procedure of the metacircular evaluator. By the time we get to apply-dispatch, the proc register contains the procedure to apply and argl contains the list of evaluated arguments to which it must be applied. The saved value of continue (originally passed to eval-dispatch and saved at ev-application), which tells where to return with the result of the procedure application, is on the stack. When the application is complete, the controller transfers to the entry point specified by the saved continue, with the result of the application in val. As with the metacircular apply, there are two cases to consider. Either the procedure to be applied is a primitive or it is a compound procedure.

apply-dispatch
  (test (op primitive-procedure?) (reg proc))
  (branch (label primitive-apply))
  (test (op compound-procedure?) (reg proc))  
  (branch (label compound-apply))
  (goto (label unknown-procedure-type))

We assume that each primitive is implemented so as to obtain its arguments from argl and place its result in val. To specify how the machine handles primitives, we would have to provide a sequence of controller instructions to implement each primitive and arrange for primitive-apply to dispatch to the instructions for the primitive identified by the contents of proc. Since we are interested in the structure of the evaluation process rather than the details of the primitives, we will instead just use an apply-primitive-procedure operation that applies the procedure in proc to the arguments in argl. For the purpose of simulating the evaluator with the simulator of section 5.2 we use the procedure apply-primitive-procedure, which calls on the underlying Scheme system to perform the application, just as we did for the metacircular evaluator in section 4.1.4. After computing the value of the primitive application, we restore continue and go to the designated entry point.

primitive-apply
  (assign val (op apply-primitive-procedure)
              (reg proc)
              (reg argl))
  (restore continue)
  (goto (reg continue))

To apply a compound procedure, we proceed just as with the metacircular evaluator. We construct a frame that binds the procedure's parameters to the arguments, use this frame to extend the environment carried by the procedure, and evaluate in this extended environment the sequence of expressions that forms the body of the procedure. Ev-sequence, described below in section 5.4.2, handles the evaluation of the sequence.

compound-apply
  (assign unev (op procedure-parameters) (reg proc))
  (assign env (op procedure-environment) (reg proc))
  (assign env (op extend-environment)
              (reg unev) (reg argl) (reg env))
  (assign unev (op procedure-body) (reg proc))
  (goto (label ev-sequence))

Compound-apply is the only place in the interpreter where the env register is ever assigned a new value. Just as in the metacircular evaluator, the new environment is constructed from the environment carried by the procedure, together with the argument list and the corresponding list of variables to be bound.

5.4.2  Sequence Evaluation and Tail Recursion

The portion of the explicit-control evaluator at ev-sequence is analogous to the metacircular evaluator's eval-sequence procedure. It handles sequences of expressions in procedure bodies or in explicit begin expressions.

Explicit begin expressions are evaluated by placing the sequence of expressions to be evaluated in unev, saving continue on the stack, and jumping to ev-sequence.

ev-begin
  (assign unev (op begin-actions) (reg exp))
  (save continue)
  (goto (label ev-sequence))

The implicit sequences in procedure bodies are handled by jumping to ev-sequence from compound-apply, at which point continue is already on the stack, having been saved at ev-application.

The entries at ev-sequence and ev-sequence-continue form a loop that successively evaluates each expression in a sequence. The list of unevaluated expressions is kept in unev. Before evaluating each expression, we check to see if there are additional expressions to be evaluated in the sequence. If so, we save the rest of the unevaluated expressions (held in unev) and the environment in which these must be evaluated (held in env) and call eval-dispatch to evaluate the expression. The two saved registers are restored upon the return from this evaluation, at ev-sequence-continue.

The final expression in the sequence is handled differently, at the entry point ev-sequence-last-exp. Since there are no more expressions to be evaluated after this one, we need not save unev or env before going to eval-dispatch. The value of the whole sequence is the value of the last expression, so after the evaluation of the last expression there is nothing left to do except continue at the entry point currently held on the stack (which was saved by ev-application or ev-begin.) Rather than setting up continue to arrange for eval-dispatch to return here and then restoring continue from the stack and continuing at that entry point, we restore continue from the stack before going to eval-dispatch, so that eval-dispatch will continue at that entry point after evaluating the expression.

ev-sequence
  (assign exp (op first-exp) (reg unev))
  (test (op last-exp?) (reg unev))
  (branch (label ev-sequence-last-exp))
  (save unev)
  (save env)
  (assign continue (label ev-sequence-continue))
  (goto (label eval-dispatch))
ev-sequence-continue
  (restore env)
  (restore unev)
  (assign unev (op rest-exps) (reg unev))
  (goto (label ev-sequence))
ev-sequence-last-exp
  (restore continue)
  (goto (label eval-dispatch))

Tail recursion

In chapter 1 we said that the process described by a procedure such as

(define (sqrt-iter guess x)
  (if (good-enough? guess x)
      guess
      (sqrt-iter (improve guess x)
                 x)))

is an iterative process. Even though the procedure is syntactically recursive (defined in terms of itself), it is not logically necessary for an evaluator to save information in passing from one call to sqrt-iter to the next.25 An evaluator that can execute a procedure such as sqrt-iter without requiring increasing storage as the procedure continues to call itself is called a tail-recursive evaluator. The metacircular implementation of the evaluator in chapter 4 does not specify whether the evaluator is tail-recursive, because that evaluator inherits its mechanism for saving state from the underlying Scheme. With the explicit-control evaluator, however, we can trace through the evaluation process to see when procedure calls cause a net accumulation of information on the stack.

Our evaluator is tail-recursive, because in order to evaluate the final expression of a sequence we transfer directly to eval-dispatch without saving any information on the stack. Hence, evaluating the final expression in a sequence -- even if it is a procedure call (as in sqrt-iter, where the if expression, which is the last expression in the procedure body, reduces to a call to sqrt-iter) -- will not cause any information to be accumulated on the stack.26

If we did not think to take advantage of the fact that it was unnecessary to save information in this case, we might have implemented eval-sequence by treating all the expressions in a sequence in the same way -- saving the registers, evaluating the expression, returning to restore the registers, and repeating this until all the expressions have been evaluated:27

ev-sequence
  (test (op no-more-exps?) (reg unev))
  (branch (label ev-sequence-end))
  (assign exp (op first-exp) (reg unev))
  (save unev)
  (save env)
  (assign continue (label ev-sequence-continue))
  (goto (label eval-dispatch))
ev-sequence-continue
  (restore env)
  (restore unev)
  (assign unev (op rest-exps) (reg unev))
  (goto (label ev-sequence))
ev-sequence-end
  (restore continue)
  (goto (reg continue))

This may seem like a minor change to our previous code for evaluation of a sequence: The only difference is that we go through the save-restore cycle for the last expression in a sequence as well as for the others. The interpreter will still give the same value for any expression. But this change is fatal to the tail-recursive implementation, because we must now return after evaluating the final expression in a sequence in order to undo the (useless) register saves. These extra saves will accumulate during a nest of procedure calls. Consequently, processes such as sqrt-iter will require space proportional to the number of iterations rather than requiring constant space. This difference can be significant. For example, with tail recursion, an infinite loop can be expressed using only the procedure-call mechanism:

(define (count n)
  (newline)
  (display n)
  (count (+ n 1)))

Without tail recursion, such a procedure would eventually run out of stack space, and expressing a true iteration would require some control mechanism other than procedure call.

Name: Anonymous 2021-03-16 10:17

5.4.3  Conditionals, Assignments, and Definitions

As with the metacircular evaluator, special forms are handled by selectively evaluating fragments of the expression. For an if expression, we must evaluate the predicate and decide, based on the value of predicate, whether to evaluate the consequent or the alternative.

Before evaluating the predicate, we save the if expression itself so that we can later extract the consequent or alternative. We also save the environment, which we will need later in order to evaluate the consequent or the alternative, and we save continue, which we will need later in order to return to the evaluation of the expression that is waiting for the value of the if.

ev-if
  (save exp)                    ; save expression for later
  (save env)
  (save continue)
  (assign continue (label ev-if-decide))
  (assign exp (op if-predicate) (reg exp))
  (goto (label eval-dispatch))  ; evaluate the predicate

When we return from evaluating the predicate, we test whether it was true or false and, depending on the result, place either the consequent or the alternative in exp before going to eval-dispatch. Notice that restoring env and continue here sets up eval-dispatch to have the correct environment and to continue at the right place to receive the value of the if expression.

ev-if-decide
  (restore continue)
  (restore env)
  (restore exp)
  (test (op true?) (reg val))
  (branch (label ev-if-consequent))

ev-if-alternative
  (assign exp (op if-alternative) (reg exp))
  (goto (label eval-dispatch))
ev-if-consequent
  (assign exp (op if-consequent) (reg exp))
  (goto (label eval-dispatch))

Assignments and definitions

Assignments are handled by ev-assignment, which is reached from eval-dispatch with the assignment expression in exp. The code at ev-assignment first evaluates the value part of the expression and then installs the new value in the environment. Set-variable-value! is assumed to be available as a machine operation.

ev-assignment
  (assign unev (op assignment-variable) (reg exp))
  (save unev)                   ; save variable for later
  (assign exp (op assignment-value) (reg exp))
  (save env)
  (save continue)
  (assign continue (label ev-assignment-1))
  (goto (label eval-dispatch))  ; evaluate the assignment value
ev-assignment-1
  (restore continue)
  (restore env)
  (restore unev)
  (perform
   (op set-variable-value!) (reg unev) (reg val) (reg env))
  (assign val (const ok))
  (goto (reg continue))

Definitions are handled in a similar way:

ev-definition
  (assign unev (op definition-variable) (reg exp))
  (save unev)                   ; save variable for later
  (assign exp (op definition-value) (reg exp))
  (save env)
  (save continue)
  (assign continue (label ev-definition-1))
  (goto (label eval-dispatch))  ; evaluate the definition value
ev-definition-1
  (restore continue)
  (restore env)
  (restore unev)
  (perform
   (op define-variable!) (reg unev) (reg val) (reg env))
  (assign val (const ok))
  (goto (reg continue))

Exercise 5.23.  Extend the evaluator to handle derived expressions such as cond, let, and so on (section 4.1.2). You may ``cheat'' and assume that the syntax transformers such as cond->if are available as machine operations.28

Exercise 5.24.  Implement cond as a new basic special form without reducing it to if. You will have to construct a loop that tests the predicates of successive cond clauses until you find one that is true, and then use ev-sequence to evaluate the actions of the clause.

Exercise 5.25.  Modify the evaluator so that it uses normal-order evaluation, based on the lazy evaluator of section 4.2.

5.4.4  Running the Evaluator

With the implementation of the explicit-control evaluator we come to the end of a development, begun in chapter 1, in which we have explored successively more precise models of the evaluation process. We started with the relatively informal substitution model, then extended this in chapter 3 to the environment model, which enabled us to deal with state and change. In the metacircular evaluator of chapter 4, we used Scheme itself as a language for making more explicit the environment structure constructed during evaluation of an expression. Now, with register machines, we have taken a close look at the evaluator's mechanisms for storage management, argument passing, and control. At each new level of description, we have had to raise issues and resolve ambiguities that were not apparent at the previous, less precise treatment of evaluation. To understand the behavior of the explicit-control evaluator, we can simulate it and monitor its performance.

We will install a driver loop in our evaluator machine. This plays the role of the driver-loop procedure of section 4.1.4. The evaluator will repeatedly print a prompt, read an expression, evaluate the expression by going to eval-dispatch, and print the result. The following instructions form the beginning of the explicit-control evaluator's controller sequence:29

read-eval-print-loop
  (perform (op initialize-stack))
  (perform
   (op prompt-for-input) (const ";;; EC-Eval input:"))
  (assign exp (op read))
  (assign env (op get-global-environment))
  (assign continue (label print-result))
  (goto (label eval-dispatch))
print-result
  (perform
   (op announce-output) (const ";;; EC-Eval value:"))
  (perform (op user-print) (reg val))
  (goto (label read-eval-print-loop))

When we encounter an error in a procedure (such as the ``unknown procedure type error'' indicated at apply-dispatch), we print an error message and return to the driver loop.30

unknown-expression-type
  (assign val (const unknown-expression-type-error))
  (goto (label signal-error))
unknown-procedure-type
  (restore continue)    ; clean up stack (from apply-dispatch)
  (assign val (const unknown-procedure-type-error))
  (goto (label signal-error))
signal-error
  (perform (op user-print) (reg val))
  (goto (label read-eval-print-loop))

For the purposes of the simulation, we initialize the stack each time through the driver loop, since it might not be empty after an error (such as an undefined variable) interrupts an evaluation.31

If we combine all the code fragments presented in sections 5.4.1-5.4.4, we can create an evaluator machine model that we can run using the register-machine simulator of section 5.2.

(define eceval
  (make-machine
   '(exp env val proc argl continue unev)
   eceval-operations
  '(
    read-eval-print-loop
      <entire machine controller as given above>
   )))

We must define Scheme procedures to simulate the operations used as primitives by the evaluator. These are the same procedures we used for the metacircular evaluator in section 4.1, together with the few additional ones defined in footnotes throughout section 5.4.

(define eceval-operations
  (list (list 'self-evaluating? self-evaluating)
        <complete list of operations for eceval machine>))

Finally, we can initialize the global environment and run the evaluator:

(define the-global-environment (setup-environment))

(start eceval)
;;; EC-Eval input:
(define (append x y)
  (if (null? x)
      y
      (cons (car x)
            (append (cdr x) y))))
;;; EC-Eval value:
ok
;;; EC-Eval input:
(append '(a b c) '(d e f))
;;; EC-Eval value:
(a b c d e f)

Of course, evaluating expressions in this way will take much longer than if we had directly typed them into Scheme, because of the multiple levels of simulation involved. Our expressions are evaluated by the explicit-control-evaluator machine, which is being simulated by a Scheme program, which is itself being evaluated by the Scheme interpreter.

Name: Anonymous 2021-03-16 10:17

Monitoring the performance of the evaluator

Simulation can be a powerful tool to guide the implementation of evaluators. Simulations make it easy not only to explore variations of the register-machine design but also to monitor the performance of the simulated evaluator. For example, one important factor in performance is how efficiently the evaluator uses the stack. We can observe the number of stack operations required to evaluate various expressions by defining the evaluator register machine with the version of the simulator that collects statistics on stack use (section 5.2.4), and adding an instruction at the evaluator's print-result entry point to print the statistics:

print-result
  (perform (op print-stack-statistics)); added instruction
  (perform
   (op announce-output) (const ";;; EC-Eval value:"))
  ... ; same as before

Interactions with the evaluator now look like this:

;;; EC-Eval input:
(define (factorial n)
  (if (= n 1)
      1
      (* (factorial (- n 1)) n)))
(total-pushes = 3 maximum-depth = 3)
;;; EC-Eval value:
ok
;;; EC-Eval input:
(factorial 5)
(total-pushes = 144 maximum-depth = 28)
;;; EC-Eval value:
120

Note that the driver loop of the evaluator reinitializes the stack at the start of each interaction, so that the statistics printed will refer only to stack operations used to evaluate the previous expression.

Exercise 5.26.  Use the monitored stack to explore the tail-recursive property of the evaluator (section 5.4.2). Start the evaluator and define the iterative factorial procedure from section 1.2.1:

(define (factorial n)
  (define (iter product counter)
    (if (> counter n)
        product
        (iter (* counter product)
              (+ counter 1))))
  (iter 1 1))

Run the procedure with some small values of n. Record the maximum stack depth and the number of pushes required to compute n! for each of these values.

a. You will find that the maximum depth required to evaluate n! is independent of n. What is that depth?

b. Determine from your data a formula in terms of n for the total number of push operations used in evaluating n! for any n > 1. Note that the number of operations used is a linear function of n and is thus determined by two constants.

Exercise 5.27.  For comparison with exercise 5.26, explore the behavior of the following procedure for computing factorials recursively:

(define (factorial n)
  (if (= n 1)
      1
      (* (factorial (- n 1)) n)))

By running this procedure with the monitored stack, determine, as a function of n, the maximum depth of the stack and the total number of pushes used in evaluating n! for n > 1. (Again, these functions will be linear.) Summarize your experiments by filling in the following table with the appropriate expressions in terms of n:

	Maximum depth 	Number of pushes
Recursive 		
factorial 		
Iterative 		
factorial 		

The maximum depth is a measure of the amount of space used by the evaluator in carrying out the computation, and the number of pushes correlates well with the time required.

Exercise 5.28.  Modify the definition of the evaluator by changing eval-sequence as described in section 5.4.2 so that the evaluator is no longer tail-recursive. Rerun your experiments from exercises 5.26 and 5.27 to demonstrate that both versions of the factorial procedure now require space that grows linearly with their input.

Exercise 5.29.  Monitor the stack operations in the tree-recursive Fibonacci computation:

(define (fib n)
  (if (< n 2)
      n
      (+ (fib (- n 1)) (fib (- n 2)))))

a. Give a formula in terms of n for the maximum depth of the stack required to compute Fib(n) for n > 2. Hint: In section 1.2.2 we argued that the space used by this process grows linearly with n.

b. Give a formula for the total number of pushes used to compute Fib(n) for n > 2. You should find that the number of pushes (which correlates well with the time used) grows exponentially with n. Hint: Let S(n) be the number of pushes used in computing Fib(n). You should be able to argue that there is a formula that expresses S(n) in terms of S(n - 1), S(n - 2), and some fixed ``overhead'' constant k that is independent of n. Give the formula, and say what k is. Then show that S(n) can be expressed as a Fib(n + 1) + b and give the values of a and b.

Exercise 5.30.  Our evaluator currently catches and signals only two kinds of errors -- unknown expression types and unknown procedure types. Other errors will take us out of the evaluator read-eval-print loop. When we run the evaluator using the register-machine simulator, these errors are caught by the underlying Scheme system. This is analogous to the computer crashing when a user program makes an error.32 It is a large project to make a real error system work, but it is well worth the effort to understand what is involved here.

a. Errors that occur in the evaluation process, such as an attempt to access an unbound variable, could be caught by changing the lookup operation to make it return a distinguished condition code, which cannot be a possible value of any user variable. The evaluator can test for this condition code and then do what is necessary to go to signal-error. Find all of the places in the evaluator where such a change is necessary and fix them. This is lots of work.

b. Much worse is the problem of handling errors that are signaled by applying primitive procedures, such as an attempt to divide by zero or an attempt to extract the car of a symbol. In a professionally written high-quality system, each primitive application is checked for safety as part of the primitive. For example, every call to car could first check that the argument is a pair. If the argument is not a pair, the application would return a distinguished condition code to the evaluator, which would then report the failure. We could arrange for this in our register-machine simulator by making each primitive procedure check for applicability and returning an appropriate distinguished condition code on failure. Then the primitive-apply code in the evaluator can check for the condition code and go to signal-error if necessary. Build this structure and make it work. This is a major project.

19 See Batali et al. 1982 for more information on the chip and the method by which it was designed.

20 In our controller, the dispatch is written as a sequence of test and branch instructions. Alternatively, it could have been written in a data-directed style (and in a real system it probably would have been) to avoid the need to perform sequential tests and to facilitate the definition of new expression types. A machine designed to run Lisp would probably include a dispatch-on-type instruction that would efficiently execute such data-directed dispatches.

21 This is an important but subtle point in translating algorithms from a procedural language, such as Lisp, to a register-machine language. As an alternative to saving only what is needed, we could save all the registers (except val) before each recursive call. This is called a framed-stack discipline. This would work but might save more registers than necessary; this could be an important consideration in a system where stack operations are expensive. Saving registers whose contents will not be needed later may also hold onto useless data that could otherwise be garbage-collected, freeing space to be reused.

22 We add to the evaluator data-structure procedures in section 4.1.3 the following two procedures for manipulating argument lists:

(define (empty-arglist) '())

(define (adjoin-arg arg arglist)
  (append arglist (list arg)))

We also use an additional syntax procedure to test for the last operand in a combination:

(define (last-operand? ops)
  (null? (cdr ops)))

23 The optimization of treating the last operand specially is known as evlis tail recursion (see Wand 1980). We could be somewhat more efficient in the argument evaluation loop if we made evaluation of the first operand a special case too. This would permit us to postpone initializing argl until after evaluating the first operand, so as to avoid saving argl in this case. The compiler in section 5.5 performs this optimization. (Compare the construct-arglist procedure of section 5.5.3.)

24 The order of operand evaluation in the metacircular evaluator is determined by the order of evaluation of the arguments to cons in the procedure list-of-values of section 4.1.1 (see exercise 4.1).

25 We saw in section 5.1 how to implement such a process with a register machine that had no stack; the state of the process was stored in a fixed set of registers.

26 This implementation of tail recursion in ev-sequence is one variety of a well-known optimization technique used by many compilers. In compiling a procedure that ends with a procedure call, one can replace the call by a jump to the called procedure's entry point. Building this strategy into the interpreter, as we have done in this section, provides the optimization uniformly throughout the language.

27 We can define no-more-exps? as follows:

(define (no-more-exps? seq) (null? seq))

28 This isn't really cheating. In an actual implementation built from scratch, we would use our explicit-control evaluator to interpret a Scheme program that performs source-level transformations like cond->if in a syntax phase that runs before execution.

29 We assume here that read and the various printing operations are available as primitive machine operations, which is useful for our simulation, but completely unrealistic in practice. These are actually extremely complex operations. In practice, they would be implemented using low-level input-output operations such as transferring single characters to and from a device.

To support the get-global-environment operation we define

(define the-global-environment (setup-environment))

(define (get-global-environment)
  the-global-environment)

30 There are other errors that we would like the interpreter to handle, but these are not so simple. See exercise 5.30.

31 We could perform the stack initialization only after errors, but doing it in the driver loop will be convenient for monitoring the evaluator's performance, as described below.

32 Regrettably, this is the normal state of affairs in conventional compiler-based language systems such as C. In UNIX TM the system ``dumps core,'' and in DOS/Windows TM it becomes catatonic. The Macintosh TM displays a picture of an exploding bomb and offers you the opportunity to reboot the computer -- if you're lucky.

Name: Anonymous 2021-03-16 10:18

5.5  Compilation

The explicit-control evaluator of section 5.4 is a register machine whose controller interprets Scheme programs. In this section we will see how to run Scheme programs on a register machine whose controller is not a Scheme interpreter.

The explicit-control evaluator machine is universal -- it can carry out any computational process that can be described in Scheme. The evaluator's controller orchestrates the use of its data paths to perform the desired computation. Thus, the evaluator's data paths are universal: They are sufficient to perform any computation we desire, given an appropriate controller.33

Commercial general-purpose computers are register machines organized around a collection of registers and operations that constitute an efficient and convenient universal set of data paths. The controller for a general-purpose machine is an interpreter for a register-machine language like the one we have been using. This language is called the native language of the machine, or simply machine language. Programs written in machine language are sequences of instructions that use the machine's data paths. For example, the explicit-control evaluator's instruction sequence can be thought of as a machine-language program for a general-purpose computer rather than as the controller for a specialized interpreter machine.

There are two common strategies for bridging the gap between higher-level languages and register-machine languages. The explicit-control evaluator illustrates the strategy of interpretation. An interpreter written in the native language of a machine configures the machine to execute programs written in a language (called the source language) that may differ from the native language of the machine performing the evaluation. The primitive procedures of the source language are implemented as a library of subroutines written in the native language of the given machine. A program to be interpreted (called the source program) is represented as a data structure. The interpreter traverses this data structure, analyzing the source program. As it does so, it simulates the intended behavior of the source program by calling appropriate primitive subroutines from the library.

In this section, we explore the alternative strategy of compilation. A compiler for a given source language and machine translates a source program into an equivalent program (called the object program) written in the machine's native language. The compiler that we implement in this section translates programs written in Scheme into sequences of instructions to be executed using the explicit-control evaluator machine's data paths.34

Compared with interpretation, compilation can provide a great increase in the efficiency of program execution, as we will explain below in the overview of the compiler. On the other hand, an interpreter provides a more powerful environment for interactive program development and debugging, because the source program being executed is available at run time to be examined and modified. In addition, because the entire library of primitives is present, new programs can be constructed and added to the system during debugging.

In view of the complementary advantages of compilation and interpretation, modern program-development environments pursue a mixed strategy. Lisp interpreters are generally organized so that interpreted procedures and compiled procedures can call each other. This enables a programmer to compile those parts of a program that are assumed to be debugged, thus gaining the efficiency advantage of compilation, while retaining the interpretive mode of execution for those parts of the program that are in the flux of interactive development and debugging. In section 5.5.7, after we have implemented the compiler, we will show how to interface it with our interpreter to produce an integrated interpreter-compiler development system.

An overview of the compiler

Our compiler is much like our interpreter, both in its structure and in the function it performs. Accordingly, the mechanisms used by the compiler for analyzing expressions will be similar to those used by the interpreter. Moreover, to make it easy to interface compiled and interpreted code, we will design the compiler to generate code that obeys the same conventions of register usage as the interpreter: The environment will be kept in the env register, argument lists will be accumulated in argl, a procedure to be applied will be in proc, procedures will return their answers in val, and the location to which a procedure should return will be kept in continue. In general, the compiler translates a source program into an object program that performs essentially the same register operations as would the interpreter in evaluating the same source program.

This description suggests a strategy for implementing a rudimentary compiler: We traverse the expression in the same way the interpreter does. When we encounter a register instruction that the interpreter would perform in evaluating the expression, we do not execute the instruction but instead accumulate it into a sequence. The resulting sequence of instructions will be the object code. Observe the efficiency advantage of compilation over interpretation. Each time the interpreter evaluates an expression -- for example, (f 84 96) -- it performs the work of classifying the expression (discovering that this is a procedure application) and testing for the end of the operand list (discovering that there are two operands). With a compiler, the expression is analyzed only once, when the instruction sequence is generated at compile time. The object code produced by the compiler contains only the instructions that evaluate the operator and the two operands, assemble the argument list, and apply the procedure (in proc) to the arguments (in argl).

This is the same kind of optimization we implemented in the analyzing evaluator of section 4.1.7. But there are further opportunities to gain efficiency in compiled code. As the interpreter runs, it follows a process that must be applicable to any expression in the language. In contrast, a given segment of compiled code is meant to execute some particular expression. This can make a big difference, for example in the use of the stack to save registers. When the interpreter evaluates an expression, it must be prepared for any contingency. Before evaluating a subexpression, the interpreter saves all registers that will be needed later, because the subexpression might require an arbitrary evaluation. A compiler, on the other hand, can exploit the structure of the particular expression it is processing to generate code that avoids unnecessary stack operations.

As a case in point, consider the combination (f 84 96). Before the interpreter evaluates the operator of the combination, it prepares for this evaluation by saving the registers containing the operands and the environment, whose values will be needed later. The interpreter then evaluates the operator to obtain the result in val, restores the saved registers, and finally moves the result from val to proc. However, in the particular expression we are dealing with, the operator is the symbol f, whose evaluation is accomplished by the machine operation lookup-variable-value, which does not alter any registers. The compiler that we implement in this section will take advantage of this fact and generate code that evaluates the operator using the instruction

(assign proc (op lookup-variable-value) (const f) (reg env))

This code not only avoids the unnecessary saves and restores but also assigns the value of the lookup directly to proc, whereas the interpreter would obtain the result in val and then move this to proc.

A compiler can also optimize access to the environment. Having analyzed the code, the compiler can in many cases know in which frame a particular variable will be located and access that frame directly, rather than performing the lookup-variable-value search. We will discuss how to implement such variable access in section 5.5.6. Until then, however, we will focus on the kind of register and stack optimizations described above. There are many other optimizations that can be performed by a compiler, such as coding primitive operations ``in line'' instead of using a general apply mechanism (see exercise 5.38); but we will not emphasize these here. Our main goal in this section is to illustrate the compilation process in a simplified (but still interesting) context.

5.5.1  Structure of the Compiler

In section 4.1.7 we modified our original metacircular interpreter to separate analysis from execution. We analyzed each expression to produce an execution procedure that took an environment as argument and performed the required operations. In our compiler, we will do essentially the same analysis. Instead of producing execution procedures, however, we will generate sequences of instructions to be run by our register machine.

The procedure compile is the top-level dispatch in the compiler. It corresponds to the eval procedure of section 4.1.1, the analyze procedure of section 4.1.7, and the eval-dispatch entry point of the explicit-control-evaluator in section 5.4.1. The compiler, like the interpreters, uses the expression-syntax procedures defined in section 4.1.2.35 Compile performs a case analysis on the syntactic type of the expression to be compiled. For each type of expression, it dispatches to a specialized code generator:

(define (compile exp target linkage)
  (cond ((self-evaluating? exp)
         (compile-self-evaluating exp target linkage))
        ((quoted? exp) (compile-quoted exp target linkage))
        ((variable? exp)
         (compile-variable exp target linkage))
        ((assignment? exp)
         (compile-assignment exp target linkage))
        ((definition? exp)
         (compile-definition exp target linkage))
        ((if? exp) (compile-if exp target linkage))
        ((lambda? exp) (compile-lambda exp target linkage))
        ((begin? exp)
         (compile-sequence (begin-actions exp)
                           target
                           linkage))
        ((cond? exp) (compile (cond->if exp) target linkage))
        ((application? exp)
         (compile-application exp target linkage))
        (else
         (error "Unknown expression type -- COMPILE" exp))))

Name: Anonymous 2021-03-16 10:18

Targets and linkages

Compile and the code generators that it calls take two arguments in addition to the expression to compile. There is a target, which specifies the register in which the compiled code is to return the value of the expression. There is also a linkage descriptor, which describes how the code resulting from the compilation of the expression should proceed when it has finished its execution. The linkage descriptor can require that the code do one of the following three things:

    continue at the next instruction in sequence (this is specified by the linkage descriptor next),

    return from the procedure being compiled (this is specified by the linkage descriptor return), or

    jump to a named entry point (this is specified by using the designated label as the linkage descriptor). 

For example, compiling the expression 5 (which is self-evaluating) with a target of the val register and a linkage of next should produce the instruction

(assign val (const 5))

Compiling the same expression with a linkage of return should produce the instructions

(assign val (const 5))
(goto (reg continue))

In the first case, execution will continue with the next instruction in the sequence. In the second case, we will return from a procedure call. In both cases, the value of the expression will be placed into the target val register.

Instruction sequences and stack usage

Each code generator returns an instruction sequence containing the object code it has generated for the expression. Code generation for a compound expression is accomplished by combining the output from simpler code generators for component expressions, just as evaluation of a compound expression is accomplished by evaluating the component expressions.

The simplest method for combining instruction sequences is a procedure called append-instruction-sequences. It takes as arguments any number of instruction sequences that are to be executed sequentially; it appends them and returns the combined sequence. That is, if <seq1> and <seq2> are sequences of instructions, then evaluating

(append-instruction-sequences <seq1> <seq2>)

produces the sequence

<seq1>
<seq2>

Whenever registers might need to be saved, the compiler's code generators use preserving, which is a more subtle method for combining instruction sequences. Preserving takes three arguments: a set of registers and two instruction sequences that are to be executed sequentially. It appends the sequences in such a way that the contents of each register in the set is preserved over the execution of the first sequence, if this is needed for the execution of the second sequence. That is, if the first sequence modifies the register and the second sequence actually needs the register's original contents, then preserving wraps a save and a restore of the register around the first sequence before appending the sequences. Otherwise, preserving simply returns the appended instruction sequences. Thus, for example,

(preserving (list <reg1> <reg2>) <seq1> <seq2>)

produces one of the following four sequences of instructions, depending on how <seq1> and <seq2> use <reg1> and <reg2>:

By using preserving to combine instruction sequences the compiler avoids unnecessary stack operations. This also isolates the details of whether or not to generate save and restore instructions within the preserving procedure, separating them from the concerns that arise in writing each of the individual code generators. In fact no save or restore instructions are explicitly produced by the code generators.

In principle, we could represent an instruction sequence simply as a list of instructions. Append-instruction-sequences could then combine instruction sequences by performing an ordinary list append. However, preserving would then be a complex operation, because it would have to analyze each instruction sequence to determine how the sequence uses its registers. Preserving would be inefficient as well as complex, because it would have to analyze each of its instruction sequence arguments, even though these sequences might themselves have been constructed by calls to preserving, in which case their parts would have already been analyzed. To avoid such repetitious analysis we will associate with each instruction sequence some information about its register use. When we construct a basic instruction sequence we will provide this information explicitly, and the procedures that combine instruction sequences will derive register-use information for the combined sequence from the information associated with the component sequences.

An instruction sequence will contain three pieces of information:

    the set of registers that must be initialized before the instructions in the sequence are executed (these registers are said to be needed by the sequence),

    the set of registers whose values are modified by the instructions in the sequence, and

    the actual instructions (also called statements) in the sequence. 

We will represent an instruction sequence as a list of its three parts. The constructor for instruction sequences is thus

(define (make-instruction-sequence needs modifies statements)
  (list needs modifies statements))

For example, the two-instruction sequence that looks up the value of the variable x in the current environment, assigns the result to val, and then returns, requires registers env and continue to have been initialized, and modifies register val. This sequence would therefore be constructed as

(make-instruction-sequence '(env continue) '(val)
 '((assign val
           (op lookup-variable-value) (const x) (reg env))
   (goto (reg continue))))

We sometimes need to construct an instruction sequence with no statements:

(define (empty-instruction-sequence)
  (make-instruction-sequence '() '() '()))

The procedures for combining instruction sequences are shown in section 5.5.4.

Exercise 5.31.  In evaluating a procedure application, the explicit-control evaluator always saves and restores the env register around the evaluation of the operator, saves and restores env around the evaluation of each operand (except the final one), saves and restores argl around the evaluation of each operand, and saves and restores proc around the evaluation of the operand sequence. For each of the following combinations, say which of these save and restore operations are superfluous and thus could be eliminated by the compiler's preserving mechanism:

(f 'x 'y)

((f) 'x 'y)

(f (g 'x) y)

(f (g 'x) 'y)

Exercise 5.32.  Using the preserving mechanism, the compiler will avoid saving and restoring env around the evaluation of the operator of a combination in the case where the operator is a symbol. We could also build such optimizations into the evaluator. Indeed, the explicit-control evaluator of section 5.4 already performs a similar optimization, by treating combinations with no operands as a special case.

a. Extend the explicit-control evaluator to recognize as a separate class of expressions combinations whose operator is a symbol, and to take advantage of this fact in evaluating such expressions.

b. Alyssa P. Hacker suggests that by extending the evaluator to recognize more and more special cases we could incorporate all the compiler's optimizations, and that this would eliminate the advantage of compilation altogether. What do you think of this idea?

5.5.2  Compiling Expressions

In this section and the next we implement the code generators to which the compile procedure dispatches.

Compiling linkage code

In general, the output of each code generator will end with instructions -- generated by the procedure compile-linkage -- that implement the required linkage. If the linkage is return then we must generate the instruction (goto (reg continue)). This needs the continue register and does not modify any registers. If the linkage is next, then we needn't include any additional instructions. Otherwise, the linkage is a label, and we generate a goto to that label, an instruction that does not need or modify any registers.36

(define (compile-linkage linkage)
  (cond ((eq? linkage 'return)
         (make-instruction-sequence '(continue) '()
          '((goto (reg continue)))))
        ((eq? linkage 'next)
         (empty-instruction-sequence))
        (else
         (make-instruction-sequence '() '()
          `((goto (label ,linkage)))))))

The linkage code is appended to an instruction sequence by preserving the continue register, since a return linkage will require the continue register: If the given instruction sequence modifies continue and the linkage code needs it, continue will be saved and restored.

(define (end-with-linkage linkage instruction-sequence)
  (preserving '(continue)
   instruction-sequence
   (compile-linkage linkage)))

Name: Anonymous 2021-03-16 10:19

Compiling simple expressions

The code generators for self-evaluating expressions, quotations, and variables construct instruction sequences that assign the required value to the target register and then proceed as specified by the linkage descriptor.

(define (compile-self-evaluating exp target linkage)
  (end-with-linkage linkage
   (make-instruction-sequence '() (list target)
    `((assign ,target (const ,exp))))))
(define (compile-quoted exp target linkage)
  (end-with-linkage linkage
   (make-instruction-sequence '() (list target)
    `((assign ,target (const ,(text-of-quotation exp)))))))
(define (compile-variable exp target linkage)
  (end-with-linkage linkage
   (make-instruction-sequence '(env) (list target)
    `((assign ,target
              (op lookup-variable-value)
              (const ,exp)
              (reg env))))))

All these assignment instructions modify the target register, and the one that looks up a variable needs the env register.

Assignments and definitions are handled much as they are in the interpreter. We recursively generate code that computes the value to be assigned to the variable, and append to it a two-instruction sequence that actually sets or defines the variable and assigns the value of the whole expression (the symbol ok) to the target register. The recursive compilation has target val and linkage next so that the code will put its result into val and continue with the code that is appended after it. The appending is done preserving env, since the environment is needed for setting or defining the variable and the code for the variable value could be the compilation of a complex expression that might modify the registers in arbitrary ways.

(define (compile-assignment exp target linkage)
  (let ((var (assignment-variable exp))
        (get-value-code
         (compile (assignment-value exp) 'val 'next)))
    (end-with-linkage linkage
     (preserving '(env)
      get-value-code
      (make-instruction-sequence '(env val) (list target)
       `((perform (op set-variable-value!)
                  (const ,var)
                  (reg val)
                  (reg env))
         (assign ,target (const ok))))))))
(define (compile-definition exp target linkage)
  (let ((var (definition-variable exp))
        (get-value-code
         (compile (definition-value exp) 'val 'next)))
    (end-with-linkage linkage
     (preserving '(env)
      get-value-code
      (make-instruction-sequence '(env val) (list target)
       `((perform (op define-variable!)
                  (const ,var)
                  (reg val)
                  (reg env))
         (assign ,target (const ok))))))))

The appended two-instruction sequence requires env and val and modifies the target. Note that although we preserve env for this sequence, we do not preserve val, because the get-value-code is designed to explicitly place its result in val for use by this sequence. (In fact, if we did preserve val, we would have a bug, because this would cause the previous contents of val to be restored right after the get-value-code is run.)

Compiling conditional expressions

The code for an if expression compiled with a given target and linkage has the form

 <compilation of predicate, target val, linkage next>
 (test (op false?) (reg val))
 (branch (label false-branch))
true-branch
 <compilation of consequent with given target and given linkage or after-if>
false-branch
 <compilation of alternative with given target and linkage>
after-if

To generate this code, we compile the predicate, consequent, and alternative, and combine the resulting code with instructions to test the predicate result and with newly generated labels to mark the true and false branches and the end of the conditional.37 In this arrangement of code, we must branch around the true branch if the test is false. The only slight complication is in how the linkage for the true branch should be handled. If the linkage for the conditional is return or a label, then the true and false branches will both use this same linkage. If the linkage is next, the true branch ends with a jump around the code for the false branch to the label at the end of the conditional.

(define (compile-if exp target linkage)
  (let ((t-branch (make-label 'true-branch))
        (f-branch (make-label 'false-branch))                    
        (after-if (make-label 'after-if)))
    (let ((consequent-linkage
           (if (eq? linkage 'next) after-if linkage)))
      (let ((p-code (compile (if-predicate exp) 'val 'next))
            (c-code
             (compile
              (if-consequent exp) target consequent-linkage))
            (a-code
             (compile (if-alternative exp) target linkage)))
        (preserving '(env continue)
         p-code
         (append-instruction-sequences
          (make-instruction-sequence '(val) '()
           `((test (op false?) (reg val))
             (branch (label ,f-branch))))
          (parallel-instruction-sequences
           (append-instruction-sequences t-branch c-code)
           (append-instruction-sequences f-branch a-code))
          after-if))))))

Env is preserved around the predicate code because it could be needed by the true and false branches, and continue is preserved because it could be needed by the linkage code in those branches. The code for the true and false branches (which are not executed sequentially) is appended using a special combiner parallel-instruction-sequences described in section 5.5.4.

Note that cond is a derived expression, so all that the compiler needs to do handle it is to apply the cond->if transformer (from section 4.1.2) and compile the resulting if expression.

Compiling sequences

The compilation of sequences (from procedure bodies or explicit begin expressions) parallels their evaluation. Each expression of the sequence is compiled -- the last expression with the linkage specified for the sequence, and the other expressions with linkage next (to execute the rest of the sequence). The instruction sequences for the individual expressions are appended to form a single instruction sequence, such that env (needed for the rest of the sequence) and continue (possibly needed for the linkage at the end of the sequence) are preserved.

(define (compile-sequence seq target linkage)
  (if (last-exp? seq)
      (compile (first-exp seq) target linkage)
      (preserving '(env continue)
       (compile (first-exp seq) target 'next)
       (compile-sequence (rest-exps seq) target linkage))))

Compiling lambda expressions

Lambda expressions construct procedures. The object code for a lambda expression must have the form

<construct procedure object and assign it to target register>
<linkage>

When we compile the lambda expression, we also generate the code for the procedure body. Although the body won't be executed at the time of procedure construction, it is convenient to insert it into the object code right after the code for the lambda. If the linkage for the lambda expression is a label or return, this is fine. But if the linkage is next, we will need to skip around the code for the procedure body by using a linkage that jumps to a label that is inserted after the body. The object code thus has the form

 <construct procedure object and assign it to target register>
 <code for given linkage>or (goto (label after-lambda))
 <compilation of procedure body>
after-lambda

Compile-lambda generates the code for constructing the procedure object followed by the code for the procedure body. The procedure object will be constructed at run time by combining the current environment (the environment at the point of definition) with the entry point to the compiled procedure body (a newly generated label).38

(define (compile-lambda exp target linkage)
  (let ((proc-entry (make-label 'entry))
        (after-lambda (make-label 'after-lambda)))
    (let ((lambda-linkage
           (if (eq? linkage 'next) after-lambda linkage)))
      (append-instruction-sequences
       (tack-on-instruction-sequence
        (end-with-linkage lambda-linkage
         (make-instruction-sequence '(env) (list target)
          `((assign ,target
                    (op make-compiled-procedure)
                    (label ,proc-entry)
                    (reg env)))))
        (compile-lambda-body exp proc-entry))
       after-lambda))))

Compile-lambda uses the special combiner tack-on-instruction-sequence (section 5.5.4) rather than append-instruction-sequences to append the procedure body to the lambda expression code, because the body is not part of the sequence of instructions that will be executed when the combined sequence is entered; rather, it is in the sequence only because that was a convenient place to put it.

Compile-lambda-body constructs the code for the body of the procedure. This code begins with a label for the entry point. Next come instructions that will cause the run-time evaluation environment to switch to the correct environment for evaluating the procedure body -- namely, the definition environment of the procedure, extended to include the bindings of the formal parameters to the arguments with which the procedure is called. After this comes the code for the sequence of expressions that makes up the procedure body. The sequence is compiled with linkage return and target val so that it will end by returning from the procedure with the procedure result in val.

(define (compile-lambda-body exp proc-entry)
  (let ((formals (lambda-parameters exp)))
    (append-instruction-sequences
     (make-instruction-sequence '(env proc argl) '(env)
      `(,proc-entry
        (assign env (op compiled-procedure-env) (reg proc))
        (assign env
                (op extend-environment)
                (const ,formals)
                (reg argl)
                (reg env))))
     (compile-sequence (lambda-body exp) 'val 'return))))

Name: Anonymous 2021-03-16 10:20

5.5.3  Compiling Combinations

The essence of the compilation process is the compilation of procedure applications. The code for a combination compiled with a given target and linkage has the form

<compilation of operator, target proc, linkage next>
<evaluate operands and construct argument list in argl>
<compilation of procedure call with given target and linkage>

The registers env, proc, and argl may have to be saved and restored during evaluation of the operator and operands. Note that this is the only place in the compiler where a target other than val is specified.

The required code is generated by compile-application. This recursively compiles the operator, to produce code that puts the procedure to be applied into proc, and compiles the operands, to produce code that evaluates the individual operands of the application. The instruction sequences for the operands are combined (by construct-arglist) with code that constructs the list of arguments in argl, and the resulting argument-list code is combined with the procedure code and the code that performs the procedure call (produced by compile-procedure-call). In appending the code sequences, the env register must be preserved around the evaluation of the operator (since evaluating the operator might modify env, which will be needed to evaluate the operands), and the proc register must be preserved around the construction of the argument list (since evaluating the operands might modify proc, which will be needed for the actual procedure application). Continue must also be preserved throughout, since it is needed for the linkage in the procedure call.

(define (compile-application exp target linkage)
  (let ((proc-code (compile (operator exp) 'proc 'next))
        (operand-codes
         (map (lambda (operand) (compile operand 'val 'next))
              (operands exp))))
    (preserving '(env continue)
     proc-code
     (preserving '(proc continue)
      (construct-arglist operand-codes)
      (compile-procedure-call target linkage)))))

The code to construct the argument list will evaluate each operand into val and then cons that value onto the argument list being accumulated in argl. Since we cons the arguments onto argl in sequence, we must start with the last argument and end with the first, so that the arguments will appear in order from first to last in the resulting list. Rather than waste an instruction by initializing argl to the empty list to set up for this sequence of evaluations, we make the first code sequence construct the initial argl. The general form of the argument-list construction is thus as follows:

<compilation of last operand, targeted to val>
(assign argl (op list) (reg val))
<compilation of next operand, targeted to val>
(assign argl (op cons) (reg val) (reg argl))
...<compilation of first operand, targeted to val>
(assign argl (op cons) (reg val) (reg argl))

Argl must be preserved around each operand evaluation except the first (so that arguments accumulated so far won't be lost), and env must be preserved around each operand evaluation except the last (for use by subsequent operand evaluations).

Compiling this argument code is a bit tricky, because of the special treatment of the first operand to be evaluated and the need to preserve argl and env in different places. The construct-arglist procedure takes as arguments the code that evaluates the individual operands. If there are no operands at all, it simply emits the instruction

(assign argl (const ()))

Otherwise, construct-arglist creates code that initializes argl with the last argument, and appends code that evaluates the rest of the arguments and adjoins them to argl in succession. In order to process the arguments from last to first, we must reverse the list of operand code sequences from the order supplied by compile-application.

(define (construct-arglist operand-codes)
  (let ((operand-codes (reverse operand-codes)))
    (if (null? operand-codes)
        (make-instruction-sequence '() '(argl)
         '((assign argl (const ()))))
        (let ((code-to-get-last-arg
               (append-instruction-sequences
                (car operand-codes)
                (make-instruction-sequence '(val) '(argl)
                 '((assign argl (op list) (reg val)))))))
          (if (null? (cdr operand-codes))
              code-to-get-last-arg
              (preserving '(env)
               code-to-get-last-arg
               (code-to-get-rest-args
                (cdr operand-codes))))))))
(define (code-to-get-rest-args operand-codes)
  (let ((code-for-next-arg
         (preserving '(argl)
          (car operand-codes)
          (make-instruction-sequence '(val argl) '(argl)
           '((assign argl
              (op cons) (reg val) (reg argl)))))))
    (if (null? (cdr operand-codes))
        code-for-next-arg
        (preserving '(env)
         code-for-next-arg
         (code-to-get-rest-args (cdr operand-codes))))))

Applying procedures

After evaluating the elements of a combination, the compiled code must apply the procedure in proc to the arguments in argl. The code performs essentially the same dispatch as the apply procedure in the metacircular evaluator of section 4.1.1 or the apply-dispatch entry point in the explicit-control evaluator of section 5.4.1. It checks whether the procedure to be applied is a primitive procedure or a compiled procedure. For a primitive procedure, it uses apply-primitive-procedure; we will see shortly how it handles compiled procedures. The procedure-application code has the following form:

 (test (op primitive-procedure?) (reg proc))
 (branch (label primitive-branch))
compiled-branch
 <code to apply compiled procedure with given target and appropriate linkage>
primitive-branch
 (assign <target>
         (op apply-primitive-procedure)
         (reg proc)
         (reg argl))
 <linkage>
after-call

Observe that the compiled branch must skip around the primitive branch. Therefore, if the linkage for the original procedure call was next, the compound branch must use a linkage that jumps to a label that is inserted after the primitive branch. (This is similar to the linkage used for the true branch in compile-if.)

(define (compile-procedure-call target linkage)
  (let ((primitive-branch (make-label 'primitive-branch))
        (compiled-branch (make-label 'compiled-branch))
        (after-call (make-label 'after-call)))
    (let ((compiled-linkage
           (if (eq? linkage 'next) after-call linkage)))
      (append-instruction-sequences
       (make-instruction-sequence '(proc) '()
        `((test (op primitive-procedure?) (reg proc))
          (branch (label ,primitive-branch))))
       (parallel-instruction-sequences
        (append-instruction-sequences
         compiled-branch
         (compile-proc-appl target compiled-linkage))
        (append-instruction-sequences
         primitive-branch
         (end-with-linkage linkage
          (make-instruction-sequence '(proc argl)
                                     (list target)
           `((assign ,target
                     (op apply-primitive-procedure)
                     (reg proc)
                     (reg argl)))))))
       after-call))))

The primitive and compound branches, like the true and false branches in compile-if, are appended using parallel-instruction-sequences rather than the ordinary append-instruction-sequences, because they will not be executed sequentially.

Applying compiled procedures

The code that handles procedure application is the most subtle part of the compiler, even though the instruction sequences it generates are very short. A compiled procedure (as constructed by compile-lambda) has an entry point, which is a label that designates where the code for the procedure starts. The code at this entry point computes a result in val and returns by executing the instruction (goto (reg continue)). Thus, we might expect the code for a compiled-procedure application (to be generated by compile-proc-appl) with a given target and linkage to look like this if the linkage is a label

 (assign continue (label proc-return))
 (assign val (op compiled-procedure-entry) (reg proc))
 (goto (reg val))
proc-return
 (assign <target> (reg val))   ; included if target is not val
 (goto (label <linkage>))   ; linkage code

or like this if the linkage is return.

 (save continue)
 (assign continue (label proc-return))
 (assign val (op compiled-procedure-entry) (reg proc))
 (goto (reg val))
proc-return
 (assign <target> (reg val))   ; included if target is not val
 (restore continue)
 (goto (reg continue))   ; linkage code

This code sets up continue so that the procedure will return to a label proc-return and jumps to the procedure's entry point. The code at proc-return transfers the procedure's result from val to the target register (if necessary) and then jumps to the location specified by the linkage. (The linkage is always return or a label, because compile-procedure-call replaces a next linkage for the compound-procedure branch by an after-call label.)

In fact, if the target is not val, that is exactly the code our compiler will generate.39 Usually, however, the target is val (the only time the compiler specifies a different register is when targeting the evaluation of an operator to proc), so the procedure result is put directly into the target register and there is no need to return to a special location that copies it. Instead, we simplify the code by setting up continue so that the procedure will ``return'' directly to the place specified by the caller's linkage:

<set up continue for linkage>
(assign val (op compiled-procedure-entry) (reg proc))
(goto (reg val))

If the linkage is a label, we set up continue so that the procedure will return to that label. (That is, the (goto (reg continue)) the procedure ends with becomes equivalent to the (goto (label <linkage>)) at proc-return above.)

(assign continue (label <linkage>))
(assign val (op compiled-procedure-entry) (reg proc))
(goto (reg val))

If the linkage is return, we don't need to set up continue at all: It already holds the desired location. (That is, the (goto (reg continue)) the procedure ends with goes directly to the place where the (goto (reg continue)) at proc-return would have gone.)

(assign val (op compiled-procedure-entry) (reg proc))
(goto (reg val))

With this implementation of the return linkage, the compiler generates tail-recursive code. Calling a procedure as the final step in a procedure body does a direct transfer, without saving any information on the stack.

Suppose instead that we had handled the case of a procedure call with a linkage of return and a target of val as shown above for a non-val target. This would destroy tail recursion. Our system would still give the same value for any expression. But each time we called a procedure, we would save continue and return after the call to undo the (useless) save. These extra saves would accumulate during a nest of procedure calls.40

Compile-proc-appl generates the above procedure-application code by considering four cases, depending on whether the target for the call is val and whether the linkage is return. Observe that the instruction sequences are declared to modify all the registers, since executing the procedure body can change the registers in arbitrary ways.41 Also note that the code sequence for the case with target val and linkage return is declared to need continue: Even though continue is not explicitly used in the two-instruction sequence, we must be sure that continue will have the correct value when we enter the compiled procedure.

(define (compile-proc-appl target linkage)
  (cond ((and (eq? target 'val) (not (eq? linkage 'return)))
         (make-instruction-sequence '(proc) all-regs
           `((assign continue (label ,linkage))
             (assign val (op compiled-procedure-entry)
                         (reg proc))
             (goto (reg val)))))
        ((and (not (eq? target 'val))
              (not (eq? linkage 'return)))
         (let ((proc-return (make-label 'proc-return)))
           (make-instruction-sequence '(proc) all-regs
            `((assign continue (label ,proc-return))
              (assign val (op compiled-procedure-entry)
                          (reg proc))
              (goto (reg val))
              ,proc-return
              (assign ,target (reg val))
              (goto (label ,linkage))))))
        ((and (eq? target 'val) (eq? linkage 'return))
         (make-instruction-sequence '(proc continue) all-regs
          '((assign val (op compiled-procedure-entry)
                        (reg proc))
            (goto (reg val)))))
        ((and (not (eq? target 'val)) (eq? linkage 'return))
         (error "return linkage, target not val -- COMPILE"
                target))))

Name: Anonymous 2021-03-16 10:21

5.5.4  Combining Instruction Sequences

This section describes the details on how instruction sequences are represented and combined. Recall from section 5.5.1 that an instruction sequence is represented as a list of the registers needed, the registers modified, and the actual instructions. We will also consider a label (symbol) to be a degenerate case of an instruction sequence, which doesn't need or modify any registers. So to determine the registers needed and modified by instruction sequences we use the selectors

(define (registers-needed s)
  (if (symbol? s) '() (car s)))
(define (registers-modified s)
  (if (symbol? s) '() (cadr s)))
(define (statements s)
  (if (symbol? s) (list s) (caddr s)))

and to determine whether a given sequence needs or modifies a given register we use the predicates

(define (needs-register? seq reg)
  (memq reg (registers-needed seq)))
(define (modifies-register? seq reg)
  (memq reg (registers-modified seq)))

In terms of these predicates and selectors, we can implement the various instruction sequence combiners used throughout the compiler.

The basic combiner is append-instruction-sequences. This takes as arguments an arbitrary number of instruction sequences that are to be executed sequentially and returns an instruction sequence whose statements are the statements of all the sequences appended together. The subtle point is to determine the registers that are needed and modified by the resulting sequence. It modifies those registers that are modified by any of the sequences; it needs those registers that must be initialized before the first sequence can be run (the registers needed by the first sequence), together with those registers needed by any of the other sequences that are not initialized (modified) by sequences preceding it.

The sequences are appended two at a time by append-2-sequences. This takes two instruction sequences seq1 and seq2 and returns the instruction sequence whose statements are the statements of seq1 followed by the statements of seq2, whose modified registers are those registers that are modified by either seq1 or seq2, and whose needed registers are the registers needed by seq1 together with those registers needed by seq2 that are not modified by seq1. (In terms of set operations, the new set of needed registers is the union of the set of registers needed by seq1 with the set difference of the registers needed by seq2 and the registers modified by seq1.) Thus, append-instruction-sequences is implemented as follows:

(define (append-instruction-sequences . seqs)
  (define (append-2-sequences seq1 seq2)
    (make-instruction-sequence
     (list-union (registers-needed seq1)
                 (list-difference (registers-needed seq2)
                                  (registers-modified seq1)))
     (list-union (registers-modified seq1)
                 (registers-modified seq2))
     (append (statements seq1) (statements seq2))))
  (define (append-seq-list seqs)
    (if (null? seqs)
        (empty-instruction-sequence)
        (append-2-sequences (car seqs)
                            (append-seq-list (cdr seqs)))))
  (append-seq-list seqs))

This procedure uses some simple operations for manipulating sets represented as lists, similar to the (unordered) set representation described in section 2.3.3:

(define (list-union s1 s2)
  (cond ((null? s1) s2)
        ((memq (car s1) s2) (list-union (cdr s1) s2))
        (else (cons (car s1) (list-union (cdr s1) s2)))))
(define (list-difference s1 s2)
  (cond ((null? s1) '())
        ((memq (car s1) s2) (list-difference (cdr s1) s2))
        (else (cons (car s1)
                    (list-difference (cdr s1) s2)))))

Preserving, the second major instruction sequence combiner, takes a list of registers regs and two instruction sequences seq1 and seq2 that are to be executed sequentially. It returns an instruction sequence whose statements are the statements of seq1 followed by the statements of seq2, with appropriate save and restore instructions around seq1 to protect the registers in regs that are modified by seq1 but needed by seq2. To accomplish this, preserving first creates a sequence that has the required saves followed by the statements of seq1 followed by the required restores. This sequence needs the registers being saved and restored in addition to the registers needed by seq1, and modifies the registers modified by seq1 except for the ones being saved and restored. This augmented sequence and seq2 are then appended in the usual way. The following procedure implements this strategy recursively, walking down the list of registers to be preserved:42

(define (preserving regs seq1 seq2)
  (if (null? regs)
      (append-instruction-sequences seq1 seq2)
      (let ((first-reg (car regs)))
        (if (and (needs-register? seq2 first-reg)
                 (modifies-register? seq1 first-reg))
            (preserving (cdr regs)
             (make-instruction-sequence
              (list-union (list first-reg)
                          (registers-needed seq1))
              (list-difference (registers-modified seq1)
                               (list first-reg))
              (append `((save ,first-reg))
                      (statements seq1)
                      `((restore ,first-reg))))
             seq2)
            (preserving (cdr regs) seq1 seq2)))))

Another sequence combiner, tack-on-instruction-sequence, is used by compile-lambda to append a procedure body to another sequence. Because the procedure body is not ``in line'' to be executed as part of the combined sequence, its register use has no impact on the register use of the sequence in which it is embedded. We thus ignore the procedure body's sets of needed and modified registers when we tack it onto the other sequence.

(define (tack-on-instruction-sequence seq body-seq)
  (make-instruction-sequence
   (registers-needed seq)
   (registers-modified seq)
   (append (statements seq) (statements body-seq))))

Compile-if and compile-procedure-call use a special combiner called parallel-instruction-sequences to append the two alternative branches that follow a test. The two branches will never be executed sequentially; for any particular evaluation of the test, one branch or the other will be entered. Because of this, the registers needed by the second branch are still needed by the combined sequence, even if these are modified by the first branch.

(define (parallel-instruction-sequences seq1 seq2)
  (make-instruction-sequence
   (list-union (registers-needed seq1)
               (registers-needed seq2))
   (list-union (registers-modified seq1)
               (registers-modified seq2))
   (append (statements seq1) (statements seq2))))

5.5.5  An Example of Compiled Code

Now that we have seen all the elements of the compiler, let us examine an example of compiled code to see how things fit together. We will compile the definition of a recursive factorial procedure by calling compile:

(compile
 '(define (factorial n)
    (if (= n 1)
        1
        (* (factorial (- n 1)) n)))
 'val
 'next)

We have specified that the value of the define expression should be placed in the val register. We don't care what the compiled code does after executing the define, so our choice of next as the linkage descriptor is arbitrary.

Compile determines that the expression is a definition, so it calls compile-definition to compile code to compute the value to be assigned (targeted to val), followed by code to install the definition, followed by code to put the value of the define (which is the symbol ok) into the target register, followed finally by the linkage code. Env is preserved around the computation of the value, because it is needed in order to install the definition. Because the linkage is next, there is no linkage code in this case. The skeleton of the compiled code is thus

  <save env if modified by code to compute value>
  <compilation of definition value, target val, linkage next>
  <restore env if saved above>
  (perform (op define-variable!)
           (const factorial)
           (reg val)
           (reg env))
  (assign val (const ok))

The expression that is to be compiled to produce the value for the variable factorial is a lambda expression whose value is the procedure that computes factorials. Compile handles this by calling compile-lambda, which compiles the procedure body, labels it as a new entry point, and generates the instruction that will combine the procedure body at the new entry point with the run-time environment and assign the result to val. The sequence then skips around the compiled procedure code, which is inserted at this point. The procedure code itself begins by extending the procedure's definition environment by a frame that binds the formal parameter n to the procedure argument. Then comes the actual procedure body. Since this code for the value of the variable doesn't modify the env register, the optional save and restore shown above aren't generated. (The procedure code at entry2 isn't executed at this point, so its use of env is irrelevant.) Therefore, the skeleton for the compiled code becomes

  (assign val (op make-compiled-procedure)
              (label entry2)
              (reg env))
  (goto (label after-lambda1))
entry2
  (assign env (op compiled-procedure-env) (reg proc))
  (assign env (op extend-environment)
              (const (n))
              (reg argl)
              (reg env))
  <compilation of procedure body>
after-lambda1
  (perform (op define-variable!)
           (const factorial)
           (reg val)
           (reg env))
  (assign val (const ok))

A procedure body is always compiled (by compile-lambda-body) as a sequence with target val and linkage return. The sequence in this case consists of a single if expression:

(if (= n 1)
    1
    (* (factorial (- n 1)) n))

Compile-if generates code that first computes the predicate (targeted to val), then checks the result and branches around the true branch if the predicate is false. Env and continue are preserved around the predicate code, since they may be needed for the rest of the if expression. Since the if expression is the final expression (and only expression) in the sequence making up the procedure body, its target is val and its linkage is return, so the true and false branches are both compiled with target val and linkage return. (That is, the value of the conditional, which is the value computed by either of its branches, is the value of the procedure.)

  <save continue, env if modified by predicate and needed by branches>
  <compilation of predicate, target val, linkage next>
  <restore continue, env if saved above>
  (test (op false?) (reg val))
  (branch (label false-branch4))
true-branch5
  <compilation of true branch, target val, linkage return>
false-branch4
  <compilation of false branch, target val, linkage return>
after-if3

The predicate (= n 1) is a procedure call. This looks up the operator (the symbol =) and places this value in proc. It then assembles the arguments 1 and the value of n into argl. Then it tests whether proc contains a primitive or a compound procedure, and dispatches to a primitive branch or a compound branch accordingly. Both branches resume at the after-call label. The requirements to preserve registers around the evaluation of the operator and operands don't result in any saving of registers, because in this case those evaluations don't modify the registers in question.

  (assign proc
          (op lookup-variable-value) (const =) (reg env))
  (assign val (const 1))
  (assign argl (op list) (reg val))
  (assign val (op lookup-variable-value) (const n) (reg env))
  (assign argl (op cons) (reg val) (reg argl))
  (test (op primitive-procedure?) (reg proc))
  (branch (label primitive-branch17))
compiled-branch16
  (assign continue (label after-call15))
  (assign val (op compiled-procedure-entry) (reg proc))
  (goto (reg val))
primitive-branch17
  (assign val (op apply-primitive-procedure)
              (reg proc)
              (reg argl))
after-call15

The true branch, which is the constant 1, compiles (with target val and linkage return) to

  (assign val (const 1))
  (goto (reg continue))

The code for the false branch is another a procedure call, where the procedure is the value of the symbol *, and the arguments are n and the result of another procedure call (a call to factorial). Each of these calls sets up proc and argl and its own primitive and compound branches. Figure 5.17 shows the complete compilation of the definition of the factorial procedure. Notice that the possible save and restore of continue and env around the predicate, shown above, are in fact generated, because these registers are modified by the procedure call in the predicate and needed for the procedure call and the return linkage in the branches.

Name: Anonymous 2021-03-16 10:21

Exercise 5.33.  Consider the following definition of a factorial procedure, which is slightly different from the one given above:

(define (factorial-alt n)
  (if (= n 1)
      1
      (* n (factorial-alt (- n 1)))))

Compile this procedure and compare the resulting code with that produced for factorial. Explain any differences you find. Does either program execute more efficiently than the other?

Exercise 5.34.  Compile the iterative factorial procedure

(define (factorial n)
  (define (iter product counter)
    (if (> counter n)
        product
        (iter (* counter product)
              (+ counter 1))))
  (iter 1 1))

Annotate the resulting code, showing the essential difference between the code for iterative and recursive versions of factorial that makes one process build up stack space and the other run in constant stack space.

;; construct the procedure and skip over code for the procedure body
  (assign val
          (op make-compiled-procedure) (label entry2) (reg env))
  (goto (label after-lambda1))

entry2     ; calls to factorial will enter here
  (assign env (op compiled-procedure-env) (reg proc))
  (assign env
          (op extend-environment) (const (n)) (reg argl) (reg env))
;; begin actual procedure body
  (save continue)
  (save env)

;; compute (= n 1)
  (assign proc (op lookup-variable-value) (const =) (reg env))
  (assign val (const 1))
  (assign argl (op list) (reg val))
  (assign val (op lookup-variable-value) (const n) (reg env))
  (assign argl (op cons) (reg val) (reg argl))
  (test (op primitive-procedure?) (reg proc))
  (branch (label primitive-branch17))
compiled-branch16
  (assign continue (label after-call15))
  (assign val (op compiled-procedure-entry) (reg proc))
  (goto (reg val))
primitive-branch17
  (assign val (op apply-primitive-procedure) (reg proc) (reg argl))

after-call15   ; val now contains result of (= n 1)
  (restore env)
  (restore continue)
  (test (op false?) (reg val))
  (branch (label false-branch4))
true-branch5  ; return 1
  (assign val (const 1))
  (goto (reg continue))

false-branch4
;; compute and return (* (factorial (- n 1)) n)
  (assign proc (op lookup-variable-value) (const *) (reg env))
  (save continue)
  (save proc)   ; save * procedure
  (assign val (op lookup-variable-value) (const n) (reg env))
  (assign argl (op list) (reg val))
  (save argl)   ; save partial argument list for *

;; compute (factorial (- n 1)), which is the other argument for *
  (assign proc
          (op lookup-variable-value) (const factorial) (reg env))
  (save proc)  ; save factorial procedure

Figure 5.17:  Compilation of the definition of the factorial procedure (continued on next page).

;; compute (- n 1), which is the argument for factorial
  (assign proc (op lookup-variable-value) (const -) (reg env))
  (assign val (const 1))
  (assign argl (op list) (reg val))
  (assign val (op lookup-variable-value) (const n) (reg env))
  (assign argl (op cons) (reg val) (reg argl))
  (test (op primitive-procedure?) (reg proc))
  (branch (label primitive-branch8))
compiled-branch7
  (assign continue (label after-call6))
  (assign val (op compiled-procedure-entry) (reg proc))
  (goto (reg val))
primitive-branch8
  (assign val (op apply-primitive-procedure) (reg proc) (reg argl))

after-call6   ; val now contains result of (- n 1)
  (assign argl (op list) (reg val))
  (restore proc) ; restore factorial
;; apply factorial
  (test (op primitive-procedure?) (reg proc))
  (branch (label primitive-branch11))
compiled-branch10
  (assign continue (label after-call9))
  (assign val (op compiled-procedure-entry) (reg proc))
  (goto (reg val))
primitive-branch11
  (assign val (op apply-primitive-procedure) (reg proc) (reg argl))

after-call9      ; val now contains result of (factorial (- n 1))
  (restore argl) ; restore partial argument list for *
  (assign argl (op cons) (reg val) (reg argl))
  (restore proc) ; restore *
  (restore continue)
;; apply * and return its value
  (test (op primitive-procedure?) (reg proc))
  (branch (label primitive-branch14))
compiled-branch13
;; note that a compound procedure here is called tail-recursively
  (assign val (op compiled-procedure-entry) (reg proc))
  (goto (reg val))
primitive-branch14
  (assign val (op apply-primitive-procedure) (reg proc) (reg argl))
  (goto (reg continue))
after-call12
after-if3
after-lambda1
;; assign the procedure to the variable factorial
  (perform
   (op define-variable!) (const factorial) (reg val) (reg env))
  (assign val (const ok))

Figure 5.17:  (continued)

Exercise 5.35.  What expression was compiled to produce the code shown in figure 5.18?

  (assign val (op make-compiled-procedure) (label entry16)
                                           (reg env))
  (goto (label after-lambda15))
entry16
  (assign env (op compiled-procedure-env) (reg proc))
  (assign env
          (op extend-environment) (const (x)) (reg argl) (reg env))
  (assign proc (op lookup-variable-value) (const +) (reg env))
  (save continue)
  (save proc)
  (save env)
  (assign proc (op lookup-variable-value) (const g) (reg env))
  (save proc)
  (assign proc (op lookup-variable-value) (const +) (reg env))
  (assign val (const 2))
  (assign argl (op list) (reg val))
  (assign val (op lookup-variable-value) (const x) (reg env))
  (assign argl (op cons) (reg val) (reg argl))
  (test (op primitive-procedure?) (reg proc))
  (branch (label primitive-branch19))
compiled-branch18
  (assign continue (label after-call17))
  (assign val (op compiled-procedure-entry) (reg proc))
  (goto (reg val))
primitive-branch19
  (assign val (op apply-primitive-procedure) (reg proc) (reg argl))
after-call17
  (assign argl (op list) (reg val))
  (restore proc)
  (test (op primitive-procedure?) (reg proc))
  (branch (label primitive-branch22))
compiled-branch21
  (assign continue (label after-call20))
  (assign val (op compiled-procedure-entry) (reg proc))
  (goto (reg val))
primitive-branch22
  (assign val (op apply-primitive-procedure) (reg proc) (reg argl))

Figure 5.18:  An example of compiler output (continued on next page). See exercise 5.35.

after-call20
  (assign argl (op list) (reg val))
  (restore env)
  (assign val (op lookup-variable-value) (const x) (reg env))
  (assign argl (op cons) (reg val) (reg argl))
  (restore proc)
  (restore continue)
  (test (op primitive-procedure?) (reg proc))
  (branch (label primitive-branch25))
compiled-branch24
  (assign val (op compiled-procedure-entry) (reg proc))
  (goto (reg val))
primitive-branch25
  (assign val (op apply-primitive-procedure) (reg proc) (reg argl))
  (goto (reg continue))
after-call23
after-lambda15
  (perform (op define-variable!) (const f) (reg val) (reg env))
  (assign val (const ok))

Figure 5.18:  (continued)

Exercise 5.36.  What order of evaluation does our compiler produce for operands of a combination? Is it left-to-right, right-to-left, or some other order? Where in the compiler is this order determined? Modify the compiler so that it produces some other order of evaluation. (See the discussion of order of evaluation for the explicit-control evaluator in section 5.4.1.) How does changing the order of operand evaluation affect the efficiency of the code that constructs the argument list?

Exercise 5.37.  One way to understand the compiler's preserving mechanism for optimizing stack usage is to see what extra operations would be generated if we did not use this idea. Modify preserving so that it always generates the save and restore operations. Compile some simple expressions and identify the unnecessary stack operations that are generated. Compare the code to that generated with the preserving mechanism intact.

Exercise 5.38.  Our compiler is clever about avoiding unnecessary stack operations, but it is not clever at all when it comes to compiling calls to the primitive procedures of the language in terms of the primitive operations supplied by the machine. For example, consider how much code is compiled to compute (+ a 1): The code sets up an argument list in argl, puts the primitive addition procedure (which it finds by looking up the symbol + in the environment) into proc, and tests whether the procedure is primitive or compound. The compiler always generates code to perform the test, as well as code for primitive and compound branches (only one of which will be executed). We have not shown the part of the controller that implements primitives, but we presume that these instructions make use of primitive arithmetic operations in the machine's data paths. Consider how much less code would be generated if the compiler could open-code primitives -- that is, if it could generate code to directly use these primitive machine operations. The expression (+ a 1) might be compiled into something as simple as 43

(assign val (op lookup-variable-value) (const a) (reg env))
(assign val (op +) (reg val) (const 1))

In this exercise we will extend our compiler to support open coding of selected primitives. Special-purpose code will be generated for calls to these primitive procedures instead of the general procedure-application code. In order to support this, we will augment our machine with special argument registers arg1 and arg2. The primitive arithmetic operations of the machine will take their inputs from arg1 and arg2. The results may be put into val, arg1, or arg2.

The compiler must be able to recognize the application of an open-coded primitive in the source program. We will augment the dispatch in the compile procedure to recognize the names of these primitives in addition to the reserved words (the special forms) it currently recognizes.44 For each special form our compiler has a code generator. In this exercise we will construct a family of code generators for the open-coded primitives.

a.  The open-coded primitives, unlike the special forms, all need their operands evaluated. Write a code generator spread-arguments for use by all the open-coding code generators. Spread-arguments should take an operand list and compile the given operands targeted to successive argument registers. Note that an operand may contain a call to an open-coded primitive, so argument registers will have to be preserved during operand evaluation.

b.  For each of the primitive procedures =, *, -, and +, write a code generator that takes a combination with that operator, together with a target and a linkage descriptor, and produces code to spread the arguments into the registers and then perform the operation targeted to the given target with the given linkage. You need only handle expressions with two operands. Make compile dispatch to these code generators.

c.  Try your new compiler on the factorial example. Compare the resulting code with the result produced without open coding.

d.  Extend your code generators for + and * so that they can handle expressions with arbitrary numbers of operands. An expression with more than two operands will have to be compiled into a sequence of operations, each with only two inputs.

Name: Anonymous 2021-03-16 10:22

5.5.6  Lexical Addressing

One of the most common optimizations performed by compilers is the optimization of variable lookup. Our compiler, as we have implemented it so far, generates code that uses the lookup-variable-value operation of the evaluator machine. This searches for a variable by comparing it with each variable that is currently bound, working frame by frame outward through the run-time environment. This search can be expensive if the frames are deeply nested or if there are many variables. For example, consider the problem of looking up the value of x while evaluating the expression (* x y z) in an application of the procedure that is returned by

(let ((x 3) (y 4))
  (lambda (a b c d e)
    (let ((y (* a b x))
          (z (+ c d x)))
      (* x y z))))

Since a let expression is just syntactic sugar for a lambda combination, this expression is equivalent to

((lambda (x y)
   (lambda (a b c d e)
     ((lambda (y z) (* x y z))
      (* a b x)
      (+ c d x))))
 3
 4)

Each time lookup-variable-value searches for x, it must determine that the symbol x is not eq? to y or z (in the first frame), nor to a, b, c, d, or e (in the second frame). We will assume, for the moment, that our programs do not use define -- that variables are bound only with lambda. Because our language is lexically scoped, the run-time environment for any expression will have a structure that parallels the lexical structure of the program in which the expression appears.45 Thus, the compiler can know, when it analyzes the above expression, that each time the procedure is applied the variable x in (* x y z) will be found two frames out from the current frame and will be the first variable in that frame.

We can exploit this fact by inventing a new kind of variable-lookup operation, lexical-address-lookup, that takes as arguments an environment and a lexical address that consists of two numbers: a frame number, which specifies how many frames to pass over, and a displacement number, which specifies how many variables to pass over in that frame. Lexical-address-lookup will produce the value of the variable stored at that lexical address relative to the current environment. If we add the lexical-address-lookup operation to our machine, we can make the compiler generate code that references variables using this operation, rather than lookup-variable-value. Similarly, our compiled code can use a new lexical-address-set! operation instead of set-variable-value!.

In order to generate such code, the compiler must be able to determine the lexical address of a variable it is about to compile a reference to. The lexical address of a variable in a program depends on where one is in the code. For example, in the following program, the address of x in expression <e1> is (2,0) -- two frames back and the first variable in the frame. At that point y is at address (0,0) and c is at address (1,2). In expression <e2>, x is at (1,0), y is at (1,1), and c is at (0,2).

((lambda (x y)
   (lambda (a b c d e)
     ((lambda (y z) <e1>)
      <e2>
      (+ c d x))))
 3
 4)

One way for the compiler to produce code that uses lexical addressing is to maintain a data structure called a compile-time environment. This keeps track of which variables will be at which positions in which frames in the run-time environment when a particular variable-access operation is executed. The compile-time environment is a list of frames, each containing a list of variables. (There will of course be no values bound to the variables, since values are not computed at compile time.) The compile-time environment becomes an additional argument to compile and is passed along to each code generator. The top-level call to compile uses an empty compile-time environment. When a lambda body is compiled, compile-lambda-body extends the compile-time environment by a frame containing the procedure's parameters, so that the sequence making up the body is compiled with that extended environment. At each point in the compilation, compile-variable and compile-assignment use the compile-time environment in order to generate the appropriate lexical addresses.

Exercises 5.39 through 5.43 describe how to complete this sketch of the lexical-addressing strategy in order to incorporate lexical lookup into the compiler. Exercise 5.44 describes another use for the compile-time environment.

Exercise 5.39.  Write a procedure lexical-address-lookup that implements the new lookup operation. It should take two arguments -- a lexical address and a run-time environment -- and return the value of the variable stored at the specified lexical address. Lexical-address-lookup should signal an error if the value of the variable is the symbol *unassigned*.46 Also write a procedure lexical-address-set! that implements the operation that changes the value of the variable at a specified lexical address.

Exercise 5.40.  Modify the compiler to maintain the compile-time environment as described above. That is, add a compile-time-environment argument to compile and the various code generators, and extend it in compile-lambda-body.

Exercise 5.41.  Write a procedure find-variable that takes as arguments a variable and a compile-time environment and returns the lexical address of the variable with respect to that environment. For example, in the program fragment that is shown above, the compile-time environment during the compilation of expression <e1> is ((y z) (a b c d e) (x y)). Find-variable should produce

(find-variable 'c '((y z) (a b c d e) (x y)))
(1 2)

(find-variable 'x '((y z) (a b c d e) (x y)))
(2 0)

(find-variable 'w '((y z) (a b c d e) (x y)))
not-found

Exercise 5.42.  Using find-variable from exercise 5.41, rewrite compile-variable and compile-assignment to output lexical-address instructions. In cases where find-variable returns not-found (that is, where the variable is not in the compile-time environment), you should have the code generators use the evaluator operations, as before, to search for the binding. (The only place a variable that is not found at compile time can be is in the global environment, which is part of the run-time environment but is not part of the compile-time environment.47 Thus, if you wish, you may have the evaluator operations look directly in the global environment, which can be obtained with the operation (op get-global-environment), instead of having them search the whole run-time environment found in env.) Test the modified compiler on a few simple cases, such as the nested lambda combination at the beginning of this section.

Exercise 5.43.  We argued in section 4.1.6 that internal definitions for block structure should not be considered ``real'' defines. Rather, a procedure body should be interpreted as if the internal variables being defined were installed as ordinary lambda variables initialized to their correct values using set!. Section 4.1.6 and exercise 4.16 showed how to modify the metacircular interpreter to accomplish this by scanning out internal definitions. Modify the compiler to perform the same transformation before it compiles a procedure body.

Exercise 5.44.  In this section we have focused on the use of the compile-time environment to produce lexical addresses. But there are other uses for compile-time environments. For instance, in exercise 5.38 we increased the efficiency of compiled code by open-coding primitive procedures. Our implementation treated the names of open-coded procedures as reserved words. If a program were to rebind such a name, the mechanism described in exercise 5.38 would still open-code it as a primitive, ignoring the new binding. For example, consider the procedure

(lambda (+ * a b x y)
  (+ (* a x) (* b y)))

which computes a linear combination of x and y. We might call it with arguments +matrix, *matrix, and four matrices, but the open-coding compiler would still open-code the + and the * in (+ (* a x) (* b y)) as primitive + and *. Modify the open-coding compiler to consult the compile-time environment in order to compile the correct code for expressions involving the names of primitive procedures. (The code will work correctly as long as the program does not define or set! these names.)

Name: Anonymous 2021-03-16 10:22

5.5.7  Interfacing Compiled Code to the Evaluator

We have not yet explained how to load compiled code into the evaluator machine or how to run it. We will assume that the explicit-control-evaluator machine has been defined as in section 5.4.4, with the additional operations specified in footnote 38. We will implement a procedure compile-and-go that compiles a Scheme expression, loads the resulting object code into the evaluator machine, and causes the machine to run the code in the evaluator global environment, print the result, and enter the evaluator's driver loop. We will also modify the evaluator so that interpreted expressions can call compiled procedures as well as interpreted ones. We can then put a compiled procedure into the machine and use the evaluator to call it:

(compile-and-go
 '(define (factorial n)
    (if (= n 1)
        1
        (* (factorial (- n 1)) n))))
 ;;; EC-Eval value:
ok
 ;;; EC-Eval input:
(factorial 5)
;;; EC-Eval value:
120

To allow the evaluator to handle compiled procedures (for example, to evaluate the call to factorial above), we need to change the code at apply-dispatch (section 5.4.1) so that it recognizes compiled procedures (as distinct from compound or primitive procedures) and transfers control directly to the entry point of the compiled code:48

apply-dispatch
  (test (op primitive-procedure?) (reg proc))
  (branch (label primitive-apply))
  (test (op compound-procedure?) (reg proc))  
  (branch (label compound-apply))
  (test (op compiled-procedure?) (reg proc))  
  (branch (label compiled-apply))
  (goto (label unknown-procedure-type))
compiled-apply
  (restore continue)
  (assign val (op compiled-procedure-entry) (reg proc))
  (goto (reg val))

Note the restore of continue at compiled-apply. Recall that the evaluator was arranged so that at apply-dispatch, the continuation would be at the top of the stack. The compiled code entry point, on the other hand, expects the continuation to be in continue, so continue must be restored before the compiled code is executed.

To enable us to run some compiled code when we start the evaluator machine, we add a branch instruction at the beginning of the evaluator machine, which causes the machine to go to a new entry point if the flag register is set.49

  (branch (label external-entry))      ; branches if flag is set
read-eval-print-loop
  (perform (op initialize-stack))
  ...

External-entry assumes that the machine is started with val containing the location of an instruction sequence that puts a result into val and ends with (goto (reg continue)). Starting at this entry point jumps to the location designated by val, but first assigns continue so that execution will return to print-result, which prints the value in val and then goes to the beginning of the evaluator's read-eval-print loop.50

external-entry
  (perform (op initialize-stack))
  (assign env (op get-global-environment))
  (assign continue (label print-result))
  (goto (reg val))

Now we can use the following procedure to compile a procedure definition, execute the compiled code, and run the read-eval-print loop so we can try the procedure. Because we want the compiled code to return to the location in continue with its result in val, we compile the expression with a target of val and a linkage of return. In order to transform the object code produced by the compiler into executable instructions for the evaluator register machine, we use the procedure assemble from the register-machine simulator (section 5.2.2). We then initialize the val register to point to the list of instructions, set the flag so that the evaluator will go to external-entry, and start the evaluator.

(define (compile-and-go expression)
  (let ((instructions
         (assemble (statements
                    (compile expression 'val 'return))
                   eceval)))
    (set! the-global-environment (setup-environment))
    (set-register-contents! eceval 'val instructions)
    (set-register-contents! eceval 'flag true)
    (start eceval)))

If we have set up stack monitoring, as at the end of section 5.4.4, we can examine the stack usage of compiled code:

(compile-and-go
 '(define (factorial n)
    (if (= n 1)
        1
        (* (factorial (- n 1)) n))))

(total-pushes = 0 maximum-depth = 0)
 ;;; EC-Eval value:
ok
 ;;; EC-Eval input:
(factorial 5)
(total-pushes = 31 maximum-depth = 14)
;;; EC-Eval value:
120

Compare this example with the evaluation of (factorial 5) using the interpreted version of the same procedure, shown at the end of section 5.4.4. The interpreted version required 144 pushes and a maximum stack depth of 28. This illustrates the optimization that results from our compilation strategy.

Interpretation and compilation

With the programs in this section, we can now experiment with the alternative execution strategies of interpretation and compilation.51 An interpreter raises the machine to the level of the user program; a compiler lowers the user program to the level of the machine language. We can regard the Scheme language (or any programming language) as a coherent family of abstractions erected on the machine language. Interpreters are good for interactive program development and debugging because the steps of program execution are organized in terms of these abstractions, and are therefore more intelligible to the programmer. Compiled code can execute faster, because the steps of program execution are organized in terms of the machine language, and the compiler is free to make optimizations that cut across the higher-level abstractions.52

The alternatives of interpretation and compilation also lead to different strategies for porting languages to new computers. Suppose that we wish to implement Lisp for a new machine. One strategy is to begin with the explicit-control evaluator of section 5.4 and translate its instructions to instructions for the new machine. A different strategy is to begin with the compiler and change the code generators so that they generate code for the new machine. The second strategy allows us to run any Lisp program on the new machine by first compiling it with the compiler running on our original Lisp system, and linking it with a compiled version of the run-time library.53 Better yet, we can compile the compiler itself, and run this on the new machine to compile other Lisp programs.54 Or we can compile one of the interpreters of section 4.1 to produce an interpreter that runs on the new machine.

Exercise 5.45.  By comparing the stack operations used by compiled code to the stack operations used by the evaluator for the same computation, we can determine the extent to which the compiler optimizes use of the stack, both in speed (reducing the total number of stack operations) and in space (reducing the maximum stack depth). Comparing this optimized stack use to the performance of a special-purpose machine for the same computation gives some indication of the quality of the compiler.

a. Exercise 5.27 asked you to determine, as a function of n, the number of pushes and the maximum stack depth needed by the evaluator to compute n! using the recursive factorial procedure given above. Exercise 5.14 asked you to do the same measurements for the special-purpose factorial machine shown in figure 5.11. Now perform the same analysis using the compiled factorial procedure.

Take the ratio of the number of pushes in the compiled version to the number of pushes in the interpreted version, and do the same for the maximum stack depth. Since the number of operations and the stack depth used to compute n! are linear in n, these ratios should approach constants as n becomes large. What are these constants? Similarly, find the ratios of the stack usage in the special-purpose machine to the usage in the interpreted version.

Compare the ratios for special-purpose versus interpreted code to the ratios for compiled versus interpreted code. You should find that the special-purpose machine does much better than the compiled code, since the hand-tailored controller code should be much better than what is produced by our rudimentary general-purpose compiler.

b. Can you suggest improvements to the compiler that would help it generate code that would come closer in performance to the hand-tailored version?

Exercise 5.46.  Carry out an analysis like the one in exercise 5.45 to determine the effectiveness of compiling the tree-recursive Fibonacci procedure

(define (fib n)
  (if (< n 2)
      n
      (+ (fib (- n 1)) (fib (- n 2)))))

compared to the effectiveness of using the special-purpose Fibonacci machine of figure 5.12. (For measurement of the interpreted performance, see exercise 5.29.) For Fibonacci, the time resource used is not linear in n; hence the ratios of stack operations will not approach a limiting value that is independent of n.

Exercise 5.47.  This section described how to modify the explicit-control evaluator so that interpreted code can call compiled procedures. Show how to modify the compiler so that compiled procedures can call not only primitive procedures and compiled procedures, but interpreted procedures as well. This requires modifying compile-procedure-call to handle the case of compound (interpreted) procedures. Be sure to handle all the same target and linkage combinations as in compile-proc-appl. To do the actual procedure application, the code needs to jump to the evaluator's compound-apply entry point. This label cannot be directly referenced in object code (since the assembler requires that all labels referenced by the code it is assembling be defined there), so we will add a register called compapp to the evaluator machine to hold this entry point, and add an instruction to initialize it:

  (assign compapp (label compound-apply))
  (branch (label external-entry))      ; branches if flag is set
read-eval-print-loop
  ...

To test your code, start by defining a procedure f that calls a procedure g. Use compile-and-go to compile the definition of f and start the evaluator. Now, typing at the evaluator, define g and try to call f.

Name: Anonymous 2021-03-16 10:22

Exercise 5.48.  The compile-and-go interface implemented in this section is awkward, since the compiler can be called only once (when the evaluator machine is started). Augment the compiler-interpreter interface by providing a compile-and-run primitive that can be called from within the explicit-control evaluator as follows:

;;; EC-Eval input:
(compile-and-run
 '(define (factorial n)
    (if (= n 1)
        1
        (* (factorial (- n 1)) n))))
;;; EC-Eval value:
ok
;;; EC-Eval input:
(factorial 5)
;;; EC-Eval value:
120

Exercise 5.49.  As an alternative to using the explicit-control evaluator's read-eval-print loop, design a register machine that performs a read-compile-execute-print loop. That is, the machine should run a loop that reads an expression, compiles it, assembles and executes the resulting code, and prints the result. This is easy to run in our simulated setup, since we can arrange to call the procedures compile and assemble as ``register-machine operations.''

Exercise 5.50.  Use the compiler to compile the metacircular evaluator of section 4.1 and run this program using the register-machine simulator. (To compile more than one definition at a time, you can package the definitions in a begin.) The resulting interpreter will run very slowly because of the multiple levels of interpretation, but getting all the details to work is an instructive exercise.

Exercise 5.51.  Develop a rudimentary implementation of Scheme in C (or some other low-level language of your choice) by translating the explicit-control evaluator of section 5.4 into C. In order to run this code you will need to also provide appropriate storage-allocation routines and other run-time support.

Exercise 5.52.  As a counterpoint to exercise 5.51, modify the compiler so that it compiles Scheme procedures into sequences of C instructions. Compile the metacircular evaluator of section 4.1 to produce a Scheme interpreter written in C.

33 This is a theoretical statement. We are not claiming that the evaluator's data paths are a particularly convenient or efficient set of data paths for a general-purpose computer. For example, they are not very good for implementing high-performance floating-point calculations or calculations that intensively manipulate bit vectors.

34 Actually, the machine that runs compiled code can be simpler than the interpreter machine, because we won't use the exp and unev registers. The interpreter used these to hold pieces of unevaluated expressions. With the compiler, however, these expressions get built into the compiled code that the register machine will run. For the same reason, we don't need the machine operations that deal with expression syntax. But compiled code will use a few additional machine operations (to represent compiled procedure objects) that didn't appear in the explicit-control evaluator machine.

35 Notice, however, that our compiler is a Scheme program, and the syntax procedures that it uses to manipulate expressions are the actual Scheme procedures used with the metacircular evaluator. For the explicit-control evaluator, in contrast, we assumed that equivalent syntax operations were available as operations for the register machine. (Of course, when we simulated the register machine in Scheme, we used the actual Scheme procedures in our register machine simulation.)

36 This procedure uses a feature of Lisp called backquote (or quasiquote) that is handy for constructing lists. Preceding a list with a backquote symbol is much like quoting it, except that anything in the list that is flagged with a comma is evaluated.

For example, if the value of linkage is the symbol branch25, then the expression `((goto (label ,linkage))) evaluates to the list ((goto (label branch25))). Similarly, if the value of x is the list (a b c), then `(1 2 ,(car x)) evaluates to the list (1 2 a).

37 We can't just use the labels true-branch, false-branch, and after-if as shown above, because there might be more than one if in the program. The compiler uses the procedure make-label to generate labels. Make-label takes a symbol as argument and returns a new symbol that begins with the given symbol. For example, successive calls to (make-label 'a) would return a1, a2, and so on. Make-label can be implemented similarly to the generation of unique variable names in the query language, as follows:

(define label-counter 0)

(define (new-label-number)
  (set! label-counter (+ 1 label-counter))
  label-counter)

(define (make-label name)
  (string->symbol
    (string-append (symbol->string name)
                   (number->string (new-label-number)))))

38 We need machine operations to implement a data structure for representing compiled procedures, analogous to the structure for compound procedures described in section 4.1.3:

(define (make-compiled-procedure entry env)
  (list 'compiled-procedure entry env))

(define (compiled-procedure? proc)
  (tagged-list? proc 'compiled-procedure))

(define (compiled-procedure-entry c-proc) (cadr c-proc))

(define (compiled-procedure-env c-proc) (caddr c-proc))

39 Actually, we signal an error when the target is not val and the linkage is return, since the only place we request return linkages is in compiling procedures, and our convention is that procedures return their values in val.

40 Making a compiler generate tail-recursive code might seem like a straightforward idea. But most compilers for common languages, including C and Pascal, do not do this, and therefore these languages cannot represent iterative processes in terms of procedure call alone. The difficulty with tail recursion in these languages is that their implementations use the stack to store procedure arguments and local variables as well as return addresses. The Scheme implementations described in this book store arguments and variables in memory to be garbage-collected. The reason for using the stack for variables and arguments is that it avoids the need for garbage collection in languages that would not otherwise require it, and is generally believed to be more efficient. Sophisticated Lisp compilers can, in fact, use the stack for arguments without destroying tail recursion. (See Hanson 1990 for a description.) There is also some debate about whether stack allocation is actually more efficient than garbage collection in the first place, but the details seem to hinge on fine points of computer architecture. (See Appel 1987 and Miller and Rozas 1994 for opposing views on this issue.)

41 The variable all-regs is bound to the list of names of all the registers:

(define all-regs '(env proc val argl continue))

42 Note that preserving calls append with three arguments. Though the definition of append shown in this book accepts only two arguments, Scheme standardly provides an append procedure that takes an arbitrary number of arguments.

43 We have used the same symbol + here to denote both the source-language procedure and the machine operation. In general there will not be a one-to-one correspondence between primitives of the source language and primitives of the machine.

44 Making the primitives into reserved words is in general a bad idea, since a user cannot then rebind these names to different procedures. Moreover, if we add reserved words to a compiler that is in use, existing programs that define procedures with these names will stop working. See exercise 5.44 for ideas on how to avoid this problem.

45 This is not true if we allow internal definitions, unless we scan them out. See exercise 5.43.

46 This is the modification to variable lookup required if we implement the scanning method to eliminate internal definitions (exercise 5.43). We will need to eliminate these definitions in order for lexical addressing to work.

47 Lexical addresses cannot be used to access variables in the global environment, because these names can be defined and redefined interactively at any time. With internal definitions scanned out, as in exercise 5.43, the only definitions the compiler sees are those at top level, which act on the global environment. Compilation of a definition does not cause the defined name to be entered in the compile-time environment.

48 Of course, compiled procedures as well as interpreted procedures are compound (nonprimitive). For compatibility with the terminology used in the explicit-control evaluator, in this section we will use ``compound'' to mean interpreted (as opposed to compiled).

49 Now that the evaluator machine starts with a branch, we must always initialize the flag register before starting the evaluator machine. To start the machine at its ordinary read-eval-print loop, we could use

(define (start-eceval)
  (set! the-global-environment (setup-environment))
  (set-register-contents! eceval 'flag false)
  (start eceval))

50 Since a compiled procedure is an object that the system may try to print, we also modify the system print operation user-print (from section 4.1.4) so that it will not attempt to print the components of a compiled procedure:

(define (user-print object)
  (cond ((compound-procedure? object)
         (display (list 'compound-procedure
                        (procedure-parameters object)
                        (procedure-body object)
                        '<procedure-env>)))
        ((compiled-procedure? object)
         (display '<compiled-procedure>))
        (else (display object))))

51 We can do even better by extending the compiler to allow compiled code to call interpreted procedures. See exercise 5.47.

52 Independent of the strategy of execution, we incur significant overhead if we insist that errors encountered in execution of a user program be detected and signaled, rather than being allowed to kill the system or produce wrong answers. For example, an out-of-bounds array reference can be detected by checking the validity of the reference before performing it. The overhead of checking, however, can be many times the cost of the array reference itself, and a programmer should weigh speed against safety in determining whether such a check is desirable. A good compiler should be able to produce code with such checks, should avoid redundant checks, and should allow programmers to control the extent and type of error checking in the compiled code.

Compilers for popular languages, such as C and C++, put hardly any error-checking operations into running code, so as to make things run as fast as possible. As a result, it falls to programmers to explicitly provide error checking. Unfortunately, people often neglect to do this, even in critical applications where speed is not a constraint. Their programs lead fast and dangerous lives. For example, the notorious ``Worm'' that paralyzed the Internet in 1988 exploited the UNIX TM operating system's failure to check whether the input buffer has overflowed in the finger daemon. (See Spafford 1989.)

53 Of course, with either the interpretation or the compilation strategy we must also implement for the new machine storage allocation, input and output, and all the various operations that we took as ``primitive'' in our discussion of the evaluator and compiler. One strategy for minimizing work here is to write as many of these operations as possible in Lisp and then compile them for the new machine. Ultimately, everything reduces to a small kernel (such as garbage collection and the mechanism for applying actual machine primitives) that is hand-coded for the new machine.

54 This strategy leads to amusing tests of correctness of the compiler, such as checking whether the compilation of a program on the new machine, using the compiled compiler, is identical with the compilation of the program on the original Lisp system. Tracking down the source of differences is fun but often frustrating, because the results are extremely sensitive to minuscule details.

Name: Anonymous 2021-03-16 10:24

References

Abelson, Harold, Andrew Berlin, Jacob Katzenelson, William McAllister, Guillermo Rozas, Gerald Jay Sussman, and Jack Wisdom. 1992. The Supercomputer Toolkit: A general framework for special-purpose computing. International Journal of High-Speed Electronics 3(3):337-361.

Allen, John. 1978. Anatomy of Lisp. New York: McGraw-Hill.

ANSI X3.226-1994. American National Standard for Information Systems -- Programming Language -- Common Lisp.

Appel, Andrew W. 1987. Garbage collection can be faster than stack allocation. Information Processing Letters 25(4):275-279.

Backus, John. 1978. Can programming be liberated from the von Neumann style? Communications of the ACM 21(8):613-641.

Baker, Henry G., Jr. 1978. List processing in real time on a serial computer. Communications of the ACM 21(4):280-293.

Batali, John, Neil Mayle, Howard Shrobe, Gerald Jay Sussman, and Daniel Weise. 1982. The Scheme-81 architecture -- System and chip. In Proceedings of the MIT Conference on Advanced Research in VLSI, edited by Paul Penfield, Jr. Dedham, MA: Artech House.

Borning, Alan. 1977. ThingLab -- An object-oriented system for building simulations using constraints. In Proceedings of the 5th International Joint Conference on Artificial Intelligence.

Borodin, Alan, and Ian Munro. 1975. The Computational Complexity of Algebraic and Numeric Problems. New York: American Elsevier.

Chaitin, Gregory J. 1975. Randomness and mathematical proof. Scientific American 232(5):47-52.

Church, Alonzo. 1941. The Calculi of Lambda-Conversion. Princeton, N.J.: Princeton University Press.

Clark, Keith L. 1978. Negation as failure. In Logic and Data Bases. New York: Plenum Press, pp. 293-322.

Clinger, William. 1982. Nondeterministic call by need is neither lazy nor by name. In Proceedings of the ACM Symposium on Lisp and Functional Programming, pp. 226-234.

Clinger, William, and Jonathan Rees. 1991. Macros that work. In Proceedings of the 1991 ACM Conference on Principles of Programming Languages, pp. 155-162.

Colmerauer A., H. Kanoui, R. Pasero, and P. Roussel. 1973. Un système de communication homme-machine en français. Technical report, Groupe Intelligence Artificielle, Université d'Aix Marseille, Luminy.

Cormen, Thomas, Charles Leiserson, and Ronald Rivest. 1990. Introduction to Algorithms. Cambridge, MA: MIT Press.

Darlington, John, Peter Henderson, and David Turner. 1982. Functional Programming and Its Applications. New York: Cambridge University Press.

Dijkstra, Edsger W. 1968a. The structure of the ``THE'' multiprogramming system. Communications of the ACM 11(5):341-346.

Dijkstra, Edsger W. 1968b. Cooperating sequential processes. In Programming Languages, edited by F. Genuys. New York: Academic Press, pp. 43-112.

Dinesman, Howard P. 1968. Superior Mathematical Puzzles. New York: Simon and Schuster.

deKleer, Johan, Jon Doyle, Guy Steele, and Gerald J. Sussman. 1977. AMORD: Explicit control of reasoning. In Proceedings of the ACM Symposium on Artificial Intelligence and Programming Languages, pp. 116-125.

Doyle, Jon. 1979. A truth maintenance system. Artificial Intelligence 12:231-272.

Feigenbaum, Edward, and Howard Shrobe. 1993. The Japanese National Fifth Generation Project: Introduction, survey, and evaluation. In Future Generation Computer Systems, vol. 9, pp. 105-117.

Feeley, Marc. 1986. Deux approches à l'implantation du language Scheme. Masters thesis, Université de Montréal.

Feeley, Marc and Guy Lapalme. 1987. Using closures for code generation. Journal of Computer Languages 12(1):47-66.

Feller, William. 1957. An Introduction to Probability Theory and Its Applications, volume 1. New York: John Wiley & Sons.

Fenichel, R., and J. Yochelson. 1969. A Lisp garbage collector for virtual memory computer systems. Communications of the ACM 12(11):611-612.

Floyd, Robert. 1967. Nondeterministic algorithms. JACM, 14(4):636-644.

Forbus, Kenneth D., and Johan deKleer. 1993. Building Problem Solvers. Cambridge, MA: MIT Press.

Friedman, Daniel P., and David S. Wise. 1976. CONS should not evaluate its arguments. In Automata, Languages, and Programming: Third International Colloquium, edited by S. Michaelson and R. Milner, pp. 257-284.

Friedman, Daniel P., Mitchell Wand, and Christopher T. Haynes. 1992. Essentials of Programming Languages. Cambridge, MA: MIT Press/McGraw-Hill.

Gabriel, Richard P. 1988. The Why of Y. Lisp Pointers 2(2):15-25.

Goldberg, Adele, and David Robson. 1983. Smalltalk-80: The Language and Its Implementation. Reading, MA: Addison-Wesley.

Gordon, Michael, Robin Milner, and Christopher Wadsworth. 1979. Edinburgh LCF. Lecture Notes in Computer Science, volume 78. New York: Springer-Verlag.

Gray, Jim, and Andreas Reuter. 1993. Transaction Processing: Concepts and Models. San Mateo, CA: Morgan-Kaufman.

Green, Cordell. 1969. Application of theorem proving to problem solving. In Proceedings of the International Joint Conference on Artificial Intelligence, pp. 219-240.

Green, Cordell, and Bertram Raphael. 1968. The use of theorem-proving techniques in question-answering systems. In Proceedings of the ACM National Conference, pp. 169-181.

Griss, Martin L. 1981. Portable Standard Lisp, a brief overview. Utah Symbolic Computation Group Operating Note 58, University of Utah.

Guttag, John V. 1977. Abstract data types and the development of data structures. Communications of the ACM 20(6):397-404.

Hamming, Richard W. 1980. Coding and Information Theory. Englewood Cliffs, N.J.: Prentice-Hall.

Hanson, Christopher P. 1990. Efficient stack allocation for tail-recursive languages. In Proceedings of ACM Conference on Lisp and Functional Programming, pp. 106-118.

Hanson, Christopher P. 1991. A syntactic closures macro facility. Lisp Pointers, 4(3).

Hardy, Godfrey H. 1921. Srinivasa Ramanujan. Proceedings of the London Mathematical Society XIX(2).

Hardy, Godfrey H., and E. M. Wright. 1960. An Introduction to the Theory of Numbers. 4th edition. New York: Oxford University Press.

Havender, J. 1968. Avoiding deadlocks in multi-tasking systems. IBM Systems Journal 7(2):74-84.

Hearn, Anthony C. 1969. Standard Lisp. Technical report AIM-90, Artificial Intelligence Project, Stanford University.

Henderson, Peter. 1980. Functional Programming: Application and Implementation. Englewood Cliffs, N.J.: Prentice-Hall.

Henderson. Peter. 1982. Functional Geometry. In Conference Record of the 1982 ACM Symposium on Lisp and Functional Programming, pp. 179-187.

Hewitt, Carl E. 1969. PLANNER: A language for proving theorems in robots. In Proceedings of the International Joint Conference on Artificial Intelligence, pp. 295-301.

Hewitt, Carl E. 1977. Viewing control structures as patterns of passing messages. Journal of Artificial Intelligence 8(3):323-364.

Hoare, C. A. R. 1972. Proof of correctness of data representations. Acta Informatica 1(1).

Hodges, Andrew. 1983. Alan Turing: The Enigma. New York: Simon and Schuster.

Hofstadter, Douglas R. 1979. Gödel, Escher, Bach: An Eternal Golden Braid. New York: Basic Books.

Hughes, R. J. M. 1990. Why functional programming matters. In Research Topics in Functional Programming, edited by David Turner. Reading, MA: Addison-Wesley, pp. 17-42.

IEEE Std 1178-1990. 1990. IEEE Standard for the Scheme Programming Language.

Ingerman, Peter, Edgar Irons, Kirk Sattley, and Wallace Feurzeig; assisted by M. Lind, Herbert Kanner, and Robert Floyd. 1960. THUNKS: A way of compiling procedure statements, with some comments on procedure declarations. Unpublished manuscript. (Also, private communication from Wallace Feurzeig.)

Kaldewaij, Anne. 1990. Programming: The Derivation of Algorithms. New York: Prentice-Hall.

Kohlbecker, Eugene Edmund, Jr. 1986. Syntactic extensions in the programming language Lisp. Ph.D. thesis, Indiana University.

Konopasek, Milos, and Sundaresan Jayaraman. 1984. The TK!Solver Book: A Guide to Problem-Solving in Science, Engineering, Business, and Education. Berkeley, CA: Osborne/McGraw-Hill.

Knuth, Donald E. 1973. Fundamental Algorithms. Volume 1 of The Art of Computer Programming. 2nd edition. Reading, MA: Addison-Wesley.

Knuth, Donald E. 1981. Seminumerical Algorithms. Volume 2 of The Art of Computer Programming. 2nd edition. Reading, MA: Addison-Wesley.

Kowalski, Robert. 1973. Predicate logic as a programming language. Technical report 70, Department of Computational Logic, School of Artificial Intelligence, University of Edinburgh.

Kowalski, Robert. 1979. Logic for Problem Solving. New York: North-Holland.

Lamport, Leslie. 1978. Time, clocks, and the ordering of events in a distributed system. Communications of the ACM 21(7):558-565.

Lampson, Butler, J. J. Horning, R. London, J. G. Mitchell, and G. K. Popek. 1981. Report on the programming language Euclid. Technical report, Computer Systems Research Group, University of Toronto.

Landin, Peter. 1965. A correspondence between Algol 60 and Church's lambda notation: Part I. Communications of the ACM 8(2):89-101.

Lieberman, Henry, and Carl E. Hewitt. 1983. A real-time garbage collector based on the lifetimes of objects. Communications of the ACM 26(6):419-429.

Liskov, Barbara H., and Stephen N. Zilles. 1975. Specification techniques for data abstractions. IEEE Transactions on Software Engineering 1(1):7-19.

McAllester, David Allen. 1978. A three-valued truth-maintenance system. Memo 473, MIT Artificial Intelligence Laboratory.

McAllester, David Allen. 1980. An outlook on truth maintenance. Memo 551, MIT Artificial Intelligence Laboratory.

McCarthy, John. 1960. Recursive functions of symbolic expressions and their computation by machine. Communications of the ACM 3(4):184-195.

McCarthy, John. 1967. A basis for a mathematical theory of computation. In Computer Programing and Formal Systems, edited by P. Braffort and D. Hirschberg. North-Holland.

McCarthy, John. 1978. The history of Lisp. In Proceedings of the ACM SIGPLAN Conference on the History of Programming Languages.

McCarthy, John, P. W. Abrahams, D. J. Edwards, T. P. Hart, and M. I. Levin. 1965. Lisp 1.5 Programmer's Manual. 2nd edition. Cambridge, MA: MIT Press.

Name: Anonymous 2021-03-16 10:25

McDermott, Drew, and Gerald Jay Sussman. 1972. Conniver reference manual. Memo 259, MIT Artificial Intelligence Laboratory.

Miller, Gary L. 1976. Riemann's Hypothesis and tests for primality. Journal of Computer and System Sciences 13(3):300-317.

Miller, James S., and Guillermo J. Rozas. 1994. Garbage collection is fast, but a stack is faster. Memo 1462, MIT Artificial Intelligence Laboratory.

Moon, David. 1978. MacLisp reference manual, Version 0. Technical report, MIT Laboratory for Computer Science.

Moon, David, and Daniel Weinreb. 1981. Lisp machine manual. Technical report, MIT Artificial Intelligence Laboratory.

Morris, J. H., Eric Schmidt, and Philip Wadler. 1980. Experience with an applicative string processing language. In Proceedings of the 7th Annual ACM SIGACT/SIGPLAN Symposium on the Principles of Programming Languages.

Phillips, Hubert. 1934. The Sphinx Problem Book. London: Faber and Faber.

Pitman, Kent. 1983. The revised MacLisp Manual (Saturday evening edition). Technical report 295, MIT Laboratory for Computer Science.

Rabin, Michael O. 1980. Probabilistic algorithm for testing primality. Journal of Number Theory 12:128-138.

Raymond, Eric. 1993. The New Hacker's Dictionary. 2nd edition. Cambridge, MA: MIT Press.

Raynal, Michel. 1986. Algorithms for Mutual Exclusion. Cambridge, MA: MIT Press.

Rees, Jonathan A., and Norman I. Adams IV. 1982. T: A dialect of Lisp or, lambda: The ultimate software tool. In Conference Record of the 1982 ACM Symposium on Lisp and Functional Programming, pp. 114-122.

Rees, Jonathan, and William Clinger (eds). 1991. The revised4 report on the algorithmic language Scheme. Lisp Pointers, 4(3).

Rivest, Ronald, Adi Shamir, and Leonard Adleman. 1977. A method for obtaining digital signatures and public-key cryptosystems. Technical memo LCS/TM82, MIT Laboratory for Computer Science.

Robinson, J. A. 1965. A machine-oriented logic based on the resolution principle. Journal of the ACM 12(1):23.

Robinson, J. A. 1983. Logic programming -- Past, present, and future. New Generation Computing 1:107-124.

Spafford, Eugene H. 1989. The Internet Worm: Crisis and aftermath. Communications of the ACM 32(6):678-688.

Steele, Guy Lewis, Jr. 1977. Debunking the ``expensive procedure call'' myth. In Proceedings of the National Conference of the ACM, pp. 153-62.

Steele, Guy Lewis, Jr. 1982. An overview of Common Lisp. In Proceedings of the ACM Symposium on Lisp and Functional Programming, pp. 98-107.

Steele, Guy Lewis, Jr. 1990. Common Lisp: The Language. 2nd edition. Digital Press.

Steele, Guy Lewis, Jr., and Gerald Jay Sussman. 1975. Scheme: An interpreter for the extended lambda calculus. Memo 349, MIT Artificial Intelligence Laboratory.

Steele, Guy Lewis, Jr., Donald R. Woods, Raphael A. Finkel, Mark R. Crispin, Richard M. Stallman, and Geoffrey S. Goodfellow. 1983. The Hacker's Dictionary. New York: Harper & Row.

Stoy, Joseph E. 1977. Denotational Semantics. Cambridge, MA: MIT Press.

Sussman, Gerald Jay, and Richard M. Stallman. 1975. Heuristic techniques in computer-aided circuit analysis. IEEE Transactions on Circuits and Systems CAS-22(11):857-865.

Sussman, Gerald Jay, and Guy Lewis Steele Jr. 1980. Constraints -- A language for expressing almost-hierachical descriptions. AI Journal 14:1-39.

Sussman, Gerald Jay, and Jack Wisdom. 1992. Chaotic evolution of the solar system. Science 257:256-262.

Sussman, Gerald Jay, Terry Winograd, and Eugene Charniak. 1971. Microplanner reference manual. Memo 203A, MIT Artificial Intelligence Laboratory.

Sutherland, Ivan E. 1963. SKETCHPAD: A man-machine graphical communication system. Technical report 296, MIT Lincoln Laboratory.

Teitelman, Warren. 1974. Interlisp reference manual. Technical report, Xerox Palo Alto Research Center.

Thatcher, James W., Eric G. Wagner, and Jesse B. Wright. 1978. Data type specification: Parameterization and the power of specification techniques. In Conference Record of the Tenth Annual ACM Symposium on Theory of Computing, pp. 119-132. Turner, David. 1981. The future of applicative languages. In Proceedings of the 3rd European Conference on Informatics, Lecture Notes in Computer Science, volume 123. New York: Springer-Verlag, pp. 334-348.

Wand, Mitchell. 1980. Continuation-based program transformation strategies. Journal of the ACM 27(1):164-180.

Waters, Richard C. 1979. A method for analyzing loop programs. IEEE Transactions on Software Engineering 5(3):237-247.

Winograd, Terry. 1971. Procedures as a representation for data in a computer program for understanding natural language. Technical report AI TR-17, MIT Artificial Intelligence Laboratory.

Winston, Patrick. 1992. Artificial Intelligence. 3rd edition. Reading, MA: Addison-Wesley.

Zabih, Ramin, David McAllester, and David Chapman. 1987. Non-deterministic Lisp with dependency-directed backtracking. AAAI-87, pp. 59-64.

Zippel, Richard. 1979. Probabilistic algorithms for sparse polynomials. Ph.D. dissertation, Department of Electrical Engineering and Computer Science, MIT.

Zippel, Richard. 1993. Effective Polynomial Computation. Boston, MA: Kluwer Academic Publishers.

Name: Anonymous 2021-03-16 10:31

>>85-86 correction for >>85 has a mission page(i accidently pasted it in another thread):

 Exercise 4.5.  Scheme allows an additional syntax for cond clauses, (<test> => <recipient>). If <test> evaluates to a true value, then <recipient> is evaluated. Its value must be a procedure of one argument; this procedure is then invoked on the value of the <test>, and the result is returned as the value of the cond expression. For example

(cond ((assoc 'b '((a 1) (b 2))) => cadr)
      (else false))

returns 2. Modify the handling of cond so that it supports this extended syntax.

Exercise 4.6.  Let expressions are derived expressions, because

(let ((<var1> <exp1>) ... (<varn> <expn>))
  <body>)

is equivalent to

((lambda (<var1> ... <varn>)
   <body>)
 <exp1>
 
 <expn>)

Implement a syntactic transformation let->combination that reduces evaluating let expressions to evaluating combinations of the type shown above, and add the appropriate clause to eval to handle let expressions.

Exercise 4.7.  Let* is similar to let, except that the bindings of the let variables are performed sequentially from left to right, and each binding is made in an environment in which all of the preceding bindings are visible. For example

(let* ((x 3)
       (y (+ x 2))
       (z (+ x y 5)))
  (* x z))

returns 39. Explain how a let* expression can be rewritten as a set of nested let expressions, and write a procedure let*->nested-lets that performs this transformation. If we have already implemented let (exercise 4.6) and we want to extend the evaluator to handle let*, is it sufficient to add a clause to eval whose action is

(eval (let*->nested-lets exp) env)

or must we explicitly expand let* in terms of non-derived expressions?

Exercise 4.8.  ``Named let'' is a variant of let that has the form

(let <var> <bindings> <body>)

The <bindings> and <body> are just as in ordinary let, except that <var> is bound within <body> to a procedure whose body is <body> and whose parameters are the variables in the <bindings>. Thus, one can repeatedly execute the <body> by invoking the procedure named <var>. For example, the iterative Fibonacci procedure (section 1.2.2) can be rewritten using named let as follows:

(define (fib n)
  (let fib-iter ((a 1)
                 (b 0)
                 (count n))
    (if (= count 0)
        b
        (fib-iter (+ a b) a (- count 1)))))

Modify let->combination of exercise 4.6 to also support named let.

Exercise 4.9.  Many languages support a variety of iteration constructs, such as do, for, while, and until. In Scheme, iterative processes can be expressed in terms of ordinary procedure calls, so special iteration constructs provide no essential gain in computational power. On the other hand, such constructs are often convenient. Design some iteration constructs, give examples of their use, and show how to implement them as derived expressions.

Exercise 4.10.  By using data abstraction, we were able to write an eval procedure that is independent of the particular syntax of the language to be evaluated. To illustrate this, design and implement a new syntax for Scheme by modifying the procedures in this section, without changing eval or apply.

4.1.3  Evaluator Data Structures

In addition to defining the external syntax of expressions, the evaluator implementation must also define the data structures that the evaluator manipulates internally, as part of the execution of a program, such as the representation of procedures and environments and the representation of true and false.

Testing of predicates

For conditionals, we accept anything to be true that is not the explicit false object.

(define (true? x)
  (not (eq? x false)))
(define (false? x)
  (eq? x false))

Representing procedures

To handle primitives, we assume that we have available the following procedures:

    (apply-primitive-procedure <proc> <args>)
    applies the given primitive procedure to the argument values in the list <args> and returns the result of the application.

    (primitive-procedure? <proc>)
    tests whether <proc> is a primitive procedure. 

These mechanisms for handling primitives are further described in section 4.1.4.

Compound procedures are constructed from parameters, procedure bodies, and environments using the constructor make-procedure:

(define (make-procedure parameters body env)
  (list 'procedure parameters body env))
(define (compound-procedure? p)
  (tagged-list? p 'procedure))
(define (procedure-parameters p) (cadr p))
(define (procedure-body p) (caddr p))
(define (procedure-environment p) (cadddr p))

Operations on Environments

The evaluator needs operations for manipulating environments. As explained in section 3.2, an environment is a sequence of frames, where each frame is a table of bindings that associate variables with their corresponding values. We use the following operations for manipulating environments:

    (lookup-variable-value <var> <env>)
    returns the value that is bound to the symbol <var> in the environment <env>, or signals an error if the variable is unbound.

    (extend-environment <variables> <values> <base-env>)
    returns a new environment, consisting of a new frame in which the symbols in the list <variables> are bound to the corresponding elements in the list <values>, where the enclosing environment is the environment <base-env>.

    (define-variable! <var> <value> <env>)
    adds to the first frame in the environment <env> a new binding that associates the variable <var> with the value <value>.

    (set-variable-value! <var> <value> <env>)
    changes the binding of the variable <var> in the environment <env> so that the variable is now bound to the value <value>, or signals an error if the variable is unbound. 

To implement these operations we represent an environment as a list of frames. The enclosing environment of an environment is the cdr of the list. The empty environment is simply the empty list.

(define (enclosing-environment env) (cdr env))
(define (first-frame env) (car env))
(define the-empty-environment '())

Each frame of an environment is represented as a pair of lists: a list of the variables bound in that frame and a list of the associated values.14

(define (make-frame variables values)
  (cons variables values))
(define (frame-variables frame) (car frame))
(define (frame-values frame) (cdr frame))
(define (add-binding-to-frame! var val frame)
  (set-car! frame (cons var (car frame)))
  (set-cdr! frame (cons val (cdr frame))))

To extend an environment by a new frame that associates variables with values, we make a frame consisting of the list of variables and the list of values, and we adjoin this to the environment. We signal an error if the number of variables does not match the number of values.

(define (extend-environment vars vals base-env)
  (if (= (length vars) (length vals))
      (cons (make-frame vars vals) base-env)
      (if (< (length vars) (length vals))
          (error "Too many arguments supplied" vars vals)
          (error "Too few arguments supplied" vars vals))))

To look up a variable in an environment, we scan the list of variables in the first frame. If we find the desired variable, we return the corresponding element in the list of values. If we do not find the variable in the current frame, we search the enclosing environment, and so on. If we reach the empty environment, we signal an ``unbound variable'' error.

(define (lookup-variable-value var env)
  (define (env-loop env)
    (define (scan vars vals)
      (cond ((null? vars)
             (env-loop (enclosing-environment env)))
            ((eq? var (car vars))
             (car vals))
            (else (scan (cdr vars) (cdr vals)))))
    (if (eq? env the-empty-environment)
        (error "Unbound variable" var)
        (let ((frame (first-frame env)))
          (scan (frame-variables frame)
                (frame-values frame)))))
  (env-loop env))

To set a variable to a new value in a specified environment, we scan for the variable, just as in lookup-variable-value, and change the corresponding value when we find it.

(define (set-variable-value! var val env)
  (define (env-loop env)
    (define (scan vars vals)
      (cond ((null? vars)
             (env-loop (enclosing-environment env)))
            ((eq? var (car vars))
             (set-car! vals val))
            (else (scan (cdr vars) (cdr vals)))))
    (if (eq? env the-empty-environment)
        (error "Unbound variable -- SET!" var)
        (let ((frame (first-frame env)))
          (scan (frame-variables frame)
                (frame-values frame)))))
  (env-loop env))

To define a variable, we search the first frame for a binding for the variable, and change the binding if it exists (just as in set-variable-value!). If no such binding exists, we adjoin one to the first frame.

(define (define-variable! var val env)
  (let ((frame (first-frame env)))
    (define (scan vars vals)
      (cond ((null? vars)
             (add-binding-to-frame! var val frame))
            ((eq? var (car vars))
             (set-car! vals val))
            (else (scan (cdr vars) (cdr vals)))))
    (scan (frame-variables frame)
          (frame-values frame))))

The method described here is only one of many plausible ways to represent environments. Since we used data abstraction to isolate the rest of the evaluator from the detailed choice of representation, we could change the environment representation if we wanted to. (See exercise 4.11.) In a production-quality Lisp system, the speed of the evaluator's environment operations -- especially that of variable lookup -- has a major impact on the performance of the system. The representation described here, although conceptually simple, is not efficient and would not ordinarily be used in a production system.15

Exercise 4.11.  Instead of representing a frame as a pair of lists, we can represent a frame as a list of bindings, where each binding is a name-value pair. Rewrite the environment operations to use this alternative representation.

Exercise 4.12.  The procedures set-variable-value!, define-variable!, and lookup-variable-value can be expressed in terms of more abstract procedures for traversing the environment structure. Define abstractions that capture the common patterns and redefine the three procedures in terms of these abstractions.

Exercise 4.13.  Scheme allows us to create new bindings for variables by means of define, but provides no way to get rid of bindings. Implement for the evaluator a special form make-unbound! that removes the binding of a given symbol from the environment in which the make-unbound! expression is evaluated. This problem is not completely specified. For example, should we remove only the binding in the first frame of the environment? Complete the specification and justify any choices you make.

Writing SICP, since you never read it.

1 Name: Anonymous 2021-03-16 8:46

121 Name: Anonymous 2021-03-16 10:14

122 Name: Anonymous 2021-03-16 10:15

123 Name: Anonymous 2021-03-16 10:15

124 Name: Anonymous 2021-03-16 10:16

125 Name: Anonymous 2021-03-16 10:16

126 Name: Anonymous 2021-03-16 10:17

127 Name: Anonymous 2021-03-16 10:17

128 Name: Anonymous 2021-03-16 10:17

129 Name: Anonymous 2021-03-16 10:18

130 Name: Anonymous 2021-03-16 10:18

131 Name: Anonymous 2021-03-16 10:19

132 Name: Anonymous 2021-03-16 10:20

133 Name: Anonymous 2021-03-16 10:21

134 Name: Anonymous 2021-03-16 10:21

135 Name: Anonymous 2021-03-16 10:22

136 Name: Anonymous 2021-03-16 10:22

137 Name: Anonymous 2021-03-16 10:22

138 Name: Anonymous 2021-03-16 10:24

139 Name: Anonymous 2021-03-16 10:25

140 Name: Anonymous 2021-03-16 10:31

141 Name: Anonymous 2021-03-16 20:17

Name: Anonymous 2021-03-16 8:46

Name: Anonymous 2021-03-16 10:14

Name: Anonymous 2021-03-16 10:15

Name: Anonymous 2021-03-16 10:15

Name: Anonymous 2021-03-16 10:16

Name: Anonymous 2021-03-16 10:16

Name: Anonymous 2021-03-16 10:17

Name: Anonymous 2021-03-16 10:17

Name: Anonymous 2021-03-16 10:17

Name: Anonymous 2021-03-16 10:18

Name: Anonymous 2021-03-16 10:18

Name: Anonymous 2021-03-16 10:19

Name: Anonymous 2021-03-16 10:20

Name: Anonymous 2021-03-16 10:21

Name: Anonymous 2021-03-16 10:21

Name: Anonymous 2021-03-16 10:22

Name: Anonymous 2021-03-16 10:22

Name: Anonymous 2021-03-16 10:22

Name: Anonymous 2021-03-16 10:24

Name: Anonymous 2021-03-16 10:25

Name: Anonymous 2021-03-16 10:31

Name: Anonymous 2021-03-16 20:17