Binary operations
So far, the programs in our language have only had to deal with one value at a
time. That’s quite intentional–by restricting our language in this way we’ve
been able to compile everything using only the rax register! Today, that
changes. Instead of dealing with one value, we’re going to introduce operations
that deal with–get this--two values. As it turns out, this is much more
challenging!
Binary operations in the interpreter
…Or at least, it’s much more challenging in the compiler. Binary operations in our interpreter really won’t be very different from unary operations! First off, here are the operations we will support:
(+ e1 e2)adds the values of the expressionse1ande2.e1ande2should evaluate to numbers.(- e1 e2)subtracts the values of the expressionse1ande2.e1ande2should evaluate to numbers.(= e1 e2)evaluates totrueife1ande2evaluate to the same value andfalseotherwise(< e1 e2)evaluates totrueife1evaluates to a smaller number thane2andfalseotherwise.e1ande2should evaluate to numbers.
Here are the cases we’ll need to add to interp_exp:
let interp_exp exp =
match exp with
(* some cases elided... *)
| Lst [Sym "+"; e1; e2] -> (
match (interp_exp e1, interp_exp e2) with
| Number n1, Number n2 ->
Number (n1 + n2)
| _ ->
raise (BadExpression exp) )
| Lst [Sym "-"; e1; e2] -> (
match (interp_exp e1, interp_exp e2) with
| Number n1, Number n2 ->
Number (n1 - n2)
| _ ->
raise (BadExpression exp) )
| Lst [Sym "="; e1; e2] ->
Boolean (interp_exp e1 = interp_exp e2)
| Lst [Sym "<"; e1; e2] -> (
match (interp_exp e1, interp_exp e2) with
| Number n1, Number n2 ->
Boolean (n1 < n2)
| _ ->
raise (BadExpression exp) )
Notice that this code enforces type-correctness: + and < will only work on
numbers. Just as we’ve seen with unary operations and conditionals, the
interpreter is just relying on OCaml’s implementations of these features.
Binary operations in the compiler
Here’s where it gets tricky.
Let’s try to sort of “naively” translate the interpreter version of +
(reminder: right now the compiler, unlike the interpreter, does not enforce
type-correctness):
let compile_exp exp =
match exp with
(* some cases elided... *)
| Lst [Sym "+"; e1; e2] ->
compile_exp e1 @ compile_exp e2 @ ...
Remember: compile_exp emits instructions to compute the value of exp and to
store that value in rax. So by the time we want to add the two values, e2 is
going to be in rax and e1 is going to be lost! So, we’ll somehow need to
“save” the value of e1. Here’s an idea: we could save it to a different
register! x86-64 has 16 general-purpose registers; let’s use r8, written Reg
R8 in our OCaml assembly library:
let compile_exp exp =
match exp with
(* some cases elided... *)
| Lst [Sym "+"; e1; e2] ->
compile_exp e1 @ [Mov (Reg R8, Reg Rax)] @ compile_exp e2 @ [Add (Reg Rax, Reg R8)]
Here we’re saving compiling e1 into rax, saving rax into r8, compiling
e2 into rax, then adding the results of the two expressions together. This
seems to work great!
$ compile_and_run "(+ 1 2)";; 3 $ compile_and_run "(+ (+ 1 2) 3)";; 6 $ compile_and_run "(+ 1 (+ 2 3))";; 7
Wait, what was that last result? Something’s not right here. Let’s look at the assembly we’re producing:
entry:
mov rax, 4
mov r8, rax
mov rax, 8
mov r8, rax
mov rax, 12
add rax, r8
add rax, r8
ret
We’re compiling (+ 1 (+ 2 3)) by first storing the runtime representation of
1 in r8, then compiling the second argument to +. But since the second
argument is also a call to +, the first thing it’s going to do is do overwrite
the value in r8 (in this case, with the runtime representation of 2).
We could try to get around this by using more registers. We could imagine having
our compiler take a list of registers it’s not allowed to use when compiling an
expression–here, since r8 is being used to store 1, we couldn’t use r8
when compiling (+ 2 3). If we had an infinite number of registers, a scheme
like this could work. But since we only have 16, there are going to be
expressions that we won’t be able to compile with that kind of scheme.
So we need someplace to store intermediate values during computation, where we won’t run out of room. How about memory?
The stack
The region of memory that our program has available for temporary use during computations is called the stack. (Longer-lived values live in the heap, which we’ll talk about in a few lectures.) We’ll start with a simple model of this region of memory; we’ll make this model more complex, and somewhat more accurate, when we talk about functions.
Imagine the stack as an array of cells, each of which has an address. The bottom
of our stack is at the highest address. When our program starts executing, the
register rsp contains this address. The memory cell at this address contains
the function’s return address. We’ll learn more about what that means later; for
now, just know that we shouldn’t overwrite the data at that address.
The “next” memory cell in the stack–that is, the first cell that we can write
data into–is at (rsp - 8). Why -? Because the stack grows “up”, from higher
addresses to lower addresses. rsp + 8 probably contains data used by the
calling function. Why 8? Because the word size on x86-64 is 8 bytes (64
bits). x86-64 memory addresses are 8 bytes; x86-64 registers are 8 bytes; all of
our program values are 8 bytes. So the stack looks like this:
| address | data |
|---|---|
| … | … |
rsp - 16 |
unused |
rsp - 8 |
unused |
rsp |
address of caller stack frame |
Accessing the stack from assembly
We’ve seen the mov instruction before–it lets us move immediate data into
registers, or move data between registers. It also lets us move data between
registers and memory. So, let’s modify our compiler to save the value of the
first argument to + to memory instead of r8.
let compile_exp exp =
match exp with
(* some cases elided... *)
| Lst [Sym "+"; e1; e2] ->
compile_exp e1
@ [Mov (MemOffset (Reg Rsp, Imm (-8)), Reg Rax)]
@ compile_exp e2
@ [Mov (Reg R8, MemOffset (Reg Rsp, Imm (-8)))]
@ [Add (Reg Rax, Reg R8)]
If we compile (+ 1 2) now, we get this:
entry:
mov rax, 4
mov [rsp + -8], rax
mov rax, 8
mov r8, [rsp + -8]
add rax, r8
ret
Those square-bracketed expressions are how our assembly language represents
memory accesses. As we see, offsets into memory (of the form <operand> +
<operand>) can be used as operands to instructions like mov and add.
What happens if we compile (+ 1 (+ 2 3)) now? We still have the same
problem we did before--2 is overwriting 1, this time at [rsp - 8] instead
of in r8:
entry:
mov rax, 4
mov [rsp + -8], rax
mov rax, 8
mov [rsp + -8], rax
mov rax, 12
mov r8, [rsp + -8]
add rax, r8
mov r8, [rsp + -8]
add rax, r8
ret
Now, though, we’ll be able to fix this issue.
Tracking the stack index
Instead of storing the intermediate value 2 at [rsp - 8], the compiler
should store it at the next available stack address: [rsp - 16]. So when we
call compile_exp e2, we will need to let it know that [rsp - 16] is the new
first stack address.
We can implement this by adding an argument to compile_exp representing the
next available stack index:
let compile_exp (stack_index : int) exp = ...
Most of the time, this stack_index argument will remain unchanged through
recursive calls. But if we store something on the stack, we’ll need to update
it. Right now, we need to do that in exactly one place: that compile_exp e2
call. We’ll modify our code to store the intermediate value at [rsp +
stack_index], and to subtract 8 from the stack index for that recursive call:
let compile_exp stack_index exp =
match exp with
(* some cases elided... *)
| Lst [Sym "+"; e1; e2] ->
compile_exp stack_index e1
@ [Mov (MemOffset (Reg Rsp, Imm stack_index), Reg Rax)]
@ compile_exp (stack_index - 8) e2
@ [Mov (Reg R8, MemOffset (Reg Rsp, Imm stack_index))]
@ [Add (Reg Rax, Reg R8)]
We now get the following code for (+ 1 (+ 2 3)):
entry:
mov rax, 4
mov [rsp + -8], rax
mov rax, 8
mov [rsp + -16], rax
mov rax, 12
mov r8, [rsp + -16]
add rax, r8
mov r8, [rsp + -8]
add rax, r8
ret
This now works great! We’ve successfully implemented addition.
Other binary operations
Our code for the other binary operations we support looks similar:
let compile_exp stack_index exp =
match exp with
(* some cases elided... *)
| Lst [Sym "-"; e1; e2] ->
compile_exp stack_index e1
@ [Mov (MemOffset (Reg Rsp, Imm stack_index), Reg Rax)]
@ compile_exp (stack_index - 8) e2
@ [Mov (Reg R8, Reg Rax)]
@ [Mov (Reg Rax, MemOffset (Reg Rsp, Imm stack_index))]
@ [Sub (Reg Rax, Reg R8)]
| Lst [Sym "="; e1; e2] ->
compile_exp stack_index e1
@ [Mov (MemOffset (Reg Rsp, Imm stack_index), Reg Rax)]
@ compile_exp (stack_index - 8) e2
@ [Mov (Reg R8, MemOffset (Reg Rsp, Imm stack_index))]
@ [Cmp (Reg Rax, Reg R8)]
@ zf_to_bool
| Lst [Sym "<"; e1; e2] ->
compile_exp stack_index e1
@ [Mov (MemOffset (Reg Rsp, Imm stack_index), Reg Rax)]
@ compile_exp (stack_index - 8) e2
@ [Mov (Reg R8, MemOffset (Reg Rsp, Imm stack_index))]
@ [Cmp (Reg R8, Reg Rax)]
@ lf_to_bool
< uses lf_to_bool, which calls setl instead of setz. setl reads the
SF and OF flags; after a comparison operation, it will set its operand to
1 if the first comparison argument was strictly less than the second.