Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon. Entire thread

How to compile arithmetic expressions to x86?

Name: Anonymous 2015-07-10 21:16

hi prog, say I want to take as given some variables on the stack x,y,z,w and compile expressions (x+y)*(z+w) to x86 asm.

first off the instructions take registers or pointers and well x86 is weird and has instructions like this:

add x, y ;x OR y must be a register, result in x
mul x ;multiplies eax with x, result is put into eax


so this would work:

mov eax, stack0
add eax, stack1
mov ebx, stack2
add ebx, stack3
mul ebx


but how would you make an algorithm to do this in general?

Name: Anonymous 2015-07-10 23:57

This is not the only way, but this is the traditional way. Introduce temporaries to generate code like the following.

t1 = x + y
t2 = z + w
t3 = t1*t2


Each node of the expression tree becomes a variable. You can also locate common subexpressions if you wish.

Then assign the variables to registers and stack positions and generate assembly like you wrote.

Name: Cudder !cXCudderUE 2015-07-11 1:23

some variables on the stack
The question you should be asking yourself is why aren't they in the registers already?

If you don't need x again (or one of the other ones, similarly), this uses one less register:
mov eax, [ebp+_y]
add [ebp+_x], eax
mov eax, [ebp+_z]
add eax, [ebp+_w]
mul [ebp+_x]


The general algorithm depends on how many of those values will need to be used after that single expression. It produces a dataflow graph and then matches the available instructions against it.

Name: Anonymous 2015-07-11 1:47

Please marry me, Cudder-sama ;_;

Name: Anonymous 2015-07-11 1:48

>>3
>>4
fuck off and kill yourself

Name: Anonymous 2015-07-11 1:55

>>3
That doesn't help you if you need to pass more values than there are registers, which is stupidly common on your beloved x86 because it only has a handful. It is also not fun at all to debug code for architectures with limited registers because an optimal register scheduling scheme will destroy even very recent and useful intermediate values with great eagerness.

Sure, you can always dial back the optimization level in your code generator or use tracing features, but on non prematurely optimized ISAs you aren't forced into doing this immediately for the most trivial cases. With x86, you have to reach for a bazooka to swat a fly.

Name: Anonymous 2015-07-11 4:42

>>4
fuck off cockring-nigger.

Name: Anonymous 2015-07-11 13:28

>>5
fuck off and optimize your quotes

Name: Cudder !cXCudderUE 2015-07-11 13:41

>>6
If you don't need to use memory then don't. Start using memory only when it doesn't all fit into registers - and arrange things suitably so that the values that are used first/most are the ones that get chosen for registers first. That's all there is to it.

It is also not fun at all to debug code for architectures with limited registers because an optimal register scheduling scheme will destroy even very recent and useful intermediate values with great eagerness.
I have no idea WTF you're complaining about. "It's too hard to debug"? It should be. This is not BASIC. If you're working at Asm level and worrying about debugging then YOU ARE DOING IT WRONG.

Name: Anonymous 2015-07-11 14:15

How to compile arithmetic expressions to x86?
You don't. First you wait for garbage collector to run and then, if you're lucky, the interpreter parses the expression and evaluates to some result.

Name: Anonymous 2015-07-11 14:58

>>9
kill yourself namefag, no one wants you here

>>10
lmao nice prog memes I can tell youre a true old/prog/

Name: Anonymous 2015-07-11 20:12

https://en.wikipedia.org/wiki/Three-address_code

I can easily compile to 3 address code, it's going from that to x86 that is hard

Name: Anonymous 2015-07-12 1:23

>>12
Then compile to two address code.

Name: Anonymous 2015-07-12 21:40

>>9
If you're working at Asm level and worrying about debugging then YOU ARE DOING IT WRONG.

This problem doesn't just apply to people who are writing assembly by hand. Any language implementation that compiles to native code and implements symbolic debugging by mapping the generated code back to high level constructs will have the same problem. It is usually hopeless to try and debug optimized x86 code generated by a compiler symbolically because the compiler will almost never generate code that maps back to the source in a straightforward fashion. Lack of registers is a major cause for this.

Name: Cudder !cXCudderUE 2015-07-13 6:38

>>14
"I'm too stupid to read Asm."

Go back to mental masturbation with your ultra-high-level languages, quiche-eater.

Name: Anonymous 2015-07-13 10:40

>>15
Who are you quoting.

Name: Anonymous 2015-07-13 11:44

>>16
Who are you quoting.
Whom are you quoting?

Name: Anonymous 2015-07-13 11:59

>>15 kill
>>16 your
>>17 self

Name: Anonymous 2015-07-13 14:36

Name: Anonymous 2015-07-14 18:29

>>15
You're a stupid codemonkey. You're an autist who would rather build spaceships out of legos than think about real spaceships because you can put yours together piece by piece. But in the end, all you'll have is a toy, while actual humans will have a ship that, while sometimes explosive, actually exist and can do everything your Assembly trinkets can't.

Name: Anonymous 2015-07-14 19:43

Who else loves quiche?

Name: Anonymous 2015-07-14 19:55

No real man, that's who

Name: Anonymous 2015-07-14 21:10

>>20
>>21
>>22
you retards can shut up until i get helped

Name: Anonymous 2015-07-14 21:18

>>23
You want some quiche too?

Name: Anonymous 2015-07-14 21:26

>>15
I wouldn't call C an ultra high level language. Not when targeting x86, anyway. I'm only familar with a handful of the most common optimizations, but the x86 output of most C compilers is rarely surprising to me. The main thing that makes things hard to follow is the incessant bouncing of values to the stack frame and back, which is a direct consequence of the shortage of registers.

Name: Anonymous 2015-07-14 23:52

>>24
Will it optimize his quotes?

Name: Anonymous 2015-07-15 1:15

OPTIMIZE THE QUICHE

Name: Anonymous 2015-07-15 4:08

>>25
I am Haskal programmer.

Name: expert !Up3JQchEFY 2015-07-15 5:59

>>12
It is my opinion that Three Address Code would be drastically overcomplicated in this situation.
This is because simple arithmetic expressions require no concept of looping or jumping, which would serve to only complicate this endeavor.
Surely a better approach would be utilizing Dijkstra's Shunting Yard algorithm1 to translate a given infix expressoan to a Postfix2 expression.
The advantages of this are obvious, of course, in that you could allocate registers3 and generate x86 Assembly4 efficiently in one pass over the postfix expression.

As an example, let us consider the infix expression, "(3 * 4) + 2".
The tokenization of this expression is displayed in figure A.
The translation of this expression to a linear stream of assembly instructions is not immediately clear in infix form, but if we are to translate it to a postfix expression, as in figure B, the (slightly naive) translation is simple and straightforward, as displayed in figure C.

I believe it is now adaquately obvious that translation to postfix is highly preferable to three address code. There is one alternative worth mentioning, however, which is building a tree5 from the infix expression, and generating assembly by recursively iterating through it. This is approximately equivalent in difficulty, but can make traversal and register allocation slightly more complex.

(
3
*
4
)
+
2

Figure A

3
4
*
2
+

Figure B

mov $3, %eax
mov $4, %ebx
mul %ebx
mov $2, %ebx
add %ebx, %eax
; The end result of the expression is now stored in %eax.

Figure C

References:
1: https://en.wikipedia.org/wiki/Shunting_yard_algorithm
2: https://en.wikipedia.org/wiki/Reverse_Polish_notation
3: https://en.wikipedia.org/wiki/Register_allocation
4: https://en.wikipedia.org/wiki/X86_instruction_listings
5: https://en.wikipedia.org/wiki/Parse_tree

Name: Anonymous 2015-07-15 7:48

>>29
You are actually posting something that makes me think! It's been so long since this has happened!

In that example the size of the stack never exceeded two, so you could do the whole thing directly in registers by just assigning registers to stack positions. If the stack grows larger it's a little harder, but I'm sure there are many ways of adapting it. The stack would be the usual stack and the registers would act like a sliding circular buffer cache for the top elements of the stack.

Name: Anonymous 2015-07-15 17:13

>>29
Welcome to the 1970s, here's your peephole optimizer and an absence of common subexpression elimination.

Name: Anonymous 2015-07-15 17:56

>>31
You can do subexpression elimination by duping a value at a specified depth in the stack.

Name: Anonymous 2015-07-15 19:25

>>32
Yes, that's the easy part, i.e. what happens once the values have been identified as duplicates.

Name: Anonymous 2015-07-15 20:45

>>33
And identifying duplicates isn't too bad either. Just search for maximal identical subtrees of the expression tree.

Name: Anonymous 2015-07-15 20:49

the problem is to produce asm, not search for common subexpressions

Name: Anonymous 2015-07-15 22:14

>>31
If it was the 1970s, he would have to give that compiler away for free if he wanted anyone to use it.

Name: Anonymous 2015-07-15 23:48

>>35
Search for my anus.

>>36
Use my anus.

Name: Anonymous 2015-07-16 3:00

>>36
UNIX QUALITY!

Name: Cudder !cXCudderUE 2015-07-18 15:41

>>29
RPN just gets you the order of operations. That's trivial stuff. Mapping values to registers and selecting instructions is the interesting part. Relevant links:

https://en.wikipedia.org/wiki/Strahler_number
https://en.wikipedia.org/wiki/Sethi%E2%80%93Ullman_algorithm

And use the standard syntax, not that backwards GNU shit.

>>30
Once you need more storage than available registers then a dataflow-directed method should be far better than "stack emulation", although testing this will be left as an exercise for the reader.

Name: Anonymous 2015-07-18 15:54

Ullman claims in his personal page at Stanford to be against the Iranian government,[5] but it's also alleged that he has demonstrated anti-Iranian sentiments. In one case, he responded to an email from an Iranian student who had inquired about admission at Stanford with an off-topic political rant and went on to say that he would not help Iranian students even if he could:

And even if I were in a position to help, I will not help Iranian students until Iran recognizes and respects Israel as the land of the Jewish people. I know that you may not hold the same insane position as the mullahs that run your country, but it is a matter of principle. If Iranians want the benefits of Stanford and other institutions in the US, they have to respect the values we hold in the US, including freedom of religion and respect for human rights.

Following that the National Iranian American Council issued a formal complaint to Stanford University,[6] to which Stanford spokesperson, Lisa Lapin responded that Ullman was expressing his own personal views and not the views of the University and that "he has no involvement in admission, and Stanford doesn’t discriminate in their admission process"[7][8][9][10]

Newer Posts
Don't change these.
Name: Email:
Entire Thread Thread List