/prog/ - VM instruction implementation without huge switch()?

Name: Anonymous 2014-06-09 23:26

How would one go about making a virtual machine without using a huge and clumsy switch( opcode ) statement?

The architecture I'm implementing (DCPU-16) has about 35-40 different instructions.

Name: Anonymous 2014-06-09 23:41

Make a data structure with an opcode's code and a function pointer to the opcode's operation.
Sort an array of those structures by opcode.
Every time you get an opcode at runtime, do a binary search on the array and, if the opcode is found, call the function pointer with the opcode's arguments (if any). If the opcode is not found, you know what to do.

Name: Anonymous 2014-06-09 23:59

switch runs in constant time, >>2-san's binary search runs in logarithmic time.

Don't listen to >>2-san and go for a switch.

Name: Anonymous 2014-06-10 0:13

Store an array for function pointers indexed by the opcode which calls them.

#define EAT 0x00
#define FUCK 0x01
#define DIE 0x02
#define INSTRUCTIONS 0x03
void eat(){}
void fuck(){}
void die(){}
int main(int a, char ** b){
     void (*i[3])();
     i[EAT]=eat; i[FUCK]=fuck; i[DIE]=die;
     while(peakNextInstruction() < INSTRUCTIONS){
          i[popNextInstruction()]();
     }
return -1;
}

Name: Anonymous 2014-06-10 0:35

>>1

Yes. You can translate opcode into x86 opcode, write it into array and execute directly. Be careful with self modifying code.

Name: Anonymous 2014-06-10 1:48

>>4

eat fuck die

And having the mind of a 13 year old is also a sign of apping ability.

Name: Anonymous 2014-06-10 1:50

>>3
switch is not constant. Its upper bound is N. If the opcode being executed is the last opcode in your switch, you need to make N, or in this case 40, comparisons, whereas lg(40) = 5.

A switch would be better if you know that certain opcodes will be executed from frequently than others, in which case you can order your switch by the frequency of the opcode.

Name: Anonymous 2014-06-10 1:53

>>7
http://en.wikipedia.org/wiki/Branch_table
switch is O(1) if your compiler isn't retarded.

Name: Anonymous 2014-06-10 1:56

>>8

You should really inspect resulting assembly, before making any dicision.

Name: Anonymous 2014-06-10 1:59

>>9
You should inspect my anus before making any incision.

Name: Anonymous 2014-06-10 3:38

>>8-10
it's funny because while switch can be very easily implemented O(1) or very close to it (ex O(log n)) MS Visual C++^tm Enterprise edition uses O(n)

>>1

How would one go about making a virtual machine without using a huge and clumsy switch( opcode ) statement?

you can use a array with function pointers but I remember some other ways too
another way is to have a huge and clumsy switch but let the preprocessor handle it (just like I did)
I had made a script in bash and with the help of the preprocessor I had made it so it will map every opcode in a switch statement to a function or some lines of code while I do not give it a list of the functions (the were in another file and the list was generated by the script in the stile X(functionName, something, somethingElse) and was used by many parts of my program
sorry if I am not understandable, I am sleepy

Name: Anonymous 2014-06-10 4:55

Can you guys explain how switch can be implemented in O(1)? Not only I stand in disbelief of this fact, I also don't understand the wikipedia page.

Name: Anonymous 2014-06-10 5:09

>>11
Sure, you save a lot of space on transient switches that way.

>>12
The input to the switch effectively becomes an index into a lookup table of branch targets. Array lookup is O(1), as is the branch.

Name: Anonymous 2014-06-10 6:46

>>9
gcc -O2 with sequential case values and more than 3 cases will generate a jump table. If you care so very much you could also use an explicitly computed goto (assuming your compiler supports it).

Using a call table (i.e., with function pointers) is not a good idea for simple opcodes since most compilers are not smart enough to inline any of the called functions once their address is taken (even if scope of the functions and the resulting address is non-global).

tl;dr get off your high horse and use a switch/case. If it's good enough for bpf-jit it's good enough for you.

Name: L. A. Calculus !jYCj6s4P.g 2014-06-10 7:17

WHO TAUGHT U RETOIDS HOW 2 DESIGN PROGRAMS?

#include <stdio.h>
#include <stdlib.h>

enum {
	NOPERATIONS = 40
};

enum {
	SUCCESS, CONTINUE, ARITY, INVALID_OPCODE
};

struct cpu {
	unsigned char *mem;
	int ip;
	int size;
	/* ... */
};

int nop(struct cpu *cp, const unsigned char *p, int n)
{
	return 1;
}

int set(struct cpu *cp, const unsigned char *p, int n)
{
	unsigned index;

	if (n < 5)
		return -ARITY;
	index = p[1] << 8 | p[2];
	cp->mem[index] = p[3];
	cp->mem[index + 1] = p[4];
	return 5;
}

int (*operations[NOPERATIONS]) (struct cpu *, const unsigned char *, int) = {
	nop, set /* , ... */
};

int cpu_tick(struct cpu *cp)
{
	int n, op;

	if (cp->ip == cp->size)
		return SUCCESS;

	op = cp->mem[cp->ip];
	if (op >= NOPERATIONS || operations[op] == NULL)
		return INVALID_OPCODE;

	if ((n = operations[op](cp, &cp->mem[cp->ip], cp->size - cp->ip)) < 0)
		return -n;

	cp->ip += n;
	return CONTINUE;
}

int main(void)
{
	unsigned char b[] = { 0, 0, 1, 0, 0, 69, 96, 0 };
	struct cpu cpu;
	int e;

	cpu.mem = b;
	cpu.ip = 0;
	cpu.size = (int) (sizeof b / sizeof b[0]);

	while ((e = cpu_tick(&cpu)) == CONTINUE)
		;

	switch (e) {
	case INVALID_OPCODE:
		fprintf(stderr, "invalid opcode: %d\n", cpu.mem[cpu.ip]);
		return EXIT_FAILURE;
	case ARITY:
		fprintf(stderr, "invalid arity to op %d\n", cpu.mem[cpu.ip]);
		return EXIT_FAILURE;
	case SUCCESS:
		printf("%d %d\n", b[0], b[1]);
		return 0;
	default:
		fprintf(stderr, "TELL DA RETOID WHO WROTE DIS EMULATOR 2 REED"
		                " DA FUKIN STANDARD.\n");
		return EXIT_FAILURE;
	}
}

Name: L. A. Calculus !jYCj6s4P.g 2014-06-10 7:27

O BTW, I WUD HAV COMMENTED IT, BUT I WAS AFRAID DAT IF ID HAVE DONE DAT, U RETOIDS WUD START BITCHING ABOUT HOW COMMENTS DELAY DA COMPILATION TIME BY A FEW FUKIN NANOSECONDS.

ROFL

Name: Anonymous 2014-06-10 9:18

>>15

struct cpu {
[…]
int ip;

That was NSA QUALITY!

Name: Anonymous 2014-06-10 9:35

>>6
Kill yourself. I'm a much better programmer than you, and you know it.

Name: Anonymous 2014-06-10 9:41

>>15

int size;

I love size_t more than you stackboi

Name: Anonymous 2014-06-10 9:44

seize my anus

Name: L. A. Calculus !jYCj6s4P.g 2014-06-10 9:46

>>17
DATZ MY INSTRUCTION POINTER

Name: Anonymous 2014-06-10 10:31

enum {

NOPERATIONS = 40
};

What the fuck is the point of that enum?

Name: Anonymous 2014-06-10 11:22

>>22
It's a Jewish thing. You're not meant to understand it.

Name: Anonymous 2014-06-10 14:23

// Only the chosen are expected to understand this.

Name: Anonymous 2014-06-10 16:46

>>22
Number of operators I think

Name: L. A. Calculus !jYCj6s4P.g 2014-06-10 16:59

>>22
DER ISNT ONE. HOW DO U LIKE DAT?

Name: Anonymous 2014-06-10 19:38

You're writing in assembly, you don't have to be limited to the constraints of C.

http://www.complang.tuwien.ac.at/forth/threaded-code.html

Name: Anonymous 2014-06-10 19:57

>>27
Oops, I misread the OP. Thought they were writing a virtual machine in DCPU-16 assembly, but they were actually talking about DCPU-16 emulator for PCs.

Yeah, a switch statement is the best you can do in portable C. You can get something close to token-threaded code if you use the gcc labels-as-values extension, but then it won't be portable to all C compilers, and the code generated by even the best C compiler is going to be shit compared to a naive assembly implementation.

Name: Anonymous 2014-06-11 4:15

>>27-28
Your link is cool so I forgive you

Name: Anonymous 2014-06-11 4:26

>>19
In the beginning there were only int and char,

>>25
I find that sort of usage annoying since it will trigger "enum constant not covered" warnings in switch statements. One of many annoyances with enums in C... (I'll confess I (ab)use them frequently, solely because it makes it easy to traverse definitions with ctags.)

I prefer this ENTERPRISE QUALITY ENUM HANDING:

#include <errno.h>

typedef enum {  /* yes typedef, I hate typing "enum" */
        FOO,
        BAR,
        BAZ,
        QUX,
} meta meta_t;  /* and the POSIX committee can bite me */

static void do_foo();
static void do_bar();
static void do_baz();
static void do_qux();

/* ... */

int dispatch_meta(meta_t val)
{
        switch (val) {
        case FOO:
                return do_foo();
        case BAR:
                return do_bar();
        case BAZ:
                return do_baz();
        case QUX:
                return do_qux();
        }

        return -EINVAL;
}

Name: L. A. Calculus !jYCj6s4P.g 2014-06-11 5:19

>>30
YAINT RED DA FUKIN STANDARD.

return do_foo();

do_foo DOESNT RETURN ANYTHING. I DNUNO WAT PLANET UR FROM BUT U KANT CONVERT NOTHING TO A FUKIN int!

typedef enum { ... } meta meta_t;

EEEEH EEEEEEEH EEEEEEEEH EEEEEEEEEH

U MEAN typedef enum { ... } meta_t;

I DUNNO WHO TAUGHT U HOW 2 PROGRAM BUT NEXT TIME U SHUD AT LEAST RUN UR PROGRAM THROUGH DA FUKIN COMPILER

N DATZ NOT ME TALKIN DATZ BARNEY FUKIN STROOTSTROOP! (DA DANISH ONE, JUNIOR)

Name: Anonymous 2014-06-11 5:32

>>30

In the beginning there were only int and char,

epic xD

return -EINVAL

epic xD

Name: Anonymous 2014-06-11 5:34

>>31
So goes freehanding before coffee. Please accept this patch:

--- a/foo.c	2014-06-11 00:28:16.679040075 -0700
+++ b/foo.c	2014-06-11 00:28:38.001086477 -0700
@@ -1,16 +1,16 @@
 #include <errno.h>
 
-typedef enum {	/* yes typedef, I hate typing enum */
+typedef enum meta {	/* yes typedef, I hate typing enum */
 	FOO,
 	BAR,
 	BAZ,
 	QUX,
-} meta meta_t;	/* and the POSIX commitee can bite me */
+} meta_t;		/* and the POSIX commitee can bite me */
 
-static void do_foo();
-static void do_bar();
-static void do_baz();
-static void do_qux();
+static int do_foo();
+static int do_bar();
+static int do_baz();
+static int do_qux();
 
 int dispatch_meta(meta_t val)
 {

Name: Anonymous 2014-06-11 5:35

>>31
good post. keep it up, nigger.

Name: Anonymous 2014-06-11 5:39

>>1
You can do function pointers or threaded code

Threaded code basically jumps to the code that implements the next instruction directly without a dispatch.

prog = { op_add, op_add, op_sub, ... };
op_add:
reg_a = reg_b + reg_a;
goto *ins_stream++;

op_sub:
reg_a = reg_a - reg_b;
goto *ins_stream++;

etc...

requires gcc's labels as values extensions to implement in C. You can use function pointers for the same idea, but it might be slower.

Name: Anonymous 2014-06-11 5:43

>>35
indirectly*

Name: Anonymous 2014-06-11 5:56

>>6
Your just mad because youll never get 2 fuk a 2hu; btw wich 2hu wud u fuk?

Name: Anonymous 2014-06-11 7:55

>>29
One of the articles linked at the end is the one that led me to FORTH SATORI. Make sure to check it out too:
http://www.bradrodriguez.com/papers/moving1.htm

Name: Anonymous 2014-06-11 12:26

>>37

epic meme, /g/ro, shalom.

Name: Anonymous 2016-10-24 15:24

>>1
You can replace the switch block with if...else if...else if...else, but that's not really a solution. Another option is to implement the instructions as functions and use an array of function pointers (i.e. opcode 5 is the function stored in instructions[5]).

Essentially though, the construct you want is the old ON OPCODE GOTO 100, 200, 300, 400, 500, and the switch statement is the closest thing C has to that.

Name: Anonymous 2016-10-24 15:56

First we familiarize ourselves with https://en.wikipedia.org/wiki/Visitor_pattern and refactor the bindings into container classes which would be dependently injected in polymorphic virtual functions using the https://en.wikipedia.org/wiki/Command_pattern encapsulated inside safe AbstractFactory methods.

Name: Anonymous 2016-10-24 17:32

You write the VM in something other than C.

Name: mentifex 2016-10-24 18:29

>>1

Why not make an array with function pointers? Enumerate each instruction and then lookup the function pointer and pass it the auxiliary data.

Name: mentifex 2016-10-24 18:37

Or better yet, write it in vhdl, compile it with ghdl and you have your processor ready for fpga and a nice event based simulation in C.

Name: Anonymous 2016-10-25 1:43

>>41

You're insane. Or joking. One of the two.

>>1

As far as maintainability and whatnot go, branch tables using arrays of function pointers are the way to go. But the compiler can't optimize as well.

Better yet, write a perl script that takes an input file mapping opcodes to functions and generates the switch table for you.

This is a great idea, and totally can't go wrong in literally 100s of ways.

Name: Anonymous 2016-10-25 2:02

>>45
How exactly are arrays of function pointers more maintainable than a switch statement?

Name: Anonymous 2016-10-25 2:14

>>46 Its easy, you use the techniques described in >>41

Name: Anonymous 2016-10-25 13:20

>>47

Not at all. It's just that you're using less verbose code, and it reads cleaner, IMHO, which counts for a lot.

Name: Anonymous 2016-10-25 20:20

write it in assembly code instead of c.

then you can compute your jumps with arithemtic!

Name: Anonymous 2016-10-26 1:05

So when I write a big switch with like 100 or so branches, like this:

switch(op) {
case 0: /*...*/ break;
case 1: /*...*/ break;
case 2: /*...*/ break;
case 3: /*...*/ break;
...
case 100: /*...*/ break;
}

with the conditions always incrementing and being in order, won't a modern compiler optimize it for me?

Name: Anonymous 2016-10-26 1:19

>>50

won't a modern compiler optimize it for me?

Microsoft C converts these to jump table and HexRays then has problems decompiling it properly, thinking that jumps are funcalls.

Name: Anonymous 2016-10-26 1:21

>>50

Also, for smaller tables, it can be much more efficient to use if/else, because of CPU branch prediction.

Name: Anonymous 2016-10-26 1:30

http://eli.thegreenplace.net/2012/07/12/computed-goto-for-efficient-dispatch-tables

Name: Anonymous 2016-10-26 2:51

>>52
Why wouldn't branch prediction apply to switch statements? They're still just conditional jumps in machine code.

Name: Anonymous 2016-10-26 3:14

>>54
It is harder to predict table[opcode]() than to predict if (opcode==1) {...} else {...}. Unconditional jump can be predicted only when it usually resolves to some single function.

Name: Anonymous 2016-10-26 4:09

>>54
he's an idiot/liar. nothing he says is correct. He's just guessing to try to sound smart at the smart peoples table (/prog/).

This shit never used to happen on old /prog/, we just knew that everybody only spoke when they had something of value to contribute and nobody would recommend anything if they had not actually done it themselves and performed the measurements.

Todays nu-/prog/ is full of the same cancer that filled up IRC channels on freenode. Casuals that make 'witty' little remarks with no content to look smart in front of everybody else. The real wizards are long gone. What a tragedy.

Name: Anonymous 2016-10-26 4:34

>>56
enjoy being trolled like a dolt.

Name: Anonymous 2016-10-26 8:11

>>53
Yeah, I've already seen that. I just prefer to write C89, not GNU C.
But the article does indeed explain my question: apparently switch has to do some additional bounds checking (C89 3.6.4.2).

Name: Cudder !cXCudderUE 2016-10-26 10:59

>>45
That is what those infected with the disease of ENTERPRISE will seriously say.

Just use a switch. 35-40 cases is not at all large for a switch. Many 8-bit emulators will have 256, and the VNVM I'm analysing at the moment has around 270.

Name: Anonymous 2016-10-26 17:42

>>4
this, or a hash table that references function pointers

Name: Anonymous 2016-10-27 1:28

>>59

40 cases is not at all large for a switch

Name: xD !sDNR5LgF8Y 2016-10-27 5:20

>>60,61

le pedophile sage

Name: Cudder !cXCudderUE 2016-10-29 3:43

>>61
Considering the C89 standard requires up to 257 (default + one for each byte, think of state machines), and C99 increased that to 1023 (no idea of the rationale for that one), 40 is certainly not large at all. C++ increased that even more to 16K(!)

Name: Anonymous 2016-10-29 8:06

COMMODORE 64 GET

Name: Anonymous 2016-10-29 15:28

>>63

C99 increased that to 1023

Probably related to the compiler having to store a jump table with that much entries, not byte size.

C++ increased that [...] to 16K

That's just wasteful.
Remember that a C89 compiler can be implemented for a machine with like 1MiB of RAM (probably the biggest culprit: #defines: up to 1024*(499+31) bytes).

Polite sage for offtopic.

Name: Anonymous 2016-10-29 17:27

Dubs check dubs

Name: Anonymous 2016-10-29 17:43

>>66
Nice dubz, bro!

Name: Anonymous 2016-10-29 20:08

>>65

Remember that a C89 compiler can be implemented for a machine with like 1MiB of RAM

A PL/I compiler can run in 32 KB of RAM and the compiled programs can run on even smaller machines. I'd rather use PL/I than any C-based or dynamically typed language any day.

Name: Anonymous 2016-10-30 1:20

Check em

Name: Anonymous 2016-10-30 6:12

Why dcpu 16? Notch is never making that game

VM instruction implementation without huge switch()?

1 Name: Anonymous 2014-06-09 23:26

2 Name: Anonymous 2014-06-09 23:41

3 Name: Anonymous 2014-06-09 23:59

4 Name: Anonymous 2014-06-10 0:13

5 Name: Anonymous 2014-06-10 0:35

6 Name: Anonymous 2014-06-10 1:48

7 Name: Anonymous 2014-06-10 1:50

8 Name: Anonymous 2014-06-10 1:53

9 Name: Anonymous 2014-06-10 1:56

10 Name: Anonymous 2014-06-10 1:59

11 Name: Anonymous 2014-06-10 3:38

12 Name: Anonymous 2014-06-10 4:55

13 Name: Anonymous 2014-06-10 5:09

14 Name: Anonymous 2014-06-10 6:46

15 Name: L. A. Calculus !jYCj6s4P.g 2014-06-10 7:17

16 Name: L. A. Calculus !jYCj6s4P.g 2014-06-10 7:27

17 Name: Anonymous 2014-06-10 9:18

18 Name: Anonymous 2014-06-10 9:35

19 Name: Anonymous 2014-06-10 9:41

20 Name: Anonymous 2014-06-10 9:44

21 Name: L. A. Calculus !jYCj6s4P.g 2014-06-10 9:46

22 Name: Anonymous 2014-06-10 10:31

23 Name: Anonymous 2014-06-10 11:22

24 Name: Anonymous 2014-06-10 14:23

25 Name: Anonymous 2014-06-10 16:46

26 Name: L. A. Calculus !jYCj6s4P.g 2014-06-10 16:59

27 Name: Anonymous 2014-06-10 19:38

28 Name: Anonymous 2014-06-10 19:57

29 Name: Anonymous 2014-06-11 4:15

30 Name: Anonymous 2014-06-11 4:26

31 Name: L. A. Calculus !jYCj6s4P.g 2014-06-11 5:19

32 Name: Anonymous 2014-06-11 5:32

33 Name: Anonymous 2014-06-11 5:34

34 Name: Anonymous 2014-06-11 5:35

35 Name: Anonymous 2014-06-11 5:39

36 Name: Anonymous 2014-06-11 5:43

37 Name: Anonymous 2014-06-11 5:56

38 Name: Anonymous 2014-06-11 7:55

39 Name: Anonymous 2014-06-11 12:26

40 Name: Anonymous 2016-10-24 15:24

41 Name: Anonymous 2016-10-24 15:56

42 Name: Anonymous 2016-10-24 17:32

43 Name: mentifex 2016-10-24 18:29

44 Name: mentifex 2016-10-24 18:37

45 Name: Anonymous 2016-10-25 1:43

46 Name: Anonymous 2016-10-25 2:02

47 Name: Anonymous 2016-10-25 2:14

48 Name: Anonymous 2016-10-25 13:20

49 Name: Anonymous 2016-10-25 20:20

50 Name: Anonymous 2016-10-26 1:05

51 Name: Anonymous 2016-10-26 1:19

52 Name: Anonymous 2016-10-26 1:21

53 Name: Anonymous 2016-10-26 1:30

54 Name: Anonymous 2016-10-26 2:51

55 Name: Anonymous 2016-10-26 3:14

56 Name: Anonymous 2016-10-26 4:09

57 Name: Anonymous 2016-10-26 4:34

58 Name: Anonymous 2016-10-26 8:11

59 Name: Cudder !cXCudderUE 2016-10-26 10:59

60 Name: Anonymous 2016-10-26 17:42

61 Name: Anonymous 2016-10-27 1:28

62 Name: xD !sDNR5LgF8Y 2016-10-27 5:20

63 Name: Cudder !cXCudderUE 2016-10-29 3:43

64 Name: Anonymous 2016-10-29 8:06

65 Name: Anonymous 2016-10-29 15:28

66 Name: Anonymous 2016-10-29 17:27

67 Name: Anonymous 2016-10-29 17:43

68 Name: Anonymous 2016-10-29 20:08

69 Name: Anonymous 2016-10-30 1:20

70 Name: Anonymous 2016-10-30 6:12