Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Stack Computers: the new wave (1989) [pdf] (cmu.edu)
72 points by LAC-Tech on Nov 27, 2022 | hide | past | favorite | 28 comments


Modern processors do not execute one instruction at a time, which is one of the reasons we rename registers (to recover the original name independent data flow). However a stack machine isn't a significant hindrance to renaming as long as the ISA is otherwise amendable to super-scalar fetch, decode, and rename.

The biggest reason we do not use stack machines today is that it's actually harder for optimizing compilers to generate good code for and it's awkward when joining control-flow from multiple blocks (stacks have to be in the same state).

However, amusingly, almost all modern processors have a small-ish control stack; it's called the RAS, the Return Address Stack predictor and allows the frontend to fetch through calls and returns way before the actual instructions are executed. There's definitely some redundancy here as everything is done again at execution time (and we check that the frontend got it right).

ADD: Knuth's MMIX and SPARC's register windows implement effectively a coarse stack, but both completely miss out on the instruction density that a true stack computer has.


The basic stack engine is also pretty neat, which tracks the value of the stack pointer separately and has it's own adder for it. If all you use is PUSH and POP you almost never have to wait for the value of the stack pointer to be available in the next instruction.

It seems like it would be great for implementing a FORTH if you used the RSP pointer to hold the data stack, but you constantly have to PUSH and POP the return values out of the way, somewhat defeating the RAS. I wonder which is more of a win?


The RAS assumption is that your don’t override the return value, which we verify at execution. As long as you don’t, your code will run fast. If you do, then you’ll take some very expensive pipeline restarts.

But you are right that x86 processors can (they all do?) track stack pointer movement in the frontend. For RISC you can track affine transformations of registers or even just the ABI designated stack pointer but it’s usually not worth the high cost.


They still make good targets for anyone getting their feet wet into compiler design and bytecode formats.

With a good macro assembler it is super easy to translate those stack upcodes into good enough Assembly.

It won't win any performance prices, but it will give a sense of acomplishment, and even provide a path for bootstraping, if wanted.

Then the whole SSA generation, register colouring and what not, can come later if the author is then really interested into deep diving into compilers.


Related:

Stack Computers: the new wave (1989) - https://news.ycombinator.com/item?id=32456981 - Aug 2022 (48 comments)

Stack Computers: the new wave (1989) - https://news.ycombinator.com/item?id=12237539 - Aug 2016 (28 comments)

Stack Computers: the new wave (1989) - https://news.ycombinator.com/item?id=8657654 - Nov 2014 (3 comments)

Stack Computers: the new wave (1989) - https://news.ycombinator.com/item?id=4620423 - Oct 2012 (37 comments)

Edit: Also these. Others?

Stack Computers (1989) Chapter 5.3: Architecture Of The RTX 32P - https://news.ycombinator.com/item?id=25325531 - Dec 2020 (1 comment)

Stack Computers: 4.4 Architecture of the Novix NC4016 (1989) - https://news.ycombinator.com/item?id=21002670 - Sept 2019 (21 comments)


In the 70-90's worked on Burroughs / Unisys B6700 - A Series computers. They are all stack based and they were very interesting how they worked and some very cool things they could do. Languages like Algol, Cobol, Pascal ran really well. Multi dimension arrays took a big hit with needing to do array descriptors, so big Fortran wasn't the best.

Currently it runs as a emulation on Intel hardware. So as a VM it would be the best example out there.


Interesting. Only somewhat related; any pointers to articles/document on implementing a stack-based virtual machine?


This talk covers some of the design decisions that lead to the design of the uxn stack machine.

https://vimeo.com/771406693#t=85m30s

C source code of the VM, 100 lines.

https://git.sr.ht/~rabbits/uxn/tree/main/item/src/uxn.c

An example assembly file for that stack machine

https://git.sr.ht/~rabbits/uxn/tree/main/item/projects/examp...

Another stack machine designed to target FPGA with an elegant design, mentioned in the talk above.

https://excamera.com/sphinx/article-j1a-swapforth.html


This might have what you need… https://wiki.xxiivv.com/site/uxn.html interesting design and fun little project.



The virtual machine in nand2tetris (book is *The Elements of Computing Systems) is stack-based.

https://www.nand2tetris.org/

https://mitpress.mit.edu/9780262539807/the-elements-of-compu...

https://www.coursera.org/learn/build-a-computer


Chapter 3 of this book is pretty useful for that, as each of the instructions have pseudo code descriptions. Worth noting they only pop once for the ADD instruction, not twice!


Look at the Green Arrays F18, the "conclusion" that Forth reached:

https://www.greenarraychips.com/home/documents/greg/PB003-11...

Here's a clear description of what each instruction does:

https://colorforth.github.io/forth.html

Hilariously, the system has no logical OR, only AND and XOR and NOT, because "Inclusive-or is rarely needed."

This system was designed by Chuck Moore, father of Forth. Here is an entertaining video of him explaining the F18A stack machine and programming system:

https://youtu.be/0PclgBd6_Zs

This is such a simple machine. I am planning to make a tiny emulator for my site. One could probably write an emulator in 80 lines of Go (one goroutine for each of the 144 cores).


>The word */ multiplies by a ratio, with a double-length intermediate product. It eliminates the need for floating-point.

Anyone who knows a lot about numeric stuff care to comment on this statement?


It's just a fixed point instruction.

Fixed point multiply: a*m times b*m yields (a*b)*m = a*m * b*m / m

In the above, m is the fixed point 1. For example, 65536 for a 16.16 fixed point.

The instruction allows you to multiply a*m by b*m and then divide by m, renormalizing your fixed point result.

Chuck Moore thinks nobody needs floating point because fixed point is sufficient!


And, importantly, the double-length intermediate result prevents the rapid loss of precision, compared to the naive alternatives of

  : */ * 65536 / :
and

  : */ 65536 / * ;


Aside, Koopman (author of this book) was the expert witness that had to sit in a room and read Toyota's ̶s̶o̶u̶r̶c̶e̶ ̶c̶o̶d̶e̶ for the "Unintended Acceleration" lawsuit.

https://users.ece.cmu.edu/~koopman/pubs/koopman14_toyota_ua_...


Are stacks mainly used in software design while registers are mainly used in hardware design? What is the trade off?


Code density is the biggest one. If I want to add two numbers with a stack design I have an add opcode. If I have 128 instructions that's 6 bits of opcode. I only need one because the add opcode will pop the two numbers off the top of the stack and push the result back on. If I want to add two numbers with a register design I need to specify which registers which means if I have 16 registers I need 4 bits per operand minimum which means 12 bits just for the two sources and destinations. If I'm trying to write for an embedded system you can't get higher code densities than using a stack machine.

The instructions can also execute faster because I don't have to do things like decode registers and look them up in the register rename table but modern CPUs pipeline this shit so well you never have to worry about instruction latencies.

For register based machines one big advantage is that it's way easier for compilers to optimize for register based systems. If I have a common value that I'm using constantly if I was using a stack based arch I'd have to keep pushing that value to the stack. With a register machine I just slap the value in a register and use it as an operand as many times as I need.


> modern CPUs pipeline this shit

Quite. In practice, a high-performance processor today, either stack or register, will internally have little resemblance to the presented instruction set. The stack will become a huge array of registers that are the inputs and outputs to various computation units, and they will be tracked and renamed as necessary to align with the various slots in the stack, rather than actually moving things around on the stack. Just as the integer register file on a register machine, becomes a huge array of registers that are renamed as necessary. Any difference between them, in terms of hardware implementation, has largely disappeared over the last couple decades.


Are these mutual exclusive ? Are there any compilers that use both stack and registers ?


almost all compilers start using the stack when they run out of registers


It's funny because this is the same kind of properties that appealed to me in FP. You compose stuff without having to "encode" parameters. Lighter (unnecessary) information for my brain.

And to a point.. pointfree in haskell is an extreme of that, albeit at some point you start having to remember more information if you stack too many operators.


Performance. It takes longer to access performance critical data on a stack machine compared to a register machine. You spend a lot of cycles rolling the stack to retrieve the value you need.

On the other hand, the circuitry is simpler and implementing a compiler for such an architecture is close to trivial. Early stack computers, especially from Burroughs, were the first to be mainly programmed in a high level language, assembly was barely used if at all. This was revolutionary at the time.


Only if you're reading directly from memory. Most stack machines will have a scratchpad of 16 values that are on chip much like registers. In the case of something like x87 you can even arbitrarily exchange them.


Performance as in operations per second. I believe power performance, operations / transistor and many other metrics are better on stack machines.


Burroughs doesn't do Assembly at all, all the CPU primitives are available as ESPOL/NEWP intrisics.


> Are stacks mainly used in software design while registers are mainly used in hardware design?

Not necessarily. BEAM, the Erlang VM, used by Erlang, Elixir and other languages is a register software machine. https://www.erlang.org/blog/a-brief-beam-primer/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: