Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
milliForth (github.com/fuzzballcat)
285 points by binarycrusader on Nov 6, 2023 | hide | past | favorite | 83 comments


> However, milliFORTH appears to be the smallest programming language implementation ever, beating out sectorLISP2, a mind-blowing 436 byte implementation of LISP, by 14 bytes.

The sectorlambda implementation of Binary Lambda Calculus is shorter yet at 383 bytes [1].

And the BLC self-interpreter is only 29 bytes [2].

> A FORTH in 422 bytes — the smallest real programming language ever, as of yet.

That may still be true, as BLC is an esoteric programming language.

[1] https://justine.lol/lambda/

[2] https://ioccc.org/2012/tromp/hint.html


Author here! I was trying to be very careful about making sure to phrase things in a way that was clear I was referring to "real" languages (ones that are used for actual production purposes), but inevitably I forgot to do so where you've highlighted. I've updated the wording so it's more clear.


And for those that aren't aware: https://en.m.wikipedia.org/wiki/Open_Firmware was a forth based bootloader for PowerPC.

This is a very cool project, author, love it!


The main advantage of a miniature Forth like this over BLC, Lisp, Brainfuck, etc, is that Forth grants low-level access to the hardware; you can easily bootstrap from it to something indistinguishable from a feature-rich Forth with a full suite of metaprogramming capabilities, or implement an entire operating system on top of it.


Miniature Lisps are similar to Forth in this way. Having worked with both I'm inclined to favour Lisps for providing cleaner structure and abstractions for code and data to build up from, with low overhead despite the superfical differences. (See SectorLisp2 (436 bytes) vs SectorForth (491 bytes) size: https://justine.lol/sectorlisp2/).

Lisps are given low-level hardware access primitives (peek/poke etc) when they are designed for bootstrapping an OS, and low-level OS primitives (such as system calls) when they are designed for bootstrapping a rich environment on a different OS. Basically the same primitives as Forth for the same purposes.


Is low-level hardware access part of the language definition [1] ?

[1] https://forth-standard.org/standard/words ?


Nothing is stopping Lisp from having low-level access to hardware.


Hmm, I'd be a little surprised if this was smaller than some of the 8-bit era forths (the TIL book had a 50ish-byte "inner interpreter" for Z80 but that didn't include any of the baseline "words" that this does.) I'm sure it wins for 32-bit systems though.


TIL Book https://archive.org/details/R.G.LoeligerThreadedInterpretive...

I own a couple hard copies of this before it got popular. Most folks will only ever encounter the above PDF. It is absolutely wonderful book written in such a matter of fact style, the only thing comparable are the works of Nils Holm.

https://t3x.org/


Funnily enough, milliFORTH is now smaller than the Binary Lambda Calculus 383 byte implementation.


[flagged]


um, sir... please show us the implementation of eval.


it's evals all the way down.


Where do I find the implementation of int 0x16 and int 0x10?


Easy, just make a CPU where "eval" is an instruction


Well there are already forth chips, so forth still wins.


When I was in high school I wrote a nice FORTH for the TRS-80 Color Computer using the OS-9 operating system which was a Unix-like multitasking OS that would fit on a 6809 microcomputer.

I think it was around 2000 lines of assembly code to implement most of the FORTH-83 standard although mine was unusual in that it did not support the block-based I/O that was common on “language system” FORTHs but instead it had handle-based API for accessing files similar to Unix, C and MS-DOS in version 2 and up.

The programming environment was a lot like Linux overall in that I’d use an ed or vi clone to edit files, then run something like an assembler or C compiler. I’d run my FORTH binary and it would present an interactive environment like most FORTHs.


I love projects like these, reminds me of the magic within the machine, as opposed to the normal cacophony of the world that comes via the machine.


I do like this reasoning. Magic within the machine for me was programming forth as part of my post grad (controlling and processing proton precession magnetometers) and the device was a Triangle Digital Services (now defunct as a viable company I think) TDS2020 forth SBC.

Almost a zen like experience - you, the machine and your focus. No world.


You could save one byte by replacing "0=" with "0<>" (well, except for the name, maybe use "0#"?):

    pop ax
    neg ax     ;sets carry flag if nonzero
    sbb ax,ax
    push ax
And another two bytes in DOCOL:

    ;add ax,4
    ;mov si,ax
    xchg ax,si
    lodsw
    lodsw
Maybe even free up a word register to always hold the address of DOCOL, then you only need two bytes at each colon definition. If it's possible without adding any extra instructions, this should save 4 bytes in total (one stosw in COLON, the DOCOL.addr variable, one lodsw in the above version of DOCOL)


Good thinking! I wouldn't have thought of that `xchg` one in DOCOL, very clever.

I'll futz around with making DOCOL's address as a register and see if I can make that work, though I don't know if that will work since I think some of the mechanisms in place have additional functions.

EDIT: So, I don't think this will work. I'd be using `dx` to store DOCOL's address, but this appears to get clobbered somewhere, fixing which would outweigh the 1 byte benefit you end up actually getting. I'll keep thinking about it though!


Porting JonesFORTH to x86-64 was one of the most rewarding fun-project experiences lately. Forth is fun to play with.


Looking at the code, this looks remarkably similar to sectorforth? Thing is, with sectorforth and this, 2 of the primitives are not required so reducing the VM to 6ops, so you can probably go smaller. Looks as though Cesar was happy with fitting into a sector: https://github.com/cesarblum/sectorforth/issues


[Author here.] Indeed, sectorFORTH (as I have mentioned) was the primary inspiration and guide for this project! Ironically, in some places I even had a rather different design which converged on a more sectorFORTH-like design - it's really well put together.

I've removed some primitives further from milliFORTH, but I haven't touched the arithmetic base. I hadn't seen those issues on sectorFORTH though, very interesting - I will look into it!


Total tangent - I've often been annoyed at some library or whatever because I didn't like the way it was implemented or the api it exposed or whatever. So I try to reimplement it myself to my tastes, and eventually once I understand the problem and iterate a few times I end up with something remarkably similar to the library I initially didn't like.

It's a good reason to at least try to implement your own sometimes, rather than just use libraries - even if you end up just using the supported external dep, you'll understand it a lot better.

(not saying that's the case here, just that you reminded me of that with your convergance...)


Ah yes apologies, that'll teach me to look at the code before the readme!


When SectorLISP claimed to be the tiniest programming language in the world, it included (1) a Turing machine program, (2) an implementation of SectorLISP written in SectorLISP, and (3) a proof that it could run software that had been written for the original LISP 1.5 system built by LISP's inventor. If milliForth is making the same claim, then it should hold itself to the same standard. Otherwise it's just a calculator written by an anonymous person. I filed an issue here: https://github.com/fuzzballcat/milliForth/issues/8


I recommend people follow that link just to see the post at the bottom showing numbers being defined from nothing (well @sp, 0 also has to be defined)


I've always wondered how to boot a virtual machine from a single sector with no other OS... now I know, which is cool.


Does anyone in the Forth community know if Charles Moore is still with us?


He has recently done some amazing low power work with GreenArrays:

"GreenArrays is shipping its 144-core asynchronous chip that needs little energy (7 pJ/inst). Idle cores use no power (100 nW). Active ones (4 mW) run fast (666 Mips), then wait for communication (idle).

Tight coding to minimize instructions executed will minimize power. The programmer can also reduce instruction fetches, transistor switching and duty cycle."

https://youtu.be/0PclgBd6_Zs

It is like the elegance of Forth in low power, multi-core hardware.


I think you're right that he's still at Green Arrays. Still a director there apparently.

I do have to note that the linked video is from 2013 so I assume they've moved on from that.


Well, he surely isn't doing it for the money.

The Moore Microprocessor Patent Portfolio was very lucrative. Eventually it ended up with a patent marketer. Nearly everyone doing processors had to buy in.

And as the inventor Chuck got his share out of it...

https://web.archive.org/web/20071226155608/http://www.linuxe...


I have heard nothing new out of Green Arrays but the site still seems to be selling the boards. If I had free cash available I'd try ordering one just to see. It's a really interesting concept and I'd love to see it developed.


I just don't managed to like colorForth.

Forth is great, but colorForth takes it one step to far.

It is very difficult to know when to not add a seemingly cool feature. Specially when you where right against most odds before.


My understanding was that part of the motivation for ColorForth was to help Moore continue programming as his eyesight deteriorated.


i'd like to know the source. and the reasons i know long time ago are about more innovative work. like more compression, keyboard layout optimization, like having the code compiled in the source and utilizing the space to embed the meaning so there is no multiple parsing and compiling. even vars are living inthe code alive. and others i don't remember now.


Wikipedia [0] via Dave Gauer's Forth webpage on Ratfactor.com [1]

[0] https://en.wikipedia.org/wiki/ColorForth

[1] https://ratfactor.com/forth/the_programming_language_that_wr...


He is. SVFIG's annual "Forth Day" meeting is on November 18th and he's likely to be there giving the traditional "Fireside Chat".


That's awesome to know. I remember reading his blog years ago when he was setting up Green Arrays


Yes, and so is Elizabeth Rather.



Still going strong at 85 years old...


justine is probably working on a version so small it will trigger the singularity irrevocably


Indeed! It's going to somehow be smaller than the 99 byte BF one...


Does the benefit of it being embeddable on a QR code outweigh the lack of quality of life features like subtraction? Nevertheless, truly an impressive feat that shows how simple computers can be without all these modern API layers.


For one specific purpose, the benefit could be worth it: bootstrapping.

Guix bootstraps from a tiny audited binary, and milliForth could be used for the same purpose.

Imagine bootstrapping a full Linux distro from a milliForth binary and source code for everything. Everything would be fully auditable, no Trusting Trust problem, and a full Software Bill of Materials.


I always thought of FORTH as the #1 hardest to think of uses for, out of all the non-esoteric and non-obsolete languages. That's a really neat application!


Booting from a firmware Forth loader was how the early Sun SunOS (BSD) workstations did their thing - you could hotkey to stop the default OS load and boot from an image on an alternative drive, across a network, or modify the Forth loader to <imagination>


That forth system is also where Device Trees originated. Parts of the Linux kernel driver interface show that heritage in the naming conventions, OF_* == OpenFirmware.


Beside bootstrapping, Forth works well on tiny MCUs.


I believe Forth is used quite a bit in the embedded world - or used to be. I read, long ago from the Usenet days, you use it when you don't want to write assembly but C is too "heavy" for your hardware.


Bitcoin's smart contracting language is a Forth variant, and arguably this was a very astute and smart choice by Satoshi.

When designing a multi-party contract, the essential problem is that two people specify the terms they each want to apply to any case in which the funds must be spent, then these two sets of requirements must be merged into a single program. This is, in general, difficult to do securely. We can establish conventions that are relatively easy to follow, but it would be a lot nicer and more powerful if we could syntactically enforce that both sets of requirements are enforced in the combined program.

Concatenative languages like Forth meet this requirement nicely. If you represent the spend requirements as a program, then combining the two programs together in such a way that they both are equally enforce is as simple as literally concatenating the two programs together.

For example, suppose Alice's requirement is that Alice signs the transaction with her key, and Bob's requirement is that Bob signs with his key, and the spending transaction is after some specified time T. Expressed in bitcoin script:

Alice: <AlicePubkey> CHECKSIGVERIFY

Bob: <T> CHECKLOCKTIMEVERIFY DROP <BobPubkey> CHECKSIGVERIFY

The combined script that meets both these sets of requirements is as simple as putting Alice's script, then Bob's, unaltered:

Alice&Bob: <AlicePubkey> CHECKSIGVERIFY <T> CHECKLOCKTIMEVERIFY DROP <BobPubkey> CHECKSIGVERIFY 1

(The `1` at the end is a quirk of bitcoin that it has to finish with a non-zero value on the stack. This combined script could also be simplified in a couple of ways. Also there's a couple of ways in which this can fail in practice. Alice's script could contain OP_RETURN, for example, which causes the entire script to become unspendable. Or a mismatched IF/ELSE. A better designed and strongly typed Forth dialect would fix these issues.)

Bitcoin's Forth is not type checked, but suppose that it were. And furthermore, suppose that it had a powerful dependently typed system that captured various key signing and stack requirements at the type level. It could be used to track not just what a program does, but also the properties of a program. Alice could put a constraint in her program that says "lock time can be no later than April 2024," and this becomes part of both the input and output type requirements of her program. Then when Bob's program is specified with T=15 May 2024, then his program no longer type checks when concatenated to the end of Alice's.

No one has yet written a system like this, but it would be really powerful if it did exist. Alice writes here smart contract conditions all by lonesome self, and Bob writes his. Then they literally concatenate one program to the other, and if it type checks then Alice and Bob can be certain that both sets of conditions are satisfied.


Bitcoin script is a large source of complexity, and it turns out to be mostly redundant. Schnorr signatures allow the use of so-called scriptless scripts [1] [2], which can do most of the things than Bitcoin script is used for, including multisig, absolute and relative timelocks, payment channels, discrete log contracts, and atomic swaps (using adaptor signatures). All without the need for a scripting language. The example you gave of a multisig output that can be spent by either Alice, or Bob after some time, is easily handled too.

[1] https://github.com/BlockstreamResearch/scriptless-scripts

[2] https://tlu.tarilabs.com/cryptography/introduction-to-script...


> absolute and relative timelocks

That's a stretch. Replacing timelocks with hashcash (or equivalent) is hardly the same thing.

But generally speaking "scriptless scripts" suffer from the need for interactive construction. There's a long way to go until scriptless scripts is a full replacement for script, if it is even possible.


Who said anything about replacing timelocks with hashcash? I'm talking about actual relative timelocks, that put a minimum block distance between 2 particular transactions.

> suffer from the need for interactive construction.

Interactive construction is already widely in use in relative timelocks' prime application of payment channels.


The scope of potential smart contract applications is much broader, and vastly more interesting than payment channels (yawn). And payment channels are pretty much the only application (definitionally) in which you can get away with "all parties are online, or this contract will be cancelled anyway" assumptions.

Do you have a citation for a block-based relative timelock? The only such things I've seen are verifiable delay functions, which are a lot more like hashcash.

What about absolute lock times? Those are far more interesting in applications of business logic, like logistics contracts. Most of which require non-interactive state updates.


> payment channels (yawn)

Don't you find 2nd layers like Lightning interesting, as a way to scale far beyond the limited L1 capacity?

> Do you have a citation for a block-based relative timelock?

See the thread starting with https://lists.launchpad.net/mimblewimble/msg00546.html

> What about absolute lock times?

Those are trivially supported in any Mimblewimble chain.


Not really, no. Lightning is still payments, multi-party and multi-hop. If the entire realm of Blockchain and smart contract technology was restricted to just payments, well I suppose that would be something. But far short of the total revolution of societal structure that was promised. I personally wouldn’t care to work on it if that was the case. Lightning is an implementation of distributed payments. Smart contracts are an implementation of decentralized law.

Your link to the mailing list thread describes essentially the same approach that bitcoin has for dealing with lock times. I would not normally describe this as “scriptless scripts,” which I have understood to me using the signing operation itself to achieve some goal that would otherwise be done in script. The locktime here is still enforced by special cased code of the consensus algorithm. If the number of things you want to do are fixed and enumerable then I suppose this is a valid approach. But it does not allow for general, arbitrary, future-defined constraints.

To give a concrete, simple example: show me how to do a return peg validation on scriptless scripts. If it can’t even do that, it is not a replacement for scripts.


> far short of the total revolution of societal structure that was promised.

Satoshi promised no such thing. That sounds more like an Ethereum promise (to which I don't subscribe).

> show me how to do a return peg validation on scriptless scripts

How does one do return peg validation in bitcoin script?


The manual way. Validate SPV inclusion proofs using CAT, hash opcodes, bignum addition, etc. To be practical it requires block header commitments, but there are other reasons for wanting that. You also need relative locktime, which original bitcoin did not have. But that should be all that is required to be added. Other commonly cited things like Merkle branch verification opcodes just make the script more compact, or get around the fact that some required opcodes were later disabled and/or restricted from use with bignums.

Re: bitcoin vs. ethereum, the bitcoin community pre-ethereum was very invested in the idea of decentralizing all the things. I remember: I was there and part of that history. But over time people interested in more than just payments or the bitcoin asset have left or switched to ethereum. Originally expressive smart contracting was very much in scope.


> But that should be all that is required to be added.

If you're talking about SPV inclusion on another chain, wouldn't one also need to verify cumulative diff on that other chain (if PoW based) ?


You can add and compare bignums in bitcoin script.


Verifying cumulative diff requires verifying possibly thousands of headers (all those between peg out and peg in), which is a completely impractical amount of witness data for one script.


Committing headers in Merkle tree form or skip lists allow for logarithmic reduction in that number, as was worked out in the (2014?) sidechains whitepaper. 10’s of headers would be doable.


Beyond requiring intrusive changes to Bitcoin like header commitments, and still needing dozen-KB scripts for pegs, it also limits properties of the sidechain. For instance, if the latter wants to use a memory-hard PoW like Cuckoo Cycle, I don't see how Bitcoin script (which cannot even loop) could verify a cycle in a pseudorandom graph.


I could argue that there’s no reason for any proof of work other than sha256 as these side chains should be merge mining, but we’re way off in a tangent thread.

The point is that smart contracts actually do something other than “authorize payment”, and fancy Schnorr signing modes don’t come close to replicating those capabilities.

And so long as there is a need for a programmable verification architecture, Forth-like languages are prttty ideal.


*provably, not definitionally.


Forth doesn't seem to do much to stop you from making mistakes, unless Bitcoin has added extra stuff.

I don't use or study anything crypto related, so I'm just guessing, but wouldn't something like Prolog work for describing contracts?


Yes, I’ve long advocated for a high level smart contracting language that would be similar to prolog in structure and semantics. It would compile down to a state machine, and then the state machine would be interpreted as a series of outputs (nodes on the state machine graph) and spending transactions (edges).

The actual spend condition language would be lower level, however, and Forth-like languages are a good fit for this part of the stack.


Things that make sense and crypto are two set for which there's very small intersection.


But that doesn't directly relate to the verifiability of milliForth itself. An extremely shortened code can be harder to verify, for example it may work as intended unless a very specific input is used to break out of its sandbox (so to say). Bootstrapping needs a short and readable enough seed for that reason, and I can't be entirely sure that it is indeed the case for milliForth.


The sneakiest possible program representable in n bytes is never less sneaky than the sneakiest possible program representable in n-1 bytes, assuming you can pad a program out with nops.


You don't need a sneaky program, you only need a program that misbehaves on sneaky inputs.


Those are the sneakiest programs.


Technically I guess what you'd do is hand type in binary code to create a hex editor, then bootstrap this forth off that, by hand-typing it in.

If you include formal verification tools as part of the stack then you can verify the tools by eye, type them in, and use them to verify the function of the stack. Admittedly Forth is tricky to do here, but something like lisp/scheme/wat can actually be proven out with say microkanren.

From there we can trust the software stack.


subtraction is trivial to add in userspace once you've loaded it though

  : - sp@ @ nand 1 + + ;
or as it appears in the hello_world.FORTH file

  : dup sp@ @ ;
  : invert dup nand ;
  : negate invert 1 + ;
  : - negate + ;
There is of course no benefit to this thing at all other than reclaiming the crown from those deviant lispers... well I suppose if you're making a your own computer from scratch like https://www.homebrewcpuring.org/ it might be a useful starting point.


: - not 1 + + ;

Disclaimer: I've never written forth, so this may not be valid for any particular forth implementation.


Look at how much room you have for data! I wonder what we can fit in there.

More seriously, a metacircular example to draw from would be: https://github.com/kragen/stoneknifeforth


Would it be possible to do a similar thing in a similar size for, for example, the 6502? I would love to give that a try, even though I probably lack the skills to make it that small. Or maybe it's even impossible.


How fast is the grabage collector? And is it web scale? I'm concerned by the lack of native JSON support.


Heh.


Has anyone figured out how to pipe input to the qemu vm under WSL 2? I've tried all sorts of things, and nothing seems to work.

The included py_autotype.py takes forever, but nothing outputs in the qemu vm as it is running.

WSL 2 does have an X server running (xeyes works).

Update: running the script against gedit doesn't work either... it's not a qemu issue


> The included py_autotype.py takes forever, but nothing outputs in the qemu vm as it is running.

Keep in mind, nothing SHOULD output while it's running, until you get to the very end and there's a 'hello world' printed.

This being said, the autotype's lethargy severely irks me. But I couldn't make it go faster, as pyautogui kept tripping up. If someone finds a better way (i.e. a way that isn't my first random thought), it will be much appreciated.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: