> However, milliFORTH appears to be the smallest programming language implementation ever, beating out sectorLISP2, a mind-blowing 436 byte implementation of LISP, by 14 bytes.
The sectorlambda implementation of Binary Lambda Calculus is shorter yet at 383 bytes [1].
And the BLC self-interpreter is only 29 bytes [2].
> A FORTH in 422 bytes — the smallest real programming language ever, as of yet.
That may still be true, as BLC is an esoteric programming language.
Author here! I was trying to be very careful about making sure to phrase things in a way that was clear I was referring to "real" languages (ones that are used for actual production purposes), but inevitably I forgot to do so where you've highlighted. I've updated the wording so it's more clear.
The main advantage of a miniature Forth like this over BLC, Lisp, Brainfuck, etc, is that Forth grants low-level access to the hardware; you can easily bootstrap from it to something indistinguishable from a feature-rich Forth with a full suite of metaprogramming capabilities, or implement an entire operating system on top of it.
Miniature Lisps are similar to Forth in this way. Having worked with both I'm inclined to favour Lisps for providing cleaner structure and abstractions for code and data to build up from, with low overhead despite the superfical differences. (See SectorLisp2 (436 bytes) vs SectorForth (491 bytes) size: https://justine.lol/sectorlisp2/).
Lisps are given low-level hardware access primitives (peek/poke etc) when they are designed for bootstrapping an OS, and low-level OS primitives (such as system calls) when they are designed for bootstrapping a rich environment on a different OS. Basically the same primitives as Forth for the same purposes.
Hmm, I'd be a little surprised if this was smaller than some of the 8-bit era forths (the TIL book had a 50ish-byte "inner interpreter" for Z80 but that didn't include any of the baseline "words" that this does.) I'm sure it wins for 32-bit systems though.
I own a couple hard copies of this before it got popular. Most folks will only ever encounter the above PDF. It is absolutely wonderful book written in such a matter of fact style, the only thing comparable are the works of Nils Holm.
When I was in high school I wrote a nice FORTH for the TRS-80 Color Computer using the OS-9 operating system which was a Unix-like multitasking OS that would fit on a 6809 microcomputer.
I think it was around 2000 lines of assembly code to implement most of the FORTH-83 standard although mine was unusual in that it did not support the block-based I/O that was common on “language system” FORTHs but instead it had handle-based API for accessing files similar to Unix, C and MS-DOS in version 2 and up.
The programming environment was a lot like Linux overall in that I’d use an ed or vi clone to edit files, then run something like an assembler or C compiler. I’d run my FORTH binary and it would present an interactive environment like most FORTHs.
I do like this reasoning. Magic within the machine for me was programming forth as part of my post grad (controlling and processing proton precession magnetometers) and the device was a Triangle Digital Services (now defunct as a viable company I think) TDS2020 forth SBC.
Almost a zen like experience - you, the machine and your focus. No world.
You could save one byte by replacing "0=" with "0<>" (well, except for the name, maybe use "0#"?):
pop ax
neg ax ;sets carry flag if nonzero
sbb ax,ax
push ax
And another two bytes in DOCOL:
;add ax,4
;mov si,ax
xchg ax,si
lodsw
lodsw
Maybe even free up a word register to always hold the address of DOCOL, then you only need two bytes at each colon definition. If it's possible without adding any extra instructions, this should save 4 bytes in total (one stosw in COLON, the DOCOL.addr variable, one lodsw in the above version of DOCOL)
Good thinking! I wouldn't have thought of that `xchg` one in DOCOL, very clever.
I'll futz around with making DOCOL's address as a register and see if I can make that work, though I don't know if that will work since I think some of the mechanisms in place have additional functions.
EDIT: So, I don't think this will work. I'd be using `dx` to store DOCOL's address, but this appears to get clobbered somewhere, fixing which would outweigh the 1 byte benefit you end up actually getting. I'll keep thinking about it though!
Looking at the code, this looks remarkably similar to sectorforth?
Thing is, with sectorforth and this, 2 of the primitives are not required so reducing the VM to 6ops, so you can probably go smaller. Looks as though Cesar was happy with fitting into a sector:
https://github.com/cesarblum/sectorforth/issues
[Author here.] Indeed, sectorFORTH (as I have mentioned) was the primary inspiration and guide for this project! Ironically, in some places I even had a rather different design which converged on a more sectorFORTH-like design - it's really well put together.
I've removed some primitives further from milliFORTH, but I haven't touched the arithmetic base. I hadn't seen those issues on sectorFORTH though, very interesting - I will look into it!
Total tangent - I've often been annoyed at some library or whatever because I didn't like the way it was implemented or the api it exposed or whatever. So I try to reimplement it myself to my tastes, and eventually once I understand the problem and iterate a few times I end up with something remarkably similar to the library I initially didn't like.
It's a good reason to at least try to implement your own sometimes, rather than just use libraries - even if you end up just using the supported external dep, you'll understand it a lot better.
(not saying that's the case here, just that you reminded me of that with your convergance...)
When SectorLISP claimed to be the tiniest programming language in the world, it included (1) a Turing machine program, (2) an implementation of SectorLISP written in SectorLISP, and (3) a proof that it could run software that had been written for the original LISP 1.5 system built by LISP's inventor. If milliForth is making the same claim, then it should hold itself to the same standard. Otherwise it's just a calculator written by an anonymous person. I filed an issue here: https://github.com/fuzzballcat/milliForth/issues/8
He has recently done some amazing low power work with GreenArrays:
"GreenArrays is shipping its 144-core asynchronous chip that needs little energy (7 pJ/inst). Idle cores use no power (100 nW). Active ones (4 mW) run fast (666 Mips), then wait for communication (idle).
Tight coding to minimize instructions executed will minimize power. The programmer can also reduce instruction fetches, transistor switching and duty cycle."
The Moore Microprocessor Patent Portfolio was very lucrative.
Eventually it ended up with a patent marketer.
Nearly everyone doing processors had to buy in.
And as the inventor Chuck got his share out of it...
I have heard nothing new out of Green Arrays but the site still seems to be selling the boards. If I had free cash available I'd try ordering one just to see. It's a really interesting concept and I'd love to see it developed.
i'd like to know the source.
and the reasons i know long time ago are about more innovative work. like more compression, keyboard layout optimization, like having the code compiled in the source and utilizing the space to embed the meaning so there is no multiple parsing and compiling. even vars are living inthe code alive. and others i don't remember now.
Does the benefit of it being embeddable on a QR code outweigh the lack of quality of life features like subtraction? Nevertheless, truly an impressive feat that shows how simple computers can be without all these modern API layers.
For one specific purpose, the benefit could be worth it: bootstrapping.
Guix bootstraps from a tiny audited binary, and milliForth could be used for the same purpose.
Imagine bootstrapping a full Linux distro from a milliForth binary and source code for everything. Everything would be fully auditable, no Trusting Trust problem, and a full Software Bill of Materials.
I always thought of FORTH as the #1 hardest to think of uses for, out of all the non-esoteric and non-obsolete languages. That's a really neat application!
Booting from a firmware Forth loader was how the early Sun SunOS (BSD) workstations did their thing - you could hotkey to stop the default OS load and boot from an image on an alternative drive, across a network, or modify the Forth loader to <imagination>
That forth system is also where Device Trees originated. Parts of the Linux kernel driver interface show that heritage in the naming conventions, OF_* == OpenFirmware.
I believe Forth is used quite a bit in the embedded world - or used to be. I read, long ago from the Usenet days, you use it when you don't want to write assembly but C is too "heavy" for your hardware.
Bitcoin's smart contracting language is a Forth variant, and arguably this was a very astute and smart choice by Satoshi.
When designing a multi-party contract, the essential problem is that two people specify the terms they each want to apply to any case in which the funds must be spent, then these two sets of requirements must be merged into a single program. This is, in general, difficult to do securely. We can establish conventions that are relatively easy to follow, but it would be a lot nicer and more powerful if we could syntactically enforce that both sets of requirements are enforced in the combined program.
Concatenative languages like Forth meet this requirement nicely. If you represent the spend requirements as a program, then combining the two programs together in such a way that they both are equally enforce is as simple as literally concatenating the two programs together.
For example, suppose Alice's requirement is that Alice signs the transaction with her key, and Bob's requirement is that Bob signs with his key, and the spending transaction is after some specified time T. Expressed in bitcoin script:
Alice: <AlicePubkey> CHECKSIGVERIFY
Bob: <T> CHECKLOCKTIMEVERIFY DROP <BobPubkey> CHECKSIGVERIFY
The combined script that meets both these sets of requirements is as simple as putting Alice's script, then Bob's, unaltered:
Alice&Bob: <AlicePubkey> CHECKSIGVERIFY <T> CHECKLOCKTIMEVERIFY DROP <BobPubkey> CHECKSIGVERIFY 1
(The `1` at the end is a quirk of bitcoin that it has to finish with a non-zero value on the stack. This combined script could also be simplified in a couple of ways. Also there's a couple of ways in which this can fail in practice. Alice's script could contain OP_RETURN, for example, which causes the entire script to become unspendable. Or a mismatched IF/ELSE. A better designed and strongly typed Forth dialect would fix these issues.)
Bitcoin's Forth is not type checked, but suppose that it were. And furthermore, suppose that it had a powerful dependently typed system that captured various key signing and stack requirements at the type level. It could be used to track not just what a program does, but also the properties of a program. Alice could put a constraint in her program that says "lock time can be no later than April 2024," and this becomes part of both the input and output type requirements of her program. Then when Bob's program is specified with T=15 May 2024, then his program no longer type checks when concatenated to the end of Alice's.
No one has yet written a system like this, but it would be really powerful if it did exist. Alice writes here smart contract conditions all by lonesome self, and Bob writes his. Then they literally concatenate one program to the other, and if it type checks then Alice and Bob can be certain that both sets of conditions are satisfied.
Bitcoin script is a large source of complexity, and it turns out to be mostly redundant. Schnorr signatures allow the use of so-called scriptless scripts [1] [2], which can do most of the things than Bitcoin script is used for, including multisig, absolute and relative timelocks, payment channels, discrete log contracts, and atomic swaps (using adaptor signatures). All without the need for a scripting language.
The example you gave of a multisig output that can be spent by either Alice, or Bob after some time, is easily handled too.
That's a stretch. Replacing timelocks with hashcash (or equivalent) is hardly the same thing.
But generally speaking "scriptless scripts" suffer from the need for interactive construction. There's a long way to go until scriptless scripts is a full replacement for script, if it is even possible.
Who said anything about replacing timelocks with hashcash?
I'm talking about actual relative timelocks, that put a minimum block distance between 2 particular transactions.
> suffer from the need for interactive construction.
Interactive construction is already widely in use in relative timelocks' prime application of payment channels.
The scope of potential smart contract applications is much broader, and vastly more interesting than payment channels (yawn). And payment channels are pretty much the only application (definitionally) in which you can get away with "all parties are online, or this contract will be cancelled anyway" assumptions.
Do you have a citation for a block-based relative timelock? The only such things I've seen are verifiable delay functions, which are a lot more like hashcash.
What about absolute lock times? Those are far more interesting in applications of business logic, like logistics contracts. Most of which require non-interactive state updates.
Not really, no. Lightning is still payments, multi-party and multi-hop. If the entire realm of Blockchain and smart contract technology was restricted to just payments, well I suppose that would be something. But far short of the total revolution of societal structure that was promised. I personally wouldn’t care to work on it if that was the case. Lightning is an implementation of distributed payments. Smart contracts are an implementation of decentralized law.
Your link to the mailing list thread describes essentially the same approach that bitcoin has for dealing with lock times. I would not normally describe this as “scriptless scripts,” which I have understood to me using the signing operation itself to achieve some goal that would otherwise be done in script. The locktime here is still enforced by special cased code of the consensus algorithm. If the number of things you want to do are fixed and enumerable then I suppose this is a valid approach. But it does not allow for general, arbitrary, future-defined constraints.
To give a concrete, simple example: show me how to do a return peg validation on scriptless scripts. If it can’t even do that, it is not a replacement for scripts.
The manual way. Validate SPV inclusion proofs using CAT, hash opcodes, bignum addition, etc. To be practical it requires block header commitments, but there are other reasons for wanting that. You also need relative locktime, which original bitcoin did not have. But that should be all that is required to be added. Other commonly cited things like Merkle branch verification opcodes just make the script more compact, or get around the fact that some required opcodes were later disabled and/or restricted from use with bignums.
Re: bitcoin vs. ethereum, the bitcoin community pre-ethereum was very invested in the idea of decentralizing all the things. I remember: I was there and part of that history. But over time people interested in more than just payments or the bitcoin asset have left or switched to ethereum. Originally expressive smart contracting was very much in scope.
Verifying cumulative diff requires verifying possibly thousands of headers (all those between peg out and peg in), which is a completely impractical amount of witness data for one script.
Committing headers in Merkle tree form or skip lists allow for logarithmic reduction in that number, as was worked out in the (2014?) sidechains whitepaper. 10’s of headers would be doable.
Beyond requiring intrusive changes to Bitcoin like header commitments, and still needing dozen-KB scripts for pegs, it also limits properties of the sidechain. For instance, if the latter wants to use a memory-hard PoW like Cuckoo Cycle, I don't see how Bitcoin script (which cannot even loop) could verify a cycle in a pseudorandom graph.
I could argue that there’s no reason for any proof of work other than sha256 as these side chains should be merge mining, but we’re way off in a tangent thread.
The point is that smart contracts actually do something other than “authorize payment”, and fancy Schnorr signing modes don’t come close to replicating those capabilities.
And so long as there is a need for a programmable verification architecture, Forth-like languages are prttty ideal.
Yes, I’ve long advocated for a high level smart contracting language that would be similar to prolog in structure and semantics. It would compile down to a state machine, and then the state machine would be interpreted as a series of outputs (nodes on the state machine graph) and spending transactions (edges).
The actual spend condition language would be lower level, however, and Forth-like languages are a good fit for this part of the stack.
But that doesn't directly relate to the verifiability of milliForth itself. An extremely shortened code can be harder to verify, for example it may work as intended unless a very specific input is used to break out of its sandbox (so to say). Bootstrapping needs a short and readable enough seed for that reason, and I can't be entirely sure that it is indeed the case for milliForth.
The sneakiest possible program representable in n bytes is never less sneaky than the sneakiest possible program representable in n-1 bytes, assuming you can pad a program out with nops.
Technically I guess what you'd do is hand type in binary code to create a hex editor, then bootstrap this forth off that, by hand-typing it in.
If you include formal verification tools as part of the stack then you can verify the tools by eye, type them in, and use them to verify the function of the stack.
Admittedly Forth is tricky to do here, but something like lisp/scheme/wat can actually be proven out with say microkanren.
There is of course no benefit to this thing at all other than reclaiming the crown from those deviant lispers... well I suppose if you're making a your own computer from scratch like https://www.homebrewcpuring.org/ it might be a useful starting point.
Would it be possible to do a similar thing in a similar size for, for example, the 6502? I would love to give that a try, even though I probably lack the skills to make it that small. Or maybe it's even impossible.
> The included py_autotype.py takes forever, but nothing outputs in the qemu vm as it is running.
Keep in mind, nothing SHOULD output while it's running, until you get to the very end and there's a 'hello world' printed.
This being said, the autotype's lethargy severely irks me. But I couldn't make it go faster, as pyautogui kept tripping up. If someone finds a better way (i.e. a way that isn't my first random thought), it will be much appreciated.
The sectorlambda implementation of Binary Lambda Calculus is shorter yet at 383 bytes [1].
And the BLC self-interpreter is only 29 bytes [2].
> A FORTH in 422 bytes — the smallest real programming language ever, as of yet.
That may still be true, as BLC is an esoteric programming language.
[1] https://justine.lol/lambda/
[2] https://ioccc.org/2012/tromp/hint.html