Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Turbo Pascal Deconstructed (simonwillison.net)
6 points by alberto-m 2 days ago | hide | past | favorite | 6 comments
 help



That raises questions though: How does he know that the chart is correct? Furthermore, whether it's correct or not, what was the process by which the LLM reached its conclusions, i.e. did it disassemble and examine the source itself or did it use pre-existing reverse engineering done by others on the internet like https://www.pcengines.ch/tp3.htm?

You can see hints of what it did in https://claude.ai/share/260d2eed-8d4a-4b9f-8a75-727c3ec4274e - annoyingly though it looks like Claude sharing doesn't detail actual code it ran.

Here's the zip file it gave me of the files it generates along the way: https://static.simonwillison.net/static/2026/turbo-pascal-an...

I had Codex GPT-5.4 xhigh run a check of those files to see if the artifact at the end appeared to use the right data, which isn't 100% fool proof but have me enough confidence to publish since this is a pretty low stakes project!


Thank you very much for that super analysis done with the help of AI - I really enjoyed reading that. May I ask are you paying for that service? And if so, how much?

Anyhow, I downloaded your ZIP file and looked into the disassembly. It seems that the disassembler simply disassembled byte by byte not taking into account that TURBO.COM is both, code and data. Since the x86 instruction set is very tense, pretty much every byte sequence turns into legal instructions. Even the ASCII strings were disassembled. Look at address hex4864 in the file for example - it should be the string "Write block to file" but got disassembled. I wonder how AI managed that obscure file.


I ran the analysis using regular Claude. I'm paying $200/month by the $20/month subscription should work fine too, and it might even work with the free plan.

For the code generator, it produced this annotated disassembly:

    2100 push ax            ;--- EmitByte: write one byte to code output ---
    2101 mov di, [code_ptr] ;DI → current position in output buffer
    2104 stosb              ;Write AL to output, advance DI
    2105 mov [code_ptr], di ;Update code pointer
    2108 pop ax             ;Restore AX
    2109 ret                ;Every compiled instruction flows through this 6-instruction emitter
    2110 mov al, 0E8h       ;--- EmitCall: generate CALL instruction ---
    2112 call EmitByte      ;Emit opcode byte E8h (near CALL)
    2115 sub bx, [code_ptr] ;Calculate relative offset
    2118 sub bx, 2          ;Adjust for instruction length
    211A xchg ax, bx        ;AX = relative offset
    211B call EmitWord      ;Emit 16-bit relative displacement
    211E ret                ;Generated: E8 lo hi — a complete CALL instruction
Obviously, there has to be a lot more to even a simple-minded x86 code generator than just a generic "emit opcode byte" and "emit call" routine. In general, what A"I" produced here is not a full disassembly but a collection of short snippets, potentially not even including the really interesting ones. But is it even correct?

EmitByte here is unnecessarily pushing/popping AX, which isn't modified by the few instructions in between at all. No competent assembly language programmer would do this. So maybe against all expectations, Turbo Pascal is just really badly coded? No, it's of course a hallucination: those instructions don't appear in the binary at all!

That the hex addresses are wrong can already be seen in the instruction "mov di,[code_ptr]" here being apparently only three bytes long. In reality it would take four! And it's easy to confirm that this code isn't present at the addresses shown.

So maybe it's somewhere else? x86 disassembly can be complicated because the opcodes are variable length, and particularly in old programs like this the code and data are often not cleanly separated. Claude apparently ran it through NDISASM, which doesn't even attempt to handle that task.

But searching for e.g. the hex opcode B0 E8 ('mov al,0xe8') is enough to confirm that this code snippet isn't to be found anywhere.

There is a lot more suspicious code, including some that couldn't possibly work (like the "ret 1" in the system call dispatcher, which would misalign the stack).

Conclusion: it's slop


Thanks for this, I've added that to my write-up of the project here: https://simonwillison.net/2026/Mar/20/turbo-pascal/#hallucin...



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: