+ using literate programming techniques to manage coding in assembly.
+ scenario, "screen-should-contain" and "assume-keyboard". awesome.
+ spaces: nice primitive of the closure and similar concepts
+ attributes for meta programs. again, awesome.
+ labels and using [,] as labels. genius.
Suggestions:
* If there's a valid reason to call them recipes, go ahead. If they're actully just functions, please stick with what everyone knows and gets. Same thing with "reagents","reply", "ingredients" and "products". Update: I did read your blog post about this. still think the regular names are better in the long run.
* I dont know where you're going to with multiple types of the "number:list" kind, but if so, the map must look the same, not lispy.
In all, I really like where you're going with this. Kudos!
I find a lot of resonance of ideas I've had for a long time now.
Wow, that's a very detailed look. Thanks! Feel free to email me; my address is in my profile.
The terminology is definitely a work in progress. It's mostly for my attempts to teach programming using Mu. I noticed that mathematical words intimidated some students. I tried to write up my rationale here: http://akkartik.name/post/mu. But I've certainly started mixing up the terms with my student, so it might not last.
> Functions, arguments, classes, methods, objects, threads, locks, all these are reassuring everyday words, and yet their meaning in programming (and math) bears no relation to their everyday meaning. (from http://akkartik.name/post/mu)
I can see how the use of the term "arguments" could be confusing ("parameters" or "inputs" would make more sense), and "threads" is a rather tenuous metaphor for how scheduling works within a kernel, but all of the others mirror their real-world meaning pretty well. I'd be more worried about students needing to unlearn "containers", "ingredients", and "reagents" when they start reading material from outside of Mu, talking to other developers, or learning calculus (which uses functions, sets, and arrays).
Actually, the only thread a ten-year-old knows is the one you might play cat's cradle with. It isn't obvious why there should be anything exclusive about one, or why you're better off keeping multiple of them 'separate'.
Functions are what something is for, as opposed to form. It isn't natural to think about their inputs and outputs.
It's quite possible my solutions are too blunt and problematic, but these seem like real problems.
Ada called them tasks. I always thought a multi-tasking program made more sense intuitively if one knew what a task was. A language designed for easy learning might call them actions, activities, jobs a la 1960's... something more intuitive.
Interesting, as I tried a quick brainstorm, I thought of a group of kids sitting together playing with the same toys. Actions that only one could do on a toy at the same time. Trying to convey the problems or exclusivity. Mentally came to problem of sharing. Then that people often borrowed toys temporarily then the owner checked up on them.
(lightbulb) Rust has a borrow-checker. And ownership. Now I wonder where they came up with that haha.
I'm not sure if a term like "tasks" / "jobs" / "actions" makes more sense. You wouldn't necessarily separate each thing that a program does into a new thread... Functions are for separating each body of work, while threads represent a path of execution through a process. In this way multiple lines of execution are intertwined to form a process, just as real threads are intertwined to form a rope... It's not a very good metaphor, but I can't think of a better one.
In Ada, "tasks" makes sense, but only because it is an abstraction that makes use of threads while managing all memory access and separation from the rest of the process.
Hmm. Good points. Maybe make an analogy to sonething like traffic lanes? More lanes equals more throughput. Not always utilized and diminishing gains as you add lanes for given workload. Problems occur during intersections where they share a resource. Requires synchronization, signalling, and/or ordering protocols to maintain safety.
How you like THAT! Maybe need a similar analogy closer to whats going on but the spirit of it seems accurate. Maybe factory workers on assembly lines or offics workers at desks.
Yeah, I agree about the term "threads" - it doesn't make for a very direct metaphor.
As for functions, I think the term makes sense, even with the non-mathematical definition, and especially in the context of OOP:
> an activity or purpose natural to or intended for a person or thing. synonyms: purpose, task, use, role
The "purpose" part doesn't make much sense with the way that we use the word, but it's certainly an activity. For example, a function of a stove is boiling water, its input would be liquid water, and its output would be steam & hot water. In OOP, "stove" would be the name of the class, and "boil_water" would be the function.
Here's an argument against using the term "ingredients" for arguments: in the real world, if you bake a cake, its ingredients are gone. In your language, they still exist (if you write a recipe for eat in mu, you can have your cake and eat it)
From that observation, I think one should conclude that using analogies from cooking, as you do with recipe in the kitchen sense is not the best idea.
- creating an array w/ create-array is possible, however I could not find how to take its address and pass it around
- one cannot get the address of an invalid position of the array, so one cannot construct bounded ranges (STL style) using a begin/end pair of addresses
- the interpreter strongly suggests to use refcounted pointers (address:shared) instead of addresses even when one does not borrow the memory
- could not figure out how to write a test scenario that uses checks named memory addresses rather than ordinal memory addresses (memory-should-contain instruction)
That's great!! Yes, create-array was just a version I created so I could teach arrays separate from new and addresses. But it creates an array on the 'stack' (default space) so there's no way to take an address of it. I quickly take my students past it to new.
memory-should-contain doesn't currently support named locations, sorry. Part of the problem is that with spaces a name can have many different addresses in different functions. So my tests write stuff I want to check in raw numbered locations, and the first 1000 addresses are reserved for tests so that names can never be clobbered.
I looked at Rust's borrow checker for a bit but wasn't smart enough to understand how it works or transplant it easily. I also noticed that it wasn't smart enough to deal with things like doubly-linked lists without reaching for ref-counting. So I figured I'd keep things simple and just use ref-counting for everything. That way I punt on all the complicated static checks in favor of a simple runtime one.
The rule is: new returns shared:address, and get-address and index-address (and maybe-convert) return address. Use shared:address to pass things around between functions, and reserve non-shared addresses only for short-term operations, usually mutations. Since non-shared addresses are not dynamically allocated, there's no possibility of use-after-free so they don't need to be refcounted, and you can copy them around as much as you like.
Use-after-free and related memory corruption is really the only thing I'm concerned about protecting my users from. Memory leaks I plan to have tools for, so that programmers can identify and break them down when memory becomes a concern, but not worry about until then. I call this "zero-developer-cost abstractions" :) It feels less restrictive and more dynamic than Rust.
Edit: I just took a look at your code, and it feels perfectly idiomatic. Nice job. Only issue I found was that you forgot to specify the outputs of find in its header. So its calls end up doing some runtime type-checking. I should probably raise a warning in this situation. Thanks again.
Indeed, I missed the output for find! A warning would definitely be helpful.
I like that you have users and thus will be able to test things out and get some genuine feedback.
This might however introduce a bias guiding your language and standard library in a certain direction (I think it's inevitable and what's happening in all languages anyway) .. Therefore I would add some of your peers into the mix.
Otherwise I really like the directions you are exploring, as I've come to very similar conclusions at this point in my career:
- being able to modify a system through safe additions
- not focusing too much on local details of a system
Yeah, the reactions here have been invaluable. It's interesting: I actually got into teaching because I had trouble getting programmers to try Mu out. Most people tend to unconsciously filter out what seems gratuitously new and different. But I'll keep trying.
I'm a little hesitant to learn how to 'understand large programs', or how to structure them for that matter, from a repository that has 100 code files in the root directory.
Layers of code are filled in as you read down the list of files. It's an interesting concept, similar to literate programming. (And FWIW, these source files are C++ code for the compiler, not an example of Mu code.)
Author here. For my part, I have never understood why people consider it to be a good thing to squirrel code into a bunch of different sub-directories.
a) It makes the build scripts more complicated, which means they'll be more likely to break on some poor noob, and that when they break it'll be less likely the noob will be able to tell how to fix it.
b) Invariably the codebase accumulates dependencies between directories that are uneconomic to reorganize. At least a flat directory has a shot at under-promising and over-delivering.
c) Maybe people do it to make the place look neat, they way I used to 'clean my room' by stuffing all my dirty clothes into drawers. But codebases that are messes at a deep level are less likely to be cleaned up if they look clean at a superficial level. And all codebases eventually turn into messes, the way we've done things so far.
Sure, directories can be used badly (and the Java and Ruby projects I've seen are often very difficult to navigate because so much is the structure of the language, rather than the structure of the project).
But they can also be used well, and the promise of a meaningful directory structure is really appealing.
Here's a question, given all of what I understand about your layered way of structuring code: how easy is it to spin off logical portions of a project? If my project accretes its own HTTP server and I want to turn that into a stand alone library, could I? Would it be possible to "rewrite history" so that the HTTP-related changes never appeared in the layers to start with?
Yeah, that's a good question. Even though the mu core is at the top directory, there are a couple of apps in sub-directories of their own. To run a single-file mu program you say:
$ ./mu factorial.mu
To run a more complex app, you just give a directory name rather than a filename:
$ ./mu edit
Files inside the directory are loaded in numeric order just like at the top level. You can also run just a subset of the layers for the editor:
$ ./mu test edit/001*
$ ./mu test edit/00[12]*
$ ./mu test edit/00[1-3]*
(Notice how the number of tests/dots grows at each layer.)
Since the layers are just regular files there's nothing stopping you from rewriting history as much as you want. That ability was precisely what I built them for.
I've only recently started using directories, so I'm sure there's stuff here I haven't considered. Feedback most welcome.
I think the reason why people (I put myself into this camp as well) prefer directories for organization is because there is an entire ecosystem built around them. Backing up, recursive search, exploring via ui, programmable access, programmatic access using language of ones choice are all choices available to the user. Code organization isnt the only place this shows up, organizing music and photos has similar semantics - apps which try to do the organization via higher level metadata rarely put in the engineering effort to come up with robust solutions for all the above use cases I cited abode.
It's not the same. I can open, process, or skim a directory containing a thousand files without much trouble or CPU activity. Having a thousand files worth of information in one file means I have to run through all that data at once. This has implications in data integrity as well if I'm making changes or worried about bitrot.
So, no, there's a difference. I'll also note that Daniel Bernstein of qmail & NaCl often used a filesystem how some use databases under the philosophy of "Why create new, complex functionality when you have well-tested code that gets the job done?"
I have to admit, I like long files. Arc's compiler is all in one file (https://github.com/arclanguage/anarki/blob/master/ac.scm) and it was a big influence on me. It was just so great to be able to have it open on one side while I wrote programs in Arc.
You're right that there are points beyond which a single file becomes too unwieldy, but with a good editor that limit is quite high for me. Maybe 10k LoC. Directories can grow even larger before they start having problems.
His idea is that the way to understand a large computer program is by running it, testing it. So he's made (what I understand) a very nice UI for introspection of a running computer program. You can see what is happening while it's running.
Designing a programming language is kind of like a 'rite of passage.' I think most of us have at least thought about how to make languages better.
I like your idea, and I think there's room for improved introspection in debugging.
At the same time, I think that if a program can't be understood without being executed, then it's already lost (in terms of readability).
Certainly, for example, if you have a threaded program, you need to be able to visually inspect and verify that there will be no deadlocks or race conditions when the program is run, because you won't be able to test every possible condition in a debugger.
It might partly be a personality-type thing. I can't imagine visually inspecting and verifying anything if I didn't write it to begin with. I wouldn't know where to begin thinking of possible ways to break it. On the other hand, it seems far more natural for the author to say, "here are the race conditions I considered, these points where a context switch would be maximally inconvenient. And still lo the program works." This way the reader doesn't have to recreate such situations from scratch. He or she just has to verify the provided situations, or notice a scenario that was missed. That seems like an easier ask from someone new to the project.
(Mu scenarios can't insert context switches yet, but it's planned.)
Perhaps it would help to think of the dichotomy as between the rules and the state space of inputs they handle, rather than between reading and running. Seemingly simple code can often hide surprising subtleties. Why is this line written just like so and not thus? How does everything turn out just right in this one situation? Tests help to record the right questions for the reader to ask.
> I can't imagine visually inspecting and verifying anything if I didn't write it to begin with.
It's a skill, you need to develop it. You develop it by doing it :)
> I wouldn't know where to begin thinking of possible ways to break it.
For deadlocks, look at every single lock used. Show that one of the Coffman conditions doesn't apply, and you've proven that there can never be a deadlock.
For race conditions, look at every piece of shared memory (or other shared resource). That is where race conditions always occur.
For understanding a program, the key is to understand the structure. That is why people put things in subdirectories, to help communicate the structure of the program, and show which things are closely related.
Do you really look at every single piece of shared memory? When you inherit a large codebase from someone? How do you reconcile that with having to deliver changes in the first weeks? Everywhere I've been, people half-ass these things, with inevitable bugs. I think you're under-estimating the possibility that you're just a better programmer than me :/
> Do you really look at every single piece of shared memory?
Yes
> When you inherit a large codebase from someone?
If the codebase has lots of threading errors, then yes, it's the only way. If it's a large program, it can take months to go through and check every piece of shared memory (and removing threads along the way, removing shared stuff); but the alternative could take years.
In one case, I moved every single lock/unlock to the top of a file so I could quickly see all the locks and unlocks, and that each lock had a matching unlock, even in error conditions.
How did you get into such issues? Were there classes where you got started, or was it all on the job?
I think I'm not concerned so much about inheriting a codebase with lots of threading errors. It's more about a codebase that's almost perfectly right, but where I'm scared to change anything because I don't have a big-picture model of the concurrency in my head..
Do you have any code samples I can try to learn from? For example, I'm not sure how you would move lock/unlock pairs to the top of a file from different functions/scopes. Unless you were doing some sort of literate programming as well?
> How did you get into such issues? Were there classes where you got started, or was it all on the job?
I took the usual college classes dealing with concurrency (in my school it was in the OS class), but it took several years in the industry to really feel confident with threads (it took less time to feel confident with networking). I wrote down my knowledge (for what it's worth) in this book: http://www.amazon.com/dp/0996193308
> I'm not sure how you would move lock/unlock pairs to the top of a file from different functions/scopes.
That solution might not work in every case. It worked in the particular case I was referring to.
> It's more about a codebase that's almost perfectly right, but where I'm scared to change anything because I don't have a big-picture model of the concurrency in my head..
Hmmmmm, that's an interesting question. Usually when the codebase is written, the person who wrote it had an idea in his head that, "this is how things will be locked to avoid problems." I try to figure out what that idea was.
Sometimes there is like a critical 'zone,' where a thread acquires the lock when it enters, and releases when it leaves. For example, it could acquire the lock when it enters a class method, and releases it when it leaves the class. Then the class becomes the critical zone.
Maybe learning to think of 'critical zones' is the most important skill to understanding the big picture?
I think the project would benefit from a Terminology / Definitions section which lists in one table the names and crisp definitions of the concepts being introduced. It appears that you are using non standard terms so having a single list of the terms would allow readers to have that list open when they are reading through the layers.
That's a good idea. Might http://akkartik.github.io/mu/html/010vm.cc.html [1] be such a big-picture map? Let me think about how to improve it; I haven't touched it in a while. Further suggestions most welcome.
I was thinking something much simpler. If you were writing a book about Mu what would be the definitions you would print on the back inner cover? A table with two columns - first column being a term and the second column containing its description in one or two sentences.
Could you elaborate on the target audience for the doc you linked to? I am unable to make out whether that doc is intended for serious programmers in other languages coming across Mu for the first time or whether its intended for new comers to programming.
Yeah the intended audience is existing C++ programmers coming across Mu for the first time. For teaching programming I mostly imagine I'll be doing it in person. Mu isn't yet ready for learning programming unassisted.
It translates to assembly/machine code with a lot less code. I don't need a compiler, I don't need to write optimizations.[1]
I actually don't consider Mu to be a language. (I'm the author.) It's a low-level starting point to explore ways to make the standard OS primitives more testable. I don't need it to look nice. Higher level languages can come later, once the foundations are in place.
I actually am growing increasingly suspicious of all our navel-gazing about syntax. We spend all our time in places like HN thinking about how to make small screenfuls of code look nice, but all that effort hasn't really translated into helping me understand the large-scale structure of random open-source codebases more easily.
Perhaps one way to think about it is that syntax helps insiders keep a codebase in their head. I'm more concerned with ways to help more outsiders import a codebase in the first place. Because people leave and move on, and it's very hard for software projects today to improve once their original authors move on. More on this: http://akkartik.name/post/readable-bad
(I'm susceptible to carpal tunnel, but it hasn't become a bigger issue. I take wrist breaks.)
[1] I can even imagine doing the translation in machine code, so that Mu then becomes self-hosting without needing any of the current stack. I might go down that road..
"I'm more concerned with ways to help more outsiders import a codebase in the first place."
This. The pain point that mu addresses is huge. I haven't really looked at the project in much depth yet, or played around with it, but I love that you are tackling this issue. I have spent way too much time wandering around a big code base trying to build mental models of what's going on, and thinking to myself the entire time "there's got to be a better way."
I don't need a compiler, I don't need to write optimizations.
You've effectively created a sort of portable assembly language, but it's more verbose than usual assembly language ("sub eax, 3").
but all that effort hasn't really translated into helping me understand the large-scale structure of random open-source codebases more easily.
I think the root of the problem is that software is being written to be more complex than it could/should be, so the solution is to encourage reducing complexity.
You've effectively created a sort of portable assembly language
Exactly. The verbosity is mostly because of my teaching project. I think teaching assembly can be just as ergonomic as teaching a high-level language. Indeed, most of us from a generation ago learned programming using assembly. It just needs to get a little more ergonomic.
the solution is to encourage reducing complexity.
Yup. The problem is that human beings have a _terrible_ track record at managing complexity. It's not just every software project ever; think about the creep of bureaucracy in ancient China, or the creep of legislation in ancient Rome. Everytime we've created a repository of rules it's gotten complex (and then gamed by smart operators). I think the problem is that it's very hard to justify removing a rule, so such repositories grow monotonically. The only way to periodically prune unnecessary rules is to first track why you created them in the first place (https://en.wikipedia.org/wiki/Wikipedia:Chesterton's_fence). Hence: tests! I think they are the great advance bequeathed to the human race by software. And they're far more broadly applicable outside software -- though I have no idea how to do this applying. I wrote up some speculative ideas about it at http://www.ribbonfarm.com/2014/04/09/the-legibility-tradeoff a couple of years ago, but that piece isn't very clear.
While working on large code bases I have often (in half jest) wished for the concept of a runtime performance tax on code that is engineered poorly. Some variants of this could be
a. Imposing an exponentially increasing programmatic sleep on functions based on their length.
b. Repeated database / filesystem / network requests for identical resources multiple times in a program should again incur a slowdown.
This kind of consideration has led me down the road towards XML, because it's the syntax that is powerful enough to model documents that bundle a mix of schemas(including source formats with their own idiosyncratic syntax). It is not a syntax anyone wants to type in or read, but that one advantage counts for a lot, given the right problem domain, because it pushes the semantic content to the center.
Some say (for example Moore) that arithmetic is only a tiny part of programming and don't deserve special syntax (especially with a language that lacks C-style for loops).
Programming languages (at least the not-so-academic kind) need to mesh with the problem-spaces being solved, and there are definitely some contexts where most of the work is encoding and applying arithmetic rules.
I haven't been posting here long, so I wasn't sure if it was acceptable to type 'fuck.' So I self-censored by replacing it with asterisks, which hackernews promptly ate. So yes, you are right of course.
> [...] think of a tiny improvement to a program you use, clone its sources, orient yourself on its organization and make your tiny improvement, all in a single afternoon.
If we alter that to "three programs you use [...] all in a single afternoon", we can plonk that into my resume.
+ recording runs as tests
+ one library for all layers of abstraction
+ building in assembler to avoid the big runtime.
+ using literate programming techniques to manage coding in assembly.
+ scenario, "screen-should-contain" and "assume-keyboard". awesome.
+ spaces: nice primitive of the closure and similar concepts
+ attributes for meta programs. again, awesome.
+ labels and using [,] as labels. genius.
Suggestions:
* If there's a valid reason to call them recipes, go ahead. If they're actully just functions, please stick with what everyone knows and gets. Same thing with "reagents","reply", "ingredients" and "products". Update: I did read your blog post about this. still think the regular names are better in the long run.
* I dont know where you're going to with multiple types of the "number:list" kind, but if so, the map must look the same, not lispy.
In all, I really like where you're going with this. Kudos! I find a lot of resonance of ideas I've had for a long time now.