Personally I think the problem starts at “shared understanding” and I think the most promising solution lies within combining ontologies (such as ones based on BFO, perhaps) with Abstract Syntax Trees or Concrete Syntax Trees.
I suspect future development will occur merging tests as examples of program functionality with data models and call graphs derived and annotated based on common ontologies.
We would have, in this future, the ability to “translate” programs from one language to another the way Google Translate does, not necessarily as correctly as if one understands the language’s native idioms, but as if one had a dictionary of words and phrases and their definitions and could translate snippets to relate an unfamiliar codebase to patterns.
This would be clearer if tagging of code to an ontology were baked into the language the way the type system is baked into TypeScript’s ability to annotate types.
And TypeScript itself is an excellent example of how if we can annotate more fine-grained information useful generally only to programmers actively developing the program, it’s still very much a win-win.
I find myself frustrated now at how I can’t always rename string to other custom types if I want the string’s type to express meaning, similarly not every language supports string literal types.
Languages are more expressive about types than they used to be, and it’s possible to make types dynamically are compile time, so languages have themselves become more flexible.
The ultimate goal would be to encode the system model so concisely that you would want to re-use the model or ontology in a number of systems, yet maintain a bidirectional relationship so if your database or a third-party system adds a constraint to the model, the model reflects that automatically. Vice versa if your model incorrectly encodes the real world, you should be able to refactor your programs by changing the model.
I suppose to make this ontology-based solution a bit easier it should be broken down into two parts: an ontology of computer software and hardware terms based on ISO BFO as an example, and a separate ontology representing the program’s problem domain, often outside of computer science.
There is something of a flaw in this logic — models rarely map exactly to the real-world and thus while you can annotate or tag software, nothing can save you from a bad model or one that needs to evolve.
To that end, ASTs and CSTs with automated code formatting can help again. There are programs that can mutate tests until they pass to automatically suggest fixes. Programs you can write to rewrite programs automatically.
I actually think one part of the article aged poorly — the section where program modifications are hard to do at scale. Actually, program changes can be trivial at scale these days, assuming you can avoid PR merge conflicts of course.
The tough part is ensuring you’ve enough knowledge of what the program is currently doing, it’s current behaviours and environment, as well as what it was meant to do.
One last thing, a program may entirely be theory not code, but as a counterpoint: any behaviours undefined by the program model or spec will eventually be relied upon by somebody at scale.
Which is another way of saying that sometimes a program dies because it is adopted too widely and thus can never evolve without confusing everyone and everything that uses it.
This is perhaps an argument that programs should constantly evolve and be built for evolution, that models should also. If so, git and GitHub help tremendously but we don’t have enough similar tools for modelling and ontology yet. We don’t have a standardized git or TS for adding model annotations to source code or trees/derived program artifacts. Git commit comments help but only a bit, they aren’t descriptive enough. Can we relate a commit to a model change? Or a production incident? How interlinked yet machine interpretable are our models and corresponding representations in code, in commit history?
And finally, can we make models and ontologies easy to use and update with less training and distraction? Could a system be built to help reverse engineer models from code by illustrating possible shapes and a human then does the work of researching the correct details and aligning all possible representations into one derived model? I’m thinking of how human-computer systems generally outpace human or computer decision making alone. If so, we rely far too much on humans to understand models encoded in code today, and should shift that burden back to the machine as much as possible to instead assist us and where possible, spot mistakes in our models.
I fully agree about the role of an expressive test suite. One of the key insights of Kuhn as cited here is that a "theory" of gravitation requires examples like planetary motion and pendula. A UML Activity diagram can help to convey the application domain globally, but a good suite of unit tests helps me understand the micro-domain of a code base.
I suspect future development will occur merging tests as examples of program functionality with data models and call graphs derived and annotated based on common ontologies.
We would have, in this future, the ability to “translate” programs from one language to another the way Google Translate does, not necessarily as correctly as if one understands the language’s native idioms, but as if one had a dictionary of words and phrases and their definitions and could translate snippets to relate an unfamiliar codebase to patterns.
This would be clearer if tagging of code to an ontology were baked into the language the way the type system is baked into TypeScript’s ability to annotate types.
And TypeScript itself is an excellent example of how if we can annotate more fine-grained information useful generally only to programmers actively developing the program, it’s still very much a win-win.
I find myself frustrated now at how I can’t always rename string to other custom types if I want the string’s type to express meaning, similarly not every language supports string literal types.
Languages are more expressive about types than they used to be, and it’s possible to make types dynamically are compile time, so languages have themselves become more flexible.
The ultimate goal would be to encode the system model so concisely that you would want to re-use the model or ontology in a number of systems, yet maintain a bidirectional relationship so if your database or a third-party system adds a constraint to the model, the model reflects that automatically. Vice versa if your model incorrectly encodes the real world, you should be able to refactor your programs by changing the model.
I suppose to make this ontology-based solution a bit easier it should be broken down into two parts: an ontology of computer software and hardware terms based on ISO BFO as an example, and a separate ontology representing the program’s problem domain, often outside of computer science.
There is something of a flaw in this logic — models rarely map exactly to the real-world and thus while you can annotate or tag software, nothing can save you from a bad model or one that needs to evolve.
To that end, ASTs and CSTs with automated code formatting can help again. There are programs that can mutate tests until they pass to automatically suggest fixes. Programs you can write to rewrite programs automatically.
I actually think one part of the article aged poorly — the section where program modifications are hard to do at scale. Actually, program changes can be trivial at scale these days, assuming you can avoid PR merge conflicts of course.
The tough part is ensuring you’ve enough knowledge of what the program is currently doing, it’s current behaviours and environment, as well as what it was meant to do.
One last thing, a program may entirely be theory not code, but as a counterpoint: any behaviours undefined by the program model or spec will eventually be relied upon by somebody at scale.
Which is another way of saying that sometimes a program dies because it is adopted too widely and thus can never evolve without confusing everyone and everything that uses it.
This is perhaps an argument that programs should constantly evolve and be built for evolution, that models should also. If so, git and GitHub help tremendously but we don’t have enough similar tools for modelling and ontology yet. We don’t have a standardized git or TS for adding model annotations to source code or trees/derived program artifacts. Git commit comments help but only a bit, they aren’t descriptive enough. Can we relate a commit to a model change? Or a production incident? How interlinked yet machine interpretable are our models and corresponding representations in code, in commit history?
And finally, can we make models and ontologies easy to use and update with less training and distraction? Could a system be built to help reverse engineer models from code by illustrating possible shapes and a human then does the work of researching the correct details and aligning all possible representations into one derived model? I’m thinking of how human-computer systems generally outpace human or computer decision making alone. If so, we rely far too much on humans to understand models encoded in code today, and should shift that burden back to the machine as much as possible to instead assist us and where possible, spot mistakes in our models.