I don't know Ada / SPARK, and I've been trying to figure this out. Based on the hallucinations I got from ChatGPT, it seems Ada itself is nowhere near as powerful as Rust in safety, while Ada with SPARK disallows some things I was considering to be quite basic, such as shared aliasing of data.
For example, it seems it's not possible to get a sub-string slice reference to an original unbounded string. In rust, a &str -> &str signature is trivial.
So it seems Ada still relies on discipline, while SPARK does not have the zero-cost abstractions that C++ and Rust have.
If that's true (is it?), then I'd definitely choose C++/Rust over Ada any time, since performance is very important to me.
Both Ada and spark have zero costs abstractions, they’re designed to run on embedded platforms.
Spark is a different use case from rust - it’s a full prover, and the goal is formal verification, typically in contexts where human life is at stake ( say, you’re writing software for an artificial heart , to take an extreme example ). This comes at the cost of being less flexible, but they’ve been slowly evolving spark so that it can handle increasingly complex cases .
Less ChatGPT and more language reference manuals, ChatGPT isn't an ancient oracle knows it all, even though Microsoft's marketing sells it as such.
Ada has as much zero cost abstractions as C++ and Rust have, and one of the reasons of Ravenscar is even what to turn off for bare metal deployments, and real time OS deployments.
By letting the compilers decide the best way to implement them.
Also I find funny this point of view on Ada, given the poor examples WG21 has added into the C++ standard library, that will never be fixed due to never-ending ABI drama.
But doesn't this copy the entirety of that slice? That's not what I meant, I was referring to a shared reference, akin to &str in Rust or std::string_view in C++.
Doing it normally should create a copy of the value, as far as I can tell. Unless you use it to create a renamed variable or by-reference parameter, the same way you can create references in C++.
I think the closest thing to a &str in Ada would be an "access String" or "access constant String", which you would get either from an allocated "String" or from a declared "aliased String". You'd create a subslice with "string(x..y)'Access". Though I'm not sure whether that actually works without explicitly declaring an array of "aliased Character", the Manual is dense (as with C++, it's nearly meaningless unless you already know what it's supposed to mean) and the tutorials generally avoid talking about access objects.
Thank you! I assume 'Access types are not memory-safe, right? Is there a SPARK equivalent which still does not copy (i.e. it only references) that is memory safe?
To be more specific, how would one implement something like a "Get_First_Word" or "Trim_Whitespace" without copying?
I think it's supposed to be safe in Ada by default? As far as I can tell, basic allocated objects cannot be deallocated without an "unchecked" operation, and access objects created from "aliased" declarations are subject to scoping rules [0]. (If you want to know the full details, go figure out however "accessibility levels" are supposed to work [1].) It should preclude functions from returning access objects without a "prefix'Unchecked_Access" operation. I'm not sure how SPARK's borrowing system is tied into all of this.
From the documentation you linked, it seems slicing creates a brand new String, which is more like std::string than std::string_view. In other words, it allocates and copies all of the string's characters (although it might allocate on and copy those bytes to the stack).
Also, the Unbounded_String owns its copy of the data, as opposed to referencing other data. The difference with String seems to be just that it can grow. It's still more like std::string than std::string_view.
Note that both std::string and std::string_view are essentially just a pointer and a length (std::string also has a capacity but let's ignore that). The difference is that trying to duplicate a std::string will end up duplicating (deep-copying) the data behind the pointer as well, where as duplicating a std::string_view will not.
Could you help me understand/interpret that link the same way you do, in case I'm missing something?
Allocating a new String, requires... "new String". You issue the "new" command, or the source does.
But what Unbounded does is... "U.Reference (U.First .. U.Last)". It returns a reference. It's not duplicating, because that would defeat the point of its entire existence. Its the buffer, containing one or more string objects, and you're just slicing a reference out of it - because that's the point.
If you want a String, you need to allocate one.
function Slice
(Source : in Unbounded_String;
Low : in Positive;
High : in Natural)
return String;
For this - there is no `out` marked. What you're grabbing is part of Unbounded, and the compiler won't let you deallocate it. Because it's owned, as a reference, to the Unbounded String.
For example, here's the actual GNAT source of the function [1]:
function Slice
(Source : Unbounded_String;
Low : Positive;
High : Natural) return String
is
begin
-- Note: test of High > Length is in accordance with AI95-00128
if Low - 1 > Source.Last or else High > Source.Last then
raise Index_Error;
else
return Source.Reference (Low .. High);
end if;
end Slice;
A pre-existing reference is returned. There is no allocation that happens whatsoever.
I tried to get some sort of proof based on Godbolt, to see if it generates any memcpy's, but I couldn't manage to do that after quite a few tries. :(
It's really difficult to understand this given how much I know about Ada, so the best I can do is to keep throwing questions at ChatGPT. And I keep getting results that go against what you said.
I've also tried a couple direct review examples from o4-mini-high, one without the documentation link [1] and one with it [2].
It matches what I've managed to learn as well. I know how LLMs work and that they hallucinate a lot, so I can't tell who here is wrong, since you seem to be really experienced, and I barely know anything... what are your thoughts?
Oh, and I really appreciate you walking me though this! Like, a lot a lot! Thank you very much!
Never, ever, throw any non mainstream language at an LLM. You will get absolutely nothing but bullshit back. They do not comprehend, and so they cannot move away from generalisation to actually speak about the language. There is not a large enough public source to train the model.
If you're throwing something to a con-artist, don't be surprised if everything you get back does not line up with reality.
I see it does what you're saying on the "else" branch, but that just returns the previous string unmodified, which makes sense. The more important one is the "if" branch though.
Looking at your Godbolt example's assembly, even if I add -O3, I see it does a call to "ada__strings__unbounded__unbounded_slice", but until I know the contents of that I can't say whether the pointer it returns is derived from the same allocation as the original one, or from a new allocation that the string was copied to.
You're using the Unbounded_Slice function [1], which calls To_Unbounded_String [2], which calls `new String` [3], which you mentioned in a previous comment that it will allocate, right?
The kind of operation I'm looking for is something like a "Trim_Whitespace" function that re-uses the old allocation without copying all the data, even when there is whitespace to trim.
(The exact definition of a string is implementation-defined. But that's the concept.)
Ada enforces safe ranges, which means you need to carry the length of the slice somehow. It does not use C's 0-terminated strings. So slicing does not work the same way as strtok or other self-modifying systems - the length isn't guessed, it's known.
But if you change one character in the buffer of the slice, it'll be changed in the original Unbounded_String too.
For trimming whitespace, you're right that Unbounded's standard Trim may reallocate. It carries multiple buffers, and when you Trim sometimes it will just hand it back, other times it'll reallocate. [0] Mostly for performance tradeoff. Keeping the original can make iteration slower, as it holds multiple buffers.
So, to implement our own - with one caveat. Slice can't handle 0-length, because range safety is enforced. So in the case of a wholly whitespace string, we'll be doing a whole new allocation.
-- This line is just for pasting into godbolt
pragma Source_File_Name (NTrim, Body_File_Name => "example.adb");
with Ada.Strings.Unbounded;
with Ada.Strings.Maps;
use Ada.Strings.Unbounded;
function NTrim(Source : Unbounded_String) return Unbounded_String is
Len : constant Natural := Length(Source);
First, Last : Natural;
Whitespace : constant Ada.Strings.Maps.Character_Set := Ada.Strings.Maps.To_Set(" " & ASCII.HT & ASCII.LF & ASCII.CR);
begin
if Len = 0 then
return Source;
end if;
First := 1;
while First <= Len and then Ada.Strings.Maps.Is_In(Element(Source, First), Whitespace) loop
First := First + 1;
end loop;
Last := Len;
while Last >= First and then Ada.Strings.Maps.Is_In(Element(Source, Last), Whitespace) loop
Last := Last - 1;
end loop;
if First > Last then
return To_Unbounded_String("");
end if;
declare
Trimmed_Length : constant Natural := Last - First + 1;
begin
if Trimmed_Length >= 3 then
return Unbounded_Slice(Source, First, First + 2);
else
return Unbounded_Slice(Source, First, Last);
end if;
end;
end NTrim;
The resulting compilation [1] has a few things. Our whitespace map gets allocated and deallocated most of the time. A map is harder to treat as a constant, and the compiler doesn't always optimise that nicely. Most of the code is bounds checking. No off-by-one allowed, here. Where first is greater than last, you get a new full allocation.
A few days ago, I had ChatGPT compare Rust and Ada. It tended to penalize Ada for its runtime checks and access values (aka pointers). However, ChatGPT didn't account for the fact that many of Ada's runtime checks would need to be manually implemented by developers in other languages. An Ada compiler, can often optimizes these checks away, knowing where they're genuinely needed and where they can be removed. This often explains why speed comparisons between C and Ada code can be misleading, as they rarely factor in the extra manual effort required to make C code equivalently robust with necessary safety checks.
Regarding access values, I listed out some of Ada's various restrictions. Its scope rules prevent referencing objects at deeper levels, objects must be explicitly marked aliased to create an access value to them, and there's far less need for access values (for instance, no pointers are needed to pass parameters by reference). Additionally, Ada offers the ability to dynamically size some objects and reclaim their memory without explicit memory allocation.
After I highlighted these details, ChatGPT admitted it had unfairly evaluated Ada, concluding it's a very safe and robust language, albeit using different techniques than Rust.
For example, it seems it's not possible to get a sub-string slice reference to an original unbounded string. In rust, a &str -> &str signature is trivial.
So it seems Ada still relies on discipline, while SPARK does not have the zero-cost abstractions that C++ and Rust have.
If that's true (is it?), then I'd definitely choose C++/Rust over Ada any time, since performance is very important to me.