Eventually people will have enough of Little Bobby Tables and url spoofing and then query engines won’t allow string concatenation at all.
The only alternative I know of is to make a query engine that exactly emulates the String Interpolation syntax of the host language and can detect string concatenation in the inputs.
But the problem with non-builders is always going to be GraphQL and advanced search boxes, where there are any of a couple dozen possible parameters and you either build one query that returns * for every unused clause or you have a factorial number of possible queries. If you don’t use a builder then Bobby always shows up. He even shows up sometimes with a builder.
Something I’ve ran into a lot over the years is people not realising that (at least in MySQL) prepared statement placeholders can only be used for values, not identifiers like column names.
Because many query builders abstract away the creation of a prepared statement, people pass variables directly into column fields and introduce injection vulns.
Number one place I see this is data tables: you have some fancy table component where the user can control which columns to see and which to sort by. If you’re not checking these against a known good allow list, you’re gonna have a bad time.
Yes, it varies by database whether placeholders can be used for table names.
Personally I find table names sufficiently non-dynamic that an enum of accepted values & string concatenation works. Whenever I've wanted to template user input into table names, I've realised that I should probably refactor my schema or add some views.
Your post made me realize that custom string interpolation handlers from C# could probably be used to do injection-safe SQL querying (https://learn.microsoft.com/en-us/dotnet/csharp/advanced-top...). Makes me wonder whether scenarios like that were considered when they built that feature or if it's just a fun coincidence from making string interpolation extensible.
Interpolated string handlers are a newer API made primarily with customization and efficiency of interpolation in mind. They are more complicated to implement and use but enable zero-cost and/or zero-allocation use cases and performing custom behavior on appending literals or data. They are also used to implement the default string interpolation handler and a few auxiliary ones like Append handler for a string builder which writes directly into it, bypassing intermediate string construction.
It's quite a mouthful to say though but very useful for advanced scenarios. Haven't seen any other language provide anything comparable.
Javascript added 'tagged templates' a while ago that allows you write a function that gets passed the 'parse tree' of a template string, so js libs do a lot of 'interpolation magic' for sql queries
I mean, in C++ (17? 20? Whenever constexpr was introduced) it's totally possible to create a library that allows you to build a SQL query via the language's string concatenation libraries/etc., but only allows you to do it with static strings unless you use ~shenanigans. (C++ unfortunately always allows ~shenanigans...)
I guess you do wind up needing to potentially re-implement some basic things (or I guess more complex, if you want format string support too). But for basic string concatenation & interpolation, it's reasonable.
That's a pretty useful way to get basic string concatenation while also preventing it from creating opportunities for SQL injection.
For example, you have a class that requires a constexpr input & can be appended to/concatenated/etc.:
SqlStringPart(constexpr ...)
operator+(SqlStringPart ...)
(so on)
And you have a Query API that only takes SQL string expressions that are built out of compile time constants + parameters:
SqlQuery(SqlStringPart ..., Parameters ...);
This doesn't solve the problem mentioned in the article around pagination & memory usage, but at least it avoids letting someone run arbitrary SQL on your database.
The only alternative I know of is to make a query engine that exactly emulates the String Interpolation syntax of the host language and can detect string concatenation in the inputs.
But the problem with non-builders is always going to be GraphQL and advanced search boxes, where there are any of a couple dozen possible parameters and you either build one query that returns * for every unused clause or you have a factorial number of possible queries. If you don’t use a builder then Bobby always shows up. He even shows up sometimes with a builder.