In a nutshell, differentiable programming is a programming paradigm in which your program itself can be differentiated. This allows you to set a certain objective you want to optimize, have your program automatically calculate the gradient of itself with regards to this objective, and then fine-tune itself in the direction of this gradient. This is exactly what you do when you train a neural network."
Isn't this just declarative programming as we know it from e.g. Prolog, SQL or other places where the programmer declares what their objective is, and it's left up to the interpreter, compiler or scheduler to figure out the best way to achieve that? And now that's being applied to ML (which probably makes sense, since it involves a lot of manual tweaking). Sounds like a great use case for a library, but hardly worthy of being called a new programming paradigm.
• Sure, it could come under the umbrella of "declarative programming", but that's an enormous umbrella, so that doesn't really say much.
• I fail to see how differentiable programming (the idea of expressing the desired computation in terms of differentiable objective functions) is any less of a "paradigm" than logic programming (the idea of expressing the desired computation in terms of logical predicates).
• Depending on the expressiveness of your programming language, every paradigm can seem like it's "a great use case of a library, but hardly worthy of being called a new programming paradigm."
Eh, differentiable programming is a lot simpler and more specific than declarative programming, and only marginally a "paradigm". It just means you can take any (reasonably deterministic and parametrized) function specified in the programming language and calculate its partial derivative wrt some parameter.
Is the presumption that these parameters are floats, or at least numeric? It seems like if I took the functions from some random program (e.g., GNU diff), most would not be meaningfully differentiable. Or perhaps I'm missing something?
Yes, the parameters you're differentiating with respect to need to be floats.
Though there might be potential for extending the frameworks to e.g. differential cryptanalysis - I'm not knowledgeable enough about it to say how much differential cryptanalysis can be done programmatically.
It's not Prolog/SQL-style declarative programming; the part about setting the objective and calculating the gradient is an example use case, not the whole thing.
You'd like to be able to write your functions using the language's standard function syntax, but have access to both the function and its differentiated form. You can achieve it with macros (in an ad-hoc way) or as a custom language feature (what's being done here, again kinda ad-hoc), or you can use an algebra+interpreter style but at the cost of having to use a less natural syntax (Haskell do notation or similar). The thing that I'd say is closest to the ideal solution is something like stage polymorphism (which is genuinely an exciting new paradigm in my book: it squares the circle of macros versus strong typing), where you can write a function definition in natural function syntax and have access both to the function itself and the AST of the function in a much richer form than what a macro gets (which can be interpreted to produce a differentiated version of the function).
This is the declarative paradigm in the same way that python maintaining an abstract representation of your code that it then passes down to a C interpreter is a "declarative" paradigm.
Sure, there are some analogies, but it is stretching the definition a bit.
I'm no expert, but is there not cause for it being a language feature rather than a library? When you use a library you effectively have a DSL to construct your differentiable program, and lose the facilities of the host language.
It's also not quite declarative programming either. In the same way that you build ordinary executable programs out of smaller executable parts. You build differentiable programs of smaller differentiable parts. You aren't declaring what outcome you want, you are simply restricting what you are building your program out of so that it has properties that allow you to interpret the program differently from how it will run "normally".
I think I can explain a little, but caveat is that it's a little bit of the blind leading the blind, ie. this isn't my field, but I work with people who are working on this and they have talked to me a little about it.
Declarative programming is more or less orthogonal to differentiable programming. You're right that declarative as a paradigm leaves the implementation details up to the "compiler," and so you can have arbitrary implementations created in response to one declared specification. And often what that means is that under the hood it's possible that the compiler can be tweaked to output better results. But the thing is that those compiler changes and optimizations, are just that: arbitrary. You can't "know" or "prove" anything about them, and that means it mostly requires human creativity and intelligence to make progress.
That is fine and all, but differentiable programming is asking/answering the question: Ok, but what if we COULD prove something here? What if the units of computation could be guaranteed to have certain mathematical properties that make them isomorphic to other formalisms?
And why do we care about that?
Well, in math what happens is that some people will start with their favorite formalism and prove a bunch of stuff about them, and figure out how to do interesting calculations or transformations on them. Like "Ah, well if you have a piece of data that conforms strictly to the following limitations, then from this alone we can calculate this very interesting answer to a very interesting question."
But a lot of the time those mathematical paradigms can't talk to each other -- in programming terms, their apis just aren't compatible. Like raster vs vector images. Both image storage/display paradigms are "about" the same thing, but a lot of the operations you can do on one don't even make sense to try on the other, and our ability to translate back and forth between them is a little wonky. Math formalisms are a bit like that, a lot of the time.
So it's very interesting in math world when someone proves that a formalism in one paradigm can be transformed perfectly into a formalism from another paradigm. All of a sudden all the operations available in either paradigm become available in both paradigms because you can always just take your data, transform it, do the operation, then transform the answer back.
(Side note: this is why some people are excited about Category Theory: it's like a mathematical rosetta stone. Ie. A lot of things can be translated into and out of category theory, and in turn all those things that were previously in separate magisteria are interchangable.)
Ok so, back to differentiable programming. If you suddenly have a way to conceive of your program / unit of computation as a differentiable function, then right off the bat you get access to all the tools ever created for calculus. The optimization thing where you find the gradient of the program and follow it toward some target is just one of the things. You also get a huge suite of tools that let you enter the world of provably correct programs, for example.
You also get access to all the tools of all math that can translated to and from calculus, which is... a lot of them. I wish I knew more about math so I could rattle off the 100 ways that would help, but I can't, so instead I'll just say that I think it would be a game-changer for creating optimized, robust systems that work way, way better than our current tech.
In a nutshell, differentiable programming is a programming paradigm in which your program itself can be differentiated. This allows you to set a certain objective you want to optimize, have your program automatically calculate the gradient of itself with regards to this objective, and then fine-tune itself in the direction of this gradient. This is exactly what you do when you train a neural network."
Isn't this just declarative programming as we know it from e.g. Prolog, SQL or other places where the programmer declares what their objective is, and it's left up to the interpreter, compiler or scheduler to figure out the best way to achieve that? And now that's being applied to ML (which probably makes sense, since it involves a lot of manual tweaking). Sounds like a great use case for a library, but hardly worthy of being called a new programming paradigm.