Sure, it would bloat the code a little to inline the optimized version, but it could be done in tight inner loops if required.
Sure, it would bloat the code a little to inline the optimized version, but it could be done in tight inner loops if required.