Any OpenCL/CUDA program for GPGPU would require this. Different GPUs may have different characteristics (core count, available instructions, memory sizes and speeds), that need to be taken into account and can be optimized for. This can only be done when you know the run-time target device, which is only at run-time.
This is the reason why for GPGPU programs, you often supply the kernel/shader as C code or another intermediate representation (vendor-specific assembly), and the final compilation step is done by the GPU driver.
Are there examples of other programs that require this?