Some things to note:
- This was finished last year, impressive now but even more so back then
- No code or models released BUT several authors have moved to StabilityAI and are working on their own improved open video models, which is hopeful as the field continues to move forward
- The paper uses existing image models as a base, and so a better base model (the new XL stable diffusion variant, or Midjourney's underlying model) will give even better results.