Seems super fast, some are saying 600x faster [0], than than the version made off of Google's paper. But it is a little less accurate. Point clouds are less useful but some on Reddit and the authors have tools to try to convert to meshes [1][2]. It does feel like stable diffusion level generation of good 3d assets is right around the corner. It will be interesting to see which tech wins out, whether it's some variant of depth estimation like sd2 and non ai tools can do, object spinning/multi angle view like Google's tool does, or whatever this tool does.
> The main problem with mesh generation from stuff like this is that usually the topology is a mess and needs a lot of cleanup to be useuable. It's not quite so bad for static non deforming objects but anything that needs to be animated deforming or that is organic looking would likely need retopologizing by hand.
>
> That's one of the worst parts of 3D modeling so it's like you're getting the AI to do the fun part and leaving you to do all the boring cleanup process.
From [1]. Seems like there is a pattern of "AI asked to generate final results with only final results to learn from, immediately asked for the apple in the picture" in AI generators. I suppose lack of specialization in application domains of NNs is a deliberate design choice for these high-profile projects, in a vague hope of simulating emergent behaviors as seen in the nature and avoiding to be another expert system(while being one!), but that attitude seems limiting usefulness, here and again.
People developing these models are very aware of what 3D workflow is like.
The issue is that image->point cloud training data is very easy to get, whereas image or point cloud -> clean 3d mesh training data is very hard to get in unconstrained domains.
Generating point clouds is where the state of the art is now. That doesn't mean that the whole field isn't entirely aware that text->3d mesh unlocks many more capabilities.
Seems like video game engines and the like would be useful ways to get lots of 3d models to corresponding point cloud data. What's the blocker to doing that? The models shown on that page look like 3d graphics circa 2000's or earlier.
I agree that random sampling surfaces of 3D meshes seems like a reasonable way to generate synthetic data for mesh > point cloud.
Without knowing a dang thing about AI, it feels like the problem moreso lies in:
1. Math related to topology: vertices, faces, edges, tri vs quad etc
2. Different topologies for the same object are better for different use cases. Rendering, skinning, morphing, physics etc all have different optimal topologies, and the definition of optimal varies based on workflow and scene specifics or even the human who has skills based on certain topological preferences. In other words, I'm not sure how much of 3D workflows are standardized even -- getting the topological data for workflows is no easy task, and it's not super usable until the model output can plug right into a workflow and the existing DCC ecosystem.
text2img generates a static asset, text2mesh is far more interesting beyond just the static rendering part which is where mesh topology becomes a big sticking point.
* There isn't software that generates point clouds from video games. This should be solvable but AFAIK hasn't been done yet.
* The diversity of models in video games is much lower than the real world
* Games use a bunch of techniques to reduce the poly count while making assets look like they are high poly (eg texture mapping). It's unclear what should be generated here.
Or ask CG designers, under consent and with credits, for data recordings of intermediate steps. Same for illustrations. It almost seems like circumventing experts is the point.
Don't human designers do image or point cloud -> clean 3D mesh in an iterative manner? I see it will be significantly more computationally expensive to iteratively deform a cube to a tree by NN, but I don't see why it isn't a solution.
the thing is that it's been shown times and times again (with chatGPT for example) that you can get really pretty good results by giving massive amounts of final results to the model. This approach is better by far than anything we've ever had in either text AI or image generation AI
It’s a fun demo. Worth to note that on mobile it didn’t include any button to download the generated point cloud data itself, at least not that I could find. Might be the same on desktop also.
Additionally I think the amount of time taken depends on the amount of visitors. I had to wait about 7 minutes for it to finish.
Too many users, I don't know Hugging Face's rules but they seem to limit how much each demo can use to a ceiling. When I ran it originally it was like 12 people using it, looks like the queue is now around 300 and Hugging Face doesn't spin up more instances. That being said the model is relatively small and can be run locally with at least 5 GB of VRAM according to the Stable Diffusion subreddit.
I understand this is point-cloud diffusion but Ajay Jain et. al. (BAIR + Google Research) accomplished the first version I saw of this back in June with their Dream Fields paper (CPVR'22).
As always this goes to show that if you can't be the first, be the loudest. OpenAI has the most well oiled media machine I've seen in awhile.
Seeing the waves of publicity OpenAI gets with every new release, I think we're seeing a new model for big-tech AI research groups. It isn't enough to just hire world-class research talent that publish area-defining papers. There has to be a commensurate investment in media to publicize the research. Obviously, if you don't have the research, you have nothing to market. But it should say something that OpenAI prioritizes great design, communication, and publicity in addition to the world-class research team. It wouldn't surprise me if we see Google AI / DeepMind / FAIR / double-down with their own investments to expand the media presence of their AI orgs.
Maybe the lack of commenter enthusiasm is because point clouds are fairly specialized. Most people don’t have interesting point cloud data lying around to test this with, or the means to capture such data.
3D sensors are slowly but surely becoming more common. The iPhone Pro series has one, and AR hardware designs tend to include these capabilities. So this model synthesis seems a bit ahead of the curve, in a good way.
Agree with you that point clouds aren't mainstream at all and most people aren't sure what they'd use them for.
I think the premise of this is text-to-3D, and that because it's quicker generations you don't really need anything besides a GPU to start playing around with it.
Can't you easily convert point clouds to polygons? If you know the sampling frequency, it's trivial to identify holes/edges/faces and just convert it to a rough polygonal model. Then you can run edge collapses with a desired error to make it smoother.
Sorry I don’t know any store apps for such, my only experience is through personal corespondance/demos with/by developers experimenting with the hardware feature and sdk. Quick googling turns up some contenders but I can’t vouch for them:
It generates nerfs (which have some advantages as well as disadvantages depending on application) but Luma AI is arguably SotA for photogrammetry on iPhones.
An alternate approach, although brewed force would be generating an image set using prompts and then using photogrammetry to convert to 3D. Either way, I'm excited for this space to grow both in 3D prompt generation and alternate inputs through scanning. There's a difference between creative and functional use case.
Looking at their prices and the (impressive) quality of the mesh topology in the demo movie they have on their web page (the rat) I couldn't help but think this a front that pretends to use pure AI but actually has real people (specialized mechanical turks) involved.
Specifically for guiding generation of the mesh from a possibly AI-generated point cloud (PTC). E.g. using manual contraints on an mostly automatic quad (re-)mesher ran as a post process on the triangle soup obtained from meshing the original, AI-generated PTC.
I.e.:
1. AI-generate PTC from image(s).
2. Auto-generate triangle mesh via marching cubes or whatever from PTC.
That would explain their pricing which seems a tad too high for a fully automatic solution.
$600 for 30 models x 10 iterations. I.e. each iteration would cost $2.
Or maybe it's just so niche this is simply because of number of users for now and indeed fully automatic.
Curious to hear what other people involved in 3D and cloud compute think.
Sometimes the business plan is to use human guidance initially but use it to improve the automated models, which gives you both first mover advantage and better training data… would be a smart play in this space.
Seems super fast, some are saying 600x faster [0], than than the version made off of Google's paper. But it is a little less accurate. Point clouds are less useful but some on Reddit and the authors have tools to try to convert to meshes [1][2]. It does feel like stable diffusion level generation of good 3d assets is right around the corner. It will be interesting to see which tech wins out, whether it's some variant of depth estimation like sd2 and non ai tools can do, object spinning/multi angle view like Google's tool does, or whatever this tool does.
[0] https://twitter.com/DrJimFan/status/1605175485897625602?t=H_...
[1] https://www.reddit.com/r/StableDiffusion/comments/zqq1ha/ope...
[2] https://github.com/openai/point-e/blob/main/point_e/examples...