Point-E: Point cloud diffusion for 3D model synthesis

cdcox · on Dec 20, 2022

Web demo for anyone interested takes about 2 minutes to run: https://huggingface.co/spaces/osanseviero/point-e

Seems super fast, some are saying 600x faster [0], than than the version made off of Google's paper. But it is a little less accurate. Point clouds are less useful but some on Reddit and the authors have tools to try to convert to meshes [1][2]. It does feel like stable diffusion level generation of good 3d assets is right around the corner. It will be interesting to see which tech wins out, whether it's some variant of depth estimation like sd2 and non ai tools can do, object spinning/multi angle view like Google's tool does, or whatever this tool does.

[0] https://twitter.com/DrJimFan/status/1605175485897625602?t=H_...

[1] https://www.reddit.com/r/StableDiffusion/comments/zqq1ha/ope...

[2] https://github.com/openai/point-e/blob/main/point_e/examples...

numpad0 · on Dec 20, 2022

> The main problem with mesh generation from stuff like this is that usually the topology is a mess and needs a lot of cleanup to be useuable. It's not quite so bad for static non deforming objects but anything that needs to be animated deforming or that is organic looking would likely need retopologizing by hand. > > That's one of the worst parts of 3D modeling so it's like you're getting the AI to do the fun part and leaving you to do all the boring cleanup process.

From [1]. Seems like there is a pattern of "AI asked to generate final results with only final results to learn from, immediately asked for the apple in the picture" in AI generators. I suppose lack of specialization in application domains of NNs is a deliberate design choice for these high-profile projects, in a vague hope of simulating emergent behaviors as seen in the nature and avoiding to be another expert system(while being one!), but that attitude seems limiting usefulness, here and again.

nl · on Dec 21, 2022

This is a misdiagnosis of where the problem is.

People developing these models are very aware of what 3D workflow is like.

The issue is that image->point cloud training data is very easy to get, whereas image or point cloud -> clean 3d mesh training data is very hard to get in unconstrained domains.

Generating point clouds is where the state of the art is now. That doesn't mean that the whole field isn't entirely aware that text->3d mesh unlocks many more capabilities.

elcritch · on Dec 21, 2022

Seems like video game engines and the like would be useful ways to get lots of 3d models to corresponding point cloud data. What's the blocker to doing that? The models shown on that page look like 3d graphics circa 2000's or earlier.

JayStavis · on Dec 21, 2022

I agree that random sampling surfaces of 3D meshes seems like a reasonable way to generate synthetic data for mesh > point cloud.

Without knowing a dang thing about AI, it feels like the problem moreso lies in:

1. Math related to topology: vertices, faces, edges, tri vs quad etc

2. Different topologies for the same object are better for different use cases. Rendering, skinning, morphing, physics etc all have different optimal topologies, and the definition of optimal varies based on workflow and scene specifics or even the human who has skills based on certain topological preferences. In other words, I'm not sure how much of 3D workflows are standardized even -- getting the topological data for workflows is no easy task, and it's not super usable until the model output can plug right into a workflow and the existing DCC ecosystem.

text2img generates a static asset, text2mesh is far more interesting beyond just the static rendering part which is where mesh topology becomes a big sticking point.

nl · on Dec 21, 2022

Yes, I agree this is a potential way forward.

I believe there are three problems:

* There isn't software that generates point clouds from video games. This should be solvable but AFAIK hasn't been done yet.

* The diversity of models in video games is much lower than the real world

* Games use a bunch of techniques to reduce the poly count while making assets look like they are high poly (eg texture mapping). It's unclear what should be generated here.

Take a look at the field of NeRFs (Neural Radiance Fields: https://datagen.tech/guides/synthetic-data/neural-radiance-f...) for more on this subject.

numpad0 · on Dec 21, 2022

Or ask CG designers, under consent and with credits, for data recordings of intermediate steps. Same for illustrations. It almost seems like circumventing experts is the point.

nl · on Dec 21, 2022

This isn't what is needed. The intermediate steps aren't useful.

numpad0 · on Dec 22, 2022

Don't human designers do image or point cloud -> clean 3D mesh in an iterative manner? I see it will be significantly more computationally expensive to iteratively deform a cube to a tree by NN, but I don't see why it isn't a solution.

make3 · on Dec 21, 2022

the thing is that it's been shown times and times again (with chatGPT for example) that you can get really pretty good results by giving massive amounts of final results to the model. This approach is better by far than anything we've ever had in either text AI or image generation AI

codetrotter · on Dec 20, 2022

> Web demo for anyone interested takes about 2 minutes to run: https://huggingface.co/spaces/osanseviero/point-e

It’s a fun demo. Worth to note that on mobile it didn’t include any button to download the generated point cloud data itself, at least not that I could find. Might be the same on desktop also.

Additionally I think the amount of time taken depends on the amount of visitors. I had to wait about 7 minutes for it to finish.

yazzku · on Dec 21, 2022

How does it take 2 minutes? I typed "dragon" and it was scheduled to take 12k seconds, or 3.3 hours. Is that normal?

cdcox · on Dec 21, 2022

Too many users, I don't know Hugging Face's rules but they seem to limit how much each demo can use to a ceiling. When I ran it originally it was like 12 people using it, looks like the queue is now around 300 and Hugging Face doesn't spin up more instances. That being said the model is relatively small and can be run locally with at least 5 GB of VRAM according to the Stable Diffusion subreddit.

yazzku · on Dec 21, 2022

I see, thank you. It was using about 30% of CPU on my end, so I couldn't tell whether it was running (poorly) on my browser or on a server.

gfodor · on Dec 21, 2022

Seems like point clouds -> big voxels -> smooth mesh could work, using smooth voxels: https://github.com/webspace-sdk/smoothvoxels

goodmattg · on Dec 21, 2022

I understand this is point-cloud diffusion but Ajay Jain et. al. (BAIR + Google Research) accomplished the first version I saw of this back in June with their Dream Fields paper (CPVR'22).

As always this goes to show that if you can't be the first, be the loudest. OpenAI has the most well oiled media machine I've seen in awhile.

Seeing the waves of publicity OpenAI gets with every new release, I think we're seeing a new model for big-tech AI research groups. It isn't enough to just hire world-class research talent that publish area-defining papers. There has to be a commensurate investment in media to publicize the research. Obviously, if you don't have the research, you have nothing to market. But it should say something that OpenAI prioritizes great design, communication, and publicity in addition to the world-class research team. It wouldn't surprise me if we see Google AI / DeepMind / FAIR / double-down with their own investments to expand the media presence of their AI orgs.

[1] https://ajayj.com/dreamfields

dang · on Dec 20, 2022

https://techcrunch.com/2022/12/20/openai-releases-point-e-an... (via https://news.ycombinator.com/item?id=34069231)

https://twitter.com/drjimfan/status/1605175485897625602 (via https://news.ycombinator.com/item?id=34068271)

(but no meaningful comments at those other threads)

pavlov · on Dec 20, 2022

Maybe the lack of commenter enthusiasm is because point clouds are fairly specialized. Most people don’t have interesting point cloud data lying around to test this with, or the means to capture such data.

3D sensors are slowly but surely becoming more common. The iPhone Pro series has one, and AR hardware designs tend to include these capabilities. So this model synthesis seems a bit ahead of the curve, in a good way.

JayStavis · on Dec 20, 2022

Agree with you that point clouds aren't mainstream at all and most people aren't sure what they'd use them for.

I think the premise of this is text-to-3D, and that because it's quicker generations you don't really need anything besides a GPU to start playing around with it.

9wzYQbTYsAIc · on Dec 20, 2022

> Maybe the lack of commenter enthusiasm is because point clouds are fairly specialized.

Please correct me if I am wrong.

The pointcloudtomesh notebook seems to be be able to output something could be converted for 3d printing purposes.

I haven’t yet attempted to do so, but that does seem like an exciting and general purpose use case.

bitL · on Dec 21, 2022

Can't you easily convert point clouds to polygons? If you know the sampling frequency, it's trivial to identify holes/edges/faces and just convert it to a rough polygonal model. Then you can run edge collapses with a desired error to make it smoother.

redavni · on Dec 21, 2022

There is a lot of publicly available realtime satellite data that nicely converts to point cloud based on cloud top temperature.

uplifter · on Dec 20, 2022

anyone with a recent (last five years) iphone or ipad has the means to generate point cloud data using the depth sensors.

alvah · on Dec 21, 2022

Theoretically this is the case, but in my testing with several apps on an iPhone 13 Pro the results are underwhelming.

speedgoose · on Dec 20, 2022

Do you have an app to recommend? And does it work well on small objects? The apps I tried were not very impressive.

uplifter · on Dec 20, 2022

Sorry I don’t know any store apps for such, my only experience is through personal corespondance/demos with/by developers experimenting with the hardware feature and sdk. Quick googling turns up some contenders but I can’t vouch for them:

https://apps.apple.com/ca/app/point-cloud-ar/id1435700044

https://apps.apple.com/ca/app/point-precise/id1629822901

astrange · on Dec 20, 2022

Polycam https://apps.apple.com/us/app/polycam-lidar-3d-scanner/id153... is good for outdoor scanning in my limited experience.

janekm · on Dec 21, 2022

It generates nerfs (which have some advantages as well as disadvantages depending on application) but Luma AI is arguably SotA for photogrammetry on iPhones.

tarr11 · on Dec 20, 2022

See also Magic3D from Nvidia, which generates mesh models.

https://deepimagination.cc/Magic3D/

FloatArtifact · on Dec 20, 2022

An alternate approach, although brewed force would be generating an image set using prompts and then using photogrammetry to convert to 3D. Either way, I'm excited for this space to grow both in 3D prompt generation and alternate inputs through scanning. There's a difference between creative and functional use case.

yazzku · on Dec 21, 2022

I think you meant 'brute force' but 'brewed' also works for me.

FloatArtifact · on Dec 21, 2022

Thanks for the laugh and kindly pointing it out. I'll leave it in for everyone's amusement.

dr_dshiv · on Dec 20, 2022

Anyone else using Kaedim to translate 2d images to 3d models? https://www.kaedim3d.com/

We made some midjourney lamps—and then printed them! Pretty cool.

virtualritz · on Dec 20, 2022

Looking at their prices and the (impressive) quality of the mesh topology in the demo movie they have on their web page (the rat) I couldn't help but think this a front that pretends to use pure AI but actually has real people (specialized mechanical turks) involved.

Specifically for guiding generation of the mesh from a possibly AI-generated point cloud (PTC). E.g. using manual contraints on an mostly automatic quad (re-)mesher ran as a post process on the triangle soup obtained from meshing the original, AI-generated PTC.

I.e.:

1. AI-generate PTC from image(s).

2. Auto-generate triangle mesh via marching cubes or whatever from PTC.

3. Quad re-mesh with mesh-guided automatic constraint discovery (think edges, corners etc.).

4. Manual edit quad-mesher constraints.

5. Quad re-mesh.

That would explain their pricing which seems a tad too high for a fully automatic solution. $600 for 30 models x 10 iterations. I.e. each iteration would cost $2.

Or maybe it's just so niche this is simply because of number of users for now and indeed fully automatic.

Curious to hear what other people involved in 3D and cloud compute think.

janekm · on Dec 21, 2022

Sometimes the business plan is to use human guidance initially but use it to improve the automated models, which gives you both first mover advantage and better training data… would be a smart play in this space.

punkspider · on Dec 21, 2022

Found this vid [1] while looking up Kaedim. The author also noticed the high prices, among other things.

    [1] https://www.youtube.com/watch?v=dnwDPLLzzEU

punkspider · on Dec 20, 2022

How long does it take to convert 2d to 3d?

I found out about Kaedim a few weeks ago and when I saw this repo, it came to my mind as well.

acreatureofhab · on Dec 20, 2022

This is bananas... the metaverse may be possible with tech like this being available for the masses.

bilsbie · on Dec 20, 2022

Can I 3d print this?

speedgoose · on Dec 20, 2022

> We would like to thank everyone behind ChatGPT for creat-ing a tool that helped provide useful writing feedback.

I wonder how much of the research paper is written by ChatGPT.