Nvidia’s novel 3D printing tech can produce 3D models from texts

“We can democratize 3D synthesis and open up everyone’s creativity in 3D content creation.”

Nergis Firtina
Nvidia’s novel 3D printing tech can produce 3D models from texts
A blue poison-dart frog sitting on a water lily

Nvidia  

Nvidia, a US-based GPU manufacturer, recently unveiled AI technology called “Magic3D” that could generate 3D models from a text prompt.

A brand-new text-to-3D content generation tool called Magic3D produces 3D mesh models with unmatched quality. They provide customers with new ways to control 3D synthesis, image conditioning techniques, and prompt-based editing methods, opening up new vistas for numerous creative applications, explained the Magic3D developers.

In their paper published on November 18, 2022, the researchers described how this technique would make it possible for anybody to produce 3D models without requiring specialized expertise.

“Once refined, the resulting technology could speed up video game (and VR) development and perhaps eventually find applications in special effects for film and TV. We hope with Magic3D. We can democratize 3D synthesis and open up everyone’s creativity in 3D content creation,” they said.

What jobs can Magic3D handle?

Like DreamFusion, which uses a text-to-image model to create a 2D image that is then optimized into volumetric NeRF (Neural radiance field) data, Magic3D uses a two-stage process that takes a crude model made in low resolution and optimizes it to a higher resolution. According to the paper’s authors, the Magic3D method can create 3D objects twice as quickly as DreamFusion.

They use a two-stage coarse-to-fine optimization system to produce text-to-3D material that is both quick and high-quality. Before making a coarse model, they use a low-resolution diffusion in the first step.

They then accelerate using a hash grid and sparse acceleration structure. In the second stage, they employ a textured mesh model that is initialized from the coarse neural representation to enable optimization using a high-resolution latent diffusion model in conjunction with an effective differentiable renderer.

Study abstract:

DreamFusion has recently demonstrated the utility of a pre-trained text-to-image diffusion model to optimize Neural Radiance Fields (NeRF), achieving remarkable text-to-3D synthesis results. However, the method has two inherent limitations: (a) extremely slow optimization of NeRF and (b) low-resolution image space supervision on NeRF, leading to low-quality 3D models with a long processing time. In this paper, we address these limitations by utilizing a two-stage optimization framework. First, we obtain a coarse model using a low-resolution diffusion prior and accelerate with a sparse 3D hash grid structure. Using the coarse representation as the initialization, we further optimize a textured 3D mesh model with an efficient differentiable renderer interacting with a high-resolution latent diffusion model. Our method, dubbed Magic3D, can create high quality 3D mesh models in 40 minutes, which is 2x faster than DreamFusion (reportedly taking 1.5 hours on average), while also achieving higher resolution. User studies show 61.7% raters to prefer our approach over DreamFusion. Together with the image-conditioned generation capabilities, we provide users with new ways to control 3D synthesis, opening up new avenues to various creative applications.