• Solely focusing on recording points on object surfaces, the new method allows for complete control over the form and appearance of the 3D models it generates.
• The innovation could soon pave the way for the generation of 3D personal avatars and the reconstruction of 3D models of sites from standard photographs.
Creating 3D models from two dimensional photos has long been a complex computing challenge, but now artificial intelligence is about to make it easy and trouble free. A research team at Simon Fraser University has developed a method to combine sets of photos taken from different angles to generate manipulable and editable 3D objects with accurately represented and transferable textures, and fully controlled exposure values. In an article entitled PAPR: Proximity Attention Point Rendering, the researchers explain how their approach overcomes the current limitations of photogrammetry, a widely used 3D modelling method, which makes use of parallax measurements on images from different camera positions.
The method developed by the research team only records points on object surfaces, relying on the principle that if a sufficient number of points are collected, surface shape can be deduced.
Overcoming the limits of traditional photogrammetry
“Photogrammetry cannot precisely reconstruct object shapes, because it is not based on accurate geometry, nor can it be used to reconstruct textures or an overview of a 3D scene,” points out Ke Li, an assistant professor and director of the APEX lab, which specializes in AI and computer vision at Simon Fraser University (SFU). “Other approaches, such as , can do similar things, but they do not allow for editing of the 3D objects they create, because of the black-box nature of deep learning models.” In short, if you were to photograph a statue using NeRF and then model it in 3D, “you wouldn’t be able to move its head.” Animation in models of this kind is extremely labour intensive, requiring users to describe what happens to each continuous coordinate.
Rendering that is faster and more accurate than LiDAR
The Proximity Attention Point Rendering method developed by the research team only records points on object surfaces, relying on the principle that if a sufficient number of points are collected, surface shape can be deduced. And when an individual point is moved, the surface on which it is located automatically adapts to accommodate the change. “The difference with LiDAR modelling is that LiDAR builds a point cloud from what it sees, with no consideration of what is hidden. With our approach, we can reconstruct a 3D object from all angles. What’s more, it is also faster than LiDAR.” The AI model used by the researchers interpolates the various points to ‘guess’ whether or not there is a surface to model. This approach offers users the advantage of being able to control the object, modify its shape and appearance, and view it from any angle, so that, for example, it can be transformed back into a 2D photo that appears to have been taken from another point of view.
A wide range of professional and recreational uses
There are many possible uses for the new technology. Everyday consumers could use it to create and animate 3D avatars of themselves and others using ordinary smartphone pictures. “If we return to the example of the statue, we could also imagine animating it in different ways, or we could have the trees in the park start to move,” points out Ke Li. In industry, it has the potential to help construction professionals overcome the limitations of LiDAR with 3D models of sites based on photographs. For the time being, computer processing required to generate 3D objects using the new technology can only be done in the cloud, but the research team is planning to improve its AI’s neural networks making it possible to directly carry out operations on a smartphone.
Vidéo : https://zvict.github.io/papr/static/videos/Ignatius-shake.mp4
Sources :
Neural radiance fields (NeRF) use AI algorithms to generate new views of complex 3D scenes based on a partial sets of 2D images. NeRFs are trained to use rendering loss to reproduce input views and to interpolate between them to generate complete scene representations.