Recovering the 3-dimensional (3D) structure of a scene from 2-dimensional (2D) images is a fundamental problem in computer vision. This technology has many applications in computer graphics, entertainment, robotics, transportation, manufacturing, security, etc. One application is 3D mapping. For example, Google Earth and Microsoft Bing Maps provide a 3D virtual replica of many of the Earth's cities. However, these 3D models are low-detail and lack ground-level realism. Google Street View and Bing Street Side provide high-resolution panoramas captured from the streets of many cities, but these stills cannot provide free navigation through the virtual world. In this dissertation, I will show how to automatically and efficiently create detailed 3D models of urban environments from streetlevel imagery.
A major goal of this dissertation is to model large urban areas, even entire cities, which is an enormous challenge due to the sheer scale of the problem. Even a partial data capture of the town of Chapel Hill requires millions of frames of street-level video. The methods presented in this dissertation are highly parallel and use little memory, and can therefore utilize modern graphics hardware (GPU) technology to process video at the recording frame rate. Also, the structure in urban scenes such as planarity, orthogonality, verticality, and texture regularity can be exploited to achieve 3D reconstructions with greater efficiency, higher quality, and lower complexity.
By examining the structure of an urban scene, a multiple-direction plane-sweep stereo method is performed on the GPU in real-time. An analysis of stereo precision leads to a view selection strategy that guarantees constant depth resolution and improves bounds on time complexity. Depth measurements are further improved by segmenting the scene into piecewise-planar and non-planar regions, a process which is aided by learned planar surface appearance. Finally, depth measurements are fused and the final 3D surface is recovered using a multi-layer heightmap model that produces clean, complete, and compact 3D reconstructions. The effectiveness of these methods is demonstrated by results from thousands of frames of video from a variety of urban scenes.
|Commitee:||Frahm, Jan-Michael, Lazebnik, Svetlana, Tomasi, Carlo, Welch, Greg|
|School:||The University of North Carolina at Chapel Hill|
|School Location:||United States -- North Carolina|
|Source:||DAI-B 72/08, Dissertation Abstracts International|
|Keywords:||Computer vision, Graphics, Large-scale urban environments, Reconstruction, Stereo, Street-level video|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be