A camera is a device for compressing rich information about the visual appearance of the three-dimensional world into a two-dimensional image. This process is inherently lossy: given an image, we can make educated guesses about the world’s shape and appearance, but there is not enough information for our guesses to be certain.
However, if we take several pictures from different viewpoints we can reason more confidently. This thesis focuses on shape: How can we determine the geometry of the world from a collection of photos? This problem is classically called Structure from Motion. We find the structure (shape of the world) from camera motion (different viewpoints).
Internet photo collections are an especially interesting source of data. With simple searches we can collect the raw information to reconstruct 3D models of famous world landmarks or entire cities. However, the photos we download are disorganized and noisy, and will not have been collected with 3D reconstruction in mind.
While some impressive demonstrations of Structure from Motion systems exist, the next generation of solvers will need to be far more robust to the many types of difficulties encountered in the wild. To this end, many recent solvers pose the problem in a new way, using relative relationships between images to infer first the orientations, and then the positions of every camera in a scene. This framework promises faster runtime and greater robustness. I contribute a theoretical analysis of the difficulty of finding camera orientations, giving a way to decide which problems are tractable and which ones might be too hard and should be reformulated. I also propose a new solver with an accompanying outlier filter for finding camera positions.
However, some of the hardest scenes are those which contain ambiguous structures: objects that look the same but are not. These induce self-consistent errors which are often too confusing for solvers to resolve correctly. I describe a scalable system which uses a graph-topological cue from a visibility graph to detect and remove these sources of error.
Together these improvements work towards a robust solution to the Structure from Motion problem, so that we can reliably build 3D models of the world, even from noisy and confusing internet photo collections.
|School Location:||United States -- New York|
|Source:||DAI-B 78/04(E), Dissertation Abstracts International|
|Subjects:||Applied Mathematics, Computer science|
|Keywords:||3d reconstruction, Computer vision, Structure from motion|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be