For the past few years, Convolutional Neural Networks have had tremendous impact not only within the field of Computer Science but by 2018 their rapid proliferation in free publically available tools has not only thoroughly permeating all aspects of data science, signal processing, and cybersecurity but had an impact on most human endeavors including entertainment, finance, transportation, agriculture, and medicine to name a few. In this dissertation I utilize CNNs specifically to achieve better classification of zooplankton in scientific images, but I also use the zooplankton images to better understand CNNs, as a benchmark to quantify the performance benefit CNNs provide over the previous state of the art, and as raw material to inspire my own contribution to the growing body of knowledge regarding CNNs. The performance benefit I achieve through utilizing contextual metadata with pixel images may no longer be novel, but it provides a concrete benefit, leverages the past body of work, and is likely applicable to numerous other application areas.
Before tackling any specifics, I provide a comprehensive overview of historical and contemporary feature extraction techniques that are particularly applicable to biological object classification in images. This includes a literature review and categorizes previous feature extraction techniques into three different categories: statistical analysis methods, topology based methods, and point/patch correspondence methods.
We then baselined existing performance with non-CNN approaches, and considerations as to how they could be made more efficient. It also quantified the number of expertly labeled images required, and which algorithms worked best (in our case SVM and Gradient Boosted Random Forest, also Multi-layer perceptron). Some minor points were investigating whether or not abstaining provided any meaningful gain, and whether or not size fractionation improved performance (no, but significantly increased speed at a minimal performance cost), and whether ensembling two diverse approaches resulted in better performance than either individually (maybe, but only a small amount)
Next, we use convolutional neural networks (CNNs) on various types of images, including plankton, looking for clues useful when training models from scratch. This publication identifies a correlation between the statistical distribution of the weights of filters in a fully trained network with the overall accuracy of that network. The implication is that given multiple instances of trained networks, it may be possible to predict future performance.
Detouring from machine learning experimentation, I cover some additional image processing work I completed in order to more cleanly segment objects in images obtained by our Zooglider.
Finally, we combine all of these contributions to improve plankton image classification: we utilize CNNs on segmented image tiles of plankton images. We also tested the hypothesis that the previously used geometric features, as well as Geotemporal and hydgrographic metadata would improve performance beyond using pixel data by itself. Evidence strongly supports the intuition that context does help identify the image contents. We evaluated both trivial and novel strategies to incorporate this data into the CNN framework, and found one of our novel strategies significantly outperformed all others.
|Advisor:||Elkan, Charles, Ohman, Mark D.|
|Commitee:||Saul, Lawrence K., Tu, Zhuowen, de Sa, Virginia R.|
|School:||University of California, San Diego|
|Department:||Computer Science and Engineering|
|School Location:||United States -- California|
|Source:||DAI-B 80/05(E), Dissertation Abstracts International|
|Subjects:||Biological oceanography, Artificial intelligence|
|Keywords:||Convolutional neural networks, Image processing, Machine learning, Zooplankton|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be