Use of Context for Object Recognition in Convolutional Neural Networks
Contextual associations are known to aid object recognition in human vision, yet the role of context in artificial vision is not known. This thesis explores whether context is implicitly captured by CNNs and two engineering approaches to incorporate context to improve object classification accuracy. The present work presents novel evidence that context is incidentally captured during training of CNNs for object recognition, and this information if retained when viewing objects with no background context present. Contextual information is found to be present in all nine CNNs, when trained on both object and scene recognition tasks. Shallower networks VGG19, VGG16, and AlexNet pretrained on ImageNet capture more contextual information than the six deeper CNNs. The present work further introduces a promising new framework to explicitly incorporate context and known object-scene relations to aid in object recognition. Unfortunately, the techniques used to engineer contextual information captured by ResNet50 pretrained on COCO to FCN-ResNet101 have not produced improvements in classification accuracy. However, the new framework lays a valuable foundation for further exploration in the use of context to improve object recognition. Incorporating context accuracy obtained by ResNet50 pretrained on Places365 performed better than incorporating context without accuracy.
Roginek, Eric William, "Use of Context for Object Recognition in Convolutional Neural Networks" (2022). ETD Collection for Fordham University. AAI29168496.