My library button
  • No image available

  • No image available

    The goal of object categorization is to locate and identify instances of an object category within an image. Recognizing an object in an image is difficult when images present occlusion, poor quality, noise or background clutter, and this task becomes even more challenging when many objects are present in the same scene. Several models for object categorization use appearance and context information from objects to improve recognition accuracy. Appearance information, based on visual cues, can successfully identify object classes up to a certain extent. Context information, based on the interaction among objects in the scene or on global scene statistics, can help successfully disambiguate appearance inputs in recognition tasks. In this work we review different approaches of using contextual information in the field of object categorization and discuss scalability, optimizations and possible future approaches.

  • No image available

    The goal of object recognition is to locate and identify instances of an object within an image. Examples of this task include recognition of faces, logos, scenes and landmarks. The use of this technology can be advantageous in guiding a blind user to recognize objects in real time and augmenting the ability of search engines to permit searches based on image content. Traditional approaches to object recognition use appearance features - e.g., color, edge responses, texture and shape cues - as the only source of information for recognizing objects in images. These features are often unable to fully capture variability in object classes, since objects may vary in scale, position, and viewpoint when presented in real world scenes. Moreover, they may introduce noisy signals when objects are occluded and surrounded by other objects in the scene, and obscured by poor image quality. As appearance features are insufficient to accurately discriminate objects in images, an object's identity can be disambiguated by modeling features obtained from other object properties, such as the surroundings and the composition of objects in real world scenes. Context, obtained from the object's nearby image data, image annotations and the presence and location of other objects, can help to disambiguate appearance inputs in recognition tasks. Recent context-based models have successfully improved recognition performance, however there exist several unanswered questions with respect to modeling contextual interactions at different levels of detail, integrating multiple contextual cues efficiently into a unified model and understanding the explicit contributions of contextual relationships. Motivated by these issues, this dissertation proposes novel approaches for investigating new types of contextual features and integrating this knowledge into appearance based object recognition models. We analyze the contributions and trade-offs of integrating context and investigate contextual interactions between pixels, regions and objects in the scene. Furthermore, we study context as (i) part of recognizing objects in images and (ii) as an advocate for label agreement to disambiguate object identity in recognition systems. Finally, we harness these discoveries to address other challenges in object recognition, such as discovering object categories in weakly labeled data.

  • No image available

    The problem of using pictures of objects captured under ideal imaging conditions (here referred to as in vitro) to recognize objects in natural environments (in situ) is an emerging area of interest in computer vision and pattern recognition. Examples of tasks in this vein include assistive vision systems for the blind and object recognition for mobile robots; the proliferation of image databases on the web is bound to lead to more examples in the near future. Despite its importance, there is still a need for a freely available database to facilitate study of this kind of training/testing dichotomy. In this work one of our contributions is a new multimedia database of 120 grocery products, GroZi-120. For every product, two different recordings are available: in vitro images extracted from online grocery websites, and in situ images extracted from camcorder video collected inside a grocery store. As an additional contribution, we present the results of applying three commonly used object recognition/ detection algorithms (color histogram matching, SIFT matching, and boosted Haar-like features) to the dataset. Finally, we analyze the successes and failures of these algorithms against product type and imaging conditions, both in terms of recognition rate and localization accuracy, in order to suggest ways forward for further research in this domain.

  • No image available

  • No image available