Understanding Levels of Object Recognition in Computational Models

Slide Note
Embed
Share

Explore the levels of object recognition in computational models, from single-object recognition to recognizing local configurations. Discover how minimizing variability aids in interpreting complex scenes and the challenges faced by deep neural networks in achieving human-level recognition on minimal images.


Uploaded on Sep 13, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Above and below the object level

  2. Object level: Single object recognition (ImageNet)

  3. Below the object level: recognizing local configurations

  4. Minimizing variability Useful for the interpretation of complex scenes

  5. Searching for Minimal Images Over 15,000 Mechanical Turk subjects Atoms of Recognition PNAS 2016

  6. Pairs Parent MIRC, Child sub-MIRC

  7. 0.79 0.0

  8. 0.14 0.88

  9. 0.88 0.16

  10. Cover Average 16.9 / class Highly redundant Each MIRC is non-redundant: each feature is important

  11. Testing computational models

  12. DNN do not reach human level recognition on minimal images Recognition of minimal images does not emerge by training any of the existing models tested. The large gap at the minimal level is not reproduced The accuracy of recognition is lower than human s Representations used by existing models do not capture differences that human recognition is sensitive to

  13. Minimal Images: Internal Interpretation Humans can interpret detailed sub-structures within the atomic MIRC. Cannot be done by current feed-forward models

  14. Internal Interpretation Internal interpretations, perceived by humans Cannot be produced by existing feed-forward models

  15. Current recognition models are feed-forward only This is likely to limit their ability to providing interpretation of find details A model that can produce interpretation of MIRCs The model uses a top-down stage (back to V1) Ben-Yossef et al, CogSci 2015 A model for full local image interpretation

  16. Above the object level

  17. Image Captioning

  18. Automatic caption: A brown horse standing next to a building. 2017

  19. Automatic caption: a man is standing in front woman in white shirt.

  20. Human description: Two women sitting at restaurant table while a man in a black shirt takes one of their purses off the chair, while they are not looking

  21. Components: Man, Woman, Purse, Chair Properties: Young, dark-hair, red, Relations: Man grabs purse, Purse hanging of chair, Woman sitting on chair

  22. Hanging-of Purse Chair Red Grabbing Sitting Woman Dark Man Not-looking This is what vision produces, and we identify the event as stealing

  23. Stealing Stealing Object X Grabbing Owner Not-looking Person A Person B Abstract and cognitive (shared with non-sighted), broad generalization

  24. Hanging-of Purse Chair Red Owns Grabbing Sitting Woman Dark Man Not-looking The ownership is added to the structure, This cognitive addition contributes to stealing

Related