Parts and Attributes

Third International Workshop on Parts and Attributes

In Conjunction with the European Conference on Computer Vision (ECCV 2014)

Date: September 12th, 2014

Venue: Zurich, Switzerland

Photos

Overview

The workshop will bring together researchers from the established field of part-based methods and from the field of attribute-based methods, which has recently gained popularity. Participants will learn from each other about recent developments and applications, for example in object recognition, scene classification and image retrieval, and they will have the opportunity to discuss similarities and differences, advantages and disadvantages of both approaches. 

 

Organizers




Rogerio Feris

Christoph Lampert

Devi Parikh

 

Schedule

08:40   Welcome
 
08:45 Invited Talk: Thomas Mensink (University of Amsterdam) - "COSTA: Co-Occurrence Statistics for Zero-Shot Classification"
[Slides]

In this talk I will introduce the first zero-shot classification method for multi-labeled image datasets. Our method, COSTA, exploits co-occurrences of visual concepts in images for knowledge transfer. These inter-dependencies arise naturally between concepts in multi-labelled datasets, and are easy to obtain from existing annotations or web-search hit counts. We estimate a classifier for a new label, as a weighted combination of related classes, using the co-occurrences to define the weight. We also propose a regression model for learning a weight for each label in the training set, which we learn in a leave-one-out setting, which improves significantly the performance. Finally, we also show that our zero-shot classifiers can serve as priors for few-shot learning. Experiments on three multi-labeled datasets reveal that our proposed zero-shot methods, are approaching and occasionally outperforming fully supervised SVMs. We conclude that co-occurrence statistics suffice for zero-shot classification.
09:20 Invited Talk: Ali Farhadi (University of Washington) - "Attributes at Scale"
09:55 Invited Talk: Raquel Urtasun (University of Toronto) - "Understanding Complex Scenes and People that Talk about Them"
 
10:30   Coffee Break
 
11:00 Invited Talk: Gregory Murphy (New York University) - "When Are Categories More Useful Than Attributes? A Perspective From Induction"
[Slides]
 
Categories are useful because they allow us to infer attributes of an object that were not themselves observed during categorization. However, some attributes can be directly inferred from an object's perceptible properties. I discuss two sets of experiments that test whether people make such attribute-to-attribute inductions, and whether they rely more on these or on category-to-attribute inductions.
11:35 Invited Talk: Adriana Kovashka (University of Pittsburgh) - "Interactive Image Search with Attributes"
[Slides]

Search engines have come a long way, but searching for images is still primarily restricted to meta information such as keywords as opposed to the images' visual content. We introduce a new form of interaction for image retrieval, where the user can give rich feedback to the system via semantic visual attributes. The proposed WhittleSearch approach allows users to narrow down the pool of relevant images by comparing the properties of the results to those of the desired target. Building on this idea, we develop a system-guided version of the method which actively engages the user in a 20-questions-like game where the answers are visual comparisons. This enables the system to obtain that information which it most needs to know. To ensure that the system interprets the user's attribute-based queries and feedback as intended, we further show how to efficiently adapt a generic model for an attribute to more closely align with the individual user's perception. Our work transforms the interaction between the image search system and its user from keywords and clicks to precise and natural language-based communication. We demonstrate the dramatic impact of this new search modality for effective retrieval on databases ranging from consumer products to human faces. This is an important step in making the output of vision systems more useful, by allowing users to both express their needs better and better interpret the system's predictions.
 
12:10   Lunch Break
 
14:00 Invited Talk: Niloy Mitra (University College London) - "Abstracting Collections of Objects and Scenes"
[Slides]

3D data continues to grow in the form of collections of models, scenes, scans, or of course as image collections. Such data, when appropriately abstracted and represented, can provide valuable priors for many geometry processing tasks, including editing, synthesis, and form-finding. In this talk, I will discuss the various algorithms we have developed over the last years to co-analyze large 3D data collections and represent them as probability distributions over part-based abstractions. Such an approach focuses more on the global inter and intra semantic relations among the parts of the shape rather than on their local geometric details. Beyond analysis techniques, I will discuss applications in editing, modeling, and fabrication.
 
14:35 Invited Talk: Peter Gehler (Max Planck Institute) - "Fields of Parts"
[Slides]

Part based models are ubiquitous in human pose estimation and object detection. In this talk I will present the Fields of Parts model that offers a different viewpoint on the classical Pictorial Structures model. The Fields of Parts model can be understood as an unrolled mean field inference machine. We train it with a maximum margin estimator using mean-field backpropagation. I will establish the link between the PS model, the Fields of Parts model, and a multilayer neural network with convolutional kernels. I will argue that it offers interesting new modeling flexibility as it paves the way to joint body pose estimation and segmentation.
 
15:10 Poster Session (and Coffee Break)

Check the list of accepted posters
 
16:40 Invited Talk: Shih-Fu Chang (Columbia University) - "Concept-Based Framework for Detecting High-Level Events in Video"

Attributes and parts are intuitive representations for real-world objects and have been shown effective in recent research on object recognition. An analogous framework has been used in the multimedia community using "concepts" for describing high-level complex events such as "birthday party" or "changing a vehicle tire." Concepts involve objects, scenes, actions, activities, and other syntactic elements usually seen in video events. In this talk, I will address several fundamental issues encountered when developing concept-based event framework - how to determine the basic concepts needed by humans when annotating video events; how to use Web mining to automatically discover a large concept pool for event representation; how to handle the weak supervision problem when concept labels are assigned to long video clips without precise timing; and finally how the concept classifier pool can be used to help retrieve novel events that have not been seen before (namely the zero-shot retrieval problem).
 
17:15 Invited Talk: Serge Belongie (Cornell Tech) - "Visipedia Tool Ecosystem"
[Slides]

To support scalable computer vision applications, we have built a suite of tools that allow for efficient collection and annotation of large image datasets. The tools are designed to both reduce data management overhead and foster collaborations between vision researchers and groups seeking the benefits of a computer vision application.
 
17:50   Concluding Remarks

Important Dates

Submission deadline: June 30th July 25th, 2014, 11:59 pm EST
Notification of acceptance: July 10th, July 31st, 2014
Camera ready submission: July 17th, 2014
Workshop date: September 12th, 2014

Submissions

Four-page (excluding references) extended abstracts in ECCV 2014 format. Abstracts describing new, published (e.g. at main ECCV 2014 conference) or ongoing work are welcome. There will be no proceedings.

The extended abstract should be submitted as a single PDF file via email to the workshop organizers.

Contributions from the following domains or closely-related areas are especially welcome:

Deformable and rigid part-based models
Generative and discriminative part-based models
Unsupervised discovery of parts
Context and hierarchy in part-based models
Part-sharing methods for visual recognition
Learning visual attributes across object classes
Attribute-based classification and search
Semantic attributes as object representations
Mid-level representations based on parts/attributes
Transfer learning / Zero-shot learning
Fine-grained visual categorization based on parts and attributes
Innovative applications related to parts and attributes

Accepted submissions will be presented as posters at the workshop.  

Reviewing of abstract submissions will be single-blind i.e. submissions need not be anonymized.

Previous Iterations

In conjunction with ECCV 2012

In conjunction with ECCV 2010