Members: Petia Radeva, Mariella Dimiccoli, Marc Bolaños, Gabriel de Oliveira, Maedeh Aghaei, Estefania Talavera, Alejandro Cartas
Visual lifelogging consists of acquiring images that capture the daily experiences of the user by wearing a camera over a long period of time. The pictures taken offer considerable potential for knowledge mining concerning how people live their lives, hence, they open up new opportunities for many potential applications in fields including healthcare, security, leisure and the quantified self. Egocentric vision or first-person camera vision refers to all Computer Vision and Machine Learning methods to extract semantic information from visual lifelogging data. This field received an exponentially growing interest last few years that can be appreciated from the following statistics of published papers on images acquired with different wearable cameras.
References:
- Marc Bolaños, Mariella Dimiccoli, Petia Radeva: Towards Storytelling from Visual Lifelogging: An Overview. CoRR abs/1507.06120 (2015)
Within our visual lifelogging project, we tackle the following egocentric (first-person camera) vision problems:
TEMPORAL SEGMENTATION, SUMMARIZATION AND KEY-FRAME EXTRACTION
While wearable cameras are becoming increasingly popular, locating relevant information in large unstructured collections of egocentric images is still a tedious and time consuming process. This work addresses the problem of organizing egocentric photo streams acquired by a wearable camera into semantically meaningful segments, hence making an important step towards the goal of automatically annotating these photos for browsing and retrieval. In the proposed method, first, contextual and semantic information is extracted for each image by employing a Convolutional Neural Networks approach. Later, a vocabulary of concepts is defined in a semantic space by relying on linguistic information. Finally, by exploiting the temporal coherence of concepts in photo streams, images which share contextual and semantic attributes are grouped together. The resulting temporal segmentation is particularly suited for further analysis, ranging from event recognition to semantic indexing and summarization. Experimental results over egocentric set of nearly 31,000 images, show the prominence of the proposed approach over state-of-the-art methods.
Building a visual summary from an egocentric photostream captured by a lifelogging wearable camera is of high interest for different applications (e.g. memory reinforcement). In this work, we propose a new summarization method based on keyframes selection that uses visual features extracted by means of a convolutional neural network. Our method applies an unsupervised clustering for dividing the photostreams into events, and finally extracts the most relevant keyframe for each event. We assess the results by applying a blind-taste test on a group of 20 people who assessed the quality of the summaries.
References:
- M. Dimiccoli, M. Bolaños, E. Tavalera, M. Aghaei, S. Nikolov, P. Radeva. Semantic Regularized Clustering for Egocentric Photo Streams Segmentation. To appear in Computer Vision and Image Understanding, 2016.
- M. Dimiccoli, H. Xu, P. Radeva A cognitive-based model for event learning. Women in Machine Learning Workshop (WIML), in conjunction with the International Conference on Neural Information Processing Systems (NIPS), December 2016, Barcelona, Spain.
- Estefanía Talavera, Mariella Dimiccoli, Marc Bolaños, Maedeh Aghaei, Petia Radeva: R-Clustering for Egocentric Video Segmentation. IbPRIA 2015: 327-336
- Marc Bolaños, Ricard Mestre, Estefanía Talavera, Xavier Giró i Nieto, Petia Radeva: Visual summary of egocentric photostreams by representative keyframes. ICME Workshops 2015: 1-6
- Aniol Lidon, Marc Bolaños, Mariella Dimiccoli, Petia Radeva, Maite Garolera, Xavier Giró i Nieto: Semantic Summarization of Egocentric Photo Stream Events. CoRR abs/1511.00438 (2015)
- Marc Bolaños, Maite Garolera, Petia Radeva: Video Segmentation of Life-Logging Videos. AMDO 2014: 1-9
EGOCENTRIC SOCIAL INTERACTION ANALYSIS
Given a user wearing a low frame rate wearable camera during a day, this work aims to automatically detect the moments when the user gets engaged into a social interaction solely by reviewing the automatically captured photos by the worn camera. The proposed method, inspired by the sociological concept of F-formation, exploits distance and orientation of the appearing individuals -with respect to the user- in the scene from a bird-view perspective. As a result, the interaction pattern over the sequence can be understood as a two-dimensional time series that corresponds to the temporal evolution of the distance and orientation features over time. A Long-Short Term Memory-based Recurrent Neural Network is then trained to classify each time series. Experimental evaluation over a dataset of 30.000 images has shown promising results on the proposed method for social interaction detection in egocentric photo-streams.
References:
- M. Aeghai, M. Dimiccoli, P. Radeva. Extended Bag-of-Tracklets for Multi-Face Tracking in Egocentric Photo Streams. Computer Vision and Image Understanding, Volume 149, 146-156, 2016. Special Issue on Assistive Computer Vision and Robotics, Elsevier, 2016
- M. Aghaei, M. Dimiccoli, P. Radeva With Whom Do I Interact? Detecting Social Interactions in Egocentric Photo Streams. To be presented at the International Conference on Pattern Recognition (ICPR), December 2016, Cancun, Mexic.
- M. Aghaei, M. Dimiccoli, P. Radeva Towards Social Interaction Detection in Egocentric Photo Streams. Workshop on Egocentric (first-person) Vision, in conjunction with the Computer Vision and Pattern Recognition Conference (CVPR), June 2016, Las Vegas, USA.
- M. Aghaei, M. Dimiccoli, P. Radeva Towards Social Interaction Detection in Egocentric Photo Streams. Proceeding of International Conference on Machine Vision (ICMV), November 2015, Barcelona, Spain.
EGO-OBJECT DISCOVERY
Lifelogging devices are spreading faster everyday. This growth can represent great benefits to develop methods for extraction of meaningful information about the user wearing the device and his/her environment. In this work, we propose a semi-supervised strategy for easily discovering objects relevant to the person wearing a first-person camera. Given an egocentric video/images sequence acquired by the camera, our algorithm uses both the appearance extracted by means of a convolutional neural network and an object refill methodology that allows to discover objects even in case of small amount of object appearance in the collection of images. An SVM filtering strategy is applied to deal with the great part of the False Positive object candidates found by most of the state of the art object detectors. We validate our method on a new egocentric dataset of 4912 daily images acquired by 4 persons as well as on both PASCAL 2012 and MSRC datasets. We obtain for all of them results that largely outperform the state of the art approach. We make public both the EDUB dataset1 and the algorithm code.
References:
- Marc Bolaños, Petia Radeva: Ego-object discovery. CoRR abs/1504.01639 (2015)
- Marc Bolaños, Maite Garolera, Petia Radeva: Object Discovery Using CNN Features in Egocentric Videos. IbPRIA 2015: 67-74