Visual Lifelogging and Egocentric Vision

Members: Petia Radeva, Mariella Dimiccoli, Marc Bolaños, Gabriel de Oliveira, Maedeh Aghaei, Estefania Talavera, Alejandro Cartas

Visual lifelogging consists of acquiring images that capture the daily experiences of the user by wearing a camera over a long period of time. The pictures taken offer considerable potential for knowledge mining concerning how people live their lives, hence, they open up new opportunities for many potential applications in fields including healthcare, security, leisure and the quantified self. Egocentric vision or first-person camera vision refers to all Computer Vision and Machine Learning methods to extract semantic information from visual lifelogging data. This field received an exponentially growing interest last few years that can be appreciated from the following statistics of published papers on images acquired with different wearable cameras.

screen-shot-2016-10-28-at-19-12-29

References:

Marc Bolaños, Mariella Dimiccoli, Petia Radeva: Towards Storytelling from Visual Lifelogging: An Overview. CoRR abs/1507.06120 (2015)

Within our visual lifelogging project, we tackle the following egocentric (first-person camera) vision problems:

TEMPORAL SEGMENTATION, SUMMARIZATION AND KEY-FRAME EXTRACTION

screen-shot-2016-10-28-at-17-48-34 While wearable cameras are becoming increasingly popular, locating relevant information in large unstructured collections of egocentric images is still a tedious and time consuming process. This work addresses the problem of organizing egocentric photo streams acquired by a wearable camera into semantically meaningful segments, hence making an important step towards the goal of automatically annotating these photos for browsing and retrieval. In the proposed method, first, contextual and semantic information is extracted for each image by employing a Convolutional Neural Networks approach. Later, a vocabulary of concepts is defined in a semantic space by relying on linguistic information. Finally, by exploiting the temporal coherence of concepts in photo streams, images which share contextual and semantic attributes are grouped together. The resulting temporal segmentation is particularly suited for further analysis, ranging from event recognition to semantic indexing and summarization. Experimental results over egocentric set of nearly 31,000 images, show the prominence of the proposed approach over state-of-the-art methods.
Building a visual summary from an egocentric photostream captured by a lifelogging wearable camera is of high interest for different applications (e.g. memory reinforcement). In this work, we propose a new summarization method based on keyframes selection that uses visual features extracted by means of a convolutional neural network. Our method applies an unsupervised clustering for dividing the photostreams into events, and finally extracts the most relevant keyframe for each event. We assess the results by applying a blind-taste test on a group of 20 people who assessed the quality of the summaries.

References:

M. Dimiccoli, M. Bolaños, E. Tavalera, M. Aghaei, S. Nikolov, P. Radeva. Semantic Regularized Clustering for Egocentric Photo Streams Segmentation. To appear in Computer Vision and Image Understanding, 2016.
M. Dimiccoli, H. Xu, P. Radeva A cognitive-based model for event learning. Women in Machine Learning Workshop (WIML), in conjunction with the International Conference on Neural Information Processing Systems (NIPS), December 2016, Barcelona, Spain.
Estefanía Talavera, Mariella Dimiccoli, Marc Bolaños, Maedeh Aghaei, Petia Radeva: R-Clustering for Egocentric Video Segmentation. IbPRIA 2015: 327-336
Marc Bolaños, Ricard Mestre, Estefanía Talavera, Xavier Giró i Nieto, Petia Radeva: Visual summary of egocentric photostreams by representative keyframes. ICME Workshops 2015: 1-6
Aniol Lidon, Marc Bolaños, Mariella Dimiccoli, Petia Radeva, Maite Garolera, Xavier Giró i Nieto: Semantic Summarization of Egocentric Photo Stream Events. CoRR abs/1511.00438 (2015)
Marc Bolaños, Maite Garolera, Petia Radeva: Video Segmentation of Life-Logging Videos. AMDO 2014: 1-9

EGOCENTRIC SOCIAL INTERACTION ANALYSIS

screen-shot-2016-10-28-at-19-37-49

Given a user wearing a low frame rate wearable camera during a day, this work aims to automatically detect the moments when the user gets engaged into a social interaction solely by reviewing the automatically captured photos by the worn camera. The proposed method, inspired by the sociological concept of F-formation, exploits distance and orientation of the appearing individuals -with respect to the user- in the scene from a bird-view perspective. As a result, the interaction pattern over the sequence can be understood as a two-dimensional time series that corresponds to the temporal evolution of the distance and orientation features over time. A Long-Short Term Memory-based Recurrent Neural Network is then trained to classify each time series. Experimental evaluation over a dataset of 30.000 images has shown promising results on the proposed method for social interaction detection in egocentric photo-streams.

References:

M. Aeghai, M. Dimiccoli, P. Radeva. Extended Bag-of-Tracklets for Multi-Face Tracking in Egocentric Photo Streams. Computer Vision and Image Understanding, Volume 149, 146-156, 2016. Special Issue on Assistive Computer Vision and Robotics, Elsevier, 2016
M. Aghaei, M. Dimiccoli, P. Radeva With Whom Do I Interact? Detecting Social Interactions in Egocentric Photo Streams. To be presented at the International Conference on Pattern Recognition (ICPR), December 2016, Cancun, Mexic.
M. Aghaei, M. Dimiccoli, P. Radeva Towards Social Interaction Detection in Egocentric Photo Streams. Workshop on Egocentric (first-person) Vision, in conjunction with the Computer Vision and Pattern Recognition Conference (CVPR), June 2016, Las Vegas, USA.
M. Aghaei, M. Dimiccoli, P. Radeva Towards Social Interaction Detection in Egocentric Photo Streams. Proceeding of International Conference on Machine Vision (ICMV), November 2015, Barcelona, Spain.

EGO-OBJECT DISCOVERY

Lifelogging devices are spreading faster everyday. This growth can represent great benefits to develop methods for extraction of meaningful information about the user wearing the device and his/her environment. In this work, we propose a semi-supervised strategy for easily discovering objects relevant to the person wearing a first-person camera. Given an egocentric video/images sequence acquired by the camera, our algorithm uses both the appearance extracted by means of a convolutional neural network and an object refill methodology that allows to discover objects even in case of small amount of object appearance in the collection of images. An SVM filtering strategy is applied to deal with the great part of the False Positive object candidates found by most of the state of the art object detectors. We validate our method on a new egocentric dataset of 4912 daily images acquired by 4 persons as well as on both PASCAL 2012 and MSRC datasets. We obtain for all of them results that largely outperform the state of the art approach. We make public both the EDUB dataset1 and the algorithm code.

References:

Marc Bolaños, Petia Radeva: Ego-object discovery. CoRR abs/1504.01639 (2015)
Marc Bolaños, Maite Garolera, Petia Radeva: Object Discovery Using CNN Features in Egocentric Videos. IbPRIA 2015: 67-74

Computer Vision and Machine Learning at the University of Barcelona (CVMLUB)

Consolidated research group (2017SGR1742), Regional Government of Catalonia, Spain

Computer Vision and Machine Learning at the University of Barcelona (CVMLUB)

Consolidated research group (2017SGR1742), Regional Government of Catalonia, Spain

Visual Lifelogging and Egocentric Vision

TEMPORAL SEGMENTATION, SUMMARIZATION AND KEY-FRAME EXTRACTION

EGOCENTRIC SOCIAL INTERACTION ANALYSIS

EGO-OBJECT DISCOVERY

Contact:

PhD Opportunity

Postdoc on Deep learning and Computer Vision

Rememory on the press: Una fotografia cada trenta segons per ajudar a fixar els records

Rememory on the press: Una fotografia cada trenta segons per ajudar a fixar els records

International Workshop on Social Signal Processing and Beyond, ICIAP’2017

CFP: LTA2017 – Second International Workshop on Lifelogging Tools and Applications – a workshop at ACM MM 2017

Seminary of Adriana Romero and Michal Drozdal: «Towards AI personalized medicine»

Open postdoc position

Petia Radeva – invited speaker to the workshop “Humanitarian and social science: from the university to the enterprise”

Best paper award at CIAPR’2016 to our paper: «Deep Learning Features for Wireless Capsule Endoscopy Analysis», by Santi Segui, Michal Drozdzal, Guillem Pascual, Carolina Malagelada, Fernando Azpiroz, Petia Radeva and Jordi Vitrià

Petia Radeva received the International CIARP Award «Aurora Pons Porrata»

Mention prize (II place) for our Application on Automatic Food Recognition in the DKV competence Health4Good!

The journal Medical Physics chose figures of our work on Stent analysis to use as a cover on their journal. http://www.medphys.org

Petia Radeva gave the plenary talk at CCIA’2016

GRADIANT award to Beatriz Remeseiro for the best PhD thesis applied to the ICT sector 2016.

3 abstracts accepted at the NIPS Workshop WiML’16.

Dr. Giovanni Maria Farinella is visiting us on 17 of November, 2016

Offers of 4 grants for Master Students

Grants_Proposal_2016_SGR

José M. Álvarez’s seminary

Michal Drozdal received the award «Pioner 2015» for his PhD thesis “Sequential image analysis for computer-aided wireless endoscopi”, by the Institució CERCA.

Re-Memory presented at the CCCB exposition «Human+»

At NVIDIA’s #GTC15, our endoluminal image analysis work was presented during the keynote.