Learning Task Relatedness in Multi-Task Learning for Images in Context
Multimedia applications often require concurrent solutions to multiple tasks. These tasks hold clues to each-others solutions, however as these relations can be complex this remains a rarely utilized property. When task relations are explicitly defined based on domain knowledge multi-task learning (MTL) offers such concurrent solutions, while exploiting relatedness between multiple tasks performed over the same dataset. In most cases however, this relatedness is not explicitly defined and the domain expert knowledge that defines it is not available. To address this issue, we introduce Selective Sharing, a method that learns the inter-task relatedness from secondary latent features while the model trains. Using this insight, we can automatically group tasks and allow them to share knowledge in a mutually beneficial way. We support our method with experiments on 5 datasets in classification, regression, and ranking tasks and compare to strong baselines and state-of-the-art approaches showing a consistent improvement in terms of accuracy and parameter counts. In addition, we perform an activation region analysis showing how Selective Sharing affects the learned representation.
Oral at ICMR 2019
Many Task Learning with Task Routing
Typical multi-task learning (MTL) methods rely on architectural adjustments and a large trainable parameter set to jointly optimize over several tasks. However, when the number of tasks increases so do the complexity of the architectural adjustments and resource requirements. In this paper, we introduce a method which applies a conditional feature-wise transformation over the convolutional activations that enables a model to successfully perform a large number of tasks. To distinguish from regular MTL, we introduce Many Task Learning (MaTL) as a special case of MTL where more than 20 tasks are performed by a single model. Our method dubbed Task Routing (TR) is encapsulated in a layer we call the Task Routing Layer (TRL), which applied in an MaTL scenario successfully fits hundreds of classification tasks in one model. We evaluate our method on 5 datasets against strong baselines and state-of-the-art approaches.
in Arxiv - Computer Vision and Pattern Recognition
OmniArt: A Large-scale Artistic Benchmark
Baselines are the starting point of any quantitative multimedia research, and benchmarks are essential for pushing those baselines further. In this article, we present baselines for the artistic domain with a new benchmark dataset featuring over 2 million images with rich structured metadata dubbed OmniArt. OmniArt contains annotations for dozens of attribute types and features semantic context information through concepts, IconClass labels, color information, and (limited) object-level bounding boxes. For our dataset we establish and present baseline scores on multiple tasks such as artist attribution, creation period estimation, type, style, and school prediction. In addition to our metadata related experiments, we explore the color spaces of art through different types and evaluate a transfer learning object recognition pipeline.
in ACM Transactions on Multimedia Computing, Communications, and applications
Plug-and-Play Interactive Deep Network Visualization
Deep models are at the heart of computer vision research recently. With a significant performance boost over conventional approaches, it is relatively easy to treat them like black boxes and enjoy the benefits they offer. However, if we are to improve and develop them further, understanding their reasoning process is key. Motivated by making the understanding process effortless both for the scientists who develop these models and the professionals using them, in this paper, we present an interactive plug&play web based deep learning visualization system. Our system allows users to upload their trained models and visualize the maximum activations of specific units or create attention/saliency maps over their input. It operates on top of most popular deep learning frameworks and is platform independent due to its web based implementation. We demonstrate the practical aspects of our two main features MaxOut and Reason through visualizations on models trained with artistic paintings from the OmniArt dataset and elaborate on the results
in VAST/VADL proceedings
OmniArt: Multi-task Deep Learning for Artistic Data Analysis
Vast amounts of artistic data is scattered on-line from both museums and art applications. Collecting, processing and studying it with respect to all accompanying attributes is an expensive process. With a motivation to speed up and improve the quality of categorical analysis in the artistic domain, in this paper we propose an efficient and accurate method for multi-task learning with a shared representation applied in the artistic domain. We continue to show how different multi-task configurations of our method behave on artistic data and outperform handcrafted feature approaches as well as convolutional neural networks. In addition to the method and analysis, we propose a challenge like nature to the new aggregated data set with almost half a million samples and structured meta-data to encourage further research and societal engagement.
on ArXiv in MM
Hand Gesture Recognition using Deep Convolutional Neural Networks
Hand gesture recognition is the process of recognizing meaningful expressions of form and motion by a human involving only the hands. There are plenty of applications where hand gesture recognition can be applied for improving control, accessibility, communication and learning. In the work presented in this paper we conducted experiments with different types of convolutional neural networks, including our own proprietary model. The performance of each model was evaluated on the Marcel dataset providing relevant insight as to how different ar-chitectures influence performance. Best results were obtained using the GoogLeNet approach featuring the Inception architecture, followed by our proprietary model and the VGG model.
Multimodal medical image retrieval system
In this paper we depict an implemented system for medical image retrieval. Our system performs retrieval based on both textual and visual content, separately and combined, using advanced encoding and quantization techniques. The text-based retrieval subsystem uses textual data acquired from an image’s corresponding article to generate a suitable representation. Using a vector space model, the generated representations structure is altered to increase performance. Query expansion with pseudo-relevance feedback is applied to fine-tune the results. The content-based retrieval subsystem performs retrieval based on visual features extracted from the images. The images are described using state-of-the-art opponentSIFT visual features. Classification was performed using Support Vector Machines (SVMs). The predictions from the SVMs are used for re-ranking the resulting images based on their modality and the modality of the query. The system was evaluated against the standardized ImageCLEF 2013, 2012 and 2011 medical datasets and it reported state-of-the-art performance for all datasets.
in Springer’s Multimedia Tools and Applications, Volume 76, 2017
Deep Learning and Support Vector Machines for Effective Plant Identification
Our planet is blooming with vegetation that consists of hundreds of thousands of plant species. Each and every one species is unique in its own way, thus enabling people to distinguish one plant from another. Distinguishing plant species is a non trivial task, in fact, it is challenging even for renowned botanists with lots of years of experience in the field. Having in mind the complexity of the task, in this paper we present a system for plant species identification based on Convolutional Neural Networks (CNN’s) and Support Vector Machines (SVM’s). The combination of these two approaches for both feature generation and classification results in a powerful plant identification system. Additionally we report state of the art results using this approach, as well as comparison with other types of approaches on the same dataset.
Content based image retrieval for large medical image corpus
In this paper we address the scalability issue when it comes to Content based image retrieval in large image archives in the medical domain. Throughout the text we focus on explaining how small changes in image representation, using existing technologies leads to impressive improvements when it comes to image indexing, search and retrieval duration. We used a combination of OpponentSIFT descriptors, Gaussian Mixture Models, Fisher kernel and Product quantization that is neatly packaged and ready for web integration. The CBIR feature of the system is demonstrated through a Python based web client with features like region of interest selection and local image upload.
Twitter Sentiment Analysis Using Deep Convolutional Neural Network
In the work presented in this paper, we conduct experiments on sentiment analysis in Twitter messages by using a deep convolutional neural network. The network is trained on top of pre-trained word em-beddings obtained by unsupervised learning on large text corpora. We use CNN with multiple filters with varying window sizes on top of which we add 2 fully connected layers with dropout and a softmax layer. Our research shows the effectiveness of using pre-trained word vectors and the advantage of leveraging Twitter corpora for the unsupervised learning phase. The experimental evaluation is made on benchmark datasets provided on the SemEval 2015 competition for the Sentiment analysis in Twitter task. Despite the fact that the presented approach does not depend on hand-crafted features, we achieve comparable performance to state-of-the-art methods on the Twitter2015 set, measuring F1 score of 64.85%.
Finki at SemEval-2016 Task 4: Deep learning architecture for Twitter sentiment analysis
In this paper, we present a novel deep learning architecture for sentiment analysis in Twitter messages. Our system finki, employs both convolutional and gated recurrent neural networks to obtain a more diverse tweet representation. The network is trained on top of GloVe word embeddings pre-trained on the Common Crawl dataset. Both neural networks are used to obtain a fixed length representation of variable sized tweets, and the concatenation of these vectors is supplied to a fully connected softmax layer with dropout regularization. The system is evaluated on benchmark datasets from the Sentiment Analysis in Twitter task of the SemEval 2016 challenge where our model achieves best and second highest results on the 2-point and 5-point quantification subtasks respectively. Despite not relying on any hand-crafted features, our system manages the second highest average rank on the considered subtasks.
in Proceedings of the SemEval competition
Emotion Identification in FIFA World Cup Tweets using Convolutional Neural Networks
Twitter has gained increasing popularity over the recent years with users generating an enormous amount of data on a variety of topics every day. Many of these posts contain real-time updates and opinions on ongoing sports games. In this paper, we present a convolutional neural network architecture for emotion identification in Twitter messages related to sporting events. The network leverages pre-trained word embeddings obtained by unsupervised learning on large text corpora. Training of the network is performed on automatically annotated tweets with 7 emotions where messages are labeled based on the presence of emotion-related hashtags on which our approach achieves 55.77% accuracy. The model is applied on Twitter messages for emotion identification during sports events on the 2014 FIFA World Cup. We also present the results of our analysis on three games that had significant impact on Twitter users.
Linked Open Data for Medical Institutions and Drug Availability Lists in Macedonia
One of the most active fields of research in the past decade has been data representation, storage and retrieval. With the vast amount of data available on the Web, this field has initiated the development of data management techniques for distributed datasets over the existing infrastructure of the Web. The Linked Data paradigm is one of them, and it aims to provide common practices for publishing and linking data on the Web with the use of Semantic Web technologies. This allows for a transformation of the Web from a web of documents, to a web of data. With this, the Web becomes a distributed network for data access, usable by software agents and machines. The interlinked nature of the distributed datasets provides new use-case scenarios for the end users, scenarios which are unavailable over isolated datasets. In this paper, we are describing the process of generating Linked Open Data from the public data of the Health Insurance Fund along with data from the Associated Pharmacies of Macedonia. With this we generate and publish an interlinked RDF dataset in a machine-readable format. We also provide examples of newly available use-case scenarios which exploit the Linked Data format of the data. These use-cases can be used by applications and services for providing relevant information to the end-users.