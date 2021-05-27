Cancel
CreatorsPublishersAdvertisers
View more in
Technology

Maria: A Visual Experience Powered Conversational Agent

By Zujie Liang, Huang Hu, Can Xu, Chongyang Tao, Xiubo Geng, Yining Chen, Fan Liang, Daxin Jiang
arxiv.org
 22 days ago

Arguably, the visual perception of conversational agents to the physical world is a key way for them to exhibit the human-like intelligence. Image-grounded conversation is thus proposed to address this challenge. Existing works focus on exploring the multimodal dialog models that ground the conversation on a given image. In this paper, we take a step further to study image-grounded conversation under a fully open-ended setting where no paired dialog and image are assumed available. Specifically, we present Maria, a neural conversation agent powered by the visual world experiences which are retrieved from a large-scale image index. Maria consists of three flexible components, i.e., text-to-image retriever, visual concept detector and visual-knowledge-grounded response generator. The retriever aims to retrieve a correlated image to the dialog from an image index, while the visual concept detector extracts rich visual knowledge from the image. Then, the response generator is grounded on the extracted visual knowledge and dialog context to generate the target response. Extensive experiments demonstrate Maria outperforms previous state-of-the-art methods on automatic metrics and human evaluation, and can generate informative responses that have some visual commonsense of the physical world.

arxiv.org
IN THIS ARTICLE
#Visual Perception#A Step Further#Arxiv#Computation#Acl
YOU MAY ALSO LIKE
News Break
Artificial Intelligence
News Break
Technology
News Break
Arts
Related
Technologydatasciencecentral.com

Why do Customer Experience and Conversational AI go hand in hand?

Consumer behavior has witnessed a whirlwind of changes in the past decade. Technological developments have enabled all industries to embrace digital transformation at scale. According to IDC, by 2023, digital transformation investment is expected to approach $7 trillion as companies build on existing strategies and investments, becoming digital-at-scale future enterprises.
BusinessPosted by
TheStreet

Pioneer And Cerence Enter Strategic Partnership To Develop Conversational AI Infotainment Solutions That Transform Mobility Experiences

TOKYO and BURLINGTON, Mass., June 08, 2021 (GLOBE NEWSWIRE) -- Pioneer Corporation and Cerence Inc. (CRNC) - Get Report today announced that the companies have reached an agreement for a strategic partnership to develop scalable, secure, AI-powered products and services for the mobility world. The intent of the agreement is to combine the leading-edge technologies of Pioneer, which provides in-car equipment and mobility services, and Cerence, which provides conversational AI to mobility OEMs and automakers, to utilize the companies' respective knowledge and technologies to accelerate innovation and develop products and services that enhance mobility experiences for drivers and passengers globally.
TheStreet

Cerence And SiriusXM Enter Into Agreement To Enhance Voice-Powered Entertainment Experience For Automakers And Their Drivers

NEW YORK and BURLINGTON, Mass., June 10, 2021 (GLOBE NEWSWIRE) -- Cerence Inc. (CRNC) - Get Report, AI for a world in motion, and Sirius XM Holdings Inc. (SIRI) - Get Report, the leading audio entertainment company in North America, today announced that they are teaming up to bring Cerence's conversational AI technology together with SiriusXM as part of a packaged offering for automakers.
Economyvmblog.com

Bynder Expands Partner Ecosystem to Power Content Engine for Digital Experiences

Bynder expands its digital experience capabilities with a global network of leading agencies, digital consultants, and DAM experts committed to helping shared clients achieve a fully integrated and seamless digital experience across channels. Most recently, Bynder launched the SAP Commerce Cloud Connector to enable brands to deliver relevant and timely...
SoftwareInfoworld

Power Platform becomes the new Visual Basic

Low-code development tools like Microsoft’s Power Platform are more than a way for users to build the apps they need when they need them. They’re a way to rapidly building code that’s needed urgently. You only have to look at the application my local general practitioner partnership, along with others in this part of London, have been using to manage appointments for COVID-19 vaccinations.
Jobsshillingtoneducation.com

Verbal+Visual

Verbal+Visual is seeking a Visual Designer. This is the entry-level position in our design vertical and ultimately comes down to a level of experience at intake. What’s important here is ambition and amelioration, therefore candidates for this position may join the team as anything from entry to mid-level designers. Role...
Computersarxiv.org

Discrete Auto-regressive Variational Attention Models for Text Modeling

Variational autoencoders (VAEs) have been widely applied for text modeling. In practice, however, they are troubled by two challenges: information underrepresentation and posterior collapse. The former arises as only the last hidden state of LSTM encoder is transformed into the latent space, which is generally insufficient to summarize the data. The latter is a long-standing problem during the training of VAEs as the optimization is trapped to a disastrous local optimum. In this paper, we propose Discrete Auto-regressive Variational Attention Model (DAVAM) to address the challenges. Specifically, we introduce an auto-regressive variational attention approach to enrich the latent space by effectively capturing the semantic dependency from the input. We further design discrete latent space for the variational attention and mathematically show that our model is free from posterior collapse. Extensive experiments on language modeling tasks demonstrate the superiority of DAVAM against several VAE counterparts.
Coding & Programmingarxiv.org

Developing a Fidelity Evaluation Approach for Interpretable Machine Learning

Although modern machine learning and deep learning methods allow for complex and in-depth data analytics, the predictive models generated by these methods are often highly complex, and lack transparency. Explainable AI (XAI) methods are used to improve the interpretability of these complex models, and in doing so improve transparency. However, the inherent fitness of these explainable methods can be hard to evaluate. In particular, methods to evaluate the fidelity of the explanation to the underlying black box require further development, especially for tabular data. In this paper, we (a) propose a three phase approach to developing an evaluation method; (b) adapt an existing evaluation method primarily for image and text data to evaluate models trained on tabular data; and (c) evaluate two popular explainable methods using this evaluation method. Our evaluations suggest that the internal mechanism of the underlying predictive model, the internal mechanism of the explainable method used and model and data complexity all affect explanation fidelity. Given that explanation fidelity is so sensitive to context and tools and data used, we could not clearly identify any specific explainable method as being superior to another.
Coding & Programmingarxiv.org

Dynamically Grown Generative Adversarial Networks

Recent work introduced progressive network growing as a promising way to ease the training for large GANs, but the model design and architecture-growing strategy still remain under-explored and needs manual design for different image data. In this paper, we propose a method to dynamically grow a GAN during training, optimizing the network architecture and its parameters together with automation. The method embeds architecture search techniques as an interleaving step with gradient-based training to periodically seek the optimal architecture-growing strategy for the generator and discriminator. It enjoys the benefits of both eased training because of progressive growing and improved performance because of broader architecture design space. Experimental results demonstrate new state-of-the-art of image generation. Observations in the search procedure also provide constructive insights into the GAN model design such as generator-discriminator balance and convolutional layer choices.
Computersarxiv.org

Distilling Self-Knowledge From Contrastive Links to Classify Graph Nodes Without Passing Messages

Nowadays, Graph Neural Networks (GNNs) following the Message Passing paradigm become the dominant way to learn on graphic data. Models in this paradigm have to spend extra space to look up adjacent nodes with adjacency matrices and extra time to aggregate multiple messages from adjacent nodes. To address this issue, we develop a method called LinkDist that distils self-knowledge from connected node pairs into a Multi-Layer Perceptron (MLP) without the need to aggregate messages. Experiment with 8 real-world datasets shows the MLP derived from LinkDist can predict the label of a node without knowing its adjacencies but achieve comparable accuracy against GNNs in the contexts of semi- and full-supervised node classification. Moreover, LinkDist benefits from its Non-Message Passing paradigm that we can also distil self-knowledge from arbitrarily sampled node pairs in a contrastive way to further boost the performance of LinkDist.
Photographyarxiv.org

ICDAR 2021 Competition on Components Segmentation Task of Document Photos

Celso A. M. Lopes Junior, Ricardo B. das Neves Junior, Byron L. D. Bezerra, Alejandro H. Toselli, Donato Impedovo. This paper describes the short-term competition on Components Segmentation Task of Document Photos that was prepared in the context of the 16th International Conference on Document Analysis and Recognition (ICDAR 2021). This competition aims to bring together researchers working on the filed of identification document image processing and provides them a suitable benchmark to compare their techniques on the component segmentation task of document images. Three challenge tasks were proposed entailing different segmentation assignments to be performed on a provided dataset. The collected data are from several types of Brazilian ID documents, whose personal information was conveniently replaced. There were 16 participants whose results obtained for some or all the three tasks show different rates for the adopted metrics, like Dice Similarity Coefficient ranging from 0.06 to 0.99. Different Deep Learning models were applied by the entrants with diverse strategies to achieve the best results in each of the tasks. Obtained results show that the current applied methods for solving one of the proposed tasks (document boundary detection) are already well stablished. However, for the other two challenge tasks (text zone and handwritten sign detection) research and development of more robust approaches are still required to achieve acceptable results.
Computersarxiv.org

Unsupervised Person Re-identification via Multi-Label Prediction and Classification based on Graph-Structural Insight

This paper addresses unsupervised person re-identification (Re-ID) using multi-label prediction and classification based on graph-structural insight. Our method extracts features from person images and produces a graph that consists of the features and a pairwise similarity of them as nodes and edges, respectively. Based on the graph, the proposed graph structure based multi-label prediction (GSMLP) method predicts multi-labels by considering the pairwise similarity and the adjacency node distribution of each node. The multi-labels created by GSMLP are applied to the proposed selective multi-label classification (SMLC) loss. SMLC integrates a hard-sample mining scheme and a multi-label classification. The proposed GSMLP and SMLC boost the performance of unsupervised person Re-ID without any pre-labelled dataset. Experimental results justify the superiority of the proposed method in unsupervised person Re-ID by producing state-of-the-art performance. The source code for this paper is publicly available on 'this https URL.
Computersarxiv.org

Grounding Spatio-Temporal Language with Transformers

Language is an interface to the outside world. In order for embodied agents to use it, language must be grounded in other, sensorimotor modalities. While there is an extended literature studying how machines can learn grounded language, the topic of how to learn spatio-temporal linguistic concepts is still largely uncharted. To make progress in this direction, we here introduce a novel spatio-temporal language grounding task where the goal is to learn the meaning of spatio-temporal descriptions of behavioral traces of an embodied agent. This is achieved by training a truth function that predicts if a description matches a given history of observations. The descriptions involve time-extended predicates in past and present tense as well as spatio-temporal references to objects in the scene. To study the role of architectural biases in this task, we train several models including multimodal Transformer architectures; the latter implement different attention computations between words and objects across space and time. We test models on two classes of generalization: 1) generalization to randomly held-out sentences; 2) generalization to grammar primitives. We observe that maintaining object identity in the attention computation of our Transformers is instrumental to achieving good performance on generalization overall, and that summarizing object traces in a single token has little influence on performance. We then discuss how this opens new perspectives for language-guided autonomous embodied agents. We also release our code under open-source license as well as pretrained models and datasets to encourage the wider community to build upon and extend our work in the future.
Technologyarxiv.org

Latent Representation in Human-Robot Interaction with Explicit Consideration of Periodic Dynamics

This paper presents a new data-driven framework for analyzing periodic physical human-robot interaction (pHRI) in latent state space. To elaborate human understanding and/or robot control during pHRI, the model representing pHRI is critical. Recent developments of deep learning technologies would enable us to learn such a model from a dataset collected from the actual pHRI. Our framework is developed based on variational recurrent neural network (VRNN), which can inherently handle time-series data like one pHRI generates. This paper modifies VRNN in order to include the latent dynamics from robot to human explicitly. In addition, to analyze periodic motions like walking, we integrate a new recurrent network based on reservoir computing (RC), which has random and fixed connections between numerous neurons, with VRNN. By augmenting RC into complex domain, periodic behavior can be represented as the phase rotation in complex domain without decaying the amplitude. For verification of the proposed framework, a rope-rotation/swinging experiment was analyzed. The proposed framework, trained on the dataset collected from the experiment, achieved the latent state space where the differences in periodic motions can be distinguished. Such a well-distinguished space yielded the best prediction accuracy of the human observations and the robot actions. The attached video can be seen in youtube: this https URL.
Computerssqlservercentral.com

Getting Started with Power Virtual Agents

Power Virtual Agents empowers subject matter experts to build intelligent conversational bots, using a guided, no-code graphical interface. In this video you’ll get a quick overview of how to get started with Power Virtual Agents.
Softwarearxiv.org

An Intelligent Question Answering System based on Power Knowledge Graph

The intelligent question answering (IQA) system can accurately capture users' search intention by understanding the natural language questions, searching relevant content efficiently from a massive knowledge-base, and returning the answer directly to the user. Since the IQA system can save inestimable time and workforce in data search and reasoning, it has received more and more attention in data science and artificial intelligence. This article introduced a domain knowledge graph using the graph database and graph computing technologies from massive heterogeneous data in electric power. It then proposed an IQA system based on the electrical power knowledge graph to extract the intent and constraints of natural interrogation based on the natural language processing (NLP) method, to construct graph data query statements via knowledge reasoning, and to complete the accurate knowledge search and analysis to provide users with an intuitive visualization. This method thoroughly combined knowledge graph and graph computing characteristics, realized high-speed multi-hop knowledge correlation reasoning analysis in tremendous knowledge. The proposed work can also provide a basis for the context-aware intelligent question and answer.
Computerssalesforce.com

Experience Builder Is Your Answer To Building Powerful, CRM-Driven Websites Without Code

As a Salesforce MVP and Vice President of Product at 7Summits, a Salesforce integration partner, I live and breathe Salesforce. I know how to build amazing digital experiences on the Salesforce platform that leverage innovative components, apps, and Lightning bolts. I’ve authored a few books on Salesforce, including Practical Guide to Salesforce Communities, and published multiple courses and presentations to help people efficiently and effectively build Salesforce solutions without code. I’m here to tell you about my favorite digital experience builder tool on the Salesforce platform, and how it enables digital leaders to build compelling, connected digital experiences.
Sciencearxiv.org

Contrastive Learning with Continuous Proxy Meta-Data for 3D MRI Classification

Benoit Dufumier, Pietro Gori, Julie Victor, Antoine Grigis, Michel Wessa, Paolo Brambilla, Pauline Favre, Mircea Polosan, Colm McDonald, Camille Marie Piguet, Edouard Duchesnay. Traditional supervised learning with deep neural networks requires a tremendous amount of labelled data to converge to a good solution. For 3D medical images, it is often...