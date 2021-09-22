CreatorsPublishersAdvertisers
View more in
Computers

Learning by Examples Based on Multi-level Optimization

By Shentong Mo, Pengtao Xie
arxiv.org
 6 days ago

Learning by examples, which learns to solve a new problem by looking into how similar problems are solved, is an effective learning method in human learning. When a student learns a new topic, he/she finds out exemplar topics that are similar to this new topic and studies the exemplar topics to deepen the understanding of the new topic. We aim to investigate whether this powerful learning skill can be borrowed from humans to improve machine learning as well. In this work, we propose a novel learning approach called Learning By Examples (LBE). Our approach automatically retrieves a set of training examples that are similar to query examples and predicts labels for query examples by using class labels of the retrieved examples. We propose a three-level optimization framework to formulate LBE which involves three stages of learning: learning a Siamese network to retrieve similar examples; learning a matching network to make predictions on query examples by leveraging class labels of retrieved similar examples; learning the ``ground-truth'' similarities between training examples by minimizing the validation loss. We develop an efficient algorithm to solve the LBE problem and conduct extensive experiments on various benchmarks where the results demonstrate the effectiveness of our method on both supervised and few-shot learning.

arxiv.org

Comments / 0

Related
arxiv.org

Learning and Decision-Making with Data: Optimal Formulations and Phase Transitions

We study the problem of designing optimal learning and decision-making formulations when only historical data is available. Prior work typically commits to a particular class of data-driven formulation and subsequently tries to establish out-of-sample performance guarantees. We take here the opposite approach. We define first a sensible yard stick with which to measure the quality of any data-driven formulation and subsequently seek to find an optimal such formulation. Informally, any data-driven formulation can be seen to balance a measure of proximity of the estimated cost to the actual cost while guaranteeing a level of out-of-sample performance. Given an acceptable level of out-of-sample performance, we construct explicitly a data-driven formulation that is uniformly closer to the true cost than any other formulation enjoying the same out-of-sample performance. We show the existence of three distinct out-of-sample performance regimes (a superexponential regime, an exponential regime and a subexponential regime) between which the nature of the optimal data-driven formulation experiences a phase transition. The optimal data-driven formulations can be interpreted as a classically robust formulation in the superexponential regime, an entropic distributionally robust formulation in the exponential regime and finally a variance penalized formulation in the subexponential regime. This final observation unveils a surprising connection between these three, at first glance seemingly unrelated, data-driven formulations which until now remained hidden.
COMPUTERS
phoronix.com

oneAPI Level Zero Loader v1.5 Released With VPU Driver Recognition, Multi-Driver Support

Intel has released a new version of their loader for oneAPI Level Zero for loading the Level Zero software driver components. With Intel's oneAPI Level Zero Loader v1.5 release out today their Intel VPU Linux driver (ze_intel_vpu) is added to their known driver list besides their Intel GPU driver. This recognition is for finding detected/enabled L0 drivers on the system. Another change with their L0 loader is working on multi-driver support now that the VPU driver is also being picked up by the loader. The loader will consider initialization a success if at least one driver succeeds.
COMPUTERS
arxiv.org

DSDF: An approach to handle stochastic agents in collaborative multi-agent reinforcement learning

Multi-Agent reinforcement learning has received lot of attention in recent years and have applications in many different areas. Existing methods involving Centralized Training and Decentralized execution, attempts to train the agents towards learning a pattern of coordinated actions to arrive at optimal joint policy. However if some agents are stochastic to varying degrees of stochasticity, the above methods often fail to converge and provides poor coordination among agents. In this paper we show how this stochasticity of agents, which could be a result of malfunction or aging of robots, can add to the uncertainty in coordination and there contribute to unsatisfactory global coordination. In this case, the deterministic agents have to understand the behavior and limitations of the stochastic agents while arriving at optimal joint policy. Our solution, DSDF which tunes the discounted factor for the agents according to uncertainty and use the values to update the utility networks of individual agents. DSDF also helps in imparting an extent of reliability in coordination thereby granting stochastic agents tasks which are immediate and of shorter trajectory with deterministic ones taking the tasks which involve longer planning. Such an method enables joint co-ordinations of agents some of which may be partially performing and thereby can reduce or delay the investment of agent/robot replacement in many circumstances. Results on benchmark environment for different scenarios shows the efficacy of the proposed approach when compared with existing approaches.
COMPUTERS
arxiv.org

Multi-Level Features Contrastive Networks for Unsupervised Domain Adaptation

Unsupervised domain adaptation aims to train a model from the labeled source domain to make predictions on the unlabeled target domain when the data distribution of the two domains is different. As a result, it needs to reduce the data distribution difference between the two domains to improve the model's generalization ability. Existing methods tend to align the two domains directly at the domain-level, or perform class-level domain alignment based on deep feature. The former ignores the relationship between the various classes in the two domains, which may cause serious negative transfer, the latter alleviates it by introducing pseudo-labels of the target domain, but it does not consider the importance of performing class-level alignment on shallow feature representations. In this paper, we develop this work on the method of class-level alignment. The proposed method reduces the difference between two domains dramaticlly by aligning multi-level features. In the case that the two domains share the label space, the class-level alignment is implemented by introducing Multi-Level Feature Contrastive Networks (MLFCNet). In practice, since the categories of samples in target domain are unavailable, we iteratively use clustering algorithm to obtain the pseudo-labels, and then minimize Multi-Level Contrastive Discrepancy (MLCD) loss to achieve more accurate class-level alignment. Experiments on three real-world benchmarks ImageCLEF-DA, Office-31 and Office-Home demonstrate that MLFCNet compares favorably against the existing state-of-the-art domain adaptation methods.
COMPUTERS
IN THIS ARTICLE
#Optimization#Lbe#Siamese#Machine Learning#Lg
arxiv.org

Improved Few-shot Segmentation by Redifinition of the Roles of Multi-level CNN Features

This study is concerned with few-shot segmentation, i.e., segmenting the region of an unseen object class in a query image, given support image(s) of its instances. The current methods rely on the pretrained CNN features of the support and query images. The key to good performance depends on the proper fusion of their mid-level and high-level features; the former contains shape-oriented information, while the latter has class-oriented information. Current state-of-the-art methods follow the approach of Tian et al., which gives the mid-level features the primary role and the high-level features the secondary role. In this paper, we reinterpret this widely employed approach by redifining the roles of the multi-level features; we swap the primary and secondary roles. Specifically, we regard that the current methods improve the initial estimate generated from the high-level features using the mid-level features. This reinterpretation suggests a new application of the current methods: to apply the same network multiple times to iteratively update the estimate of the object's region, starting from its initial estimate. Our experiments show that this method is effective and has updated the previous state-of-the-art on COCO-20$^i$ in the 1-shot and 5-shot settings and on PASCAL-5$^i$ in the 1-shot setting.
COMPUTERS
arxiv.org

SupCL-Seq: Supervised Contrastive Learning for Downstream Optimized Sequence Representations

While contrastive learning is proven to be an effective training strategy in computer vision, Natural Language Processing (NLP) is only recently adopting it as a self-supervised alternative to Masked Language Modeling (MLM) for improving sequence representations. This paper introduces SupCL-Seq, which extends the supervised contrastive learning from computer vision to the optimization of sequence representations in NLP. By altering the dropout mask probability in standard Transformer architectures, for every representation (anchor), we generate augmented altered views. A supervised contrastive loss is then utilized to maximize the system's capability of pulling together similar samples (e.g., anchors and their altered views) and pushing apart the samples belonging to the other classes. Despite its simplicity, SupCLSeq leads to large gains in many sequence classification tasks on the GLUE benchmark compared to a standard BERTbase, including 6% absolute improvement on CoLA, 5.4% on MRPC, 4.7% on RTE and 2.6% on STSB. We also show consistent gains over self supervised contrastively learned representations, especially in non-semantic tasks. Finally we show that these gains are not solely due to augmentation, but rather to a downstream optimized sequence representation. Code: this https URL.
CODING & PROGRAMMING
arxiv.org

Multi-Task Learning with Sequence-Conditioned Transporter Networks

Michael H. Lim, Andy Zeng, Brian Ichter, Maryam Bandari, Erwin Coumans, Claire Tomlin, Stefan Schaal, Aleksandra Faust. Enabling robots to solve multiple manipulation tasks has a wide range of industrial applications. While learning-based approaches enjoy flexibility and generalizability, scaling these approaches to solve such compositional tasks remains a challenge. In this work, we aim to solve multi-task learning through the lens of sequence-conditioning and weighted sampling. First, we propose a new suite of benchmark specifically aimed at compositional tasks, MultiRavens, which allows defining custom task combinations through task modules that are inspired by industrial tasks and exemplify the difficulties in vision-based learning and planning methods. Second, we propose a vision-based end-to-end system architecture, Sequence-Conditioned Transporter Networks, which augments Goal-Conditioned Transporter Networks with sequence-conditioning and weighted sampling and can efficiently learn to solve multi-task long horizon problems. Our analysis suggests that not only the new framework significantly improves pick-and-place performance on novel 10 multi-task benchmark problems, but also the multi-task learning with weighted sampling can vastly improve learning and agent performances on individual tasks.
ENGINEERING
arxiv.org

BERT is Robust! A Case Against Synonym-Based Adversarial Examples in Text Classification

Deep Neural Networks have taken Natural Language Processing by storm. While this led to incredible improvements across many tasks, it also initiated a new research field, questioning the robustness of these neural networks by attacking them. In this paper, we investigate four word substitution-based attacks on BERT. We combine a human evaluation of individual word substitutions and a probabilistic analysis to show that between 96% and 99% of the analyzed attacks do not preserve semantics, indicating that their success is mainly based on feeding poor data to the model. To further confirm that, we introduce an efficient data augmentation procedure and show that many adversarial examples can be prevented by including data similar to the attacks during training. An additional post-processing step reduces the success rates of state-of-the-art attacks below 5%. Finally, by looking at more reasonable thresholds on constraints for word substitutions, we conclude that BERT is a lot more robust than research on attacks suggests.
COMPUTERS
YOU MAY ALSO LIKE
NewsBreak
Technology
NewsBreak
Computers
arxiv.org

Data-Driven Moment-Based Distributionally Robust Chance-Constrained Optimization

Many stochastic optimization problems include chance constraints that enforce constraint satisfaction with a specific probability; however, solving an optimization problem with chance constraints assumes that the solver has access to the exact underlying probability distribution, which is often unreasonable. In data-driven applications, it is common instead to use historical data samples as a surrogate to the distribution; however, this comes at a significant computational cost from the added time spent either processing the data or, worse, adding additional variables and constraints to the optimization problem. On the other hand, the sample mean and covariance matrix are lightweight to calculate, and it is possible to reframe the chance constraint as a distributionally robust chance constraint. The challenge here is that the sample mean and covariance matrix themselves are random variables, so their uncertainty should be factored into the chance constraint. This work bridges this gap by modifying the standard method of distributionally robust chance constraints to guarantee its satisfaction. The proposed data-driven method is tested on a particularly problematic example. The results show that the computationally fast proposed method is not significantly more conservative than other methods.
COMPUTERS
arxiv.org

Conservative Data Sharing for Multi-Task Offline Reinforcement Learning

Offline reinforcement learning (RL) algorithms have shown promising results in domains where abundant pre-collected data is available. However, prior methods focus on solving individual problems from scratch with an offline dataset without considering how an offline RL agent can acquire multiple skills. We argue that a natural use case of offline RL is in settings where we can pool large amounts of data collected in various scenarios for solving different tasks, and utilize all of this data to learn behaviors for all the tasks more effectively rather than training each one in isolation. However, sharing data across all tasks in multi-task offline RL performs surprisingly poorly in practice. Thorough empirical analysis, we find that sharing data can actually exacerbate the distributional shift between the learned policy and the dataset, which in turn can lead to divergence of the learned policy and poor performance. To address this challenge, we develop a simple technique for data-sharing in multi-task offline RL that routes data based on the improvement over the task-specific data. We call this approach conservative data sharing (CDS), and it can be applied with multiple single-task offline RL methods. On a range of challenging multi-task locomotion, navigation, and vision-based robotic manipulation problems, CDS achieves the best or comparable performance compared to prior offline multi-task RL methods and previous data sharing approaches.
COMPUTERS
arxiv.org

Image-Based Multi-UAV Tracking System in a Cluttered Environment

A tracking controller for unmanned aerial vehicles (UAVs) is developed to track moving targets undergoing unknown translational and rotational motions. The main challenges are to control both the relative positions and angles between the target and the UAVs to within desired values, and to guarantee that the generated control inputs to the UAVs are feasible (i.e., within their motion capabilities). Moreover, the UAVs are controlled to ensure that the target always remains within the fields of view of their onboard cameras. To the best of our knowledge, this is the first work to apply multiple UAVs to cooperatively track a dynamic target while ensuring that the UAVs remain connected and that both occlusion and collisions are avoided. To achieve these control objectives, a designed controller solved based on the aforementioned tracking controller using quadratic programming can generate minimally invasive control actions to achieve occlusion avoidance and collision avoidance. Furthermore, control barrier functions (CBFs) with a distributed design are developed in order to reduce the amount of inter-UAV communication. Simulations were performed to assess the efficacy and performance of the developed CBF-based controller for the multi-UAV system in tracking a target.
TECHNOLOGY
arxiv.org

Optimization-based Block Coordinate Gradient Coding

Existing gradient coding schemes introduce identical redundancy across the coordinates of gradients and hence cannot fully utilize the computation results from partial stragglers. This motivates the introduction of diverse redundancies across the coordinates of gradients. This paper considers a distributed computation system consisting of one master and $N$ workers characterized by a general partial straggler model and focuses on solving a general large-scale machine learning problem with $L$ model parameters. We show that it is sufficient to provide at most $N$ levels of redundancies for tolerating $0, 1,\cdots, N-1$ stragglers, respectively. Consequently, we propose an optimal block coordinate gradient coding scheme based on a stochastic optimization problem that optimizes the partition of the $L$ coordinates into $N$ blocks, each with identical redundancy, to minimize the expected overall runtime for collaboratively computing the gradient. We obtain an optimal solution using a stochastic projected subgradient method and propose two low-complexity approximate solutions with closed-from expressions, for the stochastic optimization problem. We also show that under a shifted-exponential distribution, for any $L$, the expected overall runtimes of the two approximate solutions and the minimum overall runtime have sub-linear multiplicative gaps in $N$. To the best of our knowledge, this is the first work that optimizes the redundancies of gradient coding introduced across the coordinates of gradients.
COMPUTERS
arxiv.org

MHFC: Multi-Head Feature Collaboration for Few-Shot Learning

Few-shot learning (FSL) aims to address the data-scarce problem. A standard FSL framework is composed of two components: (1) Pre-train. Employ the base data to generate a CNN-based feature extraction model (FEM). (2) Meta-test. Apply the trained FEM to acquire the novel data's features and recognize them. FSL relies heavily on the design of the FEM. However, various FEMs have distinct emphases. For example, several may focus more attention on the contour information, whereas others may lay particular emphasis on the texture information. The single-head feature is only a one-sided representation of the sample. Besides the negative influence of cross-domain (e.g., the trained FEM can not adapt to the novel class flawlessly), the distribution of novel data may have a certain degree of deviation compared with the ground truth distribution, which is dubbed as distribution-shift-problem (DSP). To address the DSP, we propose Multi-Head Feature Collaboration (MHFC) algorithm, which attempts to project the multi-head features (e.g., multiple features extracted from a variety of FEMs) to a unified space and fuse them to capture more discriminative information. Typically, first, we introduce a subspace learning method to transform the multi-head features to aligned low-dimensional representations. It corrects the DSP via learning the feature with more powerful discrimination and overcomes the problem of inconsistent measurement scales from different head features. Then, we design an attention block to update combination weights for each head feature automatically. It comprehensively considers the contribution of various perspectives and further improves the discrimination of features. We evaluate the proposed method on five benchmark datasets (including cross-domain experiments) and achieve significant improvements of 2.1%-7.8% compared with state-of-the-arts.
COMPUTERS
arxiv.org

Multi-Level Visual Similarity Based Personalized Tourist Attraction Recommendation Using Geo-Tagged Photos

Geo-tagged photo based tourist attraction recommendation can discover users' travel preferences from their taken photos, so as to recommend suitable tourist attractions to them. However, existing visual content based methods cannot fully exploit the user and tourist attraction information of photos to extract visual features, and do not differentiate the significances of different photos. In this paper, we propose multi-level visual similarity based personalized tourist attraction recommendation using geo-tagged photos (MEAL). MEAL utilizes the visual contents of photos and interaction behavior data to obtain the final embeddings of users and tourist attractions, which are then used to predict the visit probabilities. Specifically, by crossing the user and tourist attraction information of photos, we define four visual similarity levels and introduce a corresponding quintuplet loss to embed the visual contents of photos. In addition, to capture the significances of different photos, we exploit the self-attention mechanism to obtain the visual representations of users and tourist attractions. We conducted experiments on a dataset crawled from Flickr, and the experimental results proved the advantage of this method.
LIFESTYLE
arxiv.org

Universal Adversarial Attack on Deep Learning Based Prognostics

Deep learning-based time series models are being extensively utilized in engineering and manufacturing industries for process control and optimization, asset monitoring, diagnostic and predictive maintenance. These models have shown great improvement in the prediction of the remaining useful life (RUL) of industrial equipment but suffer from inherent vulnerability to adversarial attacks. These attacks can be easily exploited and can lead to catastrophic failure of critical industrial equipment. In general, different adversarial perturbations are computed for each instance of the input data. This is, however, difficult for the attacker to achieve in real time due to higher computational requirement and lack of uninterrupted access to the input data. Hence, we present the concept of universal adversarial perturbation, a special imperceptible noise to fool regression based RUL prediction models. Attackers can easily utilize universal adversarial perturbations for real-time attack since continuous access to input data and repetitive computation of adversarial perturbations are not a prerequisite for the same. We evaluate the effect of universal adversarial attacks using NASA turbofan engine dataset. We show that addition of universal adversarial perturbation to any instance of the input data increases error in the output predicted by the model. To the best of our knowledge, we are the first to study the effect of the universal adversarial perturbation on time series regression models. We further demonstrate the effect of varying the strength of perturbations on RUL prediction models and found that model accuracy decreases with the increase in perturbation strength of the universal adversarial attack. We also showcase that universal adversarial perturbation can be transferred across different models.
COMPUTERS
arxiv.org

A Logic-based Multi-agent System for Ethical Monitoring and Evaluation of Dialogues

Abeer Dyoub (DISIM, University of L'Aquila, Italy), Stefania Costantini (DISIM, University of L'Aquila, Italy), Ivan Letteri (DISIM, University of L'Aquila, Italy), Francesca A. Lisi (DIB & CILA, University of Bari "Aldo Moro", Italy) Dialogue Systems are tools designed for various practical purposes concerning human-machine interaction. These systems should be built...
CODING & PROGRAMMING
arxiv.org

An Optimization-based Approach for Flow Table Capacity Bottleneck Mitigation in Software-Defined Networks

Flow delegation is a flexible technique to mitigate flow table capacity bottlenecks in Software-defined Networks (SDN). Such bottlenecks occur when SDN switches provide insufficient flow table capacity which leads to performance degradation and network failures. Flow delegation addresses this problem by automatically relocating flow rules from a bottlenecked switch to neighboring switches with spare capacity. This paper introduces a new algorithm to efficiently perform flow delegation based on a novel delegation template abstraction and multi-period multi-objective optimization. Different from existing work, our approach can include estimated knowledge about future network situations and deal with different optimization criteria such as link and control overhead. We discuss the problem decomposition for the new algorithm and introduce an efficient two-step heuristic. Results show, that our approach performs significantly better than the simple greedy algorithm used in earlier work and is capable of handling flow delegation for networks with hundreds of switches.
CODING & PROGRAMMING
arxiv.org

Robust Optimization of Instantaneous Beamforming and Quasi-static Phase Shifts in an IRS-assisted Multi-Cell Network

The impacts of channel estimation errors, inter-cell interference, phase adjustment cost, and computation cost on an intelligent reflecting surface (IRS)-assisted system are severe in practice but have been ignored for simplicity in most existing works. In this paper, we investigate a multi-antenna base station (BS) serving a single-antenna user with the help of a multi-element IRS in a multi-cell network with inter-cell interference. We consider imperfect channel state information (CSI) at the BS, i.e., imperfect CSIT, and focus on the robust optimization of the BS's instantaneous CSI-adaptive beamforming and the IRS's quasi-static phase shifts in two scenarios. In the scenario of coding over many slots, we formulate a robust optimization problem to maximize the user's ergodic rate. In the scenario of coding within each slot, we formulate a robust optimization problem to maximize the user's average goodput under the successful transmission probability constraints. The robust optimization problems are challenging two-timescale stochastic non-convex problems. In both scenarios, we obtain closed-form robust beamforming designs for any given phase shifts and more tractable stochastic non-convex approximate problems only for the phase shifts. Besides, we propose an iterative algorithm to obtain a Karush-Kuhn-Tucker (KKT) point of each of the stochastic problems for the phase shifts. It is worth noting that the proposed methods offer closed-form robust instantaneous CSI-adaptive beamforming designs which can promptly adapt to rapid CSI changes over slots and robust quasi-static phase shift designs of low computation and phase adjustment costs in the presence of imperfect CSIT and inter-cell interference. Numerical results further demonstrate the notable gains of the proposed robust joint designs over existing ones and reveal the practical values of the proposed solutions.
COMPUTERS
arxiv.org

The Stem Cell Hypothesis: Dilemma behind Multi-Task Learning with Transformer Encoders

Multi-task learning with transformer encoders (MTL) has emerged as a powerful technique to improve performance on closely-related tasks for both accuracy and efficiency while a question still remains whether or not it would perform as well on tasks that are distinct in nature. We first present MTL results on five NLP tasks, POS, NER, DEP, CON, and SRL, and depict its deficiency over single-task learning. We then conduct an extensive pruning analysis to show that a certain set of attention heads get claimed by most tasks during MTL, who interfere with one another to fine-tune those heads for their own objectives. Based on this finding, we propose the Stem Cell Hypothesis to reveal the existence of attention heads naturally talented for many tasks that cannot be jointly trained to create adequate embeddings for all of those tasks. Finally, we design novel parameter-free probes to justify our hypothesis and demonstrate how attention heads are transformed across the five tasks during MTL through label analysis.
SCIENCE
arxiv.org

CaTGrasp: Learning Category-Level Task-Relevant Grasping in Clutter from Simulation

Task-relevant grasping is critical for industrial assembly, where downstream manipulation tasks constrain the set of valid grasps. Learning how to perform this task, however, is challenging, since task-relevant grasp labels are hard to define and annotate. There is also yet no consensus on proper representations for modeling or off-the-shelf tools for performing task-relevant grasps. This work proposes a framework to learn task-relevant grasping for industrial objects without the need of time-consuming real-world data collection or manual annotation. To achieve this, the entire framework is trained solely in simulation, including supervised training with synthetic label generation and self-supervised, hand-object interaction. In the context of this framework, this paper proposes a novel, object-centric canonical representation at the category level, which allows establishing dense correspondence across object instances and transferring task-relevant grasps to novel instances. Extensive experiments on task-relevant grasping of densely-cluttered industrial objects are conducted in both simulation and real-world setups, demonstrating the effectiveness of the proposed framework. Code and data will be released upon acceptance at this https URL.
COMPUTERS

Comments / 0

Community Policy