Interpretation of Emergent Communication in Heterogeneous Collaborative Embodied Agents

By Shivansh Patel, Saim Wani, Unnat Jain, Alexander Schwing, Svetlana Lazebnik, Manolis Savva, Angel X. Chang
arxiv.org
 10 days ago

Communication between embodied AI agents has received increasing attention in recent years. Despite its use, it is still unclear whether the learned communication is interpretable and grounded in perception. To study the grounding of emergent forms of communication, we first

arxiv.org

Business Insider

Facebook is working on AI tech that will monitor your every move

Facebook envisions a future where smartglasses "become as useful in everyday life as smartphones," the company said in a new blog post. In order to achieve that future, such devices will require powerful AI software that can read and respond to the world around the headset's user. And the only way to train AI to see and hear the world like humans do is for it to experience the world like we do: from a first-person perspective.
INTERNET
arxiv.org

Using Trust for Heterogeneous Human-Robot Team Task Allocation

Human-robot teams have the ability to perform better across various tasks than human-only and robot-only teams. However, such improvements cannot be realized without proper task allocation. Trust is an important factor in teaming relationships, and can be used in the task allocation strategy. Despite the importance, most existing task allocation strategies do not incorporate trust. This paper reviews select studies on trust and task allocation. We also summarize and discuss how a bi-directional trust model can be used for a task allocation strategy. The bi-directional trust model represents task requirements and agents by their capabilities, and can be used to predict trust for both existing and new tasks. Our task allocation approach uses predicted trust in the agent and expected total reward for task assignment. Finally, we present some directions for future work, including the incorporation of trust from the human and human capacity for task allocation, and a negotiation phase for resolving task disagreements.
COMPUTERS
arxiv.org

Exploring Heterogeneous Characteristics of Layers in ASR Models for More Efficient Training

Transformer-based architectures have been the subject of research aimed at understanding their overparameterization and the non-uniform importance of their layers. Applying these approaches to Automatic Speech Recognition, we demonstrate that the state-of-the-art Conformer models generally have multiple ambient layers. We study the stability of these layers across runs and model sizes, propose that group normalization may be used without disrupting their formation, and examine their correlation with model weight updates in each layer. Finally, we apply these findings to Federated Learning in order to improve the training procedure, by targeting Federated Dropout to layers by importance. This allows us to reduce the model size optimized by clients without quality degradation, and shows potential for future exploration.
CODING & PROGRAMMING
arxiv.org

Stability of large complex systems with heterogeneous relaxation dynamics

We study the probability of stability of a large complex system of size $N$ within the framework of a generalized May model, which assumes a linear dynamics of each population size $n_i$ (with respect to its equilibrium value): $ \frac{\mathrm{d}\, n_i}{\mathrm{d}t} = - a_i n_i - \sqrt{T} \sum_{j} J_{ij} n_j $. The $a_i>0$'s are the intrinsic decay rates, $J_{ij}$ is a real symmetric $(N\times N)$ Gaussian random matrix and $\sqrt{T}$ measures the strength of pairwise interaction between different species. Unlike in May's original homogeneous model, each species has now an intrinsic damping $a_i$ that may differ from one another. As the interaction strength $T$ increases, the system undergoes a phase transition from a stable phase to an unstable phase at a critical value $T=T_c$. We reinterpret the probability of stability in terms of the hitting time of the level $b=0$ of an associated Dyson Brownian Motion (DBM), starting at the initial position $a_i$ and evolving in `time' $T$. In the large $N \to \infty$ limit, using this DBM picture, we are able to completely characterize $T_c$ for arbitrary density $\mu(a)$ of the $a_i$'s. For a specific flat configuration $a_i = 1 + \sigma \frac{i-1}{N}$, we obtain an explicit parametric solution for the limiting (as $N\to \infty$) spectral density for arbitrary $T$ and $\sigma$. For finite but large $N$, we also compute the large deviation properties of the probability of stability on the stable side $T < T_c$ using a Coulomb gas representation.
SCIENCE
arxiv.org

RPT: Toward Transferable Model on Heterogeneous Researcher Data via Pre-Training

With the growth of the academic engines, the mining and analysis acquisition of massive researcher data, such as collaborator recommendation and researcher retrieval, has become indispensable. It can improve the quality of services and intelligence of academic engines. Most of the existing studies for researcher data mining focus on a single task for a particular application scenario and learning a task-specific model, which is usually unable to transfer to out-of-scope tasks. The pre-training technology provides a generalized and sharing model to capture valuable information from enormous unlabeled data. The model can accomplish multiple downstream tasks via a few fine-tuning steps. In this paper, we propose a multi-task self-supervised learning-based researcher data pre-training model named RPT. Specifically, we divide the researchers' data into semantic document sets and community graph. We design the hierarchical Transformer and the local community encoder to capture information from the two categories of data, respectively. Then, we propose three self-supervised learning objectives to train the whole model. Finally, we also propose two transfer modes of RPT for fine-tuning in different scenarios. We conduct extensive experiments to evaluate RPT, results on three downstream tasks verify the effectiveness of pre-training for researcher data mining.
COMPUTERS
arxiv.org

Resource-constrained Federated Edge Learning with Heterogeneous Data: Formulation and Analysis

Efficient collaboration between collaborative machine learning and wireless communication technology, forming a Federated Edge Learning (FEEL), has spawned a series of next-generation intelligent applications. However, due to the openness of network connections, the FEEL framework generally involves hundreds of remote devices (or clients), resulting in expensive communication costs, which is not friendly to resource-constrained FEEL. To address this issue, we propose a distributed approximate Newton-type algorithm with fast convergence speed to alleviate the problem of FEEL resource (in terms of communication resources) constraints. Specifically, the proposed algorithm is improved based on distributed L-BFGS algorithm and allows each client to approximate the high-cost Hessian matrix by computing the low-cost Fisher matrix in a distributed manner to find a "better" descent direction, thereby speeding up convergence. Second, we prove that the proposed algorithm has linear convergence in strongly convex and non-convex cases and analyze its computational and communication complexity. Similarly, due to the heterogeneity of the connected remote devices, FEEL faces the challenge of heterogeneous data and non-IID (Independent and Identically Distributed) data. To this end, we design a simple but elegant training scheme, namely FedOVA, to solve the heterogeneous statistical challenge brought by heterogeneous data. In this way, FedOVA first decomposes a multi-class classification problem into more straightforward binary classification problems and then combines their respective outputs using ensemble learning. In particular, the scheme can be well integrated with our communication efficient algorithm to serve FEEL. Numerical results verify the effectiveness and superiority of the proposed algorithm.
SOFTWARE
arxiv.org

Dynamic Inference with Neural Interpreters

Nasim Rahaman, Muhammad Waleed Gondal, Shruti Joshi, Peter Gehler, Yoshua Bengio, Francesco Locatello, Bernhard Schölkopf. Modern neural network architectures can leverage large amounts of data to generalize well within the training distribution. However, they are less capable of systematic generalization to data drawn from unseen but related distributions, a feat that is hypothesized to require compositional reasoning and reuse of knowledge. In this work, we present Neural Interpreters, an architecture that factorizes inference in a self-attention network as a system of modules, which we call \emph{functions}. Inputs to the model are routed through a sequence of functions in a way that is end-to-end learned. The proposed architecture can flexibly compose computation along width and depth, and lends itself well to capacity extension after training. To demonstrate the versatility of Neural Interpreters, we evaluate it in two distinct settings: image classification and visual abstract reasoning on Raven Progressive Matrices. In the former, we show that Neural Interpreters perform on par with the vision transformer using fewer parameters, while being transferrable to a new task in a sample efficient manner. In the latter, we find that Neural Interpreters are competitive with respect to the state-of-the-art in terms of systematic generalization.
COMPUTERS
TheConversationAU

Facebook wants AI to find your keys and understand your conversations

Facebook has announced a research project that aims to push the “frontier of first-person perception”, and in the process help you remember where your left your keys. The Ego4D project provides a huge collection of first-person video and related data, plus a set of challenges for researchers to teach computers to understand the data and gather useful information from it. In September, the social media giant launched a line of “smart glasses” called Ray-Ban Stories, which carry a digital camera and other features. Much like the Google Glass project, which met mixed reviews in 2013, this one has prompted complaints of...
SOFTWARE
NewsBreak
Artificial Intelligence
NewsBreak
Technology
NewsBreak
Computers
NewsBreak
Science
NewsBreak
Computer Science
arxiv.org

Directed Percolation in Random Temporal Network Models with Heterogeneities

The event graph representation of temporal networks suggests that the connectivity of temporal structures can be mapped to a directed percolation problem. However, similar to percolation theory on static networks, this mapping is valid under the approximation that the structure and interaction dynamics of the temporal network are determined by its local properties, and otherwise, it is maximally random. We challenge these conditions and demonstrate the robustness of this mapping in case of more complicated systems. We systematically analyze random and regular network topologies and heterogeneous link-activation processes driven by bursty renewal or self-exciting processes using numerical simulation and finite-size scaling methods. We find that the critical percolation exponents characterizing the temporal network are not sensitive to many structural and dynamical network heterogeneities, while they recover known scaling exponents characterizing directed percolation on low dimensional lattices. While it is not possible to demonstrate the validity of this mapping for all temporal network models, our results establish the first batch of evidence supporting the robustness of the scaling relationships in the limited-time reachability of temporal networks.
SCIENCE
arxiv.org

Piecewise Interpretable Hilbert Spaces

We define and study piecewise interpretable Hilbert spaces in continuous logic. These are Hilbert spaces which arise as direct limits of imaginary sorts of a model $M$ of a theory $T$. We introduce natural examples of piecewise interpretable Hilbert spaces in a wide variety of contexts. We show that piecewise interpretable Hilbert spaces can be seen as encoding interesting model theoretic information about $M$ or $T$, such as properties of definable measures or Galois-theoretic information. We also show that piecewise interpretable Hilbert spaces encode unitary group representations in various ways. We carry out a systematic structural analysis of piecewise interpretable Hilbert spaces with scattered subsets. As an application of this work, we recover the classification of the unitary representations of oligomorphic groups first discovered by Tsankov (2012). Our main tool is local stability theory in continuous logic.
MATHEMATICS
arxiv.org

Directionality Reinforcement Learning to Operate Multi-Agent System without Communication

This paper establishes directionality reinforcement learning (DRL) technique to propose the complete decentralized multi-agent reinforcement learning method which can achieve cooperation based on each agent's learning: no communication and no observation. Concretely, DRL adds the direction "agents have to learn to reach the farthest goal among reachable ones" to learning agents to operate the agents cooperatively. Furthermore, to investigate the effectiveness of the DRL, this paper compare Q-learning agent with DRL with previous learning agent in maze problems. Experimental results derive that (1) DRL performs better than the previous method in terms of the spending time, (2) the direction makes agents learn yielding action for others, and (3) DRL suggests achieving multiagent learning with few costs for any number of agents.
COMPUTERS
arxiv.org

Adaptive Elastic Training for Sparse Deep Learning on Heterogeneous Multi-GPU Servers

Motivated by extreme multi-label classification applications, we consider training deep learning models over sparse data in multi-GPU servers. The variance in the number of non-zero features across training batches and the intrinsic GPU heterogeneity combine to limit accuracy and increase the time to convergence. We address these challenges with Adaptive SGD, an adaptive elastic model averaging stochastic gradient descent algorithm for heterogeneous multi-GPUs that is characterized by dynamic scheduling, adaptive batch size scaling, and normalized model merging. Instead of statically partitioning batches to GPUs, batches are routed based on the relative processing speed. Batch size scaling assigns larger batches to the faster GPUs and smaller batches to the slower ones, with the goal to arrive at a steady state in which all the GPUs perform the same number of model updates. Normalized model merging computes optimal weights for every GPU based on the assigned batches such that the combined model achieves better accuracy. We show experimentally that Adaptive SGD outperforms four state-of-the-art solutions in time-to-accuracy and is scalable with the number of GPUs.
CODING & PROGRAMMING
arxiv.org

Applying quantum approximate optimization to the heterogeneous vehicle routing problem

Quantum computing offers new heuristics for combinatorial problems. With small- and intermediate-scale quantum devices becoming available, it is possible to implement and test these heuristics on small-size problems. A candidate for such combinatorial problems is the heterogeneous vehicle routing problem (HVRP): the problem of finding the optimal set of routes, given a heterogeneous fleet of vehicles with varying loading capacities, to deliver goods to a given set of customers. In this work, we investigate the potential use of a quantum computer to find approximate solutions to the HVRP using the quantum approximate optimization algorithm (QAOA). For this purpose we formulate a mapping of the HVRP to an Ising Hamiltonian and simulate the algorithm on problem instances of up to 21 qubits. We find that the number of qubits needed for this mapping scales quadratically with the number of customers. We compare the performance of different classical optimizers in the QAOA for varying problem size of the HVRP, finding a trade-off between optimizer performance and runtime.
TECHNOLOGY
towardsdatascience.com

Not Merely Averages: Using Machine Learning to Estimate Heterogeneous Treatment Effects

How does the causal impact of a policy or program vary across individuals?. This blog post provides a practical introduction on how to use generic machine learning inference on heterogeneous treatment effects in experiments as proposed by Chernozhukow, Demirer, Duflo and Férnandez-Val (2020). I wrote this blog post for the statistically minded practitioner who is interested in applying the method in their work. If you want to learn the theory underlying the method, please consult the original paper. The beta version of the GenericML package developed by Welz, Alfons, Demirer and Chernozhukov can be found here, and the code used in this blog post is here.
SCIENCE
arxiv.org

Relation-aware Heterogeneous Graph for User Profiling

User profiling has long been an important problem that investigates user interests in many real applications. Some recent works regard users and their interacted objects as entities of a graph and turn the problem into a node classification task. However, they neglect the difference of distinct interaction types, e.g. user clicks an item v.s.user purchases an item, and thus cannot incorporate such information well. To solve these issues, we propose to leverage the relation-aware heterogeneous graph method for user profiling, which also allows capturing significant meta relations. We adopt the query, key, and value mechanism in a transformer fashion for heterogeneous message passing so that entities can effectively interact with each other. Via such interactions on different relation types, our model can generate representations with rich information for the user profile prediction. We conduct experiments on two real-world e-commerce datasets and observe a significant performance boost of our approach.
SOFTWARE
arxiv.org

Individual versus Social Benefit on the Heterogeneous Networks

The focus of structural balance theory is dedicated to social benefits, while in a real network individual benefits sometimes get the importance as well. In Strauss's model, the local minima are modeled by considering an individual term besides a social one and the assumption is based on equal strength of individual benefits. The results show that the competition between two terms leads to a phase transition between individual and social benefits and there is a critical point, $CP$, that represents a first-order phase transition in the network. Concerning a real network of relations, individuals adjust the strength of their relationships based on the benefits they acquire from. Therefore, addressing heterogeneity in the individual interactions, we study a modified version of Strauss's model in which the first term represents the heterogeneous individual benefit by $\theta_{ij}$, and the coefficient of the second term, $\alpha$, measures the strength of social benefit. Our studies show that there is a region where the triangles are in a crumpled state rather than being dispersed in the network and increasing the heterogeneity of individual benefits results in the narrower region of crumpled state. Out of this region, the network is a mixture of links and triangles and the value of $\alpha$ determines whether the individual benefit or social benefit overcomes. For the small value of $\alpha$ the individual benefit dominates whereas in the large value of $\alpha$ the social benefit overcomes.
SCIENCE
arxiv.org

Energy-based Accounting Model for Heterogeneous Supercomputers

In this paper we present a new accounting model for heterogeneous supercomputers. An increasing number of supercomputing centres adopt heterogeneous architectures consisting of CPUs and hardware accelerators for their systems. Accounting models using the core hour as unit of measure are redefined to provide an appropriate charging rate based on the computing performance of different processing elements, as well as their energy efficiency and purchase price. In this paper we provide an overview of existing models and define a new model that, while retaining the core hour as a fundamental concept, takes into account the interplay among resources such as CPUs and RAM, and that bases the GPU charging rate on energy consumption. We believe that this model, designed for Pawsey Supercomputing Research Centre's next supercomputer Setonix, has a lot of advantages compared to other models, introducing carbon footprint as a primary driver in determining the allocation of computational workflow on heterogeneous resources.
COMPUTERS
arxiv.org

Array Element Coupling in Radio Interferometry I: A Semi-Analytic Approach

We derive a general formalism for interferometric visibilities, which considers first-order antenna-antenna coupling and assumes steady-state, incident radiation. We simulate such coupling features for non-polarized skies on a compact, redundantly-spaced array and present a phenomenological analysis of the coupling features. Contrary to previous studies, we find mutual coupling features manifest themselves at nonzero fringe rates. We compare power spectrum results for both coupled and non-coupled (noiseless, simulated) data and find coupling effects to be highly dependent on LST, baseline length, and baseline orientation. For all LSTs, lengths, and orientations, coupling features appear at delays which are outside the foreground 'wedge', which has been studied extensively and contains non-coupled astrophysical foreground features. Further, we find that first-order coupling effects threaten our ability to average data from baselines with identical length and orientation. Two filtering strategies are proposed which may mitigate such coupling systematics. The semi-analytic coupling model herein presented may be used to study mutual coupling systematics as a function of LST, baseline length, and baseline orientation. Such a model is not only helpful to the field of 21cm cosmology, but any study involving interferometric measurements, where coupling effects at the level of at least 1 part in 10^4 could corrupt the scientific result. Our model may be used to mitigate coupling systematics in existing radio interferometers and to design future arrays where the configuration of array elements inherently mitigates coupling effects at desired LSTs and angular resolutions.
SCIENCE
arxiv.org

Knowledge Graph informed Fake News Classification via Heterogeneous Representation Ensembles

Increasing amounts of freely available data both in textual and relational form offers exploration of richer document representations, potentially improving the model performance and robustness. An emerging problem in the modern era is fake news detection -- many easily available pieces of information are not necessarily factually correct, and can lead to wrong conclusions or are used for manipulation. In this work we explore how different document representations, ranging from simple symbolic bag-of-words, to contextual, neural language model-based ones can be used for efficient fake news identification. One of the key contributions is a set of novel document representation learning methods based solely on knowledge graphs, i.e. extensive collections of (grounded) subject-predicate-object triplets. We demonstrate that knowledge graph-based representations already achieve competitive performance to conventionally accepted representation learners. Furthermore, when combined with existing, contextual representations, knowledge graph-based document representations can achieve state-of-the-art performance. To our knowledge this is the first larger-scale evaluation of how knowledge graph-based representations can be systematically incorporated into the process of fake news classification.
SOCIETY

