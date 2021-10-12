CreatorsPublishersAdvertisers
VarArray: Array-Geometry-Agnostic Continuous Speech Separation

By Takuya Yoshioka, Xiaofei Wang, Dongmei Wang, Min Tang, Zirun Zhu, Zhuo Chen, Naoyuki Kanda
arxiv.org
 10 days ago

Continuous speech separation using a microphone array was shown to be promising in dealing with the speech overlap problem in natural conversation transcription. This paper proposes VarArray, an array-geometry-agnostic speech separation neural network model. The proposed model is applicable to any number

Daily Mail

TikTok star is shocked to discover Amazon has more than 3,000 recordings of her stored from her Echo speakers, as well as a list of her contacts and her LOCATION

A TikTok star was left shocked, after discovering Amazon had more than 3,000 recordings of her voice from an Echo speaker, including her location and contacts. The data privacy campaigner, who goes by the username @my.data.not.yours, asked Amazon to send all data it has on her, including from smart speakers.
BEHIND VIRAL VIDEOS
Business Insider

Facebook is working on AI tech that will monitor your every move

Facebook envisions a future where smartglasses "become as useful in everyday life as smartphones," the company said in a new blog post. In order to achieve that future, such devices will require powerful AI software that can read and respond to the world around the headset's user. And the only way to train AI to see and hear the world like humans do is for it to experience the world like we do: from a first-person perspective.
INTERNET
arxiv.org

Stable Prediction on Graphs with Agnostic Distribution Shift

Graph is a flexible and effective tool to represent complex structures in practice and graph neural networks (GNNs) have been shown to be effective on various graph tasks with randomly separated training and testing data. In real applications, however, the distribution of training graph might be different from that of the test one (e.g., users' interactions on the user-item training graph and their actual preference on items, i.e., testing environment, are known to have inconsistencies in recommender systems). Moreover, the distribution of test data is always agnostic when GNNs are trained. Hence, we are facing the agnostic distribution shift between training and testing on graph learning, which would lead to unstable inference of traditional GNNs across different test environments. To address this problem, we propose a novel stable prediction framework for GNNs, which permits both locally and globally stable learning and prediction on graphs. In particular, since each node is partially represented by its neighbors in GNNs, we propose to capture the stable properties for each node (locally stable) by re-weighting the information propagation/aggregation processes. For global stability, we propose a stable regularizer that reduces the training losses on heterogeneous environments and thus warping the GNNs to generalize well. We conduct extensive experiments on several graph benchmarks and a noisy industrial recommendation dataset that is collected from 5 consecutive days during a product promotion festival. The results demonstrate that our method outperforms various SOTA GNNs for stable prediction on graphs with agnostic distribution shift, including shift caused by node labels and attributes.
COMPUTERS
arxiv.org

Multi-channel Narrow-Band Deep Speech Separation with Full-band Permutation Invariant Training

This paper addresses the problem of multi-channel multi-speech separation based on deep learning techniques. In the short time Fourier transform domain, we propose an end-to-end narrow-band network that directly takes as input the multi-channel mixture signals of one frequency, and outputs the separated signals of this frequency. In narrow-band, the spatial information (or inter-channel difference) can well discriminate between speakers at different positions. This information is intensively used in many narrow-band speech separation methods, such as beamforming and clustering of spatial vectors. The proposed network is trained to learn a rule to automatically exploit this information and perform speech separation. Such a rule should be valid for any frequency, thence the network is shared by all frequencies. In addition, a full-band permutation invariant training criterion is proposed to solve the frequency permutation problem encountered by most narrow-band methods. Experiments show that, by focusing on deeply learning the narrow-band information, the proposed method outperforms the oracle beamforming method and the state-of-the-art deep learning based method.
IN THIS ARTICLE
#Agnostic#Vararray#Ami#Icassp#Machine Learning#Lg
TheConversationAU

Facebook wants AI to find your keys and understand your conversations

Facebook has announced a research project that aims to push the “frontier of first-person perception”, and in the process help you remember where your left your keys. The Ego4D project provides a huge collection of first-person video and related data, plus a set of challenges for researchers to teach computers to understand the data and gather useful information from it. In September, the social media giant launched a line of “smart glasses” called Ray-Ban Stories, which carry a digital camera and other features. Much like the Google Glass project, which met mixed reviews in 2013, this one has prompted complaints of...
SOFTWARE
arxiv.org

Platoon Formation in a Mixed Traffic Environment: A Model-Agnostic Optimal Control Approach

Coordination of connected and automated vehicles (CAVs) in mixed traffic environment poses significant challenges due to the presence of human-driven vehicles (HDVs) with stochastic dynamics and driving behavior. In earlier work, we addressed the problem of platoon formation of HDVs led by a CAV using a model-dependent, open-loop controller. In this paper, we develop a comprehensive model-agnostic, multi-objective optimal controller which ensures platoon formation of the trailing HDVs by directly controlling the leading CAV without having explicit knowledge of the HDV dynamics. We provide a detailed exposition of the control framework that uses instantaneous motion information from multiple successive HDVs to enforce safety while achieving the optimization objectives. To demonstrate the efficacy of the proposed control framework, we evaluate its performance using numerical simulation and provide associated sensitivity and robustness analysis.
CARS
arxiv.org

TCube: Domain-Agnostic Neural Time-series Narration

The task of generating rich and fluent narratives that aptly describe the characteristics, trends, and anomalies of time-series data is invaluable to the sciences (geology, meteorology, epidemiology) or finance (trades, stocks, or sales and inventory). The efforts for time-series narration hitherto are domain-specific and use predefined templates that offer consistency but lead to mechanical narratives. We present TCube (Time-series-to-text), a domain-agnostic neural framework for time-series narration, that couples the representation of essential time-series elements in the form of a dense knowledge graph and the translation of said knowledge graph into rich and fluent narratives through the transfer-learning capabilities of PLMs (Pre-trained Language Models). TCube's design primarily addresses the challenge that lies in building a neural framework in the complete paucity of annotated training data for time-series. The design incorporates knowledge graphs as an intermediary for the representation of essential time-series elements which can be linearized for textual translation. To the best of our knowledge, TCube is the first investigation of the use of neural strategies for time-series narration. Through extensive evaluations, we show that TCube can improve the lexical diversity of the generated narratives by up to 65.38% while still maintaining grammatical integrity. The practicality and deployability of TCube is further validated through an expert review (n=21) where 76.2% of participating experts wary of auto-generated narratives favored TCube as a deployable system for time-series narration due to its richer narratives. Our code-base, models, and datasets, with detailed instructions for reproducibility is publicly hosted at this https URL.
COMPUTERS
arxiv.org

Cones with convoluted geometry that always scatter or radiate

We investigate fixed energy scattering from conical potentials having an irregular cross-section. The incident wave can be any arbitrary non-trivial Herglotz wave. We show that a large number of such local conical scatterers scatter all incident waves, meaning that the far-field will always be non-zero. In essence there are no incident waves for which these potentials would seem transparent at any given energy. We show more specifically that there is a large collection of star-shaped cones whose local geometries always produce a scattered wave. In fact, except for a countable set, all cones from a family of deformations between a circular and a star-shaped cone will always scatter any non-trivial incident Herglotz wave. Our methods are based on the use of spherical harmonics and a deformation argument. We also investigate the related problem for sources. In particular if the support of the source is locally a thin cone, with an arbitrary cross-section, then it will produce a non-zero far-field.
PHYSICS
arxiv.org

Gromov-Hausdorff class: its completeness and cloud geometry

The paper is devoted to the study of the Gromov-Hausdorff proper class, consisting of all metric spaces considered up to isometry. In this class, a generalized Gromov-Hausdorff pseudometric is introduced and the geometry of the resulting space is investigated. The first main result is a proof of the completeness of the space, i.e., that all fundamental sequences converge in it. Then we partition the space into maximal proper subclasses consisting of spaces at a finite distance from each other. We call such subclasses clouds. A multiplicative similarity group operates on clouds, multiplying all the distances of each metric space by some positive number. We present examples of similarity mappings transferring some clouds into another ones. We also show that if a cloud contains a space that remains at zero distance from itself under action of all similarities, then such a cloud contracted to this space. In the final part, we investigate subsets of the real line with respect to their behavior under various similarities.
MATHEMATICS
arxiv.org

A dual-element, two-dimensional atom array with continuous-mode operation

Quantum processing architectures that include multiple qubit modalities offer compelling strategies for high-fidelity operations and readout, quantum error correction, and a path for scaling to large system sizes. Such hybrid architectures have been realized for leading platforms, including superconducting circuits and trapped ions. Recently, a new approach for constructing large, coherent quantum processors has emerged based on arrays of individually trapped neutral atoms. However, these demonstrations have been limited to arrays of a single atomic element where the identical nature of the atoms makes crosstalk-free control and non-demolition readout of a large number of atomic qubits challenging. Here we introduce a dual-element atom array with individual control of single rubidium and cesium atoms. We demonstrate their independent placement in arrays with up to 512 trapping sites and observe negligible crosstalk between the two elements. Furthermore, by continuously reloading one atomic element while maintaining an array of the other, we demonstrate a new continuous operation mode for atom arrays without any off-time. Our results enable avenues for ancilla-assisted quantum protocols such as quantum non-demolition measurements and quantum error correction, as well as continuously operating quantum processors and sensors.
COMPUTERS
arxiv.org

The Geometry of Memoryless Stochastic Policy Optimization in Infinite-Horizon POMDPs

We consider the problem of finding the best memoryless stochastic policy for an infinite-horizon partially observable Markov decision process (POMDP) with finite state and action spaces with respect to either the discounted or mean reward criterion. We show that the (discounted) state-action frequencies and the expected cumulative reward are rational functions of the policy, whereby the degree is determined by the degree of partial observability. We then describe the optimization problem as a linear optimization problem in the space of feasible state-action frequencies subject to polynomial constraints that we characterize explicitly. This allows us to address the combinatorial and geometric complexity of the optimization problem using recent tools from polynomial optimization. In particular, we demonstrate how the partial observability constraints can lead to multiple smooth and non-smooth local optimizers and we estimate the number of critical points.
MATHEMATICS
vmware.com

vRA 8.5 - Define Machine Properties by Input Array

I'm trying to define in a multi-machine Cloud Template per instance: OS, flavor and some boolean values. For this case I have a Input array, which is defined as following:. inputs: win: type: array title: Win2022 minItems: 0 maxItems: 10 items: type: object properties: flavor: type: string title: Flavor description: |- <b> Select the size of the deployment. </b> <br> Micro = 1 CPU - 1 GB RAM, <br> Small = 1 CPU - 2 GB RAM, <br> Medium = 2 CPU - 4 GB RAM, <br> Large = 2 CPU - 8 GB RAM <br> enum: - tiny - small - medium - large name1: type: string title: name chkOcr: type: boolean title: OCR allow default: false os: type: string title: OS default: '2019' enum: - '2016' - '2019' - '2022'
CODING & PROGRAMMING
arxiv.org

Model-Agnostic Meta-Attack: Towards Reliable Evaluation of Adversarial Robustness

The vulnerability of deep neural networks to adversarial examples has motivated an increasing number of defense strategies for promoting model robustness. However, the progress is usually hampered by insufficient robustness evaluations. As the de facto standard to evaluate adversarial robustness, adversarial attacks typically solve an optimization problem of crafting adversarial examples with an iterative process. In this work, we propose a Model-Agnostic Meta-Attack (MAMA) approach to discover stronger attack algorithms automatically. Our method learns the optimizer in adversarial attacks parameterized by a recurrent neural network, which is trained over a class of data samples and defenses to produce effective update directions during adversarial example generation. Furthermore, we develop a model-agnostic training algorithm to improve the generalization ability of the learned optimizer when attacking unseen defenses. Our approach can be flexibly incorporated with various attacks and consistently improves the performance with little extra computational cost. Extensive experiments demonstrate the effectiveness of the learned attacks by MAMA compared to the state-of-the-art attacks on different defenses, leading to a more reliable evaluation of adversarial robustness.
CODING & PROGRAMMING
The SOLIDWORKS Blog

Revisioning Customer Supplied Geometry on the Platform

For many industries, it’s quite common to receive customer supplied 3D models. Fixturing, dies, molds, & custom automation equipment, all make use of this practice. But what happens when your customer provides you with a new revision of the design? How do you capture it? Luckily, the Collaborative Designer for SOLIDWORKS role on the 3DEXPERIENCE Platform makes this easy. Let’s take a look.
COMPUTERS
arxiv.org

Conformal geometry and half-integrable spacetimes

Using a combination of techniques from conformal and complex geometry, we show the potentialization of 4-dimensional closed Einstein-Weyl structures which are half-algebraically special and admit a "half-integrable" almost-complex structure. That is, we reduce the Einstein-Weyl equations to a single, conformally invariant, non-linear scalar equation, that we call the "conformal HH equation", and we reconstruct the conformal structure (curvature and metric) from a solution to this equation. We show that the conformal metric is composed of: a conformally flat part, a conformally half-flat part related to certain "constants" of integration, and a potential part that encodes the full non-linear curvature, and that coincides in form with the Hertz potential from perturbation theory. We also study the potentialization of the Dirac-Weyl, Maxwell (with and without sources), and Yang-Mills systems. We show how to deal with the ordinary Einstein equations by using a simple trick. Our results give a conformally invariant, coordinate-free, generalization of the hyper-heavenly construction of Plebanski and collaborators.
MATHEMATICS
arxiv.org

Sum-of-Squares Geometry Processing

Geometry processing presents a variety of difficult numerical problems, each seeming to require its own tailored solution. This breadth is largely due to the expansive list of geometric primitives, e.g., splines, triangles, and hexahedra, joined with an ever-expanding variety of objectives one might want to achieve with them. With the recent increase in attention toward higher-order surfaces, we can expect a variety of challenges porting existing solutions that work on triangle meshes to work on these more complex geometry types. In this paper, we present a framework for solving many core geometry processing problems on higher-order surfaces. We achieve this goal through sum-of-squares optimization, which transforms nonlinear polynomial optimization problems into sequences of convex problems whose complexity is captured by a single degree parameter. This allows us to solve a suite of problems on higher-order surfaces, such as continuous collision detection and closest point queries on curved patches, with only minor changes between formulations and geometries.
MATHEMATICS
arxiv.org

Quantized Noncommutative Geometry from Multitrace Matrix Models

In this article the geometry of quantum gravity is quantized in the sense of being noncommutative (first quantization) but it is also quantized in the sense of being emergent (second quantization). A new mechanism for quantum geometry is proposed in which noncommutative geometry can emerge from "one-matrix multitrace scalar matrix models" by probing the statistical physics of commutative phases of matter. This is in contrast to the usual mechanism in which noncommutative geometry emerges from "many-matrix singletrace Yang-Mills matrix models" by probing the statistical physics of noncommutative phases of gauge theory. In this novel scenario quantized geometry emerges in the form of a transition between the two phase diagrams of the real quartic matrix model and the noncommutative scalar phi-four field theory. More precisely, emergence of the geometry is identified here with the emergence of the uniform-ordered phase and the corresponding commutative (Ising) and noncommutative (stripe) coexistence lines. The critical exponents and the Wigner's semicircle law are used to determine the dimension and the metric respectively. Arguments from the saddle point equation, from Monte Carlo simulation and from the matrix renormalization group equation are provided in support of this scenario.
MATHEMATICS
arxiv.org

Counting Objects by Diffused Index: geometry-free and training-free approach

Mengyi Tang (1), Maryam Yashtini (2), Sung Ha Kang (1) ((1) Georgia Institute of Technology, (2) Georgetown University ) Counting objects is a fundamental but challenging problem. In this paper, we propose diffusion-based, geometry-free, and learning-free methodologies to count the number of objects in images. The main idea is to represent each object by a unique index value regardless of its intensity or size, and to simply count the number of index values. First, we place different vectors, refer to as seed vectors, uniformly throughout the mask image. The mask image has boundary information of the objects to be counted. Secondly, the seeds are diffused using an edge-weighted harmonic variational optimization model within each object. We propose an efficient algorithm based on an operator splitting approach and alternating direction minimization method, and theoretical analysis of this algorithm is given. An optimal solution of the model is obtained when the distributed seeds are completely diffused such that there is a unique intensity within each object, which we refer to as an index. For computational efficiency, we stop the diffusion process before a full convergence, and propose to cluster these diffused index values. We refer to this approach as Counting Objects by Diffused Index (CODI). We explore scalar and multi-dimensional seed vectors. For Scalar seeds, we use Gaussian fitting in histogram to count, while for vector seeds, we exploit a high-dimensional clustering method for the final step of counting via clustering. The proposed method is flexible even if the boundary of the object is not clear nor fully enclosed. We present counting results in various applications such as biological cells, agriculture, concert crowd, and transportation. Some comparisons with existing methods are presented.
COMPUTERS
arxiv.org

Continual learning using lattice-free MMI for speech recognition

Continual learning (CL), or domain expansion, recently became a popular topic for automatic speech recognition (ASR) acoustic modeling because practical systems have to be updated frequently in order to work robustly on types of speech not observed during initial training. While sequential adaptation allows tuning a system to a new domain, it may result in performance degradation on the old domains due to catastrophic forgetting. In this work we explore regularization-based CL for neural network acoustic models trained with the lattice-free maximum mutual information (LF-MMI) criterion. We simulate domain expansion by incrementally adapting the acoustic model on different public datasets that include several accents and speaking styles. We investigate two well-known CL techniques, elastic weight consolidation (EWC) and learning without forgetting (LWF), which aim to reduce forgetting by preserving model weights or network outputs. We additionally introduce a sequence-level LWF regularization, which exploits posteriors from the denominator graph of LF-MMI to further reduce forgetting. Empirical results show that the proposed sequence-level LWF can improve the best average word error rate across all domains by up to 9.4% relative compared with using regular LWF.
COMPUTERS
arxiv.org

FedSpeech: Federated Text-to-Speech with Continual Learning

Federated learning enables collaborative training of machine learning models under strict privacy restrictions and federated text-to-speech aims to synthesize natural speech of multiple users with a few audio training samples stored in their devices locally. However, federated text-to-speech faces several challenges: very few training samples from each speaker are available, training samples are all stored in local device of each user, and global model is vulnerable to various attacks. In this paper, we propose a novel federated learning architecture based on continual learning approaches to overcome the difficulties above. Specifically, 1) we use gradual pruning masks to isolate parameters for preserving speakers' tones; 2) we apply selective masks for effectively reusing knowledge from tasks; 3) a private speaker embedding is introduced to keep users' privacy. Experiments on a reduced VCTK dataset demonstrate the effectiveness of FedSpeech: it nearly matches multi-task training in terms of multi-speaker speech quality; moreover, it sufficiently retains the speakers' tones and even outperforms the multi-task training in the speaker similarity experiment.
COMPUTERS

