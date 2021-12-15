ContributorsPublishersAdvertisers
Apogee to Apogee Path Sampler

By Chris Sherlock, Szymon Urbas, Matthew Ludkin
 4 days ago

Amongst Markov chain Monte Carlo algorithms, Hamiltonian Monte Carlo (HMC) is often the algorithm of choice for complex, high-dimensional target distributions; however, its efficiency is notoriously sensitive to the choice of the integration-time tuning parameter, $T$. When...

Beta News

A path towards end-to-end automation

Customer service bots, digital payroll workflows, fleet management and lest we forget the clever programmer who famously automated his own job -- the uses for and adoption of automation continues to expand. In fact, the automation market is expected to reach $136.5 billion by 2026 with a five-year compound annual growth rate (CAGR) of 3.7 percent according to Industry ARC’s Automation Market Report.
Reconfiguring Shortest Paths in Graphs

Reconfiguring two shortest paths in a graph means modifying one shortest path to the other by changing one vertex at a time so that all the intermediate paths are also shortest paths. This problem has several natural applications, namely: (a) revamping road networks, (b) rerouting data packets in synchronous multiprocessing setting, (c) the shipping container stowage problem, and (d) the train marshalling problem.
Learning to Prompt for Continual Learning

Zifeng Wang, Zizhao Zhang, Chen-Yu Lee, Han Zhang, Ruoxi Sun, Xiaoqi Ren, Guolong Su, Vincent Perot, Jennifer Dy, Tomas Pfister. The mainstream paradigm behind continual learning has been to adapt the model parameters to non-stationary data distributions, where catastrophic forgetting is the central challenge. Typical methods rely on a rehearsal buffer or known task identity at test time to retrieve learned knowledge and address forgetting, while this work presents a new paradigm for continual learning that aims to train a more succinct memory system without accessing task identity at test time. Our method learns to dynamically prompt (L2P) a pre-trained model to learn tasks sequentially under different task transitions. In our proposed framework, prompts are small learnable parameters, which are maintained in a memory space. The objective is to optimize prompts to instruct the model prediction and explicitly manage task-invariant and task-specific knowledge while maintaining model plasticity. We conduct comprehensive experiments under popular image classification benchmarks with different challenging continual learning settings, where L2P consistently outperforms prior state-of-the-art methods. Surprisingly, L2P achieves competitive results against rehearsal-based methods even without a rehearsal buffer and is directly applicable to challenging task-agnostic continual learning. Source code is available at this https URL.
Saliency Grafting: Innocuous Attribution-Guided Mixup with Calibrated Label Mixing

The Mixup scheme suggests mixing a pair of samples to create an augmented training sample and has gained considerable attention recently for improving the generalizability of neural networks. A straightforward and widely used extension of Mixup is to combine with regional dropout-like methods: removing random patches from a sample and replacing it with the features from another sample. Albeit their simplicity and effectiveness, these methods are prone to create harmful samples due to their randomness. To address this issue, 'maximum saliency' strategies were recently proposed: they select only the most informative features to prevent such a phenomenon. However, they now suffer from lack of sample diversification as they always deterministically select regions with maximum saliency, injecting bias into the augmented data. In this paper, we present, a novel, yet simple Mixup-variant that captures the best of both worlds. Our idea is two-fold. By stochastically sampling the features and 'grafting' them onto another sample, our method effectively generates diverse yet meaningful samples. Its second ingredient is to produce the label of the grafted sample by mixing the labels in a saliency-calibrated fashion, which rectifies supervision misguidance introduced by the random sampling procedure. Our experiments under CIFAR, Tiny-ImageNet, and ImageNet datasets show that our scheme outperforms the current state-of-the-art augmentation strategies not only in terms of classification accuracy, but is also superior in coping under stress conditions such as data corruption and object occlusion.
Characterizing and addressing the issue of oversmoothing in neural autoregressive sequence modeling

Neural autoregressive sequence models smear the probability among many possible sequences including degenerate ones, such as empty or repetitive sequences. In this work, we tackle one specific case where the model assigns a high probability to unreasonably short sequences. We define the oversmoothing rate to quantify this issue. After confirming the high degree of oversmoothing in neural machine translation, we propose to explicitly minimize the oversmoothing rate during training. We conduct a set of experiments to study the effect of the proposed regularization on both model distribution and decoding performance. We use a neural machine translation task as the testbed and consider three different datasets of varying size. Our experiments reveal three major findings. First, we can control the oversmoothing rate of the model by tuning the strength of the regularization. Second, by enhancing the oversmoothing loss contribution, the probability and the rank of <eos> token decrease heavily at positions where it is not supposed to be. Third, the proposed regularization impacts the outcome of beam search especially when a large beam is used. The degradation of translation quality (measured in BLEU) with a large beam significantly lessens with lower oversmoothing rate, but the degradation compared to smaller beam sizes remains to exist. From these observations, we conclude that the high degree of oversmoothing is the main reason behind the degenerate case of overly probable short sequences in a neural autoregressive model.
Discrete- and continuous-time random walks in 1D Lévy random medium

A Lévy random medium, in a given space, is a random point process where the distances between points, a.k.a. targets, are long-tailed. Random walks visiting the targets of a Lévy random medium have been used to model many (physical, ecological, social) phenomena that exhibit superdiffusion as the result of interactions between an agent and a sparse, complex environment. In this note we consider the simplest non-trivial Lévy random medium, a sequence of points in the real line with i.i.d. long-tailed distances between consecutive targets. A popular example of a continuous-time random walk in this medium is the so-called Lévy-Lorentz gas. We give an account of a number of recent theorems on generalizations and variations of such model, in discrete and continuous time.
Distributed neural network control with dependability guarantees: a compositional port-Hamiltonian approach

Large-scale cyber-physical systems require that control policies are distributed, that is, that they only rely on local real-time measurements and communication with neighboring agents. Optimal Distributed Control (ODC) problems are, however, highly intractable even in seemingly simple cases. Recent work has thus proposed training Neural Network (NN) distributed controllers. A main challenge of NN controllers is that they are not dependable during and after training, that is, the closed-loop system may be unstable, and the training may fail due to vanishing and exploding gradients. In this paper, we address these issues for networks of nonlinear port-Hamiltonian (pH) systems, whose modeling power ranges from energy systems to non-holonomic vehicles and chemical reactions. Specifically, we embrace the compositional properties of pH systems to characterize deep Hamiltonian control policies with built-in closed-loop stability guarantees, irrespective of the interconnection topology and the chosen NN parameters. Furthermore, our setup enables leveraging recent results on well-behaved neural ODEs to prevent the phenomenon of vanishing gradients by design. Numerical experiments corroborate the dependability of the proposed architecture, while matching the performance of general neural network policies.
BayesFlow can reliably detect Model Misspecification and Posterior Errors in Amortized Bayesian Inference

Neural density estimators have proven remarkably powerful in performing efficient simulation-based Bayesian inference in various research domains. In particular, the BayesFlow framework uses a two-step approach to enable amortized parameter estimation in settings where the likelihood function is implicitly defined by a simulation program. But how faithful is such inference when simulations are poor representations of reality? In this paper, we conceptualize the types of model misspecification arising in simulation-based inference and systematically investigate the performance of the BayesFlow framework under these misspecifications. We propose an augmented optimization objective which imposes a probabilistic structure on the latent data space and utilize maximum mean discrepancy (MMD) to detect potentially catastrophic misspecifications during inference undermining the validity of the obtained results. We verify our detection criterion on a number of artificial and realistic misspecifications, ranging from toy conjugate models to complex models of decision making and disease outbreak dynamics applied to real data. Further, we show that posterior inference errors increase as a function of the distance between the true data-generating distribution and the typical set of simulations in the latent summary space. Thus, we demonstrate the dual utility of MMD as a method for detecting model misspecification and as a proxy for verifying the faithfulness of amortized Bayesian inference.
UMAD: Universal Model Adaptation under Domain and Category Shift

Learning to reject unknown samples (not present in the source classes) in the target domain is fairly important for unsupervised domain adaptation (UDA). There exist two typical UDA scenarios, i.e., open-set, and open-partial-set, and the latter assumes that not all source classes appear in the target domain. However, most prior methods are designed for one UDA scenario and always perform badly on the other UDA scenario. Moreover, they also require the labeled source data during adaptation, limiting their usability in data privacy-sensitive applications. To address these issues, this paper proposes a Universal Model ADaptation (UMAD) framework which handles both UDA scenarios without access to the source data nor prior knowledge about the category shift between domains. Specifically, we aim to learn a source model with an elegantly designed two-head classifier and provide it to the target domain. During adaptation, we develop an informative consistency score to help distinguish unknown samples from known samples. To achieve bilateral adaptation in the target domain, we further maximize localized mutual information to align known samples with the source classifier and employ an entropic loss to push unknown samples far away from the source classification boundary, respectively. Experiments on open-set and open-partial-set UDA scenarios demonstrate that UMAD, as a unified approach without access to source data, exhibits comparable, if not superior, performance to state-of-the-art data-dependent methods.
General theory of Josephson Diodes

Motivated by recent progress in the superconductivity nonreciprocal phenomena, we study the general theory of Josephson diodes. The central ingredient for Josephson diodes is the asymmetric proximity process inside the tunneling barrier. From the symmetry breaking point of view, there are two types of Josephson diodes, inversion breaking and time-reversal breaking. For the inversion breaking case, applying voltage bias could effectively tune the proximity process like the voltage-dependent Rashba coupling giving rises to $I_c(V)\neq I_c(-V)$. For the time-reversal breaking case, the magnetic field and current flow could adjust the internal time-reversal breaking field like magnetism or time-reversal breaking electron-electron pairing, which leads to $I_c(B)\neq I_c(-B)$ or $I_{c+}\neq I_{c-}$. All these results provide a complete understanding and the general principles of realizing Josephson diodes, especially the recently found NbSe$_2$/Nb$_3$Br$_8$/NbSe$_2$ Josephson diodes.
Conjugate dualities for relative smoothness and strong convexity under the light of generalized convexity

Relative Bregman smoothness and strong convexity have recently gained considerable attention in optimization. However, conjugate dualities for Bregman smoothness and strong convexity remain an open problem as noted earlier by Lu, Freund, and Nesterov, Relatively smooth convex optimization by first-order methods, and applications, SIAM Journal on Optimization, 28(1):333-354, 2018. In this paper we address this question by introducing the notions of relative anisotropic strong convexity and smoothness as the respective dual counterparts of Bregman smoothness and strong convexity. In essence, the duality holds between tilt- and shift-parametrized families of upper and lower bounds and can thus be examined under the light of generalized convexity. In the Euclidean case this specializes to the well-known conjugate duality between Lipschitz smoothness and strong convexity. The two notions here introduced can be thought of as anisotropic generalizations of the well-known descent lemma and the strong convexity subgradient inequality. Alternatively, in the context of generalized convexity these characterizations can be interpreted as generalized subgradient inequalities. In the Euclidean case, the class of strongly convex functions can be described in terms of pointwise maxima over quadratics with uniform curvature. Surprisingly, in contrast, the class of anisotropically strongly convex functions, in general, only forms a proper subset of the corresponding class of pointwise maxima, unless a certain saddle-point property holds. Aside from the Euclidean case, this saddle-point property is shown to hold automatically in the one-dimensional or the essentially smooth case.
Forbes

Migration To The Cloud: The Path To Least Resistance

Chetan Mathur is CEO of Next Pathway, the Automated Cloud Migration company. Moving your data and workloads to the cloud is an imperative. Delaying may harm your company’s operational efficiency, while doing serious damage to your competitive edge. It’s understandable why companies are resistant. Migration to the cloud is...
SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense Reasoning

Zhecan Wang, Haoxuan You, Liunian Harold Li, Alireza Zareian, Suji Park, Yiqing Liang, Kai-Wei Chang, Shih-Fu Chang. Answering complex questions about images is an ambitious goal for machine intelligence, which requires a joint understanding of images, text, and commonsense knowledge, as well as a strong reasoning ability. Recently, multimodal Transformers have made great progress in the task of Visual Commonsense Reasoning (VCR), by jointly understanding visual objects and text tokens through layers of cross-modality attention. However, these approaches do not utilize the rich structure of the scene and the interactions between objects which are essential in answering complex commonsense questions. We propose a Scene Graph Enhanced Image-Text Learning (SGEITL) framework to incorporate visual scene graphs in commonsense reasoning. To exploit the scene graph structure, at the model structure level, we propose a multihop graph transformer for regularizing attention interaction among hops. As for pre-training, a scene-graph-aware pre-training method is proposed to leverage structure knowledge extracted in the visual scene graph. Moreover, we introduce a method to train and generate domain-relevant visual scene graphs using textual annotations in a weakly-supervised manner. Extensive experiments on VCR and other tasks show a significant performance boost compared with the state-of-the-art methods and prove the efficacy of each proposed component.
A Heterogeneous Graph Learning Model for Cyber-Attack Detection

A cyber-attack is a malicious attempt by experienced hackers to breach the target information system. Usually, the cyber-attacks are characterized as hybrid TTPs (Tactics, Techniques, and Procedures) and long-term adversarial behaviors, making the traditional intrusion detection methods ineffective. Most existing cyber-attack detection systems are implemented based on manually designed rules by referring to domain knowledge (e.g., threat models, threat intelligences). However, this process is lack of intelligence and generalization ability. Aiming at this limitation, this paper proposes an intelligent cyber-attack detection method based on provenance data. To effective and efficient detect cyber-attacks from a huge number of system events in the provenance data, we firstly model the provenance data by a heterogeneous graph to capture the rich context information of each system entities (e.g., process, file, socket, etc.), and learns a semantic vector representation for each system entity. Then, we perform online cyber-attack detection by sampling a small and compact local graph from the heterogeneous graph, and classifying the key system entities as malicious or benign. We conducted a series of experiments on two provenance datasets with real cyber-attacks. The experiment results show that the proposed method outperforms other learning based detection models, and has competitive performance against state-of-the-art rule based cyber-attack detection systems.
Solving Inverse Problems with NerfGANs

We introduce a novel framework for solving inverse problems using NeRF-style generative models. We are interested in the problem of 3-D scene reconstruction given a single 2-D image and known camera parameters. We show that naively optimizing the latent space leads to artifacts and poor novel view rendering. We attribute this problem to volume obstructions that are clear in the 3-D geometry and become visible in the renderings of novel views. We propose a novel radiance field regularization method to obtain better 3-D surfaces and improved novel views given single view observations. Our method naturally extends to general inverse problems including inpainting where one observes only partially a single view. We experimentally evaluate our method, achieving visual improvements and performance boosts over the baselines in a wide range of tasks. Our method achieves $30-40\%$ MSE reduction and $15-25\%$ reduction in LPIPS loss compared to the previous state of the art.
RegionCLIP: Region-based Language-Image Pretraining

Yiwu Zhong, Jianwei Yang, Pengchuan Zhang, Chunyuan Li, Noel Codella, Liunian Harold Li, Luowei Zhou, Xiyang Dai, Lu Yuan, Yin Li, Jianfeng Gao. Contrastive language-image pretraining (CLIP) using image-text pairs has achieved impressive results on image classification in both zero-shot and transfer learning settings. However, we show that directly applying such models to recognize image regions for object detection leads to poor performance due to a domain shift: CLIP was trained to match an image as a whole to a text description, without capturing the fine-grained alignment between image regions and text spans. To mitigate this issue, we propose a new method called RegionCLIP that significantly extends CLIP to learn region-level visual representations, thus enabling fine-grained alignment between image regions and textual concepts. Our method leverages a CLIP model to match image regions with template captions and then pretrains our model to align these region-text pairs in the feature space. When transferring our pretrained model to the open-vocabulary object detection tasks, our method significantly outperforms the state of the art by 3.8 AP50 and 2.2 AP for novel categories on COCO and LVIS datasets, respectively. Moreoever, the learned region representations support zero-shot inference for object detection, showing promising results on both COCO and LVIS datasets. Our code is available at this https URL.
Stable Long-Term Recurrent Video Super-Resolution

Recurrent models have gained popularity in deep learning (DL) based video super-resolution (VSR), due to their increased computational efficiency, temporal receptive field and temporal consistency compared to sliding-window based models. However, when inferring on long video sequences presenting low motion (i.e. in which some parts of the scene barely move), recurrent models diverge through recurrent processing, generating high frequency artifacts. To the best of our knowledge, no study about VSR pointed out this instability problem, which can be critical for some real-world applications. Video surveillance is a typical example where such artifacts would occur, as both the camera and the scene stay static for a long time.
A Globally Convergent Distributed Jacobi Scheme for Block-Structured Nonconvex Constrained Optimization Problems

Motivated by the increasing availability of high-performance parallel computing, we design a distributed parallel algorithm for linearly-coupled block-structured nonconvex constrained optimization problems. Our algorithm performs Jacobi-type proximal updates of the augmented Lagrangian function, requiring only local solutions of separable block nonlinear programming (NLP) problems. We provide a cheap and explicitly computable Lyapunov function that allows us to establish global and local sublinear convergence of our algorithm, its iteration complexity, as well as simple, practical and theoretically convergent rules for automatically tuning its parameters. This in contrast to existing algorithms for nonconvex constrained optimization based on the alternating direction method of multipliers that rely on at least one of the following: Gauss-Seidel or sequential updates, global solutions of NLP problems, non-computable Lyapunov functions, and hand-tuning of parameters. Numerical experiments showcase its advantages for large-scale problems, including the multi-period optimization of a 9000-bus AC optimal power flow test case over 168 time periods, solved on the Summit supercomputer using an open-source Julia code.
On the accuracy and performance of the lattice Boltzmann method with 64-bit, 32-bit and novel 16-bit number formats

Fluid dynamics simulations with the lattice Boltzmann method (LBM) are very memory-intensive. Alongside reduction in memory footprint, significant performance benefits can be achieved by using FP32 (single) precision compared to FP64 (double) precision, especially on GPUs. Here, we evaluate the possibility to use even FP16 and Posit16 (half) precision for storing fluid populations, while still carrying arithmetic operations in FP32. For this, we first show that the commonly occurring number range in the LBM is a lot smaller than the FP16 number range. Based on this observation, we develop novel 16-bit formats - based on a modified IEEE-754 and on a modified Posit standard - that are specifically tailored to the needs of the LBM. We then carry out an in-depth characterization of LBM accuracy for six different test systems with increasing complexity: Poiseuille flow, Taylor-Green vortices, Karman vortex streets, lid-driven cavity, a microcapsule in shear flow (utilizing the immersed-boundary method) and finally the impact of a raindrop (based on a Volume-of-Fluid approach). We find that the difference in accuracy between FP64 and FP32 is negligible in almost all cases, and that for a large number of cases even 16-bit is sufficient. Finally, we provide a detailed performance analysis of all precision levels on a large number of hardware microarchitectures and show that significant speedup is achieved with mixed FP32/16-bit.
