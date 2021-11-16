ContributorsPublishersAdvertisers
Software

Robust 3D Scene Segmentation through Hierarchical and Learnable Part-Fusion

By Anirud Thyagharajan, Benjamin Ummenhofer, Prashant Laddha, Om J Omer, Sreenivas Subramoney
arxiv.org
 8 days ago

3D semantic segmentation is a fundamental building block for several scene understanding applications such as autonomous driving, robotics and AR/VR. Several state-of-the-art semantic segmentation models suffer from the part misclassification problem, wherein parts of the same object are labelled...

arxiv.org

arxiv.org

3D modelling of survey scene from images enhanced with a multi-exposure fusion

In current practice, scene survey is carried out by workers using total stations. The method has high accuracy, but it incurs high costs if continuous monitoring is needed. Techniques based on photogrammetry, with the relatively cheaper digital cameras, have gained wide applications in many fields. Besides point measurement, photogrammetry can also create a three-dimensional (3D) model of the scene. Accurate 3D model reconstruction depends on high quality images. Degraded images will result in large errors in the reconstructed 3D model. In this paper, we propose a method that can be used to improve the visibility of the images, and eventually reduce the errors of the 3D scene model. The idea is inspired by image dehazing. Each original image is first transformed into multiple exposure images by means of gamma-correction operations and adaptive histogram equalization. The transformed images are analyzed by the computation of the local binary patterns. The image is then enhanced, with each pixel generated from the set of transformed image pixels weighted by a function of the local pattern feature and image saturation. Performance evaluation has been performed on benchmark image dehazing datasets. Experimentations have been carried out on outdoor and indoor surveys. Our analysis finds that the method works on different types of degradation that exist in both outdoor and indoor images. When fed into the photogrammetry software, the enhanced images can reconstruct 3D scene models with sub-millimeter mean errors.
SOFTWARE
arxiv.org

Hierarchical Bayesian Bandits

Meta-, multi-task, and federated learning can be all viewed as solving similar tasks, drawn from an unknown distribution that reflects task similarities. In this work, we provide a unified view of all these problems, as learning to act in a hierarchical Bayesian bandit. We analyze a natural hierarchical Thompson sampling algorithm (hierTS) that can be applied to any problem in this class. Our regret bounds hold under many instances of such problems, including when the tasks are solved sequentially or in parallel; and capture the structure of the problems, such that the regret decreases with the width of the task prior. Our proofs rely on novel total variance decompositions, which can be applied to other graphical model structures. Finally, our theory is complemented by experiments, which show that the hierarchical structure helps with knowledge sharing among the tasks. This confirms that hierarchical Bayesian bandits are a universal and statistically-efficient tool for learning to act with similar bandit tasks.
SCIENCE
arxiv.org

Learnable Locality-Sensitive Hashing for Video Anomaly Detection

Video anomaly detection (VAD) mainly refers to identifying anomalous events that have not occurred in the training set where only normal samples are available. Existing works usually formulate VAD as a reconstruction or prediction problem. However, the adaptability and scalability of these methods are limited. In this paper, we propose a novel distance-based VAD method to take advantage of all the available normal data efficiently and flexibly. In our method, the smaller the distance between a testing sample and normal samples, the higher the probability that the testing sample is normal. Specifically, we propose to use locality-sensitive hashing (LSH) to map samples whose similarity exceeds a certain threshold into the same bucket in advance. In this manner, the complexity of near neighbor search is cut down significantly. To make the samples that are semantically similar get closer and samples not similar get further apart, we propose a novel learnable version of LSH that embeds LSH into a neural network and optimizes the hash functions with contrastive learning strategy. The proposed method is robust to data imbalance and can handle the large intra-class variations in normal data flexibly. Besides, it has a good ability of scalability. Extensive experiments demonstrate the superiority of our method, which achieves new state-of-the-art results on VAD benchmarks.
SOFTWARE
arxiv.org

Improving the robustness and accuracy of biomedical language models through adversarial training

Deep transformer neural network models have improved the predictive accuracy of intelligent text processing systems in the biomedical domain. They have obtained state-of-the-art performance scores on a wide variety of biomedical and clinical Natural Language Processing (NLP) benchmarks. However, the robustness and reliability of these models has been less explored so far. Neural NLP models can be easily fooled by adversarial samples, i.e. minor changes to input that preserve the meaning and understandability of the text but force the NLP system to make erroneous decisions. This raises serious concerns about the security and trust-worthiness of biomedical NLP systems, especially when they are intended to be deployed in real-world use cases. We investigated the robustness of several transformer neural language models, i.e. BioBERT, SciBERT, BioMed-RoBERTa, and Bio-ClinicalBERT, on a wide range of biomedical and clinical text processing tasks. We implemented various adversarial attack methods to test the NLP systems in different attack scenarios. Experimental results showed that the biomedical NLP models are sensitive to adversarial samples; their performance dropped in average by 21 and 18.9 absolute percent on character-level and word-level adversarial noise, respectively. Conducting extensive adversarial training experiments, we fine-tuned the NLP models on a mixture of clean samples and adversarial inputs. Results showed that adversarial training is an effective defense mechanism against adversarial noise; the models robustness improved in average by 11.3 absolute percent. In addition, the models performance on clean data increased in average by 2.4 absolute present, demonstrating that adversarial training can boost generalization abilities of biomedical NLP systems.
ENGINEERING
IN THIS ARTICLE
#Learnability#Segmentation#Ar Vr#Segment Fusion#Scannet
arxiv.org

Hierarchical Topometric Representation of 3D Robotic Maps

In this paper, we propose a method for generating a hierarchical, volumetric topological map from 3D point clouds. There are three basic hierarchical levels in our map: $storey - region - volume$. The advantages of our method are reflected in both input and output. In terms of input, we accept multi-storey point clouds and building structures with sloping roofs or ceilings. In terms of output, we can generate results with metric information of different dimensionality, that are suitable for different robotics applications. The algorithm generates the volumetric representation by generating $volumes$ from a 3D voxel occupancy map. We then add $passage$s (connections between $volumes$), combine small $volumes$ into a big $region$ and use a 2D segmentation method for better topological representation. We evaluate our method on several freely available datasets. The experiments highlight the advantages of our approach.
COMPUTERS
arxiv.org

The Hierarchical Subspace Iteration Method for Laplace--Beltrami Eigenproblems

Sparse eigenproblems are important for various applications in computer graphics. The spectrum and eigenfunctions of the Laplace--Beltrami operator, for example, are fundamental for methods in shape analysis and mesh processing. The Subspace Iteration Method is a robust solver for these problems. In practice, however, Lanczos schemes are often faster. In this paper, we introduce the Hierarchical Subspace Iteration Method (HSIM), a novel solver for sparse eigenproblems that operates on a hierarchy of nested vector spaces. The hierarchy is constructed such that on the coarsest space all eigenpairs can be computed with a dense eigensolver. HSIM uses these eigenpairs as initialization and iterates from coarse to fine over the hierarchy. On each level, subspace iterations, initialized with the solution from the previous level, are used to approximate the eigenpairs. This approach substantially reduces the number of iterations needed on the finest grid compared to the non-hierarchical Subspace Iteration Method. Our experiments show that HSIM can solve Laplace--Beltrami eigenproblems on meshes faster than state-of-the-art methods based on Lanczos iterations, preconditioned conjugate gradients and subspace iterations.
CODING & PROGRAMMING
The Associated Press

ExOne Announces Schunk Has Purchased an X1 25Pro® for the Production of Binder Jet 3D Printed Metal Parts as a Service

NORTH HUNTINGDON, Pa.--(BUSINESS WIRE)--Nov 11, 2021-- The ExOne Company (Nasdaq: XONE), the global leader in industrial sand and metal 3D printers using binder jetting technology, today announced that the Schunk Group, an international technology company featuring products made of high-tech materials, including sintered metals, has purchased an X1 25Pro large metal binder jetting system.
BUSINESS
arxiv.org

CamLiFlow: Bidirectional Camera-LiDAR Fusion for Joint Optical Flow and Scene Flow Estimation

In this paper, we study the problem of jointly estimating the optical flow and scene flow from synchronized 2D and 3D data. Previous methods either employ a complex pipeline which splits the joint task into independent stages, or fuse 2D and 3D information in an ``early-fusion'' or ``late-fusion'' manner. Such one-size-fits-all approaches suffer from a dilemma of failing to fully utilize the characteristic of each modality or to maximize the inter-modality complementarity. To address the problem, we propose a novel end-to-end framework, called CamLiFlow. It consists of 2D and 3D branches with multiple bidirectional connections between them in specific layers. Different from previous work, we apply a point-based 3D branch to better extract the geometric features and design a symmetric learnable operator to fuse dense image features and sparse point features. We also propose a transformation for point clouds to solve the non-linear issue of 3D-2D projection. Experiments show that CamLiFlow achieves better performance with fewer parameters. Our method ranks 1st on the KITTI Scene Flow benchmark, outperforming the previous art with 1/7 parameters. Code will be made available.
COMPUTERS
arxiv.org

Identity-Preserving Pose-Robust Face Hallucination Through Face Subspace Prior

Over the past few decades, numerous attempts have been made to address the problem of recovering a high-resolution (HR) facial image from its corresponding low-resolution (LR) counterpart, a task commonly referred to as face hallucination. Despite the impressive performance achieved by position-patch and deep learning-based methods, most of these techniques are still unable to recover identity-specific features of faces. The former group of algorithms often produces blurry and oversmoothed outputs particularly in the presence of higher levels of degradation, whereas the latter generates faces which sometimes by no means resemble the individuals in the input images. In this paper, a novel face super-resolution approach will be introduced, in which the hallucinated face is forced to lie in a subspace spanned by the available training faces. Therefore, in contrast to the majority of existing face hallucination techniques and thanks to this face subspace prior, the reconstruction is performed in favor of recovering person-specific facial features, rather than merely increasing image quantitative scores. Furthermore, inspired by recent advances in the area of 3D face reconstruction, an efficient 3D dictionary alignment scheme is also presented, through which the algorithm becomes capable of dealing with low-resolution faces taken in uncontrolled conditions. In extensive experiments carried out on several well-known face datasets, the proposed algorithm shows remarkable performance by generating detailed and close to ground truth results which outperform the state-of-the-art face hallucination algorithms by significant margins both in quantitative and qualitative evaluations.
SCIENCE
arxiv.org

Improving Semantic Image Segmentation via Label Fusion in Semantically Textured Meshes

Florian Fervers, Timo Breuer, Gregor Stachowiak, Sebastian Bullinger, Christoph Bodensteiner, Michael Arens. Models for semantic segmentation require a large amount of hand-labeled training data which is costly and time-consuming to produce. For this purpose, we present a label fusion framework that is capable of improving semantic pixel labels of video sequences in an unsupervised manner. We make use of a 3D mesh representation of the environment and fuse the predictions of different frames into a consistent representation using semantic mesh textures. Rendering the semantic mesh using the original intrinsic and extrinsic camera parameters yields a set of improved semantic segmentation images. Due to our optimized CUDA implementation, we are able to exploit the entire $c$-dimensional probability distribution of annotations over $c$ classes in an uncertainty-aware manner. We evaluate our method on the Scannet dataset where we improve annotations produced by the state-of-the-art segmentation network ESANet from $52.05 \%$ to $58.25 \%$ pixel accuracy. We publish the source code of our framework online to foster future research in this area (\url{this https URL}). To the best of our knowledge, this is the first publicly available label fusion framework for semantic image segmentation based on meshes with semantic textures.
SOFTWARE
arxiv.org

One-Pot 3D Printing of Robust Multimaterial Devices

Sijia Huang, Steven Adelmund, Pradip S. Pichumani, Johanna J. Schwartz, Yigit Menguc, Maxim Shusteff, Thomas J. Wallin. Polymer 3D printing is a broad set of manufacturing methods that permit the fabrication of complex architectures, and, as a result, numerous efforts focus on formulating processible chemistries that produce desirable material behavior in printed parts. However, current resin chemistries typically result in a single fixed set of properties once fully polymerized, a fact that poses significant engineering challenges to obtaining multimaterial devices. As an alternative to single-property materials, we introduce a ternary sequential reaction scheme that exhibits diverse multimaterial properties by profoundly altering the polymer microstructure from within a single resin composition. In this system, the photodosage during 3D printing sets both the shape and extent of conversion for each subsequent reaction. This different polymerization mechanisms of the subsequent stages yield disparate crosslink densities and viscoelastic properties. As a result, our materials possess Young's Moduli spanning over three orders of magnitude (400 kPa < E < 1.6 GPa) with smooth transitions between soft and stiff regions. We successfully pattern a 500x change in modulus in under a millimeter while the sequential assembly of our polymer networks ensures robust interfaces and enhances toughness by 10x compared to the single property materials. Most importantly, the final objects remain stable to UV and thermal aging, a key limitation to applications of previous multimaterial chemistries. We demonstrate the ability to 3D print intricate multimaterial architectures by fabricating a soft, wearable braille display.
TECHNOLOGY
arxiv.org

4D Segmentation Algorithm with application to 3D+time Image Segmentation

In this paper, we introduce and study a novel segmentation method for 4D images based on surface evolution governed by a nonlinear partial differential equation, the generalized subjective surface equation. The new method uses 4D digital image information and information from a thresholded 4D image in a local neighborhood. Thus, the 4D image segmentation is accomplished by defining the edge detector function's input as the weighted sum of the norm of gradients of presmoothed 4D image and norm of presmoothed thresholded 4D image in a local neighborhood. Additionally, we design and study a numerical method based on the finite volume approach for solving the new model. The reduced diamond cell approach is used for approximating the gradient of the solution. We use a semi-implicit finite volume scheme for the numerical discretization and show that our numerical scheme is unconditionally stable. The new 4D method was tested on artificial data and applied to real data representing 3D+time microscopy images of cell nuclei within the zebrafish pectoral fin and hind-brain. In a real application, processing 3D+time microscopy images amounts to solving a linear system with several billion unknowns and requires over $1000$ GB of memory; thus, it may not be possible to process these images on a serial machine without parallel implementation utilizing the MPI. Consequently, we develop and present in the paper OpenMP and MPI parallel implementation of designed algorithms. Finally, we include cell tracking results to show how our new method serves as a basis for finding trajectories of cells during embryogenesis.
COMPUTERS
arxiv.org

Hierarchical Graph Networks for 3D Human Pose Estimation

Recent 2D-to-3D human pose estimation works tend to utilize the graph structure formed by the topology of the human skeleton. However, we argue that this skeletal topology is too sparse to reflect the body structure and suffer from serious 2D-to-3D ambiguity problem. To overcome these weaknesses, we propose a novel graph convolution network architecture, Hierarchical Graph Networks (HGN). It is based on denser graph topology generated by our multi-scale graph structure building strategy, thus providing more delicate geometric information. The proposed architecture contains three sparse-to-fine representation subnetworks organized in parallel, in which multi-scale graph-structured features are processed and exchange information through a novel feature fusion strategy, leading to rich hierarchical representations. We also introduce a 3D coarse mesh constraint to further boost detail-related feature learning. Extensive experiments demonstrate that our HGN achieves the state-of-the art performance with reduced network parameters.
SCIENCE
arxiv.org

Hierarchical Knowledge Distillation for Dialogue Sequence Labeling

Shota Orihashi, Yoshihiro Yamazaki, Naoki Makishima, Mana Ihori, Akihiko Takashima, Tomohiro Tanaka, Ryo Masumura. This paper presents a novel knowledge distillation method for dialogue sequence labeling. Dialogue sequence labeling is a supervised learning task that estimates labels for each utterance in the target dialogue document, and is useful for many applications such as dialogue act estimation. Accurate labeling is often realized by a hierarchically-structured large model consisting of utterance-level and dialogue-level networks that capture the contexts within an utterance and between utterances, respectively. However, due to its large model size, such a model cannot be deployed on resource-constrained devices. To overcome this difficulty, we focus on knowledge distillation which trains a small model by distilling the knowledge of a large and high performance teacher model. Our key idea is to distill the knowledge while keeping the complex contexts captured by the teacher model. To this end, the proposed method, hierarchical knowledge distillation, trains the small model by distilling not only the probability distribution of the label classification, but also the knowledge of utterance-level and dialogue-level contexts trained in the teacher model by training the model to mimic the teacher model's output in each level. Experiments on dialogue act estimation and call scene segmentation demonstrate the effectiveness of the proposed method.
SCIENCE
arxiv.org

Learnable Structural Semantic Readout for Graph Classification

With the great success of deep learning in various domains, graph neural networks (GNNs) also become a dominant approach to graph classification. By the help of a global readout operation that simply aggregates all node (or node-cluster) representations, existing GNN classifiers obtain a graph-level representation of an input graph and predict its class label using the representation. However, such global aggregation does not consider the structural information of each node, which results in information loss on the global structure. Particularly, it limits the discrimination power by enforcing the same weight parameters of the classifier for all the node representations; in practice, each of them contributes to target classes differently depending on its structural semantic. In this work, we propose structural semantic readout (SSRead) to summarize the node representations at the position-level, which allows to model the position-specific weight parameters for classification as well as to effectively capture the graph semantic relevant to the global structure. Given an input graph, SSRead aims to identify structurally-meaningful positions by using the semantic alignment between its nodes and structural prototypes, which encode the prototypical features of each position. The structural prototypes are optimized to minimize the alignment cost for all training graphs, while the other GNN parameters are trained to predict the class labels. Our experimental results demonstrate that SSRead significantly improves the classification performance and interpretability of GNN classifiers while being compatible with a variety of aggregation functions, GNN architectures, and learning frameworks.
CODING & PROGRAMMING
arxiv.org

Nucleation in Sessile Saline Microdroplets: Induction Time Measurement via Deliquescence-Recrystallization Cycling

Ruel Cedeno, Romain Grossier, Mehdi Lagaize (CINaM), David Nerini (MIO), Nadine Candoni (AMU), A. E. Flood, Stéphane Veesler (CINaM) Induction time, a measure of how long one will wait for nucleation to occur, is an important parameter in quantifying nucleation kinetics and its underlying mechanisms. Due to the stochastic nature of nucleation, efficient methods for measuring large number of independent induction times are needed to ensure statistical reproducibility. In this work, we present a novel approach for measuring and analyzing induction times in sessile arrays of microdroplets via deliquescence/recrystallization cycling. With the help of a recently developed image analysis protocol, we show that the interfering diffusion-mediated interactions between microdroplets can be eliminated by controlling the relative humidity, thereby ensuring independent nucleation events. Moreover, possible influence of heterogeneities, impurities, and memory effect appear negligible as suggested by our 2-cycle experiment. Further statistical analysis (k-sample Anderson-Darling test) reveals that upon identifying possible outliers, the dimensionless induction times obtained from different datasets (microdroplet lines) obey the same distribution and thus can be pooled together to form a much larger dataset. The pooled dataset showed an excellent fit with the Weibull function, giving a mean supersaturation at nucleation of 1.61 and 1.85 for the 60pL and 4pL microdroplet respectively. This confirms the effect of confinement where smaller systems require higher supersaturations to nucleate. Both the experimental method and the data-treatment procedure presented herein offer promising routes in the study of fundamental aspects of nucleation kinetics, particularly confinement effects, and are adaptable to other salts, pharmaceuticals, or biological crystals of interest.
SCIENCE
arxiv.org

Quantum process tomography of adiabatic and superadiabatic stimulated Raman passage

Quantum control methods for three-level systems have become recently an important direction of research in quantum information science and technology. Here we present numerical simulations using realistic experimental parameters for quantum process tomography in STIRAP (stimulated Raman adiabatic passage) and saSTIRAP (superadiabatic STIRAP). Specifically, we identify a suitable basis in the operator space as the identity operator together with the 8 Gell-Mann operators, and we calculate the corresponding process matrices, which have $9\times 9=81$ elements. We discuss these results for the ideal decoherence-free case, as well as for the experimentally-relevant case with decoherence included.
SCIENCE
arxiv.org

Trimming Stability Selection increases variable selection robustness

Contamination can severely distort an estimator unless the estimation procedure is suitably robust. This is a well-known issue and has been addressed in Robust Statistics, however, the relation of contamination and distorted variable selection has been rarely considered in literature. As for variable selection, many methods for sparse model selection have been proposed, including Stability Selection which is a meta-algorithm based on some variable selection algorithm in order to immunize against particular data configurations. We introduce the variable selection breakdown point that quantifies the number of cases resp. cells that have to be contaminated in order to let no relevant variable be detected. We show that particular outlier configurations can completely mislead model selection and argue why even cell-wise robust methods cannot fix this problem. We combine the variable selection breakdown point with resampling, resulting in the Stability Selection breakdown point that quantifies the robustness of Stability Selection. We propose a trimmed Stability Selection which only aggregates the models with the lowest in-sample losses so that, heuristically, models computed on heavily contaminated resamples should be trimmed away. We provide a short simulation study that reveals both the potential of our approach as well as the fragility of variable selection, even for an extremely small cell-wise contamination rate.
SCIENCE
arxiv.org

Modeling ultrafast demagnetization and spin transport: the interplay of spin-polarized electrons and thermal magnons

We theoretically investigate laser-induced spin transport in metallic magnetic heterostructures using an effective spin transport description that treats itinerant electrons and thermal magnons on an equal footing. Electron-magnon scattering is included and taken as the driving force for ultrafast demagnetization. We assume that in the low-fluence limit the magnon system remains in a quasi-equilibrium, allowing a transient nonzero magnon chemical potential. In combination with the diffusive transport equations for the itinerant electrons, the description is used to chart the full spin dynamics within the heterostructure. In agreement with recent experiments, we find that in case the spin-current-receiving material includes an efficient spin dissipation channel, the interfacial spin current becomes directly proportional to the temporal derivative of the magnetization. Based on an analytical calculation, we discuss that other relations between the spin current and magnetization may arise in case the spin-current-receiving material displays inefficient spin-flip scattering. Finally, we discuss the role of (interfacial) magnon transport and show that, a priori, it cannot be neglected. However, its significance strongly depends on the system parameters.
PHYSICS

