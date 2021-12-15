ContributorsPublishersAdvertisers
Science

From Noise to Feature: Exploiting Intensity Distribution as a Novel Soft Biometric Trait for Finger Vein Recognition

By Wenxiong Kang, Yuting Lu, Dejian Li, Wei Jia
arxiv.org
 4 days ago

Most finger vein feature extraction algorithms achieve satisfactory performance due to their texture representation abilities, despite simultaneously ignoring the intensity distribution that is formed by the finger tissue, and in some cases, processing it as background noise. In this paper, we...

arxiv.org

Comments / 0

Related
arxiv.org

Learning to Prompt for Continual Learning

Zifeng Wang, Zizhao Zhang, Chen-Yu Lee, Han Zhang, Ruoxi Sun, Xiaoqi Ren, Guolong Su, Vincent Perot, Jennifer Dy, Tomas Pfister. The mainstream paradigm behind continual learning has been to adapt the model parameters to non-stationary data distributions, where catastrophic forgetting is the central challenge. Typical methods rely on a rehearsal buffer or known task identity at test time to retrieve learned knowledge and address forgetting, while this work presents a new paradigm for continual learning that aims to train a more succinct memory system without accessing task identity at test time. Our method learns to dynamically prompt (L2P) a pre-trained model to learn tasks sequentially under different task transitions. In our proposed framework, prompts are small learnable parameters, which are maintained in a memory space. The objective is to optimize prompts to instruct the model prediction and explicitly manage task-invariant and task-specific knowledge while maintaining model plasticity. We conduct comprehensive experiments under popular image classification benchmarks with different challenging continual learning settings, where L2P consistently outperforms prior state-of-the-art methods. Surprisingly, L2P achieves competitive results against rehearsal-based methods even without a rehearsal buffer and is directly applicable to challenging task-agnostic continual learning. Source code is available at this https URL.
arxiv.org

Roominoes: Generating Novel 3D Floor Plans From Existing 3D Rooms

Kai Wang, Xianghao Xu, Leon Lei, Selena Ling, Natalie Lindsay, Angel X. Chang, Manolis Savva, Daniel Ritchie. Realistic 3D indoor scene datasets have enabled significant recent progress in computer vision, scene understanding, autonomous navigation, and 3D reconstruction. But the scale, diversity, and customizability of existing datasets is limited, and it is time-consuming and expensive to scan and annotate more. Fortunately, combinatorics is on our side: there are enough individual rooms in existing 3D scene datasets, if there was but a way to recombine them into new layouts. In this paper, we propose the task of generating novel 3D floor plans from existing 3D rooms. We identify three sub-tasks of this problem: generation of 2D layout, retrieval of compatible 3D rooms, and deformation of 3D rooms to fit the layout. We then discuss different strategies for solving the problem, and design two representative pipelines: one uses available 2D floor plans to guide selection and deformation of 3D rooms; the other learns to retrieve a set of compatible 3D rooms and combine them into novel layouts. We design a set of metrics that evaluate the generated results with respect to each of the three subtasks and show that different methods trade off performance on these subtasks. Finally, we survey downstream tasks that benefit from generated 3D scenes and discuss strategies in selecting the methods most appropriate for the demands of these tasks.
DANIEL RITCHIE
arxiv.org

Noise Distribution Adaptive Self-Supervised Image Denoising using Tweedie Distribution and Score Matching

Tweedie distributions are a special case of exponential dispersion models, which are often used in classical statistics as distributions for generalized linear models. Here, we reveal that Tweedie distributions also play key roles in modern deep learning era, leading to a distribution independent self-supervised image denoising formula without clean reference images. Specifically, by combining with the recent Noise2Score self-supervised image denoising approach and the saddle point approximation of Tweedie distribution, we can provide a general closed-form denoising formula that can be used for large classes of noise distributions without ever knowing the underlying noise distribution. Similar to the original Noise2Score, the new approach is composed of two successive steps: score matching using perturbed noisy images, followed by a closed form image denoising formula via distribution-independent Tweedie's formula. This also suggests a systematic algorithm to estimate the noise model and noise parameters for a given noisy image data set. Through extensive experiments, we demonstrate that the proposed method can accurately estimate noise models and parameters, and provide the state-of-the-art self-supervised image denoising performance in the benchmark dataset and real-world dataset.
COMPUTERS
arxiv.org

Regularisation by fractional noise for one-dimensional differential equations with nonnegative distributional drift

We study existence and uniqueness of solutions to the equation $dX_t=b(X_t)dt + dB_t$, where $b$ is a distribution in some Besov space and $B$ is a fractional Brownian motion with Hurst parameter $H\leqslant 1/2$. First, the equation is understood as a nonlinear Young integral equation. The integral is constructed in a $p$-variation space, which is well suited when $b$ is a nonnegative (or nonpositive) distribution. Based on the Besov regularity of $b$, a condition on $H$ is given so that solutions to the equation exist. The construction is deterministic, and $B$ can be replaced by a deterministic path $w$ which has a sufficiently smooth local time.
MATHEMATICS
IN THIS ARTICLE
#Finger Vein Recognition#Trait#Vein#The Finger#Feature Extraction#Ieee
arxiv.org

Towards Full-to-Empty Room Generation with Structure-Aware Feature Encoding and Soft Semantic Region-Adaptive Normalization

Vasileios Gkitsas, Nikolaos Zioulis, Vladimiros Sterzentsenko, Alexandros Doumanoglou, Dimitrios Zarpalas. The task of transforming a furnished room image into a background-only is extremely challenging since it requires making large changes regarding the scene context while still preserving the overall layout and style. In order to acquire photo-realistic and structural consistent background, existing deep learning methods either employ image inpainting approaches or incorporate the learning of the scene layout as an individual task and leverage it later in a not fully differentiable semantic region-adaptive normalization module. To tackle these drawbacks, we treat scene layout generation as a feature linear transformation problem and propose a simple yet effective adjusted fully differentiable soft semantic region-adaptive normalization module (softSEAN) block. We showcase the applicability in diminished reality and depth estimation tasks, where our approach besides the advantages of mitigating training complexity and non-differentiability issues, surpasses the compared methods both quantitatively and qualitatively. Our softSEAN block can be used as a drop-in module for existing discriminative and generative models. Implementation is available on this http URL.
TECHNOLOGY
gadgetsandwearables.com

The best smart rings: health tracking from your finger

As technology shrinks in size and increases in power, making smart rings is becoming less and less of a challenge. They’ve not hit the mainstream just yet but it is only a matter of time. In this article we outline our pick of the best options that are out there – devices to get you off to a good wearable start in 2022.
CELL PHONES
arxiv.org

Decoupling Object Detection from Human-Object Interaction Recognition

We propose DEFR, a DEtection-FRee method to recognize Human-Object Interactions (HOI) at image level without using object location or human pose. This is challenging as the detector is an integral part of existing methods. In this paper, we propose two findings to boost the performance of the detection-free approach, which significantly outperforms the detection-assisted state of the arts. Firstly, we find it crucial to effectively leverage the semantic correlations among HOI classes. Remarkable gain can be achieved by using language embeddings of HOI labels to initialize the linear classifier, which encodes the structure of HOIs to guide training. Further, we propose Log-Sum-Exp Sign (LSE-Sign) loss to facilitate multi-label learning on a long-tailed dataset by balancing gradients over all classes in a softmax format. Our detection-free approach achieves 65.6 mAP in HOI classification on HICO, outperforming the detection-assisted state of the art (SOTA) by 18.5 mAP, and 52.7 mAP in one-shot classes, surpassing the SOTA by 27.3 mAP. Different from previous work, our classification model (DEFR) can be directly used in HOI detection without any additional training, by connecting to an off-the-shelf object detector whose bounding box output is converted to binary masks for DEFR. Surprisingly, such a simple connection of two decoupled models achieves SOTA performance (32.35 mAP).
COMPUTERS
YOU MAY ALSO LIKE
NewsBreak
Technology
NewsBreak
Science
arxiv.org

Robustifying automatic speech recognition by extracting slowly varying features

In the past few years, it has been shown that deep learning systems are highly vulnerable under attacks with adversarial examples. Neural-network-based automatic speech recognition (ASR) systems are no exception. Targeted and untargeted attacks can modify an audio input signal in such a way that humans still recognise the same words, while ASR systems are steered to predict a different transcription. In this paper, we propose a defense mechanism against targeted adversarial attacks consisting in removing fast-changing features from the audio signals, either by applying slow feature analysis, a low-pass filter, or both, before feeding the input to the ASR system. We perform an empirical analysis of hybrid ASR models trained on data pre-processed in such a way. While the resulting models perform quite well on benign data, they are significantly more robust against targeted adversarial attacks: Our final, proposed model shows a performance on clean data similar to the baseline model, while being more than four times more robust.
COMPUTERS
towardsdatascience.com

How to get reports from audio files using speech recognition and NLP

Transform speech into knowledge with Hugging Face/Facebook AI and expert.ai. Over the years I’ve saved tons of audio/video files, telling myself I would soon listen to them. This folder has now become an enormous messy heap of audios, and I often don’t even remember what each particular file is about. That’s why I wanted to create a program to analyze audio files and produce a report on their content. I needed something that with a simple click would show me topics, main words, main sentences, etc. To achieve this, I used Facebook AI/Hugging Face Wav2Vec 2.0 model in combination with expert.ai’s NL API. I uploaded the code here, hoping that it would be helpful to others as well.
SOFTWARE
arxiv.org

IID Sampling from Doubly Intractable Distributions

Intractable posterior distributions of parameters with intractable normalizing constants depending upon the parameters are known as doubly intractable posterior distributions. The terminology itself indicates that obtaining Bayesian inference from such posteriors is doubly difficult compared to traditional intractable posteriors where the normalizing constants are tractable and admit traditional Markov Chain Monte Carlo (MCMC) solutions.
SCIENCE
biometricupdate.com

Developers attempt to keep edge from compounding biometrics bias, accuracy issues

From advances in chips and sensors to algorithms, biometric authentication systems have undergone a period of rapid development. This is very evident in the video surveillance market, where speed and accuracy have made gains on the back of developments in edge AI. Still, the problem of bias in AI, which can lead to inaccurate and even life-threatening decisions, has not been completely solved.
SOFTWARE
arxiv.org

A learning-based approach to feature recognition of Engineering shapes

Lakshmi Priya Muraleedharan, Ramanathan Muthuganapathy. In this paper, we propose a machine learning approach to recognise engineering shape features such as holes, slots, etc. in a CAD mesh model. With the advent of digital archiving, newer manufacturing techniques such as 3D printing, scanning of components and reverse engineering, CAD data is proliferated in the form of mesh model representation. As the number of nodes and edges become larger in a mesh model as well as the possibility of presence of noise, direct application of graph-based approaches would not only be expensive but also difficult to be tuned for noisy data. Hence, this calls for newer approaches to be devised for feature recognition for CAD models represented in the form of mesh. Here, we show that a discrete version of Gauss map can be used as a signature for a feature learning. We show that this approach not only requires fewer memory requirements but also the training time is quite less. As no network architecture is involved, the number of hyperparameters are much lesser and can be tuned in a much faster time. The recognition accuracy is also very similar to that of the one obtained using 3D convolutional neural networks (CNN) but in much lesser running time and storage requirements. A comparison has been done with other non-network based machine learning approaches to show that our approach has the highest accuracy. We also show the recognition results for CAD models having multiple features as well as complex/interacting features obtained from public benchmarks. The ability to handle noisy data has also been demonstrated.
COMPUTERS
arxiv.org

Graph Convolutional Networks with Dual Message Passing for Subgraph Isomorphism Counting and Matching

Graph neural networks (GNNs) and message passing neural networks (MPNNs) have been proven to be expressive for subgraph structures in many applications. Some applications in heterogeneous graphs require explicit edge modeling, such as subgraph isomorphism counting and matching. However, existing message passing mechanisms are not designed well in theory. In this paper, we start from a particular edge-to-vertex transform and exploit the isomorphism property in the edge-to-vertex dual graphs. We prove that searching isomorphisms on the original graph is equivalent to searching on its dual graph. Based on this observation, we propose dual message passing neural networks (DMPNNs) to enhance the substructure representation learning in an asynchronous way for subgraph isomorphism counting and matching as well as unsupervised node classification. Extensive experiments demonstrate the robust performance of DMPNNs by combining both node and edge representation learning in synthetic and real heterogeneous graphs. Code is available at this https URL.
arxiv.org

Adaptation and Attention for Neural Video Coding

Nannan Zou, Honglei Zhang, Francesco Cricri, Ramin G. Youvalari, Hamed R. Tavakoli, Jani Lainema, Emre Aksu, Miska Hannuksela, Esa Rahtu. Neural image coding represents now the state-of-the-art image compression approach. However, a lot of work is still to be done in the video domain. In this work, we propose an end-to-end learned video codec that introduces several architectural novelties as well as training novelties, revolving around the concepts of adaptation and attention. Our codec is organized as an intra-frame codec paired with an inter-frame codec. As one architectural novelty, we propose to train the inter-frame codec model to adapt the motion estimation process based on the resolution of the input video. A second architectural novelty is a new neural block that combines concepts from split-attention based neural networks and from DenseNets. Finally, we propose to overfit a set of decoder-side multiplicative parameters at inference time. Through ablation studies and comparisons to prior art, we show the benefits of our proposed techniques in terms of coding gains. We compare our codec to VVC/H.266 and RLVC, which represent the state-of-the-art traditional and end-to-end learned codecs, respectively, and to the top performing end-to-end learned approach in 2021 CLIC competition, E2E_T_OL. Our codec clearly outperforms E2E_T_OL, and compare favorably to VVC and RLVC in some settings.
CODING & PROGRAMMING
arxiv.org

IS-COUNT: Large-scale Object Counting from Satellite Images with Covariate-based Importance Sampling

Object detection in high-resolution satellite imagery is emerging as a scalable alternative to on-the-ground survey data collection in many environmental and socioeconomic monitoring applications. However, performing object detection over large geographies can still be prohibitively expensive due to the high cost of purchasing imagery and compute. Inspired by traditional survey data collection strategies, we propose an approach to estimate object count statistics over large geographies through sampling. Given a cost budget, our method selects a small number of representative areas by sampling from a learnable proposal distribution. Using importance sampling, we are able to accurately estimate object counts after processing only a small fraction of the images compared to an exhaustive approach. We show empirically that the proposed framework achieves strong performance on estimating the number of buildings in the United States and Africa, cars in Kenya, brick kilns in Bangladesh, and swimming pools in the U.S., while requiring as few as 0.01% of satellite images compared to an exhaustive approach.
SCIENCE
arxiv.org

The exploitation of Multiple Feature Extraction Techniques for Speaker Identification in Emotional States under Disguised Voices

Due to improvements in artificial intelligence, speaker identification (SI) technologies have brought a great direction and are now widely used in a variety of sectors. One of the most important components of SI is feature extraction, which has a substantial impact on the SI process and performance. As a result, numerous feature extraction strategies are thoroughly investigated, contrasted, and analyzed. This article exploits five distinct feature extraction methods for speaker identification in disguised voices under emotional environments. To evaluate this work significantly, three effects are used: high-pitched, low-pitched, and Electronic Voice Conversion (EVC). Experimental results reported that the concatenated Mel-Frequency Cepstral Coefficients (MFCCs), MFCCs-delta, and MFCCs-delta-delta is the best feature extraction method.
ARTIFICIAL INTELLIGENCE
arxiv.org

Multiple Instance Learning for Brain Tumor Detection from Magnetic Resonance Spectroscopy Data

We apply deep learning (DL) on Magnetic resonance spectroscopy (MRS) data for the task of brain tumor detection. Medical applications often suffer from data scarcity and corruption by noise. Both of these problems are prominent in our data set. Furthermore, a varying number of spectra are available for the different patients. We address these issues by considering the task as a multiple instance learning (MIL) problem. Specifically, we aggregate multiple spectra from the same patient into a "bag" for classification and apply data augmentation techniques. To achieve the permutation invariance during the process of bagging, we proposed two approaches: (1) to apply min-, max-, and average-pooling on the features of all samples in one bag and (2) to apply an attention mechanism. We tested these two approaches on multiple neural network architectures. We demonstrate that classification performance is significantly improved when training on multiple instances rather than single spectra. We propose a simple oversampling data augmentation method and show that it could further improve the performance. Finally, we demonstrate that our proposed model outperforms manual classification by neuroradiologists according to most performance metrics.
HEALTH
arxiv.org

A Globally Convergent Distributed Jacobi Scheme for Block-Structured Nonconvex Constrained Optimization Problems

Motivated by the increasing availability of high-performance parallel computing, we design a distributed parallel algorithm for linearly-coupled block-structured nonconvex constrained optimization problems. Our algorithm performs Jacobi-type proximal updates of the augmented Lagrangian function, requiring only local solutions of separable block nonlinear programming (NLP) problems. We provide a cheap and explicitly computable Lyapunov function that allows us to establish global and local sublinear convergence of our algorithm, its iteration complexity, as well as simple, practical and theoretically convergent rules for automatically tuning its parameters. This in contrast to existing algorithms for nonconvex constrained optimization based on the alternating direction method of multipliers that rely on at least one of the following: Gauss-Seidel or sequential updates, global solutions of NLP problems, non-computable Lyapunov functions, and hand-tuning of parameters. Numerical experiments showcase its advantages for large-scale problems, including the multi-period optimization of a 9000-bus AC optimal power flow test case over 168 time periods, solved on the Summit supercomputer using an open-source Julia code.
CODING & PROGRAMMING
arxiv.org

End-to-End Multi-Task Deep Learning and Model Based Control Algorithm for Autonomous Driving

End-to-end driving with a deep learning neural network (DNN) has become a rapidly growing paradigm of autonomous driving in industry and academia. Yet safety measures and interpretability still pose challenges to this paradigm. We propose an end-to-end driving algorithm that integrates multi-task DNN, path prediction, and control models in a pipeline of data flow from sensory devices through these models to driving decisions. It provides quantitative measures to evaluate the holistic, dynamic, and real-time performance of end-to-end driving systems, and thus allows to quantify their safety and interpretability. The DNN is a modified UNet, a well known encoder-decoder neural network of semantic segmentation. It consists of one segmentation, one regression, and two classification tasks for lane segmentation, path prediction, and vehicle controls. We present three variants of the modified UNet architecture having different complexities, compare them on different tasks in four static measures for both single and multi-task (MT) architectures, and then identify the best one by two additional dynamic measures in real-time simulation. We also propose a learning- and model-based longitudinal controller using model predictive control method. With the Stanley lateral controller, our results show that MTUNet outperforms an earlier modified UNet in terms of curvature and lateral offset estimation on curvy roads at normal speed, which has been tested in a real car driving on real roads.

Comments / 0

Community Policy