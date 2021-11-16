ContributorsPublishersAdvertisers
MoRe-Fi: Motion-robust and Fine-grained Respiration Monitoring via Deep-Learning UWB Radar

By Tianyue Zheng, Zhe Chen, Shujie Zhang, Chao Cai, Jun Luo
 8 days ago

Crucial for healthcare and biomedical applications, respiration monitoring often employs wearable sensors in practice, causing inconvenience due to their direct contact with human bodies. Therefore, researchers have been constantly searching for contact-free alternatives. Nonetheless, existing contact-free designs mostly require human subjects to remain static,...

A Robust Deep Learning-Based Beamforming Design for RIS-assisted Multiuser MISO Communications with Practical Constraints

Reconfigurable intelligent surface (RIS) has become a promising technology to improve wireless communication in recent years. It steers the incident signals to create a favorable propagation environment by controlling the reconfigurable passive elements with less hardware cost and lower power consumption. In this paper, we consider a RIS-aided multiuser multiple-input single-output downlink communication system. We aim to maximize the weighted sum-rate of all users by joint optimizing the active beamforming at the access point and the passive beamforming vector of the RIS elements. Unlike most existing works, we consider the more practical situation with the discrete phase shifts and imperfect channel state information (CSI). Specifically, for the situation that the discrete phase shifts and perfect CSI are considered, we first develop a deep quantization neural network (DQNN) to simultaneously design the active and passive beamforming while most reported works design them alternatively. Then, we propose an improved structure (I-DQNN) based on DQNN to simplify the parameters decision process when the control bits of each RIS element are greater than 1 bit. Finally, we extend the two proposed DQNN-based algorithms to the case that the discrete phase shifts and imperfect CSI are considered simultaneously. Our simulation results show that the two DQNN-based algorithms have better performance than traditional algorithms in the perfect CSI case, and are also more robust in the imperfect CSI case.
Learning Robust Scheduling with Search and Attention

Allocating physical layer resources to users based on channel quality, buffer size, requirements and constraints represents one of the central optimization problems in the management of radio resources. The solution space grows combinatorially with the cardinality of each dimension making it hard to find optimal solutions using an exhaustive search or even classical optimization algorithms given the stringent time requirements. This problem is even more pronounced in MU-MIMO scheduling where the scheduler can assign multiple users to the same time-frequency physical resources. Traditional approaches thus resort to designing heuristics that trade optimality in favor of feasibility of execution. In this work we treat the MU-MIMO scheduling problem as a tree-structured combinatorial problem and, borrowing from the recent successes of AlphaGo Zero, we investigate the feasibility of searching for the best performing solutions using a combination of Monte Carlo Tree Search and Reinforcement Learning. To cater to the nature of the problem at hand, like the lack of an intrinsic ordering of the users as well as the importance of dependencies between combinations of users, we make fundamental modifications to the neural network architecture by introducing the self-attention mechanism. We then demonstrate that the resulting approach is not only feasible but vastly outperforms state-of-the-art heuristic-based scheduling approaches in the presence of measurement uncertainties and finite buffers.
Distribution-based loss functions for deep learning models

Information is made of data. During training step, an artificial neural network learns to map (predict) a set of inputs to a set of outputs from a labeled dataset. Computing the optimal weights is an optimization problem and it is usually solved by the stochastic gradient descent: weights are updated using the backpropagation of prediction error. The gradient descent algorithm updates weights navigating down the gradient (or slope) of the error, so that it can reduce the error of the next prediction. This is, in their very essence, how neural networks work.
Towards Robust Knowledge Graph Embedding via Multi-task Reinforcement Learning

Nowadays, Knowledge graphs (KGs) have been playing a pivotal role in AI-related applications. Despite the large sizes, existing KGs are far from complete and comprehensive. In order to continuously enrich KGs, automatic knowledge construction and update mechanisms are usually utilized, which inevitably bring in plenty of noise. However, most existing knowledge graph embedding (KGE) methods assume that all the triple facts in KGs are correct, and project both entities and relations into a low-dimensional space without considering noise and knowledge conflicts. This will lead to low-quality and unreliable representations of KGs. To this end, in this paper, we propose a general multi-task reinforcement learning framework, which can greatly alleviate the noisy data problem. In our framework, we exploit reinforcement learning for choosing high-quality knowledge triples while filtering out the noisy ones. Also, in order to take full advantage of the correlations among semantically similar relations, the triple selection processes of similar relations are trained in a collective way with multi-task learning. Moreover, we extend popular KGE models TransE, DistMult, ConvE and RotatE with the proposed framework. Finally, the experimental validation shows that our approach is able to enhance existing KGE models and can provide more robust representations of KGs in noisy scenarios.
#Respiration#Deep Learning Uwb Radar#Machine Learning#Lg#Signal Processing#Sensys
Radar Aided 6G Beam Prediction: Deep Learning Algorithms and Real-World Demonstration

This paper presents the first machine learning based real-world demonstration for radar-aided beam prediction in a practical vehicular communication scenario. Leveraging radar sensory data at the communication terminals provides important awareness about the transmitter/receiver locations and the surrounding environment. This awareness could be utilized to reduce or even eliminate the beam training overhead in millimeter wave (mmWave) and sub-terahertz (THz) MIMO communication systems, which enables a wide range of highly-mobile low-latency applications. In this paper, we develop deep learning based radar-aided beam prediction approaches for mmWave/sub-THz systems. The developed solutions leverage domain knowledge for radar signal processing to extract the relevant features fed to the learning models. This optimizes their performance, complexity, and inference time. The proposed radar-aided beam prediction solutions are evaluated using the large-scale real-world dataset DeepSense 6G, which comprises co-existing mmWave beam training and radar measurements. In addition to completely eliminating the radar/communication calibration overhead, the experimental results showed that the proposed algorithms are able to achieve around $90\%$ top-5 beam prediction accuracy while saving $93\%$ of the beam training overhead. This highlights a promising direction for addressing the beam management overhead challenges in mmWave/THz communication systems.
FedGreen: Federated Learning with Fine-Grained Gradient Compression for Green Mobile Edge Computing

Federated learning (FL) enables devices in mobile edge computing (MEC) to collaboratively train a shared model without uploading the local data. Gradient compression may be applied to FL to alleviate the communication overheads but current FL with gradient compression still faces great challenges. To deploy green MEC, we propose FedGreen, which enhances the original FL with fine-grained gradient compression to efficiently control the total energy consumption of the devices. Specifically, we introduce the relevant operations including device-side gradient reduction and server-side element-wise aggregation to facilitate the gradient compression in FL. According to a public dataset, we investigate the contributions of the compressed local gradients with respect to different compression ratios. After that, we formulate and tackle a learning accuracy-energy efficiency tradeoff problem where the optimal compression ratio and computing frequency are derived for each device. Experiments results demonstrate that given the 80% test accuracy requirement, compared with the baseline schemes, FedGreen reduces at least 32% of the total energy consumption of the devices.
Stacked U-Nets with Self-Assisted Priors Towards Robust Correction of Rigid Motion Artifact in Brain MRI

In this paper, we develop an efficient retrospective deep learning method called stacked U-Nets with self-assisted priors to address the problem of rigid motion artifacts in MRI. The proposed work exploits the usage of additional knowledge priors from the corrupted images themselves without the need for additional contrast data. The proposed network learns missed structural details through sharing auxiliary information from the contiguous slices of the same distorted subject. We further design a refinement stacked U-Nets that facilitates preserving of the image spatial details and hence improves the pixel-to-pixel dependency. To perform network training, simulation of MRI motion artifacts is inevitable. We present an intensive analysis using various types of image priors: the proposed self-assisted priors and priors from other image contrast of the same subject. The experimental analysis proves the effectiveness and feasibility of our self-assisted priors since it does not require any further data scans.
Adversarially Robust Learning for Security-Constrained Optimal Power Flow

In recent years, the ML community has seen surges of interest in both adversarially robust learning and implicit layers, but connections between these two areas have seldom been explored. In this work, we combine innovations from these areas to tackle the problem of N-k security-constrained optimal power flow (SCOPF). N-k SCOPF is a core problem for the operation of electrical grids, and aims to schedule power generation in a manner that is robust to potentially k simultaneous equipment outages. Inspired by methods in adversarially robust training, we frame N-k SCOPF as a minimax optimization problem - viewing power generation settings as adjustable parameters and equipment outages as (adversarial) attacks - and solve this problem via gradient-based techniques. The loss function of this minimax problem involves resolving implicit equations representing grid physics and operational decisions, which we differentiate through via the implicit function theorem. We demonstrate the efficacy of our framework in solving N-3 SCOPF, which has traditionally been considered as prohibitively expensive to solve given that the problem size depends combinatorially on the number of potential outages.
Image-based monitoring of bolt loosening through deep-learning-based integrated detection and tracking

Structural bolts are critical components used in different structural elements, such as beam-column connections and friction damping devices. The clamping force in structural bolts is highly influenced by the bolt rotation. Much of the existing vision-based research about bolt rotation estimation relies on traditional computer vision algorithms such as Hough Transform to assess static images of bolts. This requires careful image preprocessing, and it may not perform well in the situation of complicated bolt assemblies, or in the presence of surrounding objects and background noise, thus hindering their real-world applications. In this study, an integrated real-time detect-track method, namely RTDT-Bolt, is proposed to monitor the bolt rotation angle. First, a real-time convolutional-neural-networks-based object detector, named YOLOv3-tiny, is established and trained to localize structural bolts. Then, the target-free object tracking algorithm based on optical flow is implemented, to continuously monitor and quantify the rotation of structural bolts. In order to enhance the tracking performance against background noise and potential illumination changes during tracking, the YOLOv3-tiny is integrated with the optical flow tracking algorithm to re-detect the bolts when the tracking gets lost. Extensive parameter studies were conducted to identify optimal tracking performance and examine the potential limitations. The results indicate the RTDT-Bolt method can greatly enhance the tracking performance of bolt rotation, which can achieve over 90% accuracy using the recommended range for the parameters.
Learning a Shared Model for Motorized Prosthetic Joints to Predict Ankle-Joint Motion

Control strategies for active prostheses or orthoses use sensor inputs to recognize the user's locomotive intention and generate corresponding control commands for producing the desired locomotion. In this paper, we propose a learning-based shared model for predicting ankle-joint motion for different locomotion modes like level-ground walking, stair ascent, stair descent, slope ascent, and slope descent without the need to classify between them. Features extracted from hip and knee joint angular motion are used to continuously predict the ankle angles and moments using a Feed-Forward Neural Network-based shared model. We show that the shared model is adequate for predicting the ankle angles and moments for different locomotion modes without explicitly classifying between the modes. The proposed strategy shows the potential for devising a high-level controller for an intelligent prosthetic ankle that can adapt to different locomotion modes.
Deep Learning based Urban Vehicle Trajectory Analytics

A `trajectory' refers to a trace generated by a moving object in geographical spaces, usually represented by of a series of chronologically ordered points, where each point consists of a geo-spatial coordinate set and a timestamp. Rapid advancements in location sensing and wireless communication technology enabled us to collect and store a massive amount of trajectory data. As a result, many researchers use trajectory data to analyze mobility of various moving objects. In this dissertation, we focus on the `urban vehicle trajectory,' which refers to trajectories of vehicles in urban traffic networks, and we focus on `urban vehicle trajectory analytics.' The urban vehicle trajectory analytics offers unprecedented opportunities to understand vehicle movement patterns in urban traffic networks including both user-centric travel experiences and system-wide spatiotemporal patterns. The spatiotemporal features of urban vehicle trajectory data are structurally correlated with each other, and consequently, many previous researchers used various methods to understand this structure. Especially, deep-learning models are getting attentions of many researchers due to its powerful function approximation and feature representation abilities. As a result, the objective of this dissertation is to develop deep-learning based models for urban vehicle trajectory analytics to better understand the mobility patterns of urban traffic networks. Particularly, this dissertation focuses on two research topics, which has high necessity, importance and applicability: Next Location Prediction, and Synthetic Trajectory Generation. In this study, we propose various novel models for urban vehicle trajectory analytics using deep learning.
Towards Privacy-Preserving Affect Recognition: A Two-Level Deep Learning Architecture

Automatically understanding and recognising human affective states using images and computer vision can improve human-computer and human-robot interaction. However, privacy has become an issue of great concern, as the identities of people used to train affective models can be exposed in the process. For instance, malicious individuals could exploit images from users and assume their identities. In addition, affect recognition using images can lead to discriminatory and algorithmic bias, as certain information such as race, gender, and age could be assumed based on facial features. Possible solutions to protect the privacy of users and avoid misuse of their identities are to: (1) extract anonymised facial features, namely action units (AU) from a database of images, discard the images and use AUs for processing and training, and (2) federated learning (FL) i.e. process raw images in users' local machines (local processing) and send the locally trained models to the main processing machine for aggregation (central processing). In this paper, we propose a two-level deep learning architecture for affect recognition that uses AUs in level 1 and FL in level 2 to protect users' identities. The architecture consists of recurrent neural networks to capture the temporal relationships amongst the features and predict valence and arousal affective states. In our experiments, we evaluate the performance of our privacy-preserving architecture using different variations of recurrent neural networks on RECOLA, a comprehensive multimodal affective database. Our results show state-of-the-art performance of $0.426$ for valence and $0.401$ for arousal using the Concordance Correlation Coefficient evaluation metric, demonstrating the feasibility of developing models for affect recognition that are both accurate and ensure privacy.
Phase function estimation from a diffuse optical image via deep learning

The phase function is a key element of a light propagation model for Monte Carlo (MC) simulation, which is usually fitted with an analytic function with associated parameters. In recent years, machine learning methods were reported to estimate the parameters of the phase function of a particular form such as the Henyey-Greenstein phase function but, to our knowledge, no studies have been performed to determine the form of the phase function. Here we design a convolutional neural network to estimate the phase function from a diffuse optical image without any explicit assumption on the form of the phase function. Specifically, we use a Gaussian mixture model as an example to represent the phase function generally and learn the model parameters accurately. The Gaussian mixture model is selected because it provides the analytic expression of phase function to facilitate deflection angle sampling in MC simulation, and does not significantly increase the number of free parameters. Our proposed method is validated on MC-simulated reflectance images of typical biological tissues using the Henyey-Greenstein phase function with different anisotropy factors. The effects of field of view (FOV) and spatial resolution on the errors are analyzed to optimize the estimation method. The mean squared error of the phase function is 0.01 and the relative error of the anisotropy factor is 3.28%.
Semantic-aware Representation Learning Via Probability Contrastive Loss

Recent feature contrastive learning (FCL) has shown promising performance in unsupervised representation learning. For the close-set representation learning where labeled data and unlabeled data belong to the same semantic space, however, FCL cannot show overwhelming gains due to not involving the class semantics during optimization. Consequently, the produced features do not guarantee to be easily classified by the class weights learned from labeled data although they are information-rich. To tackle this issue, we propose a novel probability contrastive learning (PCL) in this paper, which not only produces rich features but also enforces them to be distributed around the class prototypes. Specifically, we propose to use the output probabilities after softmax to perform contrastive learning instead of the extracted features in FCL. Evidently, such a way can exploit the class semantics during optimization. Moreover, we propose to remove the $\ell_{2}$ normalization in the traditional FCL and directly use the $\ell_{1}$-normalized probability for contrastive learning. Our proposed PCL is simple and effective. We conduct extensive experiments on three close-set image classification tasks, i.e., unsupervised domain adaptation, semi-supervised learning, and semi-supervised domain adaptation. The results on multiple datasets demonstrate that our PCL can consistently get considerable gains and achieves the state-of-the-art performance for all three tasks.
Fine-Grained Vehicle Classification in Urban Traffic Scenes using Deep Learning

The increasingly dense traffic is becoming a challenge in our local settings, urging the need for a better traffic monitoring and management system. Fine-grained vehicle classification appears to be a challenging task as compared to vehicle coarse classification. Exploring a robust approach for vehicle detection and classification into fine-grained categories is therefore essentially required. Existing Vehicle Make and Model Recognition (VMMR) systems have been developed on synchronized and controlled traffic conditions. Need for robust VMMR in complex, urban, heterogeneous, and unsynchronized traffic conditions still remain an open research area. In this paper, vehicle detection and fine-grained classification are addressed using deep learning. To perform fine-grained classification with related complexities, local dataset THS-10 having high intra-class and low interclass variation is exclusively prepared. The dataset consists of 4250 vehicle images of 10 vehicle models, i.e., Honda City, Honda Civic, Suzuki Alto, Suzuki Bolan, Suzuki Cultus, Suzuki Mehran, Suzuki Ravi, Suzuki Swift, Suzuki Wagon R and Toyota Corolla. This dataset is available online. Two approaches have been explored and analyzed for classification of vehicles i.e, fine-tuning, and feature extraction from deep neural networks. A comparative study is performed, and it is demonstrated that simpler approaches can produce good results in local environment to deal with complex issues such as dense occlusion and lane departures. Hence reducing computational load and time, e.g. fine-tuning Inception-v3 produced highest accuracy of 97.4% with lowest misclassification rate of 2.08%. Fine-tuning MobileNet-v2 and ResNet-18 produced 96.8% and 95.7% accuracies, respectively. Extracting features from fc6 layer of AlexNet produces an accuracy of 93.5% with a misclassification rate of 6.5%.
DFC: Deep Feature Consistency for Robust Point Cloud Registration

How to extract significant point cloud features and estimate the pose between them remains a challenging question, due to the inherent lack of structure and ambiguous order permutation of point clouds. Despite significant improvements in applying deep learning-based methods for most 3D computer vision tasks, such as object classification, object segmentation and point cloud registration, the consistency between features is still not attractive in existing learning-based pipelines. In this paper, we present a novel learning-based alignment network for complex alignment scenes, titled deep feature consistency and consisting of three main modules: a multiscale graph feature merging network for converting the geometric correspondence set into high-dimensional features, a correspondence weighting module for constructing multiple candidate inlier subsets, and a Procrustes approach named deep feature matching for giving a closed-form solution to estimate the relative pose. As the most important step of the deep feature matching module, the feature consistency matrix for each inlier subset is constructed to obtain its principal vectors as the inlier likelihoods of the corresponding subset. We comprehensively validate the robustness and effectiveness of our approach on both the 3DMatch dataset and the KITTI odometry dataset. For large indoor scenes, registration results on the 3DMatch dataset demonstrate that our method outperforms both the state-of-the-art traditional and learning-based methods. For KITTI outdoor scenes, our approach remains quite capable of lowering the transformation errors. We also explore its strong generalization capability over cross-datasets.
Deep Distilling: automated code generation using explainable deep learning

Human reasoning can distill principles from observed patterns and generalize them to explain and solve novel problems. The most powerful artificial intelligence systems lack explainability and symbolic reasoning ability, and have therefore not achieved supremacy in domains requiring human understanding, such as science or common sense reasoning. Here we introduce deep distilling, a machine learning method that learns patterns from data using explainable deep learning and then condenses it into concise, executable computer code. The code, which can contain loops, nested logical statements, and useful intermediate variables, is equivalent to the neural network but is generally orders of magnitude more compact and human-comprehensible. On a diverse set of problems involving arithmetic, computer vision, and optimization, we show that deep distilling generates concise code that generalizes out-of-distribution to solve problems orders-of-magnitude larger and more complex than the training data. For problems with a known ground-truth rule set, deep distilling discovers the rule set exactly with scalable guarantees. For problems that are ambiguous or computationally intractable, the distilled rules are similar to existing human-derived algorithms and perform at par or better. Our approach demonstrates that unassisted machine intelligence can build generalizable and intuitive rules explaining patterns in large datasets that would otherwise overwhelm human reasoning.
Deep Neural Networks for Automatic Grain-matrix Segmentation in Plane and Cross-polarized Sandstone Photomicrographs

Grain segmentation of sandstone that is partitioning the grain from its surrounding matrix/cement in the thin section is the primary step for computer-aided mineral identification and sandstone classification. The microscopic images of sandstone contain many mineral grains and their surrounding matrix/cement. The distinction between adjacent grains and the matrix is often ambiguous, making grain segmentation difficult. Various solutions exist in literature to handle these problems; however, they are not robust against sandstone petrography's varied pattern. In this paper, we formulate grain segmentation as a pixel-wise two-class (i.e., grain and background) semantic segmentation task. We develop a deep learning-based end-to-end trainable framework named Deep Semantic Grain Segmentation network (DSGSN), a data-driven method, and provide a generic solution. As per the authors' knowledge, this is the first work where the deep neural network is explored to solve the grain segmentation problem. Extensive experiments on microscopic images highlight that our method obtains better segmentation accuracy than various segmentation architectures with more parameters.
Versatile Inverse Reinforcement Learning via Cumulative Rewards

Inverse Reinforcement Learning infers a reward function from expert demonstrations, aiming to encode the behavior and intentions of the expert. Current approaches usually do this with generative and uni-modal models, meaning that they encode a single behavior. In the common setting, where there are various solutions to a problem and the experts show versatile behavior this severely limits the generalization capabilities of these methods. We propose a novel method for Inverse Reinforcement Learning that overcomes these problems by formulating the recovered reward as a sum of iteratively trained discriminators. We show on simulated tasks that our approach is able to recover general, high-quality reward functions and produces policies of the same quality as behavioral cloning approaches designed for versatile behavior.
EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching

Current metrics for video captioning are mostly based on the text-level comparison between reference and candidate captions. However, they have some insuperable drawbacks, e.g., they cannot handle videos without references, and they may result in biased evaluation due to the one-to-many nature of video-to-text and the neglect of visual relevance. From the human evaluator's viewpoint, a high-quality caption should be consistent with the provided video, but not necessarily be similar to the reference in literal or semantics. Inspired by human evaluation, we propose EMScore (Embedding Matching-based score), a novel reference-free metric for video captioning, which directly measures similarity between video and candidate captions. Benefit from the recent development of large-scale pre-training models, we exploit a well pre-trained vision-language model to extract visual and linguistic embeddings for computing EMScore. Specifically, EMScore combines matching scores of both coarse-grained (video and caption) and fine-grained (frames and words) levels, which takes the overall understanding and detailed characteristics of the video into account. Furthermore, considering the potential information gain, EMScore can be flexibly extended to the conditions where human-labeled references are available. Last but not least, we collect VATEX-EVAL and ActivityNet-FOIl datasets to systematically evaluate the existing metrics. VATEX-EVAL experiments demonstrate that EMScore has higher human correlation and lower reference dependency. ActivityNet-FOIL experiment verifies that EMScore can effectively identify "hallucinating" captions. The datasets will be released to facilitate the development of video captioning metrics. The code is available at: this https URL.
