ContributorsPublishersAdvertisers
Software

Online Meta Adaptation for Variable-Rate Learned Image Compression

By Wei Jiang, Wei Wang, Songnan Li, Shan Liu
arxiv.org
 8 days ago

This work addresses two major issues of end-to-end learned image compression (LIC) based on deep neural networks: variable-rate learning where separate networks are required to generate compressed images with varying qualities, and the train-test mismatch between differentiable approximate quantization and true hard quantization. We introduce...

arxiv.org

Comments / 0

Related
arxiv.org

Fine-Grained Image Analysis with Deep Learning: A Survey

Fine-grained image analysis (FGIA) is a longstanding and fundamental problem in computer vision and pattern recognition, and underpins a diverse set of real-world applications. The task of FGIA targets analyzing visual objects from subordinate categories, e.g., species of birds or models of cars. The small inter-class and large intra-class variation inherent to fine-grained image analysis makes it a challenging problem. Capitalizing on advances in deep learning, in recent years we have witnessed remarkable progress in deep learning powered FGIA. In this paper we present a systematic survey of these advances, where we attempt to re-define and broaden the field of FGIA by consolidating two fundamental fine-grained research areas -- fine-grained image recognition and fine-grained image retrieval. In addition, we also review other key issues of FGIA, such as publicly available benchmark datasets and related domain-specific applications. We conclude by highlighting several research directions and open problems which need further exploration from the community.
SOFTWARE
arxiv.org

OSSEM: one-shot speaker adaptive speech enhancement using meta learning

Although deep learning (DL) has achieved notable progress in speech enhancement (SE), further research is still required for a DL-based SE system to adapt effectively and efficiently to particular speakers. In this study, we propose a novel meta-learning-based speaker-adaptive SE approach (called OSSEM) that aims to achieve SE model adaptation in a one-shot manner. OSSEM consists of a modified transformer SE network and a speaker-specific masking (SSM) network. In practice, the SSM network takes an enrolled speaker embedding extracted using ECAPA-TDNN to adjust the input noisy feature through masking. To evaluate OSSEM, we designed a modified Voice Bank-DEMAND dataset, in which one utterance from the testing set was used for model adaptation, and the remaining utterances were used for testing the performance. Moreover, we set restrictions allowing the enhancement process to be conducted in real time, and thus designed OSSEM to be a causal SE system. Experimental results first show that OSSEM can effectively adapt a pretrained SE model to a particular speaker with only one utterance, thus yielding improved SE results. Meanwhile, OSSEM exhibits a competitive performance compared to state-of-the-art causal SE systems.
COMPUTERS
arxiv.org

Transformer-based Image Compression

A Transformer-based Image Compression (TIC) approach is developed which reuses the canonical variational autoencoder (VAE) architecture with paired main and hyper encoder-decoders. Both main and hyper encoders are comprised of a sequence of neural transformation units (NTUs) to analyse and aggregate important information for more compact representation of input image, while the decoders mirror the encoder-side operations to generate pixel-domain image reconstruction from the compressed bitstream. Each NTU is consist of a Swin Transformer Block (STB) and a convolutional layer (Conv) to best embed both long-range and short-range information; In the meantime, a casual attention module (CAM) is devised for adaptive context modeling of latent features to utilize both hyper and autoregressive priors. The TIC rivals with state-of-the-art approaches including deep convolutional neural networks (CNNs) based learnt image coding (LIC) methods and handcrafted rules-based intra profile of recently-approved Versatile Video Coding (VVC) standard, and requires much less model parameters, e.g., up to 45% reduction to leading-performance LIC.
COMPUTERS
arxiv.org

Learning Online for Unified Segmentation and Tracking Models

Tracking requires building a discriminative model for the target in the inference stage. An effective way to achieve this is online learning, which can comfortably outperform models that are only trained offline. Recent research shows that visual tracking benefits significantly from the unification of visual tracking and segmentation due to its pixel-level discrimination. However, it imposes a great challenge to perform online learning for such a unified model. A segmentation model cannot easily learn from prior information given in the visual tracking scenario. In this paper, we propose TrackMLP: a novel meta-learning method optimized to learn from only partial information to resolve the imposed challenge. Our model is capable of extensively exploiting limited prior information hence possesses much stronger target-background discriminability than other online learning methods. Empirically, we show that our model achieves state-of-the-art performance and tangible improvement over competing models. Our model achieves improved average overlaps of66.0%,67.1%, and68.5% in VOT2019, VOT2018, and VOT2016 datasets, which are 6.4%,7.3%, and6.4% higher than our baseline. Code will be made publicly available.
COMPUTERS
IN THIS ARTICLE
#Deep Learning#Online Learning#Meta Learning#Image Compression#Lic#Oml#Cvae#Sgd
arxiv.org

Meta-learning and data augmentation for mass-generalised jet taggers

Deep neural networks trained for jet tagging are typically specific to a narrow range of transverse momenta or jet masses. Given the large phase space that the LHC is able to probe, the potential benefit of classifiers that are effective over a wide range of masses or transverse momenta is significant. In this work we benchmark the performance of a number of methods for achieving accurate classification at masses distant from those used in training, with a focus on algorithms that leverage meta-learning. We study the discrimination of jets from boosted $Z'$ bosons against a QCD background. We find that a simple data augmentation strategy that standardises the angular scale of jets with different masses is sufficient to produce strong generalisation. The meta-learning algorithms provide only a small improvement in generalisation when combined with this augmentation. We also comment on the relationship between mass generalisation and mass decorrelation, demonstrating that those models which generalise better than the baseline also sculpt the background to a smaller degree.
SCIENCE
arxiv.org

FedGreen: Federated Learning with Fine-Grained Gradient Compression for Green Mobile Edge Computing

Federated learning (FL) enables devices in mobile edge computing (MEC) to collaboratively train a shared model without uploading the local data. Gradient compression may be applied to FL to alleviate the communication overheads but current FL with gradient compression still faces great challenges. To deploy green MEC, we propose FedGreen, which enhances the original FL with fine-grained gradient compression to efficiently control the total energy consumption of the devices. Specifically, we introduce the relevant operations including device-side gradient reduction and server-side element-wise aggregation to facilitate the gradient compression in FL. According to a public dataset, we investigate the contributions of the compressed local gradients with respect to different compression ratios. After that, we formulate and tackle a learning accuracy-energy efficiency tradeoff problem where the optimal compression ratio and computing frequency are derived for each device. Experiments results demonstrate that given the 80% test accuracy requirement, compared with the baseline schemes, FedGreen reduces at least 32% of the total energy consumption of the devices.
SOFTWARE
arxiv.org

Learning Graphs from Smooth and Graph-Stationary Signals with Hidden Variables

Network-topology inference from (vertex) signal observations is a prominent problem across data-science and engineering disciplines. Most existing schemes assume that observations from all nodes are available, but in many practical environments, only a subset of nodes is accessible. A natural (and sometimes effective) approach is to disregard the role of unobserved nodes, but this ignores latent network effects, deteriorating the quality of the estimated graph. Differently, this paper investigates the problem of inferring the topology of a network from nodal observations while taking into account the presence of hidden (latent) variables. Our schemes assume the number of observed nodes is considerably larger than the number of hidden variables and build on recent graph signal processing models to relate the signals and the underlying graph. Specifically, we go beyond classical correlation and partial correlation approaches and assume that the signals are smooth and/or stationary in the sought graph. The assumptions are codified into different constrained optimization problems, with the presence of hidden variables being explicitly taken into account. Since the resulting problems are ill-conditioned and non-convex, the block matrix structure of the proposed formulations is leveraged and suitable convex-regularized relaxations are presented. Numerical experiments over synthetic and real-world datasets showcase the performance of the developed methods and compare them with existing alternatives.
SCIENCE
arxiv.org

Metric-based multimodal meta-learning for human movement identification via footstep recognition

We describe a novel metric-based learning approach that introduces a multimodal framework and uses deep audio and geophone encoders in siamese configuration to design an adaptable and lightweight supervised model. This framework eliminates the need for expensive data labeling procedures and learns general-purpose representations from low multisensory data obtained from omnipresent sensing systems. These sensing systems provide numerous applications and various use cases in activity recognition tasks. Here, we intend to explore the human footstep movements from indoor environments and analyze representations from a small self-collected dataset of acoustic and vibration-based sensors. The core idea is to learn plausible similarities between two sensory traits and combining representations from audio and geophone signals. We present a generalized framework to learn embeddings from temporal and spatial features extracted from audio and geophone signals. We then extract the representations in a shared space to maximize the learning of a compatibility function between acoustic and geophone features. This, in turn, can be used effectively to carry out a classification task from the learned model, as demonstrated by assigning high similarity to the pairs with a human footstep movement and lower similarity to pairs containing no footstep movement. Performance analyses show that our proposed multimodal framework achieves a 19.99\% accuracy increase (in absolute terms) and avoided overfitting on the evaluation set when the training samples were increased from 200 pairs to just 500 pairs while satisfactorily learning the audio and geophone representations. Our results employ a metric-based contrastive learning approach for multi-sensor data to mitigate the impact of data scarcity and perform human movement identification with limited data size.
COMPUTERS
YOU MAY ALSO LIKE
NewsBreak
Technology
NewsBreak
Computers
NewsBreak
Software
arxiv.org

Auxiliary Loss Adaption for Image Inpainting

Auxiliary losses commonly used in image inpainting lead to better reconstruction performance by incorporating prior knowledge of missing regions. However, it usually takes a lot of effort to fully exploit the potential of auxiliary losses, since improperly weighted auxiliary losses would distract the model from the inpainting task, and the effectiveness of an auxiliary loss might vary during the training process. Furthermore, the design of auxiliary losses takes domain expertise. In this work, we introduce the Auxiliary Loss Adaption (Adaption) algorithm to dynamically adjust the parameters of the auxiliary loss, to better assist the primary task. Our algorithm is based on the principle that better auxiliary loss is the one that helps increase the performance of the main loss through several steps of gradient descent. We then examined two commonly used auxiliary losses in inpainting and use \ac{ALA} to adapt their parameters. Experimental results show that ALA induces more competitive inpainting results than fixed auxiliary losses. In particular, simply combining auxiliary loss with \ac{ALA}, existing inpainting methods can achieve increased performances without explicitly incorporating delicate network design or structure knowledge prior.
TECHNOLOGY
arxiv.org

Meta-Voice: Fast few-shot style transfer for expressive voice cloning using meta learning

The task of few-shot style transfer for voice cloning in text-to-speech (TTS) synthesis aims at transferring speaking styles of an arbitrary source speaker to a target speaker's voice using very limited amount of neutral data. This is a very challenging task since the learning algorithm needs to deal with few-shot voice cloning and speaker-prosody disentanglement at the same time. Accelerating the adaptation process for a new target speaker is of importance in real-world applications, but even more challenging. In this paper, we approach to the hard fast few-shot style transfer for voice cloning task using meta learning. We investigate the model-agnostic meta-learning (MAML) algorithm and meta-transfer a pre-trained multi-speaker and multi-prosody base TTS model to be highly sensitive for adaptation with few samples. Domain adversarial training mechanism and orthogonal constraint are adopted to disentangle speaker and prosody representations for effective cross-speaker style transfer. Experimental results show that the proposed approach is able to conduct fast voice cloning using only 5 samples (around 12 second speech data) from a target speaker, with only 100 adaptation steps. Audio samples are available online.
COMPUTERS
arxiv.org

A Performance Bound for Model Based Online Reinforcement Learning

Model based reinforcement learning (RL) refers to an approximate optimal control design for infinite-horizon (IH) problems that aims at approximating the optimal IH controller and associated cost parametrically. In online RL, the training process of the respective approximators is performed along the de facto system trajectory (potentially in addition to offline data). While there exist stability results for online RL, the IH controller performance has been addressed only fragmentary, rarely considering the parametric and error-prone nature of the approximation explicitly even in the model based case. To assess the performance for such a case, this work utilizes a model predictive control framework to mimic an online RL controller. More precisely, the optimization based controller is associated with an online adapted approximate cost which serves as a terminal cost function. The results include a stability and performance estimate statement for the control and training scheme and demonstrate the dependence of the controller's performance bound on the error resulting from parameterized cost approximation.
CODING & PROGRAMMING
arxiv.org

Compressive Features in Offline Reinforcement Learning for Recommender Systems

In this paper, we develop a recommender system for a game that suggests potential items to players based on their interactive behaviors to maximize revenue for the game provider. Our approach is built on a reinforcement learning-based technique and is trained on an offline data set that is publicly available on an IEEE Big Data Cup challenge. The limitation of the offline data set and the curse of high dimensionality pose significant obstacles to solving this problem. Our proposed method focuses on improving the total rewards and performance by tackling these main difficulties. More specifically, we utilized sparse PCA to extract important features of user behaviors. Our Q-learning-based system is then trained from the processed offline data set. To exploit all possible information from the provided data set, we cluster user features to different groups and build an independent Q-table for each group. Furthermore, to tackle the challenge of unknown formula for evaluation metrics, we design a metric to self-evaluate our system's performance based on the potential value the game provider might achieve and a small collection of actual evaluation metrics that we obtain from the live scoring environment. Our experiments show that our proposed metric is consistent with the results published by the challenge organizers. We have implemented the proposed training pipeline, and the results show that our method outperforms current state-of-the-art methods in terms of both total rewards and training speed. By addressing the main challenges and leveraging the state-of-the-art techniques, we have achieved the best public leaderboard result in the challenge. Furthermore, our proposed method achieved an estimated score of approximately 20% better and can be trained faster by 30 times than the best of the current state-of-the-art methods.
VIDEO GAMES
arxiv.org

Phase function estimation from a diffuse optical image via deep learning

The phase function is a key element of a light propagation model for Monte Carlo (MC) simulation, which is usually fitted with an analytic function with associated parameters. In recent years, machine learning methods were reported to estimate the parameters of the phase function of a particular form such as the Henyey-Greenstein phase function but, to our knowledge, no studies have been performed to determine the form of the phase function. Here we design a convolutional neural network to estimate the phase function from a diffuse optical image without any explicit assumption on the form of the phase function. Specifically, we use a Gaussian mixture model as an example to represent the phase function generally and learn the model parameters accurately. The Gaussian mixture model is selected because it provides the analytic expression of phase function to facilitate deflection angle sampling in MC simulation, and does not significantly increase the number of free parameters. Our proposed method is validated on MC-simulated reflectance images of typical biological tissues using the Henyey-Greenstein phase function with different anisotropy factors. The effects of field of view (FOV) and spatial resolution on the errors are analyzed to optimize the estimation method. The mean squared error of the phase function is 0.01 and the relative error of the anisotropy factor is 3.28%.
COMPUTERS
towardsdatascience.com

Image Classification Transfer Learning and Fine Tuning using TensorFlow

A simple CNN(convolutional neural network) transfer learning application with fine tuning is done here using the EfficientNetB0 model on the food101 dataset from tensorflow datatsets. Python 3 kernel is used on the Jupyter notebook interface to perform the experiment. EfficientNet, first introduced in Tan and Le, 2019 is among the...
CODING & PROGRAMMING
arxiv.org

Single Image Object Counting and Localizing using Active-Learning

The need to count and localize repeating objects in an image arises in different scenarios, such as biological microscopy studies, production lines inspection, and surveillance recordings analysis. The use of supervised Convoutional Neural Networks (CNNs) achieves accurate object detection when trained over large class-specific datasets. The labeling effort in this approach does not pay-off when the counting is required over few images of a unique object class.
SOFTWARE
Canyon News

Distance Learning: Top Tools To Ace Online Learning Effectively In 2021

UNITED STATES—Students today are fortunate to be living in the era of Google and the Internet where information is readily and easily accessible more than ever before. To put things into perspective for you, the biggest search engine giant – Google processes more than 40,000 queries every second on average. This is a whole lot of learning on average.
EDUCATION
arxiv.org

An Investigation into Keystroke Dynamics and Heart Rate Variability as Indicators of Stress

Lifelogging has become a prominent research topic in recent years. Wearable sensors like Fitbits and smart watches are now increasingly popular for recording ones activities. Some researchers are also exploring keystroke dynamics for lifelogging. Keystroke dynamics refers to the process of measuring and assessing a persons typing rhythm on digital devices. A digital footprint is created when a user interacts with devices like keyboards, mobile phones or touch screen panels and the timing of the keystrokes is unique to each individual though likely to be affected by factors such as fatigue, distraction or emotional stress. In this work we explore the relationship between keystroke dynamics as measured by the timing for the top-10 most frequently occurring bi-grams in English, and the emotional state and stress of an individual as measured by heart rate variability (HRV). We collected keystroke data using the Loggerman application while HRV was simultaneously gathered. With this data we performed an analysis to determine the relationship between variations in keystroke dynamics and variations in HRV. Our conclusion is that we need to use a more detailed representation of keystroke timing than the top-10 bigrams, probably personalised to each user.
VIETNAM
arxiv.org

Deep learning based on mixed-variable physics informed neural network for solving fluid dynamics without simulation data

Deep learning method has attracted tremendous attention to handle fluid dynamics in recent years. However, the deep learning method requires much data to guarantee the generalization ability and the data of fluid dynamics are deficient. Recently, physics informed neural network (PINN) is popular to solve the fluid flow problems, which basic concept is to embed the governing equation and continuity equation into loss function, with the requirement of less dataset for obtaining a reliable neural network. In this paper, the mixed-variable PINN method, which convert the governing equation into continuum and constitutive formulations, is proposed to solve the fluid dynamics (flow past cylinder) without any labeled data. The initial/boundary conditions with penalty factors are also embedded into the loss function to become a well-imposed problem. The results show that mixed-variable PINN has better predictive ability to construct the flow field than traditional PINN scheme. Furthermore, the transfer learning method is adopted to is solve the fluid solutions with different Reynold numbers with less computational cost. The results also demonstrate that the transfer learning method can well simulate the different Reynolds number in a short time.
COMPUTERS
arxiv.org

End-to-end optimized image compression with competition of prior distributions

Convolutional autoencoders are now at the forefront of image compression research. To improve their entropy coding, encoder output is typically analyzed with a second autoencoder to generate per-variable parametrized prior probability distributions. We instead propose a compression scheme that uses a single convolutional autoencoder and multiple learned prior distributions working as a competition of experts. Trained prior distributions are stored in a static table of cumulative distribution functions. During inference, this table is used by an entropy coder as a look-up-table to determine the best prior for each spatial location. Our method offers rate-distortion performance comparable to that obtained with a predicted parametrized prior with only a fraction of its entropy coding and decoding complexity.
SOFTWARE

Comments / 0

Community Policy