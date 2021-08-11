Cancel
CreatorsPublishersAdvertisers
View more in
Science

Curse of Dimensionality — A “Curse” to Machine Learning

By Editors' Picks
towardsdatascience.com
 8 days ago

Cover picture for the articleCurse of Dimensionality describes the explosive nature of increasing data dimensions and its resulting exponential increase in computational efforts required for its processing and/or analysis. This term was first introduced by Richard E. Bellman, to explain the increase in volume of Euclidean space associated with adding extra dimensions, in area of dynamic programming. Today, this phenomenon is observed in fields like machine learning, data analysis, data mining to name a few. An increase in the dimensions can in theory, add more information to the data thereby improving the quality of data but practically increases the noise and redundancy during its analysis.

towardsdatascience.com

Comments / 0

IN THIS ARTICLE
#Data Mining#Feature Learning#Hughes#Knn#Pca#Vol
YOU MAY ALSO LIKE
NewsBreak
Technology
NewsBreak
Science
NewsBreak
Computer Science
Related
ScienceNature.com

Author Correction: High-dimensional hepatopath data analysis by machine learning for predicting HBV-related fibrosis

Correction to: Scientific Reports https://doi.org/10.1038/s41598-021-84556-4, published online 03 March 2021. The original version of this Article contained an error in Affiliation 2 and Affiliation 3, where the city was incorrectly given as ‘Zhenjiang’. The correct affiliations are listed below. School of Computer Science and Engineering, Jiangsu University of Technology, Changzhou...
Softwarearxiv.org

High-dimensional encryption in optical fibers using machine learning

Michelle L. J. Lollie, Fatemeh Mostafavi, Narayan Bhusal, Mingyuan Hong, Chenglong You, Roberto de J. León-Montiel, Omar S. Magaña-Loaiza, Mario A. Quiroz-Juárez. The ability to engineer the spatial wavefunction of photons has enabled a variety of quantum protocols for communication, sensing, and information processing. These protocols exploit the high dimensionality of structured light enabling the encodinng of multiple bits of information in a single photon, the measurement of small physical parameters, and the achievement of unprecedented levels of security in schemes for cryptography. Unfortunately, the potential of structured light has been restrained to free-space platforms in which the spatial profile of photons is preserved. Here, we make an important step forward to using structured light for fiber optical communication. We introduce a smart high-dimensional encryption protocol in which the propagation of spatial modes in multimode fibers is used as a natural mechanism for encryption. This provides a secure communication channel for data transmission. The information encoded in spatial modes is retrieved using artificial neural networks, which are trained from the intensity distributions of experimentally detected spatial modes. Our on-fiber communication platform allows us to use spatial modes of light for high-dimensional bit-by-bit and byte-by-byte encoding. This protocol enables one to recover messages and images with almost perfect accuracy. Our smart protocol for high-dimensional optical encryption in optical fibers has key implications for quantum technologies relying on structured fields of light, particularly those that are challenged by free-space propagation.
Computersarxiv.org

Spatially and color consistent environment lighting estimation using deep neural networks for mixed reality

Bruno Augusto Dorta Marques, Esteban Walter Gonzalez Clua, Anselmo Antunes Montenegro, Cristina Nader Vasconcelos. The representation of consistent mixed reality (XR) environments requires adequate real and virtual illumination composition in real-time. Estimating the lighting of a real scenario is still a challenge. Due to the ill-posed nature of the problem, classical inverse-rendering techniques tackle the problem for simple lighting setups. However, those assumptions do not satisfy the current state-of-art in computer graphics and XR applications. While many recent works solve the problem using machine learning techniques to estimate the environment light and scene's materials, most of them are limited to geometry or previous knowledge. This paper presents a CNN-based model to estimate complex lighting for mixed reality environments with no previous information about the scene. We model the environment illumination using a set of spherical harmonics (SH) environment lighting, capable of efficiently represent area lighting. We propose a new CNN architecture that inputs an RGB image and recognizes, in real-time, the environment lighting. Unlike previous CNN-based lighting estimation methods, we propose using a highly optimized deep neural network architecture, with a reduced number of parameters, that can learn high complex lighting scenarios from real-world high-dynamic-range (HDR) environment images. We show in the experiments that the CNN architecture can predict the environment lighting with an average mean squared error (MSE) of \num{7.85e-04} when comparing SH lighting coefficients. We validate our model in a variety of mixed reality scenarios. Furthermore, we present qualitative results comparing relights of real-world scenes.
Mathematicsarxiv.org

On Einstein equations with cosmological constant in braneworld models

In this paper, we investigate the Einstein equations with cosmological constant for Randall-Sundrum (RS) and Dvali-Gabadadze-Porrati (DGP) models to determine the warp functions in the context of warp product spacetimes. In RS model, it is shown that Einstein's equation in the bulk is reduced into the brane as a vacuum equation, having vacuum solution, which is not affected by the cosmological constant in the bulk. In DGP model, it is shown that the Einstein's equation in the bulk is reduced into the brane and along the extra dimension, where both equations are affected by the cosmological constant in the bulk. We have solved these equations in DGP model, subject to vanishing cosmological constants on the brane and along extra dimension, and obtained exact solutions for the warp functions. The solutions depend on the typical values of cosmological constant in the bulk as well as the dimension of the brane. So, corresponding to the typical values, some solutions have exponential behaviours which may be set to represent warp inflation on the brane, and some other solutions have oscillating behaviours which may be set to represent warp waves or branes waves along the extra dimension.
Coding & Programmingarxiv.org

DRB-GAN: A Dynamic ResBlock Generative Adversarial Network for Artistic Style Transfer

The paper proposes a Dynamic ResBlock Generative Adversarial Network (DRB-GAN) for artistic style transfer. The style code is modeled as the shared parameters for Dynamic ResBlocks connecting both the style encoding network and the style transfer network. In the style encoding network, a style class-aware attention mechanism is used to attend the style feature representation for generating the style codes. In the style transfer network, multiple Dynamic ResBlocks are designed to integrate the style code and the extracted CNN semantic feature and then feed into the spatial window Layer-Instance Normalization (SW-LIN) decoder, which enables high-quality synthetic images with artistic style transfer. Moreover, the style collection conditional discriminator is designed to equip our DRB-GAN model with abilities for both arbitrary style transfer and collection style transfer during the training stage. No matter for arbitrary style transfer or collection style transfer, extensive experiments strongly demonstrate that our proposed DRB-GAN outperforms state-of-the-art methods and exhibits its superior performance in terms of visual quality and efficiency. Our source code is available at \color{magenta}{\url{this https URL}}.
Computersarxiv.org

Deep neural network methods for solving forward and inverse problems of time fractional diffusion equations with conformable derivative

Physics-informed neural networks (PINNs) show great advantages in solving partial differential equations. In this paper, we for the first time propose to study conformable time fractional diffusion equations by using PINNs. By solving the supervise learning task, we design a new spatio-temporal function approximator with high data efficiency. L-BFGS algorithm is used to optimize our loss function, and back propagation algorithm is used to update our parameters to give our numerical solutions. For the forward problem, we can take IC/BCs as the data, and use PINN to solve the corresponding partial differential equation. Three numerical examples are are carried out to demonstrate the effectiveness of our methods. In particular, when the order of the conformable fractional derivative $\alpha$ tends to $1$, a class of weighted PINNs is introduced to overcome the accuracy degradation caused by the singularity of solutions. For the inverse problem, we use the data obtained to train the neural network, and the estimation of parameter $\lambda$ in the equation is elaborated. Similarly, we give three numerical examples to show that our method can accurately identify the parameters, even if the training data is corrupted with 1\% uncorrelated noise.
Sciencearxiv.org

Hybrid dynamical type theories for navigation

We present a hybrid dynamical type theory equipped with useful primitives for organizing and proving safety of navigational control algorithms. This type theory combines the framework of Fu--Kishida--Selinger for constructing linear dependent type theories from state-parameter fibrations with previous work on categories of hybrid systems under sequential composition. We also define a conjectural embedding of a fragment of linear-time temporal logic within our type theory, with the goal of obtaining interoperability with existing state-of-the-art tools for automatic controller synthesis from formal task specifications. As a case study, we use the type theory to organize and prove safety properties for an obstacle-avoiding navigation algorithm of Arslan--Koditschek as implemented by Vasilopoulos. Finally, we speculate on extensions of the type theory to deal with conjugacies between model and physical spaces, as well as hierarchical template-anchor relationships.
Beauty & Fashionarxiv.org

KCNet: An Insect-Inspired Single-Hidden-Layer Neural Network with Randomized Binary Weights for Prediction and Classification Tasks

Fruit flies are established model systems for studying olfactory learning as they will readily learn to associate odors with both electric shock or sugar rewards. The mechanisms of the insect brain apparently responsible for odor learning form a relatively shallow neuronal architecture. Olfactory inputs are received by the antennal lobe (AL) of the brain, which produces an encoding of each odor mixture across ~50 sub-units known as glomeruli. Each of these glomeruli then project its component of this feature vector to several of ~2000 so-called Kenyon Cells (KCs) in a region of the brain known as the mushroom body (MB). Fly responses to odors are generated by small downstream neuropils that decode the higher-order representation from the MB. Research has shown that there is no recognizable pattern in the glomeruli--KC connections (and thus the particular higher-order representations); they are akin to fingerprints~-- even isogenic flies have different projections. Leveraging insights from this architecture, we propose KCNet, a single-hidden-layer neural network that contains sparse, randomized, binary weights between the input layer and the hidden layer and analytically learned weights between the hidden layer and the output layer. Furthermore, we also propose a dynamic optimization algorithm that enables the KCNet to increase performance beyond its structural limits by searching a more efficient set of inputs. For odorant-perception tasks that predict perceptual properties of an odorant, we show that KCNet outperforms existing data-driven approaches, such as XGBoost. For image-classification tasks, KCNet achieves reasonable performance on benchmark datasets (MNIST, Fashion-MNIST, and EMNIST) without any data-augmentation methods or convolutional layers and shows particularly fast running time. Thus, neural networks inspired by the insect brain can be both economical and perform well.
Mathematicsarxiv.org

Bayesian and Algebraic Strategies to Design in Synthetic Biology

Innovation in synthetic biology often still depends on large-scale experimental trial-and-error, domain expertise, and ingenuity. The application of rational design engineering methods promise to make this more efficient, faster, cheaper and safer. But this requires mathematical models of cellular systems. And for these models we then have to determine if they can meet our intended target behaviour. Here we develop two complementary approaches that allow us to determine whether a given molecular circuit, represented by a mathematical model, is capable of fulfilling our design objectives. We discuss algebraic methods that are capable of identifying general principles guaranteeing desired behaviour; and we provide an overview over Bayesian design approaches that allow us to choose from a set of models, that model which has the highest probability of fulfilling our design objectives. We discuss their uses in the context of biochemical adaptation, and then consider how robustness can and should affect our design approach.
Softwarearxiv.org

GCCAD: Graph Contrastive Coding for Anomaly Detection

Bo Chen, Jing Zhang, Xiaokang Zhang, Yuxiao Dong, Jian Song, Peng Zhang, Kaibo Xu, Evgeny Kharlamov, Jie Tang. Graph-based anomaly detection has been widely used for detecting malicious activities in real-world applications. Existing attempts to address this problem have thus far focused on structural feature engineering or learning in the binary classification regime. In this work, we propose to leverage graph contrastive coding and present the supervised GCCAD model for contrasting abnormal nodes with normal ones in terms of their distances to the global context (e.g., the average of all nodes). To handle scenarios with scarce labels, we further enable GCCAD as a self-supervised framework by designing a graph corrupting strategy for generating synthetic node labels. To achieve the contrastive objective, we design a graph neural network encoder that can infer and further remove suspicious links during message passing, as well as learn the global context of the input graph. We conduct extensive experiments on four public datasets, demonstrating that 1) GCCAD significantly and consistently outperforms various advanced baselines and 2) its self-supervised version without fine-tuning can achieve comparable performance with its fully supervised version.
arxiv.org

Understanding Data Visualization Design Practice

Professional roles for data visualization designers are growing in popularity, and interest in relationships between the academic research and professional practice communities is gaining traction. However, despite the potential for knowledge sharing between these communities, we have little understanding of the ways in which practitioners design in real-world, professional settings. Inquiry in numerous design disciplines indicates that practitioners approach complex situations in ways that are fundamentally different from those of researchers. In this work, I take a practice-led approach to understanding visualization design practice on its own terms. Twenty data visualization practitioners were interviewed and asked about their design process, including the steps they take, how they make decisions, and the methods they use. Findings suggest that practitioners do not follow highly systematic processes, but instead rely on situated forms of knowing and acting in which they draw from precedent and use methods and principles that are determined appropriate in the moment. These findings have implications for how visualization researchers understand and engage with practitioners, and how educators approach the training of future data visualization designers.
Technologyarxiv.org

Deep MRI Reconstruction with Radial Subsampling

In spite of its extensive adaptation in almost every medical diagnostic and examinatorial application, Magnetic Resonance Imaging (MRI) is still a slow imaging modality which limits its use for dynamic imaging. In recent years, Parallel Imaging (PI) and Compressed Sensing (CS) have been utilised to accelerate the MRI acquisition. In clinical settings, subsampling the k-space measurements during scanning time using Cartesian trajectories, such as rectilinear sampling, is currently the most conventional CS approach applied which, however, is prone to producing aliased reconstructions. With the advent of the involvement of Deep Learning (DL) in accelerating the MRI, reconstructing faithful images from subsampled data became increasingly promising. Retrospectively applying a subsampling mask onto the k-space data is a way of simulating the accelerated acquisition of k-space data in real clinical setting. In this paper we compare and provide a review for the effect of applying either rectilinear or radial retrospective subsampling on the quality of the reconstructions outputted by trained deep neural networks. With the same choice of hyper-parameters, we train and evaluate two distinct Recurrent Inference Machines (RIMs), one for each type of subsampling. The qualitative and quantitative results of our experiments indicate that the model trained on data with radial subsampling attains higher performance and learns to estimate reconstructions with higher fidelity paving the way for other DL approaches to involve radial subsampling.
Sciencetowardsdatascience.com

Use these tips to pass Technical Data Science Interviews!

I am writing this blog post series to share my journey what I learned when applying for Data Scientist/Machine Learning Engineer and interviewing positions at different companies. It was almost 5 years that I had applied for a job, and things have changed dramatically since I gave my last interview. There have been massive changes in the industry. It felt a little like uncharted territory but eventually, I got the hang of the overall process. I learned a lot of new things and had a lot of fun talking with experts in different industry domains where Machine Learning is applied.
Physicsarxiv.org

Realization of a transition between type-I and type-II Dirac semimetals in monolayers

The phase transition between type-I and type-II Dirac semimetals will reveal a series of significant physical properties because of their completely distinct electronic, optical and magnetic properties. However, no mechanism and materials have been proposed to realize the transition to date. Here, we propose that the transition can be realized in two-dimensional (2D) materials consisting of zigzag chains, by tuning external strains. The origination of the transition is that some orbital interactions in zigzag chains vary drastically with structural deformation, which changes dispersions of the corresponding bands. Two 2D nanosheets, monolayer PN and AsN, are searched out to confirm the mechanism by using first-principles calculations. They are intrinsic type-I or type-II Dirac materials, and transit to another type of Dirac materials by external strains. In addition, a possible routine is proposed to synthesize the new 2D structures.
Computersarxiv.org

Contextual Convolutional Neural Networks

We propose contextual convolution (CoConv) for visual recognition. CoConv is a direct replacement of the standard convolution, which is the core component of convolutional neural networks. CoConv is implicitly equipped with the capability of incorporating contextual information while maintaining a similar number of parameters and computational cost compared to the standard convolution. CoConv is inspired by neuroscience studies indicating that (i) neurons, even from the primary visual cortex (V1 area), are involved in detection of contextual cues and that (ii) the activity of a visual neuron can be influenced by the stimuli placed entirely outside of its theoretical receptive field. On the one hand, we integrate CoConv in the widely-used residual networks and show improved recognition performance over baselines on the core tasks and benchmarks for visual recognition, namely image classification on the ImageNet data set and object detection on the MS COCO data set. On the other hand, we introduce CoConv in the generator of a state-of-the-art Generative Adversarial Network, showing improved generative results on CIFAR-10 and CelebA. Our code is available at this https URL.
Softwaretowardsdatascience.com

Learn SQL Server Management Studio — Part 11 Intro to SQL Database & Server in the Cloud (Azure)

The skillset that will make you fun at parties! Step by step. Hi there! Welcome to the 11th instalment of a series of tutorials on SQL and SQL Server Studio. There’s a simple goal: To make you familiar and comfortable with the tool, and the language. “Why does it even matter?” I see you asking. It turns out that curiosity and side projects are often detrimental in getting picked on new projects or even getting hired for a new job. The mere fact you’ve already used an important tool such as SQL Server Studio and wrote some SQL queries, can and will give you a clear head start.
Softwaremakeuseof.com

GitHub Copilot: The Coding AI

If you're a programmer, there's a good chance you've become exhausted from writing lengthy programs (or you will!) And you've probably wondered to yourself, "What if I had someone sitting with me to help me create these programs?" Now you have GitHub Copilot, an Artificial Intelligence tool that helps you...
Computersarxiv.org

Variational Attention: Propagating Domain-Specific Knowledge for Multi-Domain Learning in Crowd Counting

In crowd counting, due to the problem of laborious labelling, it is perceived intractability of collecting a new large-scale dataset which has plentiful images with large diversity in density, scene, etc. Thus, for learning a general model, training with data from multiple different datasets might be a remedy and be of great value. In this paper, we resort to the multi-domain joint learning and propose a simple but effective Domain-specific Knowledge Propagating Network (DKPNet)1 for unbiasedly learning the knowledge from multiple diverse data domains at the same time. It is mainly achieved by proposing the novel Variational Attention(VA) technique for explicitly modeling the attention distributions for different domains. And as an extension to VA, Intrinsic Variational Attention(InVA) is proposed to handle the problems of over-lapped domains and sub-domains. Extensive experiments have been conducted to validate the superiority of our DKPNet over several popular datasets, including ShanghaiTech A/B, UCF-QNRF and NWPU.
Coding & Programmingtowardsdatascience.com

How to Convert Your Python Project into a Package Installable through pip

A tutorial with a ready-to-run template, describing how to transform a Python Project into a Package available in the Python Package Index. It may happen that when you create a new Python project, you have to reuse some previous code, which has been already well organised. Thus, you could convert this previous well-organised code into a package.

Comments / 0

Community Policy