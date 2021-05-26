newsbreak-logo
Exploratory Data Analysis in R: Data Summarising, Visualization, and Predictive Model

Cover picture for the articleExploratory data analysis is unavoidable to understand any dataset. It includes data summarization, visualization, some statistical analysis, and predictive analysis. This article will focus on data storytelling or exploratory data analysis using R and different packages of R. This article will cover:. The Summarization and Visualization of Some Key Points.

Softwarearxiv.org

On Data-centric Forwarding in Mobile Ad-hoc Networks: Baseline Design and Simulation Analysis

IP networking deals with end-to-end communication where the network layer routing protocols maintain the reachability from one address to another. However, challenging environments, such as mobile ad-hoc networks or MANETs, lead to frequent path failures and changes between the sender and receiver, incurring higher packet loss. The obligatory route setup and maintenance of a device-to-device stable path in MANETs incur significant data retrieval delay and transmission overhead. Such overhead exaggerates the packet loss manifold.
Entertainmentarxiv.org

Visual representation of negation: Real world data analysis on comic image design

There has been a widely held view that visual representations (e.g., photographs and illustrations) do not depict negation, for example, one that can be expressed by a sentence "the train is not coming". This view is empirically challenged by analyzing the real-world visual representations of comic (manga) illustrations. In the experiment using image captioning tasks, we gave people comic illustrations and asked them to explain what they could read from them. The collected data showed that some comic illustrations could depict negation without any aid of sequences (multiple panels) or conventional devices (special symbols). This type of comic illustrations was subjected to further experiments, classifying images into those containing negation and those not containing negation. While this image classification was easy for humans, it was difficult for data-driven machines, i.e., deep learning models (CNN), to achieve the same high performance. Given the findings, we argue that some comic illustrations evoke background knowledge and thus can depict negation with purely visual elements.
Computersarxiv.org

Model-based and Data-driven Approaches for Downlink Massive MIMO Channel Estimation

We study downlink channel estimation in a multi-cell Massive multiple-input multiple-output (MIMO) system operating in time-division duplex. The users must know their effective channel gains to decode their received downlink data. Previous works have used the mean value as the estimate, motivated by channel hardening. However, this is associated with a performance loss in non-isotropic scattering environments. We propose two novel estimation methods that can be applied without downlink pilots. The first method is model-based and asymptotic arguments are utilized to identify a connection between the effective channel gain and the average received power during a coherence block. This second method is data-driven and trains a neural network to identify a mapping between the available information and the effective channel gain. Both methods can be utilized for any channel distribution and precoding. For the model-aided method, we derive closed-form expressions when using maximum ratio or zero-forcing precoding. We compare the proposed methods with the state-of-the-art using the normalized mean-squared error and spectral efficiency (SE). The results suggest that the two proposed methods provide better SE than the state-of-the-art when there is a low level of channel hardening, while the performance difference is relatively small with the uncorrelated channel model.
Coding & Programmingarxiv.org

Agilepy: A Python framework for scientific analysis of AGILE data

A. Bulgarelli, L. Baroncelli, A. Addis, N. Parmiggiani, A. Aboudan, A. Di Piano, V. Fioretti, M. Tavani, C. Pittori, F. Lucarelli, F. Verrecchia. The Italian AGILE space mission, with its Gamma-Ray Imaging Detector (GRID) instrument sensitive in the 30 MeV-50 GeV gamma-ray energy band, has been operating since 2007. Agilepy is an open-source Python package to analyse AGILE/GRID data. The package is built on top of the command-line version of the AGILE Science Tools, developed by the AGILE Team, publicly available and released by ASI/SSDC. The primary purpose of the package is to provide an easy to use high-level interface to analyse AGILE/GRID data by simplifying the configuration of the tasks and ensuring straightforward access to the data. The current features are the generation and display of sky maps and light curves, the access to gamma-ray sources catalogues, the analysis to perform spectral model and position fitting, the wavelet analysis. Agilepy also includes an interface tool providing the time evolution of the AGILE off-axis viewing angle for a chosen sky region. The Flare Advocate team also uses the tool to analyse the data during the daily monitoring of the gamma-ray sky. Agilepy (and its dependencies) can be easily installed using Anaconda.
Environmentarxiv.org

Wildfires vegetation recovery through satellite remote sensing and Functional Data Analysis

In recent years wildfires have caused havoc across the world, especially aggravated in certain regions, due to climate change. Remote sensing has become a powerful tool for monitoring fires, as well as for measuring their effects on vegetation over the following years. We aim to explain the dynamics of wildfires' effects on a vegetation index (previously estimated by causal inference through synthetic controls) from pre-wildfire available information (mainly proceeding from satellites). For this purpose, we use regression models from Functional Data Analysis, where wildfire effects are considered functional responses, depending on elapsed time after each wildfire, while pre-wildfire information acts as scalar covariates. Our main findings show that vegetation recovery after wildfires is a slow process, affected by many pre-wildfire conditions, among which the richness and diversity of vegetation is one of the best predictors for the recovery.
Sciencearxiv.org

A flexible Bayesian non-confounding spatial model for analysis of dispersed count data in clinical studies

Mahsa Nadifar (1), Hossein Baghishani (1), Afshin Fallah (2) ((1) Shahrood University of Technology, (2) IKIU) In employing spatial regression models for counts, we usually meet two issues. First, ignoring the inherent collinearity between covariates and the spatial effect would lead to causal inferences. Second, real count data usually reveal over or under-dispersion where the classical Poisson model is not appropriate to use. We propose a flexible Bayesian hierarchical modeling approach by joining non-confounding spatial methodology and a newly reconsidered dispersed count modeling from the renewal theory to control the issues. Specifically, we extend the methodology for analyzing spatial count data based on the gamma distribution assumption for waiting times. The model can be formulated as a latent Gaussian model, and consequently, we can carry out the fast computation using the integrated nested Laplace approximation method. We also examine different popular approaches for handling spatial confounding and compare their performances in the presence of dispersion. We use the proposed methodology to analyze a clinical dataset related to stomach cancer incidence in Slovenia and perform a simulation study to understand the proposed approach's merits better.
Coding & Programmingr-bloggers.com

Filtering Data in R 10 Tips -tidyverse package

[This article was first published on Methods – finnstats, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. filtering data in r, In this...
Softwarearxiv.org

rta-dq-lib: a software library to perform online data quality analysis of scientific data

Leonardo Baroncelli, Andrea Bulgarelli, Nicolo Parmiggiani, Valentina Fioretti, Antonio Addis, Giovanni De Cesare, Ambra Di Piano, Vito Conforti, Fulvio Gianotti, Federico Russo, Gilles Maurin, Thomas Vuillaume, Pierre Aubert, Emilio Garcia, Antonio Zoccoli. The Cherenkov Telescope Array (CTA) is an initiative that is currently building the largest gamma-ray ground Observatory that...
Computersarxiv.org

High-Fidelity and Low-Latency Universal Neural Vocoder based on Multiband WaveRNN with Data-Driven Linear Prediction for Discrete Waveform Modeling

This paper presents a novel high-fidelity and low-latency universal neural vocoder framework based on multiband WaveRNN with data-driven linear prediction for discrete waveform modeling (MWDLP). MWDLP employs a coarse-fine bit WaveRNN architecture for 10-bit mu-law waveform modeling. A sparse gated recurrent unit with a relatively large size of hidden units is utilized, while the multiband modeling is deployed to achieve real-time low-latency usage. A novel technique for data-driven linear prediction (LP) with discrete waveform modeling is proposed, where the LP coefficients are estimated in a data-driven manner. Moreover, a novel loss function using short-time Fourier transform (STFT) for discrete waveform modeling with Gumbel approximation is also proposed. The experimental results demonstrate that the proposed MWDLP framework generates high-fidelity synthetic speech for seen and unseen speakers and/or language on 300 speakers training data including clean and noisy/reverberant conditions, where the number of training utterances is limited to 60 per speaker, while allowing for real-time low-latency processing using a single core of $\sim\!$ 2.1--2.7~GHz CPU with $\sim\!$ 0.57--0.64 real-time factor including input/output and feature extraction.
Softwaregim-international.com

IDS GeoRadar Upgrades GPR Data Analysis Software

IDS GeoRadar, part of Hexagon, has enhanced IQMaps, its post-processing software application for advanced GPR data analysis. The updated version includes new functionalities that improve the visualization of radar data and extend the application fields to void detection and archaeology. Thanks to these new functionalities, IQMaps now allows to multi-shape...
Computerstowardsdatascience.com

Model Tree: handle Data Shifts mixing Linear Model and Decision Tree

All the trained models are prone to become old and useless. It’s a well-known truth that, after some time, all the models are not able to be accurate. This is normal and due to a temporal shift that may occur in the data flow. Especially, most of the application which involve modeling the human activity must be monitored and continually updated. For example, the changes of some needs or market trends may influence the purchasing power of customers. If we are not able to take into account the changing of customer's habits, our predictions reveal to be not trustable over time.
Computersarxiv.org

Model selection of chaotic systems from data with hidden variables using sparse data assimilation

Many natural systems exhibit chaotic behaviour such as the weather, hydrology, neuroscience and population dynamics. Although many chaotic systems can be described by relatively simple dynamical equations, characterizing these systems can be challenging, due to sensitivity to initial conditions and difficulties in differentiating chaotic behavior from noise. Ideally, one wishes to find a parsimonious set of equations that describe a dynamical system. However, model selection is more challenging when only a subset of the variables are experimentally accessible. Manifold learning methods using time-delay embeddings can successfully reconstruct the underlying structure of the system from data with hidden variables, but not the equations. Recent work in sparse-optimization based model selection has enabled model discovery given a library of possible terms, but regression-based methods require measurements of all state variables. We present a method combining variational annealing -- a technique previously used for parameter estimation in chaotic systems with hidden variables -- with sparse optimization methods to perform model identification for chaotic systems with unmeasured variables. We applied the method to experimental data from an electrical circuit with Lorenz-system like behavior to successfully recover the circuit equations with two measured and one hidden variable. We discuss the robustness of our method to varying noise and manifold sampling using ground-truth time-series simulated from the classic Lorenz system.
Sciencearxiv.org

RTApipe, a framework to develop astronomical pipelines for the real-time analysis of scientific data

In the multi-messenger era, astronomical projects share information about transients phenomena issuing science alerts to the Scientific Community through different communications networks. This coordination is mandatory to understand the nature of these physical phenomena. For this reason, astrophysical projects rely on real-time analysis software pipelines to identify as soon as possible transients (e.g. GRBs), and to speed up external alerts' reaction time. These pipelines can share and receive the science alerts through the Gamma-ray Coordinates Network. This work presents a framework designed to simplify the development of real-time scientific analysis pipelines. The framework provides the architecture and the required automatisms to develop a real-time analysis pipeline, allowing the researchers to focus more on the scientific aspects. The framework has been successfully used to develop real-time pipelines for the scientific analysis of the AGILE space mission data. It is planned to reuse this framework for the Super-GRAWITA and AFISS projects. A possible future use for the Cherenkov Telescope Array (CTA) project is under evaluation.
Marketstowardsai.net

Data Science Job Market Trend Analysis for 2021

Know what employers are expecting for a data scientist role in 2021. Data analysis from over 3000+ data scientist job postings — extracted from several career portals using web scraping. Author(s): Sujan Shirol, Roberto Iriondo. Disclaimer: This article is only for educational purposes. We do not encourage anyone to scrape...
Coding & Programmingarxiv.org

Contrastive Model Inversion for Data-Free Knowledge Distillation

Model inversion, whose goal is to recover training data from a pre-trained model, has been recently proved feasible. However, existing inversion methods usually suffer from the mode collapse problem, where the synthesized instances are highly similar to each other and thus show limited effectiveness for downstream tasks, such as knowledge distillation. In this paper, we propose Contrastive Model Inversion~(CMI), where the data diversity is explicitly modeled as an optimizable objective, to alleviate the mode collapse issue. Our main observation is that, under the constraint of the same amount of data, higher data diversity usually indicates stronger instance discrimination. To this end, we introduce in CMI a contrastive learning objective that encourages the synthesizing instances to be distinguishable from the already synthesized ones in previous batches. Experiments of pre-trained models on CIFAR-10, CIFAR-100, and Tiny-ImageNet demonstrate that CMI not only generates more visually plausible instances than the state of the arts, but also achieves significantly superior performance when the generated data are used for knowledge distillation. Code is available at \url{this https URL}.
Coding & Programmingtowardsdatascience.com

TextPlot: R Library for Visualizing Text Data

Data visualization is an essential task in data science. We can gain insights from the data visualization so it can support our decision for our problems. Text data is one of the most analyzed data by many people. It is also one of the most complex data because we have to spend a lot of time preprocessing the data, ranging from tokenizing the text, deleting terms that are not meaningful, creating a document term matrix, etc.
Sciencearxiv.org

A Bayesian change point model for spatio-temporal data

Urbanization of an area is known to increase the temperature of the surrounding area. This phenomenon -- a so-called urban heat island (UHI) -- occurs at a local level over a period of time and has lasting impacts for historical data analysis. We propose a methodology to examine if long-term changes in temperature increases and decreases across time exist (and to what extent) at the local level for a given set of temperature readings at various locations. Specifically, we propose a Bayesian change point model for spatio-temporally dependent data where we select the number of change points at each location using a "forwards" selection process using deviance information criteria (DIC). We then fit the selected model and examine the linear slopes across time to quantify changes in long-term temperature behavior. We show the utility of this model and method using a synthetic data set and temperature measurements from eight stations in Utah consisting of daily temperature data for 60 years.
Sciencetowardsdatascience.com

Data Types in Data Science

A quick guide on the differences between quantitative and qualitative data. There are a lot of engineers who have never been involved in the field of statistics or data science. But in order to build data science pipelines or rewrite produced code by data scientists to an adequate, easily maintained code many nuances and misunderstandings arise from the engineering side. For those Data/ML engineers and novice data scientists, I make this series of posts. I’ll try to explain some basic approaches in plain English and, based on it, explain some of the Data Science basic concepts.
MarketsSentinel

[PDF] Data Science Platform Market Futuristic Opportunity Analysis

Data Science Platform Market Market In-Depth Information; Competitive Landscape by Geographical Analysis 2021-2027. The Data Science Platform Market market study is reprinted with the analysis covers the existing growth rate and international opportunities through the current trend and future scope. The report highlights the critical factors as attractive business growth, pricing strategy, and development landscape. The research provides an easy understanding of graphical representation based on the competitive landscape to readers and customers.