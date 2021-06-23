Cancel
CreatorsPublishersAdvertisers
View more in
Markets

The 4-Dimensions Forecasting Framework

By Editors' Picks
towardsdatascience.com
 7 days ago

Cover picture for the articleWhen it comes to demand forecasting, most supply chains rely on populating 18-month forecasts with monthly buckets. Should this be considered a best practice, or is it merely a by-default, overlooked choice? I have seen countless supply chains forecasting demand at an irrelevant aggregation level (whether material, geographical, or temporal). In this article, I propose an original 4-dimensions forecasting framework that will enable you to set up a tailor-made forecasting process for your supply chain. I like to use this framework to kick off any forecasting project.

towardsdatascience.com
Community Policy
IN THIS ARTICLE
#Forecasting#Data Science#A Black Box#France#Temporal Aggregation
YOU MAY ALSO LIKE
Country
Belgium
News Break
Economy
News Break
Markets
News Break
Market Data
News Break
Marketing
Related
Coding & Programminggitconnected.com

How to Build a Decision Tree Model in Python?

We will discuss the Decision Trees because it is a pervasive topic. What we’re going to do here is group a series of Decision Trees into a single predictive model; that is, we’ll create models of Ensemble Methods. Decision Trees are among the predictive models that offer the highest level...
Mathematicstowardsdatascience.com

Unit 3) Genetic Algorithm: Benchmark Test Functions

Applying the Concepts in the Course on Genetic Algorithms to a range of Real Optimization Problems!. Hello and Welcome back to this full course on Evolutionary Computation! In this post we will cover a genetic algorithm for evaluating benchmark test functions from the material learned thus far from Unit 3, Genetic Algorithms. As this is a continuation of the series, if you have not checked out that article please do so so that you are not left out in the dark! You can check it out here:
Coding & ProgrammingEurekAlert

New data science platform speeds up Python queries

PROVIDENCE, R.I. [Brown University] -- Researchers from Brown University and MIT have developed a new data science framework that allows users to process data with the programming language Python -- without paying the "performance tax" normally associated with a user-friendly language. The new framework, called Tuplex, is able to process...
Technologytowardsdatascience.com

Smartphone for Activity Recognition (Part 2)

In the previous article, we were performing classification on the Human Activity Recognition dataset. We know that this dataset has so many features (561 to be exact) and some of them strongly correlate with each other. Random Forest model can classify human activities as good as 94% accuracy using this dataset. However, it takes forever to do so. The second candidate is k-NN model with 89% accuracy followed by Decision Tree model with 86% accuracy and Naive Bayes model with 73% accuracy. This makes us think,
Animalstowardsdatascience.com

Python Pandas vs. R Dplyr

Pandas for Python and Dplyr for R are the two most popular libraries for working with tabular/structured data for many Data Scientists. There is always this big and partly heated discussion on which framework is better. Honestly, does it really matter? In the end, it’s about getting the job done and both pandas and dplyr offer great tools for data wrangling. No worries, this article is not yet another comparison that tries to prove a point for either library! The purpose of this article therefore is:
Computerstowardsdatascience.com

Unit 3 Application) Evolving Neural Network for Time Series Analysis

Hello and Welcome back to this full course on Evolutionary Computation! In this post we will wrap up Unit 3 with the much anticipated application of evolving the weights of a Neural Network for Time Series Analysis!. The concepts and material you need to know to best understand this material...
Coding & Programmingtowardsdatascience.com

Significance or Hypothesis Tests with Python

In a series of weekly articles, I will cover some important statistics topics with a twist. The goal is to use Python to help us get intuition on complex concepts, empirically test theoretical proofs, or build algorithms from scratch. In this series, you will find articles covering topics such as random variables, sampling distributions, confidence intervals, significance tests, and more.
Cancerarxiv.org

Hierarchical Phenotyping and Graph Modeling of Spatial Architecture in Lymphoid Neoplasms

The cells and their spatial patterns in the tumor microenvironment (TME) play a key role in tumor evolution, and yet remains an understudied topic in computational pathology. This study, to the best of our knowledge, is among the first to hybrid local and global graph methods to profile orchestration and interaction of cellular components. To address the challenge in hematolymphoid cancers where the cell classes in TME are unclear, we first implemented cell level unsupervised learning and identified two new cell subtypes. Local cell graphs or supercells were built for each image by considering the individual cell's geospatial location and classes. Then, we applied supercell level clustering and identified two new cell communities. In the end, we built global graphs to abstract spatial interaction patterns and extract features for disease diagnosis. We evaluate the proposed algorithm on H\&E slides of 60 hematolymphoid neoplasm patients and further compared it with three cell level graph-based algorithms, including the global cell graph, cluster cell graph, and FLocK. The proposed algorithm achieves a mean diagnosis accuracy of 0.703 with the repeated 5-fold cross-validation scheme. In conclusion, our algorithm shows superior performance over the existing methods and can be potentially applied to other cancer types.
ComputersNature.com

The ENIGMA Toolbox: multiscale neural contextualization of multisite neuroimaging datasets

To the Editor— Among big-data neuroscience initiatives, the ENIGMA (Enhancing NeuroImaging Genetics through Meta-Analysis) Consortium—a worldwide alliance of over 2,000 scientists diversified into over 50 working groups—has yielded some of the largest studies of the healthy and diseased human brain. Through harmonized procedures and by sharing site-specific brain metrics (for example, cortical thickness) or aggregated statistical maps, ENIGMA has set the stage for large-scale analyses comparing findings across different topics or disorders1,2. In parallel, increasingly available resources offer opportunities to contextualize findings across multiscale brain organization. Examples include the Allen Human Brain Atlas3 (microarray-derived postmortem gene expression), the BigBrain Project4 (three-dimensional postmortem human brain histology) and the Human Connectome Project5 (high-definition in vivo functional and structural connectomics). Here we introduce the ENIGMA Toolbox, an open ecosystem for integration and visualization of multisite ENIGMA results and their multiscale neural contextualization. Our Toolbox relies on an efficient codebase for exploring and analyzing big data, aiming to facilitate and homogenize follow-up analyses of ENIGMA or other magnetic resonance imaging (MRI) datasets around the globe.
Coding & Programmingarxiv.org

SOLO: A Simple Framework for Instance Segmentation

Compared to many other dense prediction tasks, e.g., semantic segmentation, it is the arbitrary number of instances that has made instance segmentation much more challenging. In order to predict a mask for each instance, mainstream approaches either follow the 'detect-then-segment' strategy (e.g., Mask R-CNN), or predict embedding vectors first then cluster pixels into individual instances. In this paper, we view the task of instance segmentation from a completely new perspective by introducing the notion of "instance categories", which assigns categories to each pixel within an instance according to the instance's location. With this notion, we propose segmenting objects by locations (SOLO), a simple, direct, and fast framework for instance segmentation with strong performance. We derive a few SOLO variants (e.g., Vanilla SOLO, Decoupled SOLO, Dynamic SOLO) following the basic principle. Our method directly maps a raw input image to the desired object categories and instance masks, eliminating the need for the grouping post-processing or the bounding box detection. Our approach achieves state-of-the-art results for instance segmentation in terms of both speed and accuracy, while being considerably simpler than the existing methods. Besides instance segmentation, our method yields state-of-the-art results in object detection (from our mask byproduct) and panoptic segmentation. We further demonstrate the flexibility and high-quality segmentation of SOLO by extending it to perform one-stage instance-level image matting. Code is available at: this https URL.
Coding & Programmingarxiv.org

A Domain-Theoretic Approach to Statistical Programming Languages

We give a domain-theoretic semantics to a statistical programming language, using the plain old category of dcpos, in contrast to some more sophisticated recent proposals. Remarkably, our monad of minimal valuations is commutative, which allows for program transformations that permute the order of independent random draws, as one would expect. A similar property is not known for Jones and Plotkin' s monad of continuous valuations. Instead of working with true real numbers, we work with exact real arithmetic, providing a bridge towards possible implementations. (Implementations by themselves are not addressed here.) Rather remarkably, we show that restricting ourselves to minimal valuations does not restrict us much: all measures on the real line can be modeled by minimal valuations on the domain $\mathbf{I}\mathbb{R}_\bot$ of exact real arithmetic. We give three operational semantics for our language, and we show that they are all adequate with respect to the denotational semantics. We also explore quite a few examples in order to demonstrate that our semantics computes exactly as one would expect, and in order to debunk the myth that a semantics based on continuous maps would not be expressive enough to encode measures with non-compact support using only measures with compact support, or to encode measures via non-continuous density functions, for instance.
Coding & Programmingarxiv.org

S2C2 - An orthogonal method for Semi-Supervised Learning on fuzzy labels

Lars Schmarje, Monty Santarossa, Simon-Martin Schröder, Claudius Zelenka, Rainer Kiko, Jenny Stracke, Nina Volkmann, Reinhard Koch. Semi-Supervised Learning (SSL) can decrease the amount of required labeled image data and thus the cost for deep learning. Most SSL methods only consider a clear distinction between classes but in many real-world datasets, this clear distinction is not given due to intra- or interobserver variability. This variability can lead to different annotations per image. Thus many images have ambiguous annotations and their label needs to be considered "fuzzy". This fuzziness of labels must be addressed as it will limit the performance of Semi-Supervised Learning (SSL) and deep learning in general. We propose Semi-Supervised Classification & Clustering (S2C2) which can extend many deep SSL algorithms. S2C2 can estimate the fuzziness of a label and applies SSL as a classification to certainly labeled data while creating distinct clusters for images with similar but fuzzy labels. We show that S2C2 results in median 7.4% better F1-score for classifications and 5.4% lower inner distance of clusters across multiple SSL algorithms and datasets while being more interpretable due to the fuzziness estimation of our method. Overall, a combination of Semi-Supervised Learning with our method S2C2 leads to better handling of the fuzziness of labels and thus real-world datasets.
Sciencearxiv.org

The MultiBERTs: BERT Reproductions for Robustness Analysis

Thibault Sellam, Steve Yadlowsky, Jason Wei, Naomi Saphra, Alexander D'Amour, Tal Linzen, Jasmijn Bastings, Iulia Turc, Jacob Eisenstein, Dipanjan Das, Ian Tenney, Ellie Pavlick. Experiments with pretrained models such as BERT are often based on a single checkpoint. While the conclusions drawn apply to the artifact (i.e., the particular instance...
Sciencearxiv.org

Bayesian Spanning Tree: Estimating the Backbone of the Dependence Graph

In multivariate data analysis, it is often important to estimate a graph characterizing dependence among (p) variables. A popular strategy uses the non-zero entries in a (p\times p) covariance or precision matrix, typically requiring restrictive modeling assumptions for accurate graph recovery. To improve model robustness, we instead focus on estimating the {\em backbone} of the dependence graph. We use a spanning tree likelihood, based on a minimalist graphical model that is purposely overly-simplified. Taking a Bayesian approach, we place a prior on the space of trees and quantify uncertainty in the graphical model. In both theory and experiments, we show that this model does not require the population graph to be a spanning tree or the covariance to satisfy assumptions beyond positive-definiteness. The model accurately recovers the backbone of the population graph at a rate competitive with existing approaches but with better robustness. We show combinatorial properties of the spanning tree, which may be of independent interest, and develop an efficient Gibbs sampler for Bayesian inference. Analyzing electroencephalography data using a Hidden Markov Model with each latent state modeled by a spanning tree, we show that results are much more interpretable compared with popular alternatives.
Sciencearxiv.org

AdaGDA: Faster Adaptive Gradient Descent Ascent Methods for Minimax Optimization

In the paper, we propose a class of faster adaptive gradient descent ascent methods for solving the nonconvex-strongly-concave minimax problems by using unified adaptive matrices used in the SUPER-ADAM \citep{huang2021super}. Specifically, we propose a fast adaptive gradient decent ascent (AdaGDA) method based on the basic momentum technique, which reaches a low sample complexity of $O(\kappa^4\epsilon^{-4})$ for finding an $\epsilon$-stationary point without large batches, which improves the existing result of adaptive minimax optimization method by a factor of $O(\sqrt{\kappa})$. Moreover, we present an accelerated version of AdaGDA (VR-AdaGDA) method based on the momentum-based variance reduced technique, which achieves the best known sample complexity of $O(\kappa^3\epsilon^{-3})$ for finding an $\epsilon$-stationary point without large batches. Further assume the bounded Lipschitz parameter of objective function, we prove that our VR-AdaGDA method reaches a lower sample complexity of $O(\kappa^{2.5}\epsilon^{-3})$ with the mini-batch size $O(\kappa)$. In particular, we provide an effective convergence analysis framework for our adaptive methods based on unified adaptive matrices, which include almost existing adaptive learning rates.
Computersarxiv.org

Efficient Spatio-Temporal Recurrent Neural Network for Video Deblurring

Real-time video deblurring still remains a challenging task due to the complexity of spatially and temporally varying blur itself and the requirement of low computational cost. To improve the network efficiency, we adopt residual dense blocks into RNN cells, so as to efficiently extract the spatial features of the current frame. Furthermore, a global spatio-temporal attention module is proposed to fuse the effective hierarchical features from past and future frames to help better deblur the current frame. Another issue needs to be addressed urgently is the lack of a real-world benchmark dataset. Thus, we contribute a novel dataset (BSD) to the community, by collecting paired blurry/sharp video clips using a co-axis beam splitter acquisition system. Experimental results show that the proposed method (ESTRNN) can achieve better deblurring performance both quantitatively and qualitatively with less computational cost against state-of-the-art video deblurring methods. In addition, cross-validation experiments between datasets illustrate the high generality of BSD over the synthetic datasets. The code and dataset are released at this https URL.
Computersarxiv.org

Content-Aware Convolutional Neural Networks

Convolutional Neural Networks (CNNs) have achieved great success due to the powerful feature learning ability of convolution layers. Specifically, the standard convolution traverses the input images/features using a sliding window scheme to extract features. However, not all the windows contribute equally to the prediction results of CNNs. In practice, the convolutional operation on some of the windows (e.g., smooth windows that contain very similar pixels) can be very redundant and may introduce noises into the computation. Such redundancy may not only deteriorate the performance but also incur the unnecessary computational cost. Thus, it is important to reduce the computational redundancy of convolution to improve the performance. To this end, we propose a Content-aware Convolution (CAC) that automatically detects the smooth windows and applies a 1x1 convolutional kernel to replace the original large kernel. In this sense, we are able to effectively avoid the redundant computation on similar pixels. By replacing the standard convolution in CNNs with our CAC, the resultant models yield significantly better performance and lower computational cost than the baseline models with the standard convolution. More critically, we are able to dynamically allocate suitable computation resources according to the data smoothness of different images, making it possible for content-aware computation. Extensive experiments on various computer vision tasks demonstrate the superiority of our method over existing methods.
Career Development & Advicetowardsdatascience.com

5 Non-Obvious Ways to Make Data Engineers Love Working for You

And how to become a better data leader in the process. “Hiring data engineers is a piece of cake!” said no one ever. While it’s never been easy to recruit engineers — period, data engineers are a whole new ballgame. Despite the hypergrowth of the data engineering profession, hiring the backbone of your data team has never been more challenging.
Computerstowardsdatascience.com

6 Research Papers about Machine Learning Deployment Phase

A beginner's mistake is to ignore research. Reading research is daunting, especially when you’re not from an academic background, like me. Nonetheless, it ought to be done. Ignoring research can easily lead to you falling behind with your skills set because research paints the scope of the current problems being grappled with. Therefore, to remain relevant as a machine learning practitioner involves adopting the academic mindset and habits [to some degree].
towardsdatascience.com

Data Strata

Uncovering neutrality and transparency in data visualisation. This essay looks into the matters of neutrality and transparency in. data visualisation design. More specifically, it disaggregates the several “data strata” involved in the production and consumption of data visualisation, of which, amongst the numeric and visual, the designer is also one; and subsequently proposes how each stratum should contribute to seeing data and its visualisations as subjective rather than objective practices. Finally, it touches upon the accountability that the designer and responsibility that the reader hold when engaging with a piece of data visualisation design.