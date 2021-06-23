Cancel
CreatorsPublishersAdvertisers
View more in
Coding & Programming

To retrain, or not to retrain?

By Editors' Picks
towardsdatascience.com
 9 days ago

Cover picture for the articleIs it time to retrain your machine learning model?. Even though data science is all about… data, the answer to this question is surprisingly often based on a gut feeling. Some retrain the models overnight — because it is convenient. Others do it every month — it seemed about right, and someone had to pick the schedule. Or even “when users come complaining” — ouch!

towardsdatascience.com
Community Policy
IN THIS ARTICLE
#Consumer Behavior#Small Data#Real Time Data#Data Storage#Retail Sales
YOU MAY ALSO LIKE
News Break
Retail
News Break
Technology
News Break
Computers
News Break
Science
News Break
Coding & Programming
News Break
Computer Science
Related
Computer Sciencetowardsdatascience.com

Why You Failed Your Machine Learning Interview

My experience in the ML interview process. 11 exemplary questions and ways to answer them. Not so long ago I left university with a master's degree in computer science. And I absolutely knew I had to find a job in the realm of Machine Learning(ML) and Data Science. Some work experience in data science and data engineering and a master's degree from a top university was on my resume. But this is only a door opener, to get the job you need to convenience them in the interview.
Internettechxplore.com

Facebook AI guru says regulate its use, not the tech

Artificial intelligence itself should not be targeted by regulators, but how it is used, Facebook's top executive developing the technology told AFP. "I am generally in favour of regulating a particular application rather than a technology" in general, Yann LeCun, said in an interview. Facebook's chief AI scientist and one...
Fordtowardsdatascience.com

Data’s big whiff

How to escape our dashboard rat race, learn from data, and love the job again. Fool me once, shame on you. Fool me twice, shame on me. Fool me a thousand times, and I might be a data scientist, answering the same ad hoc questions I answered a month ago, wondering why I’m still not working on more interesting projects despite building more dashboards than a Ford factory and writing enough documentation to land a golf cart on the moon.
Computer Sciencetechxplore.com

AI and marshmallows: Developing human-AI collaboration

Despite unprecedented advancements in technology and countless depictions of complex human-AI interactions in sci-fi movies, we have yet to fully achieve AI bots that can engage in conversation as naturally as humans can. Kushal Chawla, researcher at the USC Institute for Creative Technologies (ICT) and a doctoral student in computer science, and collaborators at both the USC Information Sciences Institute (ISI) and ICT are taking us one step closer to this reality by teaching AI how to negotiate with humans.
Coding & Programminggitconnected.com

How to Build a Decision Tree Model in Python?

We will discuss the Decision Trees because it is a pervasive topic. What we’re going to do here is group a series of Decision Trees into a single predictive model; that is, we’ll create models of Ensemble Methods. Decision Trees are among the predictive models that offer the highest level...
Softwarehelpnetsecurity.com

RtBrick Management API simplifies integration with existing OSS and BSS systems

RtBrick has announced a new Management API for its disaggregated routing software that simplifies the integration with existing OSS and BSS systems. It dramatically reduces the amount of time and effort required to make disaggregated networks operational by using widely adopted industry tools and programming languages. Many of the world’s...
United Nationstowardsdatascience.com

A Discourse on Reinforcement Learning

The narrative presents an expansive setting, with multiple paradigms, related around the theme of Reinforcement Learning(RL). We believe that such a setting may help the reader to perceive a broader view of RL, realizing its underlying assumptions, and foreseeing unexplored links for further research. This is the first of the...
Sciencetowardsdatascience.com

How to Break Down Silos and Find Community as a Data Scientist

How did you decide to go into data science and — more specifically — into the area of data science you’re currently focused on?. My journey into data science has been a pretty interesting one. I am an electrical engineer; post-graduation, I worked in a power utility company in the capital city of India. My daily work revolved around power transformers, grids, and substations. However, as luck would have it, I got a chance to work in a department of electronic meters called Advanced Metering Infrastructure (AMI), where we would sift through the humongous electronic-meter data to identify potential fraud or fault in meters. This was a life-changing moment for me, as I started seeing the immense value that data could bring to any business. I started researching and literally opened a pandora’s box, albeit in a positive way.
Mathematicstowardsdatascience.com

Unit 3) Genetic Algorithm: Benchmark Test Functions

Applying the Concepts in the Course on Genetic Algorithms to a range of Real Optimization Problems!. Hello and Welcome back to this full course on Evolutionary Computation! In this post we will cover a genetic algorithm for evaluating benchmark test functions from the material learned thus far from Unit 3, Genetic Algorithms. As this is a continuation of the series, if you have not checked out that article please do so so that you are not left out in the dark! You can check it out here:
Sciencearxiv.org

A Survey on Graph-Based Deep Learning for Computational Histopathology

With the remarkable success of representation learning for prediction problems, we have witnessed a rapid expansion of the use of machine learning and deep learning for the analysis of digital pathology and biopsy image patches. However, traditional learning over patch-wise features using convolutional neural networks limits the model when attempting to capture global contextual information. The phenotypical and topological distribution of constituent histological entities play a critical role in tissue diagnosis. As such, graph data representations and deep learning have attracted significant attention for encoding tissue representations, and capturing intra- and inter- entity level interactions. In this review, we provide a conceptual grounding of graph-based deep learning and discuss its current success for tumor localization and classification, tumor invasion and staging, image retrieval, and survival prediction. We provide an overview of these methods in a systematic manner organized by the graph representation of the input image including whole slide images and tissue microarrays. We also outline the limitations of existing techniques, and suggest potential future advances in this domain.
Computersarxiv.org

Title:Towards Measuring Bias in Image Classification

Authors:Nina Schaaf, Omar de Mitri, Hang Beom Kim, Alexander Windberger, Marco F. Huber. Abstract: Convolutional Neural Networks (CNN) have become de fact state-of-the-art for the main computer vision tasks. However, due to the complex underlying structure their decisions are hard to understand which limits their use in some context of the industrial world. A common and hard to detect challenge in machine learning (ML) tasks is data bias. In this work, we present a systematic approach to uncover data bias by means of attribution maps. For this purpose, first an artificial dataset with a known bias is created and used to train intentionally biased CNNs. The networks' decisions are then inspected using attribution maps. Finally, meaningful metrics are used to measure the attribution maps' representativeness with respect to the known bias. The proposed study shows that some attribution map techniques highlight the presence of bias in the data better than others and metrics can support the identification of bias.
Coding & Programmingtowardsdatascience.com

Python and Data Science Snippets on the Command Line

Data Science Simplified is a website I created that sends daily Python and data science tips to your mailbox. The tip is designed so that you can gain useful knowledge in 1 minute and go on with your day. However, sometimes you might want to search for certain tips when...
Computersarxiv.org

Unsupervised Model Drift Estimation with Batch Normalization Statistics for Dataset Shift Detection and Model Selection

While many real-world data streams imply that they change frequently in a nonstationary way, most of deep learning methods optimize neural networks on training data, and this leads to severe performance degradation when dataset shift happens. However, it is less possible to annotate or inspect newly streamed data by humans, and thus it is desired to measure model drift at inference time in an unsupervised manner. In this paper, we propose a novel method of model drift estimation by exploiting statistics of batch normalization layer on unlabeled test data. To remedy possible sampling error of streamed input data, we adopt low-rank approximation to each representational layer. We show the effectiveness of our method not only on dataset shift detection but also on model selection when there are multiple candidate models among model zoo or training trajectories in an unsupervised way. We further demonstrate the consistency of our method by comparing model drift scores between different network architectures.
Coding & Programmingtowardsdatascience.com

Reinforcement Learning vs Bayesian Optimization: when to use what

Note from Towards Data Science’s editors: While we allow independent authors to publish articles in accordance with our rules and guidelines, we do not endorse each author’s contribution. You should not rely on an author’s works without seeking professional advice. See our Reader Terms for details. Optimization is the key...
Coding & Programmingarxiv.org

Reducing the Variance of Gaussian Process Hyperparameter Optimization with Preconditioning

Gaussian processes remain popular as a flexible and expressive model class, but the computational cost of kernel hyperparameter optimization stands as a major limiting factor to their scaling and broader adoption. Recent work has made great strides combining stochastic estimation with iterative numerical techniques, essentially boiling down GP inference to the cost of (many) matrix-vector multiplies. Preconditioning -- a highly effective step for any iterative method involving matrix-vector multiplication -- can be used to accelerate convergence and thus reduce bias in hyperparameter optimization. Here, we prove that preconditioning has an additional benefit that has been previously unexplored. It not only reduces the bias of the $\log$-marginal likelihood estimator and its derivatives, but it also simultaneously can reduce variance at essentially negligible cost. We leverage this result to derive sample-efficient algorithms for GP hyperparameter optimization requiring as few as $\mathcal{O}(\log(\varepsilon^{-1}))$ instead of $\mathcal{O}(\varepsilon^{-2})$ samples to achieve error $\varepsilon$. Our theoretical results enable provably efficient and scalable optimization of kernel hyperparameters, which we validate empirically on a set of large-scale benchmark problems. There, variance reduction via preconditioning results in an order of magnitude speedup in hyperparameter optimization of exact GPs.
Coding & Programmingarxiv.org

AdaXpert: Adapting Neural Architecture for Growing Data

In real-world applications, data often come in a growing manner, where the data volume and the number of classes may increase dynamically. This will bring a critical challenge for learning: given the increasing data volume or the number of classes, one has to instantaneously adjust the neural model capacity to obtain promising performance. Existing methods either ignore the growing nature of data or seek to independently search an optimal architecture for a given dataset, and thus are incapable of promptly adjusting the architectures for the changed data. To address this, we present a neural architecture adaptation method, namely Adaptation eXpert (AdaXpert), to efficiently adjust previous architectures on the growing data. Specifically, we introduce an architecture adjuster to generate a suitable architecture for each data snapshot, based on the previous architecture and the different extent between current and previous data distributions. Furthermore, we propose an adaptation condition to determine the necessity of adjustment, thereby avoiding unnecessary and time-consuming adjustments. Extensive experiments on two growth scenarios (increasing data volume and number of classes) demonstrate the effectiveness of the proposed method.
Coding & ProgrammingScience Daily

New data science platform speeds up Python queries

Researchers from Brown University and MIT have developed a new data science framework that allows users to process data with the programming language Python -- without paying the "performance tax" normally associated with a user-friendly language. The new framework, called Tuplex, is able to process data queries written in Python...
Softwaretowardsdatascience.com

Word, Subword, and Character-Based Tokenization: Know the Difference

The differences that anyone working on an NLP project should know. Natural Language Processing (NLP) is a branch of Artificial Intelligence (AI) that provides machines (computers) the ability to understand written and spoken human language in the same way as human beings. NLP is almost everywhere and helping people in their daily tasks. 😍 It is such a common technology now that we often take it for granted. A few examples are spell check, autocomplete, spam detection, Alexa, or Google assistant. NLP can be taken for granted, but one can never forget that machines work with numbers and not letters/words/sentences. So to work with a large amount of text data readily available on the internet, we need manipulation and cleaning of text which we commonly call text pre-processing in NLP.
Animalstowardsdatascience.com

Python Pandas vs. R Dplyr

Pandas for Python and Dplyr for R are the two most popular libraries for working with tabular/structured data for many Data Scientists. There is always this big and partly heated discussion on which framework is better. Honestly, does it really matter? In the end, it’s about getting the job done and both pandas and dplyr offer great tools for data wrangling. No worries, this article is not yet another comparison that tries to prove a point for either library! The purpose of this article therefore is: