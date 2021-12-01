ContributorsPublishersAdvertisers
“My data drifted. What’s next?” How to handle ML model drift in production.

Cover picture for the articleAn introductory overview of the possible steps. “I have a model in production, and the data is drifting. How to react?”. That is a question we often get. This data drift might be the only signal. You are predicting something, but don’t know the facts yet. Statistical change in model inputs...

InfoQ.com

QCon Plus ML Panel Discussion: ML in Production - What's Next?

The recent QCon Plus online conference featured a panel discussion titled "ML in Production - What's Next?" Some key takeaways were that many ML projects fail in production because of poor engineering infrastructure and a lack of intra-disciplinary communication, and that both model explainability and ML for edge computing are important technologies that are still not mature.
COMPUTERS
cell.com

The non-linear impact of data handling on network diffusion models

An overview of basic data handling and common pitfalls. A framework for analyzing the impact of data handling on network diffusion models. Results showing the impact of data handling on various standard network models. The bigger picture. Many computational models rely on real-world data, and the steps required in moving...
COMPUTERS
towardsdatascience.com

How to track statistics on all queries in your Postgres database to prevent slow queries or bottlenecks

Tweak your database performance to perfection with the crucial statistics that this extensions offers you. Have you ever wondered why some parts of your application are suddenly very slow? Can it be your database? How would you find out? Wouldn’t it be nice to have an extensions that tracks statistics over all queries that it executes so that you can analyze your database performance and clear up bottlenecks?
SOFTWARE
IN THIS ARTICLE
#Data Model#Data Architecture#Data Transformation#Data Preparation#Source Data
towardsdatascience.com

What’s in Store for the Future of the Modern Data Stack?

Bob Muglia, the former CEO of Snowflake, discusses what’s next for the tooling and technologies powering data analytics and engineering. A few weeks ago, I had the opportunity to chat with Bob Muglia, former CEO of Snowflake and one of the pioneers of the modern data stack, to learn about his predictions for the future of our industry.
MARKETS
towardsdatascience.com

Serving ML Models with gRPC

Most people who are looking to put their newly trained ML model into production turn to REST¹ APIs. Here’s why I think you should consider using gRPC instead. Nothing! The main benefit of REST APIs is their ubiquity. Every major programming language has a way of making HTTP clients and servers. And there are several existing frameworks for wrapping ML models with REST APIs (e.g. BentoML, TF Serving, etc). But, if your use case doesn’t fit one of those tools (and even if it does), you may find yourself wanting to write something a little more custom. And the same thing that makes REST APIs versatile can also make them difficult to work with.
CODING & PROGRAMMING
towardsdatascience.com

Setting up a Text Summarisation Project (Part 1)

How to establish a baseline with a no-ML “model”. This is the first part of a tutorial on setting up a text summarisation project. For more context and an overview of this tutorial, please refer back to the introduction. In this part we will establish a baseline using a very...
CODING & PROGRAMMING
towardsdatascience.com

Lessons Learned from Integrating the Human for Data Analytics

Reflections from 6 years of developing human-in-the-loop (HITL) tools. For most technical folks, coding things up is easy. If there is a tool you are not happy with, you can hack one up yourself without much of a hassle. If you want to extract data, you can quickly write up some regular expressions. If you want to combine some CSV files together, you can quickly create the Python script for that. If you need to debug a program, you know the tools and the ins and outs of debugging tools to be able to diagnose the fault of your programs.
SOFTWARE
NewsBreak
Technology
NewsBreak
Computers
towardsdatascience.com

How to Build a Poisson Hidden Markov Model Using Python and Statsmodels

A Poisson Hidden Markov Model is a mixture of two regression models: A Poisson regression model which is visible and a Markov model which is ‘hidden’. In a Poisson HMM, the mean value predicted by the Poisson model depends on not only the regression variables of the Poisson model, but also on the current state or regime that the hidden Markov process is in.
CODING & PROGRAMMING
towardsdatascience.com

Examples of Multi-Cursor for working with Data

How to save time and nerves when coding for data analysis in VS Code using Multi-Cursor and selection features. Working with data can be very dynamic with repeated forward and backward motions through your code to adjust and copy snippets, introducing new assumptions, filters or steps that have an impact way below. This happens notably often when doing EDA or developing processing pipelines and you have to find your path to the solution somewhat experimentally, asking yourself: How would this look for a different variable? How does it change if trimmed to the 99th percentile? Is it faster and still working, if I filter two steps earlier? This process is called discovery for a reason, but if we’re honest to ourselves, that’s what we like about it. New ideas can come very fast in this process. However, changing the code to cater these ideas can be a grind. This is where I see the benefit of Multi-Cursor and advanced selection features.
CODING & PROGRAMMING
towardsdatascience.com

How to add gradient background to Python plots

This article discusses a simple way to add gradient background to plots in Python. We may at times need to add multi-color background to plots to highlight a few sections of the plot. For example, we may highlight the periods of recession and recovery in red and green respectively. In this article, we’ll discuss a simple yet effective way to do this in Python. For Tableau users, this is similar to adding reference bands.
MARKETS
towardsdatascience.com

Speed up Hugging Face Training Jobs on AWS by Up to 50% with SageMaker Training Compiler

Deep neural networks have been steadily becoming larger every year, as hardware and algorithmic advancements have allowed for neural networks composed of hundreds of billions of parameters. Transformer models in particular, often used for Natural Language Processing (NLP) tasks, have seen the number of parameters used soar in recent years. For example, Bidirectional Encoder Representations from Transformers (BERT) large proposed in 2018 has over 340 million parameters, and the Switch Transformer proposed in 2021 has 1.6 trillion parameters.
COMPUTERS
High Point Enterprise

How HPE Ezmeral ML Ops addresses trends in data science and machine learning

In a recent report published by Gartner® (Data Science and Machine Learning Trends You Can’t Ignore, 27 Sept 2021)1, they discussed key trends influencing the future of machine learning (ML) and data science (DS). This blog shows you how HPE Ezmeral ML Ops is already providing capabilities to allow you to stay ahead of these trends.
SOFTWARE
towardsdatascience.com

How to Easily Cluster Textual Data in Python

With this method, you’ll never have to manually cluster survey answers ever again. Text data is notoriously annoying, I really don’t enjoy working with it. Especially survey data — whose bright idea was it to let people type whatever they want?. In most research companies, some poor person will have...
CODING & PROGRAMMING
towardsdatascience.com

You’re Likely Learning Data Science Wrong, Here’s How to Learn Instead

Most young, aspiring data scientists I’ve talked to are really enticed by the accolades, just like I was. I’ve spent a lot of time researching the perfect courses, programs, and achievements so that I can have a suite of badges that look impressive to others. The problem? You impress the...
COMPUTERS
towardsdatascience.com

Getting Started with Trino Query Engine

A step-by-step tutorial on how to install Trino, connect it to a SQL server, and write a simple Python Client. Trino is a distributed open source SQL query engine for Big Data Analytics. It can run distributed and parallel queries thus it is incredibly fast. Trino can run both on on-premise and cloud environments, such as Google, Azure, and Amazon.
SOFTWARE
towardsdatascience.com

A Complete 26 Week Course to Learn Python for Data Science in 2022

Learn most of the Python stuff you need for data science in 26 weeks. Python is a programming language used by many data scientists to clean data, make visualizations and build models. Learning Python for data science has never been easier — there are tons of free guides and tutorials out there that you can use to your advantage.
CODING & PROGRAMMING
towardsdatascience.com

Self-Training Classifier: How to Make Any Algorithm Behave Like a Semi-Supervised One

An easy Python implementation of Self-Training using standard classification algorithms from the Sklearn library. Semi-Supervised Learning combines labeled and unlabeled examples to expand the available data pool for model training. As a result, we can improve model performance and save a lot of time and money by not having to label thousands of examples manually.
CODING & PROGRAMMING
towardsdatascience.com

AI Explainability Requires Robustness

How robustness to adversarial input perturbations affects model interpretability. Due to their opaqueness, a great deal of mystique surrounds the apparent power of deep neural networks. Consequently, we often want to gain better insight into our models through explanations of their behavior. Meanwhile, as we will see, the existence of adversarial examples — known to plague typical neural networks — implies that explanations will often be unintelligible. Luckily, recent effort seeking to find ways to train so-called robust models reveals a pathway to more interpretable models; namely, models that are trained to be robust to adversarial input perturbations exhibit higher-quality explanations.
COMPUTERS
towardsdatascience.com

Defining the Moving Average Model for Time Series Forecasting in Python

Explore the moving average model and discover how we can use the ACF plot to identify the right MA(q) model for our time series. One of the foundational models for time series forecasting is the moving average model, denoted as MA(q). This is one of the basic statistical models that is a building block of more complex models such as the ARMA, ARIMA, SARIMA and SARIMAX models. A deep understanding of the MA(q) is thus a key step prior to using more complex models to forecast intricate time series.
SOFTWARE

