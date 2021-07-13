Cancel
CreatorsPublishersAdvertisers
View more in
Coding & Programming

Explainable AI (XAI) with SHAP -Multi-Class Classification Problem

By Editors' Picks
towardsdatascience.com
 13 days ago

Cover picture for the articlePractical guide for XAI analysis with SHAP for a Multi-class classification problem. Model explainability becomes a basic part of the machine learning pipeline. Keeping a machine learning model as a “black box” is not an option anymore. Luckily there are analytical tools such as (lime, ExplainerDashboard, Shapash, Dalex and more) that are evolving rapidly and becoming more popular. In a previous post we explained how to use SHAP for a regression problem. This guide provides a practical example on how to use and interpret the open source python package, SHAP, for XAI analysis in Multi-class classification problem and use it to improve the model.

towardsdatascience.com

Comments / 0

IN THIS ARTICLE
#Ai#Xai#Explainerdashboard#Kaggle#Csv#Display Labels#Xticks Rotation#Treeexplainer
YOU MAY ALSO LIKE
News Break
Technology
News Break
Computers
News Break
Coding & Programming
News Break
Python
Related
Softwarearxiv.org

Robust Learning for Text Classification with Multi-source Noise Simulation and Hard Example Mining

Many real-world applications involve the use of Optical Character Recognition (OCR) engines to transform handwritten images into transcripts on which downstream Natural Language Processing (NLP) models are applied. In this process, OCR engines may introduce errors and inputs to downstream NLP models become noisy. Despite that pre-trained models achieve state-of-the-art performance in many NLP benchmarks, we prove that they are not robust to noisy texts generated by real OCR engines. This greatly limits the application of NLP models in real-world scenarios. In order to improve model performance on noisy OCR transcripts, it is natural to train the NLP model on labelled noisy texts. However, in most cases there are only labelled clean texts. Since there is no handwritten pictures corresponding to the text, it is impossible to directly use the recognition model to obtain noisy labelled data. Human resources can be employed to copy texts and take pictures, but it is extremely expensive considering the size of data for model training. Consequently, we are interested in making NLP models intrinsically robust to OCR errors in a low resource manner. We propose a novel robust training framework which 1) employs simple but effective methods to directly simulate natural OCR noises from clean texts and 2) iteratively mines the hard examples from a large number of simulated samples for optimal performance. 3) To make our model learn noise-invariant representations, a stability loss is employed. Experiments on three real-world datasets show that the proposed framework boosts the robustness of pre-trained models by a large margin. We believe that this work can greatly promote the application of NLP models in actual scenarios, although the algorithm we use is simple and straightforward. We make our codes and three datasets publicly available\footnote{this https URL}.
Softwareaithority.com

Top Applications of Artificial Intelligence (AI) in Business

More businesses are now embracing Artificial Intelligence, causing the technology to show signs of acceleration. In IBM’s 2021 Global AI Adoption Index, one-third of companies are currently using AI in some way, while 43% are exploring. Experts believe that the accelerated AI rollout can partly be attributed to the pandemic. In addition, access to an artificial intelligence development company and advances for consultation and development services, and advances in AI tools have made it more accessible for most companies.
Pythontowardsdatascience.com

If you thought predictive analytics in HR would be smooth sailing after mastering machine learning, think again!

Sometime back, I wouldn’t have imagined writing on topics like predictive analytics and machine learning. It all used to seem too magical, far too difficult, and nearly impossible to get anywhere close to, especially in the world of HR! This write-up aims to uncover some golden lessons, shed light in some dark places and hopefully help catapult your success if you’re a first-timer in this area of work within HR analytics. The wisdom here comes from learning the hard way after being thrown into the fire pit, actually having to make predictions for a business problem. If you’re in such a position or just curious but procrastinating until such a task becomes important and urgent, having no prior knowledge on where to start, how to make progress, what is needed, and how everything can come together to the finish line, be rest assured that it is definitely doable, just watch out for the hidden pitfalls and prepare to make your move.
Data Privacytowardsdatascience.com

Ethical Data Work: Lessons on Technical Data Protection

Note from Towards Data Science’s editors: While we allow independent authors to publish articles in accordance with our rules and guidelines, we can’t validate every author’s contribution. The author of this post reiterates that he is not giving legal advice. See our Reader Terms for details. Introduction. Being data scientists...
Podcasttowardsdatascience.com

What Is Your Next Goal as a Data Scientist?

No matter where you are in the great universe of data science, you’ve probably set yourself a goal (or a few) for the next couple of months. Here at the Variable, we want to do more than just present our favorite recent posts—we’d like to also help you on your journey. So this week, choose your own adventure. We’re sure you’ll find at least one pick—or, who knows? Seven?—that fit your current needs.
Softwaretowardsdatascience.com

A Step-by-Step Guide to Speech Recognition and Audio Signal Processing in Python

Speech is the primary form of human communication and is also a vital part of understanding behavior and cognition. Speech Recognition in Artificial Intelligence is a technique deployed on computer programs that enables them in understanding spoken words. As images and videos, sound is also an analog signal that humans...
Softwaretowardsdatascience.com

How to explain Machine Learning to a lay person

Here’s an explanation I wrote in 2014, and still use today. Advancements in computer technology over the past decades have meant that the collection of electronic data has become more commonplace in most fields of human endeavor. Many organizations now find themselves holding large amounts of data spanning many prior years. This data can relate to people, financial transactions, biological information, and much, much more.
Softwaretowardsdatascience.com

Deploying a Machine Learning Model as an API on Red Hat OpenShift Container Platform

From Source Code in a GitHub repository with Flask, Scikit-Learn and Docker. Machine and Deep Learning applications have become more popular than ever. As we have seen previously, the enterprise Kubernetes platform, Red Hat OpenShift Container Platform, helps data scientists and developers to really focus on the value using their preferred tools by bringing additional security controls in place and make environments much easier to manage. It provides the ability to deploy, serve, secure and optimize machine learning models at enterprise-scale and highly available clusters allowing data scientists to focus on the value of data. We can install Red Hat OpenShift clusters in the cloud using managed services (Red Hat OpenShift on IBM Cloud, Red Hat OpenShift Service on AWS, Azure Red Hat OpenShift) or we can run them on our own by installing from another cloud provider (AWS, Azure, Google Cloud, Platform agnostic). We also have the possibility to create clusters on supported infrastructure (Bare Metal, IBM Z, Power, Red Hat OpenStack, Red Hat Virtualization, vSphere, Platform agonistic) or a minimal cluster on our laptop which is useful for local development and testing (MacOS, Linux, Windows). Lot of freedom here.
Softwarearxiv.org

One-Class Classification for Wafer Map using Adversarial Autoencoder with DSVDD Prior

Recently, semiconductors' demand has exploded in virtual reality, smartphones, wearable devices, the internet of things, robotics, and automobiles. Semiconductor manufacturers want to make semiconductors with high yields. To do this, manufacturers conduct many quality assurance activities. Wafer map pattern classification is a typical way of quality assurance. The defect pattern on the wafer map can tell us which process has a problem. Most of the existing wafer map classification methods are based on supervised methods. The supervised methods tend to have high performance, but they require extensive labor and expert knowledge to produce labeled datasets with a balanced distribution in mind. In the semiconductor manufacturing process, it is challenging to get defect data with balanced distribution. In this paper, we propose a one-class classification method using an Adversarial Autoencoder (AAE) with Deep Support Vector Data Description (DSVDD) prior, which generates random vectors within the hypersphere of DSVDD. We use the WM-811k dataset, which consists of a real-world wafer map. We compare the F1 score performance of our model with DSVDD and AAE.
Coding & Programmingtechgig.com

Apply Python in AI: swipe right (or not?)

Python's popularity in the development of AI-based applications has skyrocketed. The features and accessibility could be the reason for this. There are many algorithms and coding behind AI and machine learning, but this problem appears to be quite simple to solve with. Python. . When it comes to AI and...
Coding & Programmingtowardsdatascience.com

A Beginner’s Guide to Python for Data Science

11 Python packages you should learn as a data scientist. Data scientists perform a large variety of tasks on a daily basis — data collection, pre-processing, analysis, machine learning, and visualization. If you are a beginner in the data science industry, you might have taken a course in Python or...
Animalstowardsdatascience.com

10x times faster Pandas Apply in a single line change of code

Speed-up Pandas processing workflow with Swifter Package. Pandas is one of the popular Python packages among the data science community, as it offers a vast API and flexible data structures for data explorations and visualization. When it comes to handling and processing large-size datasets, it fails. One can load and...
Coding & Programmingtowardsdatascience.com

Event-driven architecture and semantic coupling

Event-driven architecture (EDA) is key to building loosely coupled applications (microservices or not). It is an architectural style (see here and here) where components communicate asynchronously by emitting and reacting to the events. Def. 1: An event is something that has happened in past. An event notification (or say event...
Coding & Programmingtowardsdatascience.com

14 Must-Know pip Commands For Data Scientists and Engineers

Exploring some of the most useful pip commands for everyday programming. Even though Python standard library offers a wide range of tools and functionality, developers still need to use external (to the standard library) packages that are widely available. Most programming languages have their standard package managers that allow users to manage the dependencies of their projects.
Coding & Programmingtowardsdatascience.com

Removing Duplicate or Similar Images in Python

One of the most naive approaches to detecting duplicate images would be to compare pixel by pixel by checking that all values are the same. However, this becomes very inefficient when testing a large number of images. A second very common approach would be to extract the cryptographic hash of...
Softwarenetapp.com

Spectrum Scale File system (GPFS) hang Problem

Spectrum Scale 5.1.1 Replicated File System on 2 NetApp Clusters (FAS8200) When One of the NetApp SANs goes offline (either from NetApp classic view or removed mapping from SAN switch), the GPFS cluster goes to HANG state and only recovers when the SAN becomes online again. When the SAN is...
Technologytowardsdatascience.com

The Top Five Machine Learning Methods to Forecast Demand for New Products

And why XGBoost performed so well in a recent study. Forecasting future fashion demands is valuable and complicated. It’s valuable because of the opportunity cost to a retailer being prepared, or not, to sell the next high-demand item. Moreover, if they mis-predict demand, they must pay for the merchandise and probably pay in some form to liquidate unsold items.
Sciencetowardsdatascience.com

How the Data Scientist Role Could Evolve

And What a Data Scientist Can Learn to Face Changing Technology. As computers get faster and data science tools get better, less of a data scientists job will be focused on optimizing a traditional ML models (non neural network models). Many companies are pursuing AutoML frameworks that can perform a lot of feature engineering and model optimization for all sorts of problems. All the major cloud providers (and a bunch of start ups) offer out-of-the-box transfer learning models for computer vision, and many offer AutoML services for both tabular and NLP models. These are services where you upload your data and the best model gets spit out after you click train. Ironically enough, data scientists who are supposed to create ML/AI are finding these new tools are automating portions of their own jobs.
Data Privacytowardsdatascience.com

The One Piece of Advice That Changed My Life as a Data Scientist

Programming is worthless if you can’t create value. It took me three years to learn the core high-end skills needed for the progression of my data science career — Machine learning, SQL, Python, and data visualization. The importance of getting the necessary skillsets in data science cannot be over-emphasized as professionals need to keep up with the latest improvements around programming and computing.

Comments / 0

Community Policy