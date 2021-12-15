ContributorsPublishersAdvertisers
Science

Interpreting AUROC in Hypothesis Testing

By Editors' Picks
towardsdatascience.com
 6 days ago

Cover picture for the articleBinary decisions show up in various domains, from machine learning to hypothesis testing. In the context of binary classification, the ROC (Receiver Operating Characteristics) curve demonstrates the trade-off between the two kinds of errors we can make. To convert this curve into a single metric, the area under it is used....

towardsdatascience.com

Comments / 0

Related
Nature.com

Chemical hardness-driven interpretable machine learning approach for rapid search of photocatalysts

Strategies combining high-throughput (HT) and machine learning (ML) to accelerate the discovery of promising new materials have garnered immense attention in recent years. The knowledge of new guiding principles is usually scarce in such studies, essentially due to the 'black-box' nature of the ML models. Therefore, we devised an intuitive method of interpreting such opaque ML models through SHapley Additive exPlanations (SHAP) values and coupling them with the HT approach for finding efficient 2D water-splitting photocatalysts. We developed a new database of 3099 2D materials consisting of metals connected to six ligands in an octahedral geometry, termed as 2DO (octahedral 2D materials) database. The ML models were constructed using a combination of composition and chemical hardness-based features to gain insights into the thermodynamic and overall stabilities. Most importantly, it distinguished the target properties of the isocompositional 2DO materials differing in bond connectivities by combining the advantages of both elemental and structural features. The interpretable ML regression, classification, and data analysis lead to a new hypothesis that the highly stable 2DO materials follow the HSAB principle. The most stable 2DO materials were further screened based on suitable band gaps within the visible region and band alignments with respect to standard redox potentials using the GW method, resulting in 21 potential candidates. Moreover, HfSe2 and ZrSe2 were found to have high solar-to-hydrogen efficiencies reaching their theoretical limits. The proposed methodology will enable materials scientists and engineers to formulate predictive models, which will be accurate, physically interpretable, transferable, and computationally tractable.
CHEMISTRY
towardsdatascience.com

Classification on Hyperspectral Data

The goal of this tutorial is to apply PCA to hyperspectral data. (To learn about PCA, read the article “PCA on Hyperspectral Data”.). After reducing the dimensionality of the data using PCA, classify the data by applying the Support Vector Machine(SVM) to classify the different materials in the image.
CODING & PROGRAMMING
towardsdatascience.com

Fish Weight Prediction (Regression Analysis for beginners) — Part 1

· Part 1.1 — Building ML model Pipeline. · Part 1.2 — Analyze algorithms and methods. Today we will predict(estimate) the weight of the fish based on species name of fish, vertical length, diagonal length, cross length, height, and diagonal width using linear models. I will introduce the top town approach to solving the problem, which I explained in the previous article. First In part 1.1 I will build a model and then in part 1.2 I will try to explain how each algorithm and methods work. This is a regression analysis problem for beginners. Understanding the main principles and methods of building this kind of problem will help to build your own ML regression model such as (house price prediction, etc.)
SCIENCE
towardsdatascience.com

Building a Cell Data Format For My Custom Notebook Server

A rather cool project that I have engaged with recently, that has made me very excited is the new Jockey notebook editor. This editor can be used to work with notebooks across a whole scope of tech-stacks, and is planned to have all kinds of functionality that will go much further beyond the standard cell environment. There are several components which need to be built out for the back-end, and today I am going to be contributing to that by making the back-end for our cell editor. This will also tie in to the Jockey session and configuration settings I programmed before, so if you would like to read the article where I program all of that, you may do so here:
SOFTWARE
IN THIS ARTICLE
#Roc#Interpretability#Null Hypothesis#Hypothesis Testing#Auroc
towardsdatascience.com

How Well Does Self-Supervised Learning Perform In The Real World?

If you have been reading recent publications on self-supervised pre-training, you might have noticed that all of the novel methods and techniques were mostly evaluated on ImageNet. The ImageNet dataset is highly diverse, large and contains an enormous number of classes. It has been curated specifically to evaluate the performance of image processing models, so it is unquestionably well suited for this task. But relatively few emphasis has been put on how these self-supervised techniques perform on other image datasets. Datasets that are uncurated and contain large amounts of random images. In their paper “Self-supervised Pretraining of Visual Features in the Wild”, Goyal et al. set out to investigate whether the perceived performances of self-supervised pre-training techniques hold true when trained on a set of random, uncurated images.
COMPUTERS
towardsdatascience.com

Product Case Interviews Dos and Don’ts for Data Scientists

How to make a great impression in product case interviews. In this blog, we will look at one of the interviews particular to data scientists, especially product data scientists, at tech companies: the product case interview. If you prefer to watch rather than read, you can also check out my video on the same subject over at my channel.
ECONOMY
towardsdatascience.com

Chipotle Site Selection Using CBG Clustering — Part 1

Can geospatial data help predict where Chipotle will open new locations?. All analysis done in this article is done using SafeGraph CBG data and Patterns data. Please cite SafeGraph for any reference to this article. This article is the first in a series of two articles that will revolve around...
RESTAURANTS

Comments / 0

Community Policy