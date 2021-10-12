CreatorsPublishersAdvertisers
View more in
Chemistry

Predicting the Stereoselectivity of Chemical Transformations by Machine Learning

By Justin Li, Dakang Zhang, Yifei Wang, Christopher Ye, Hao Xu, Pengyu Hong
arxiv.org
 10 days ago

Stereoselective reactions (both chemical and enzymatic reactions) have been essential for origin of life, evolution, human biology and medicine. Since late 1960s, there have been numerous successes in the exciting new frontier of asymmetric catalysis. However, most industrial and academic asymmetric catalysis nowadays do follow

arxiv.org

Comments / 0

Related
Gamespot

Become An AI And Machine Learning Expert With This Training Package

With machine learning, programs and code can learn from repeated iterations and improve themselves in a supervised or unsupervised context. This allows for an accelerated progression so that your projects will end up better without having to spend as much time working with brute-force coding techniques. The problem most people end up facing is that integrating AI and machine learning aspects into your projects and code can seem difficult and opaque without prior experience. However, with the proper teacher, the prospect becomes a lot less intimidating.
COMPUTERS
ScienceAlert

A Physicist Quantified The Amount of Information in The Entire Observable Universe

In attempts to understand the very nature of our reality, physicists sure have some mind-bending theories. Like what if information is a tangible and fundamental aspect of physical reality itself – alongside matter and energy? Or, alternatively, what if information is the fifth state of matter? Information is, after all, something all matter and energy measurably possess. The rules that govern their existence, like their mass, speed, or charge, are all bits of information they contain. So to allow experimental probing of such ideas, physicist Melvin Vopson from the University of Portsmouth in the UK estimated how much information a single elementary...
ASTRONOMY
arxiv.org

Implementation of machine learning techniques to predict impact parameter and transverse spherocity in heavy-ion collisions at the LHC

Machine learning techniques have been quite popular recently in the high-energy physics community and have led to numerous developments in this field. In heavy-ion collisions, one of the crucial observables, the impact parameter, plays an important role in the final-state particle production. This being extremely small (i.e. of the order of a few fermi), it is almost impossible to measure impact parameter in experiments. In this work, we implement the ML-based regression technique via Gradient Boosting Decision Trees (GBDT) to obtain a prediction of impact parameter in Pb-Pb collisions at $\sqrt{s_{NN}}$ = 5.02 TeV using A Multi-Phase Transport (AMPT) model. After its successful implementation in small collision systems, transverse spherocity, an event shape observable, holds an opportunity to reveal more about the particle production in heavy-ion collisions as well. In the absence of any experimental exploration in this direction at the LHC yet, we suggest an ML-based regression method to estimate centrality-wise transverse spherocity distributions in Pb-Pb collisions at $\sqrt{s_{NN}}$ = 5.02 TeV by training the model with minimum bias collision data. Throughout this work, we have used a few final state observables as the input to the ML-model, which could be easily made available from collision data. Our method seems to work quite well as we see a good agreement between the simulated true values and the predicted values from the ML-model.
SCIENCE
arxiv.org

Data-Driven Modeling of S0 -> S1 Excitation Energy in the BODIPY Chemical Space: High-Throughput Computation, Quantum Machine Learning, and Inverse Design

Derivatives of BODIPY are popular fluorophores due to their synthetic feasibility, structural rigidity, high quantum yield, and tunable spectroscopic properties. While the characteristic absorption maximum of BODIPY is at 2.5 eV, combinations of functional groups and substitution sites can shift the peak position by +/- 1 eV. Time-dependent long-range corrected hybrid density functional methods can model the lowest excitation energies offering a semi-quantitative precision of +/- 0.3 eV. Alas, the chemical space of BODIPYs stemming from combinatorial introduction of -- even a few dozen -- substituents is too large for brute-force high-throughput modeling. To navigate this vast space, we select 77,412 molecules and train a kernel-based quantum machine learning model providing < 2% hold-out error. Further reuse of the results presented here to navigate the entire BODIPY universe comprising over 253 giga (253 x 10^9) molecules is demonstrated by inverse-designing candidates with desired target excitation energies.
CHEMISTRY
IN THIS ARTICLE
#Stereoselectivity#Chemical Physics#Chemical Reactions#Chemical Transformations#Random Forest#Lg
arxiv.org

Modeling of Pan Evaporation Based on the Development of Machine Learning Methods

For effective planning and management of water resources and implementation of the related strategies, it is important to ensure proper estimation of evaporation losses, especially in regions that are prone to drought. Changes in climatic factors, such as changes in temperature, wind speed, sunshine hours, humidity, and solar radiation can have a significant impact on the evaporation process. As such, evaporation is a highly non-linear, non-stationary process, and can be difficult to be modeled based on climatic factors, especially in different agro-climatic conditions. The aim of this study, therefore, is to investigate the feasibility of several machines learning (ML) models (conditional random forest regression, Multivariate Adaptive Regression Splines, Bagged Multivariate Adaptive Regression Splines, Model Tree M5, K- nearest neighbor, and the weighted K- nearest neighbor) for modeling the monthly pan evaporation estimation. This study proposes the development of newly explored ML models for modeling evaporation losses in three different locations over the Iraq region based on the available climatic data in such areas. The evaluation of the performance of the proposed model based on various evaluation criteria showed the capability of the proposed weighted K- nearest neighbor model in modeling the monthly evaporation losses in the studies areas with better accuracy when compared with the other existing models used as a benchmark in this study.
SCIENCE
arxiv.org

Graph-Based Machine Learning Improves Just-in-Time Defect Prediction

The increasing complexity of today's software requires the contribution of thousands of developers. This complex collaboration structure makes developers more likely to introduce defect-prone changes that lead to software faults. Determining when these defect-prone changes are introduced has proven challenging, and using traditional machine learning (ML) methods to make these determinations seems to have reached a plateau. In this work, we build contribution graphs consisting of developers and source files to capture the nuanced complexity of changes required to build software. By leveraging these contribution graphs, our research shows the potential of using graph-based ML to improve Just-In-Time (JIT) defect prediction. We hypothesize that features extracted from the contribution graphs may be better predictors of defect-prone changes than intrinsic features derived from software characteristics. We corroborate our hypothesis using graph-based ML for classifying edges that represent defect-prone changes. This new framing of the JIT defect prediction problem leads to remarkably better results. We test our approach on 14 open-source projects and show that our best model can predict whether or not a code change will lead to a defect with an F1 score as high as 86.25$\%$. This represents an increase of as much as 55.4$\%$ over the state-of-the-art in JIT defect prediction. We describe limitations, open challenges, and how this method can be used for operational JIT defect prediction.
CODING & PROGRAMMING
arxiv.org

Machine-learning based determination of the stacking order of bilayer graphene

With the increasing interest in twisted bilayer graphene (TBLG) of the past years, fast, reliable, and non-destructive methods to precisely determine the stacking angle are required. Raman spectroscopy potentially provides such method, given the large amount of information about the state of the graphene that is stored in its Raman spectrum. However, changes in the Raman spectra induced by the stacking order can be very subtle, thus making the angle identification tedious. In this work, we propose the use of machine learning (ML) analysis techniques for the automated classification of the Raman spectrum of TBLG into a selected range of stacking angles. The ML classification proposed here is low computationally demanding, providing fast and accurate results with over a 99% of agreement with the manual labelling of the spectra. The flexibility and non-invasive nature of the Raman measurements, paired with the predictive accuracy of the ML, is expected to facilitate the exploration of the nascent research of TBLG. Moreover, the present work showcases how the currently available open-source tools facilitate the study and integration of ML-based techniques requiring only a minimum programming knowledge.
CHEMISTRY
arxiv.org

Machine learning assisted GaAsN circular polarimeter

We demonstrate the application of a two stage machine learning algorithm that enables to correlate the electrical signals from a GaAs$_x$N$_{1-x}$ circular polarimeter with the intensity, degree of circular polarization and handedness of an incident light beam. Specifically, we employ a multimodal logistic regression to discriminate the handedness of light and a 6-layer neural network to establish the relationship between the input voltages, the intensity and degree of circular polarization. We have developed a particular neural network training strategy that substantially improves the accuracy of the device. The algorithm was trained and tested on theoretically generated photoconductivity and on photoluminescence experimental results. Even for a small training experimental dataset (70 instances), it is shown that the proposed algorithm correctly predicts linear, right and left circularly polarized light misclassifying less than $1.5\%$ of the cases and attains an accuracy larger than $97\%$ in the vast majority of the predictions ($92\%$) for intensity and degree of circular polarization. These numbers are significantly improved for the larger theoretically generated datasets (4851 instances). The algorithm is versatile enough that it can be easily adjusted to other device configurations where a map needs to be established between the input parameters and the device response. Training and testing data files as well as the algorithm are provided as supplementary material.
COMPUTERS
YOU MAY ALSO LIKE
NewsBreak
Science
NewsBreak
Chemistry
National Science Foundation (press release)

Machine learning uncovers 'genes of importance' in agriculture

Approach using evolutionary principles identifies genes that enable plants to grow with less fertilize. Machine learning, a type of artificial intelligence used to detect patterns in data, can pinpoint "genes of importance" that help crops grow with less fertilizer, according to a U.S. National Science Foundation-funded study published in Nature Communications. It can also predict additional traits in plants and disease outcomes in animals, illustrating its applications beyond agriculture.
AGRICULTURE
arxiv.org

Opportunities for Machine Learning to Accelerate Halide Perovskite Commercialization and Scale-Up

While halide perovskites attract significant academic attention, examples of at-scale industrial production are still sparse. In this perspective, we review practical challenges hindering the commercialization of halide perovskites, and discuss how machine-learning (ML) tools could help: (1) active-learning algorithms that blend institutional knowledge and human expertise could help stabilize and rapidly update baseline manufacturing processes; (2) ML-powered metrology, including computer imaging, could help narrow the performance gap between large- and small-area devices; and (3) inference methods could help accelerate root-cause analysis by reconciling multiple data streams and simulations, focusing research effort on areas with highest probability for improvement. We conclude that to satisfy many of these challenges, incremental -- not radical -- adaptations of existing ML and statistical methods are needed. We identify resources to help develop in-house data-science talent, and propose how industry-academic partnerships could help adapt "ready-now" ML tools to specific industry needs, further improve process control by revealing underlying mechanisms, and develop "gamechanger" discovery-oriented algorithms to better navigate vast materials combination spaces and the literature.
ENGINEERING
The Motley Fool

What Is Machine Learning?

Terms like artificial intelligence and machine learning are often used interchangeably, but they refer to slightly different technologies. Specifically, machine learning is a subtype of artificial intelligence. In this Backstage Pass video, which aired Sept. 27, 2021, Motley Fool contributors Toby Bordelon, John Bromels, Jose Najarro, and Trevor Jennewine share...
COMPUTERS
arxiv.org

Machine learning for percolation utilizing auxiliary Ising variables

Machine learning for phase transition has received intensive research interest in recent years. However, its application in percolation still remains challenging. We propose an auxiliary Ising mapping method for machine learning study of the standard percolation as well as a variety of statistical mechanical systems in correlated percolation representations. We demonstrate that unsupervised machine learning is able to accurately locate the percolation threshold, independent of the spatial dimension of system or the type of phase transition, which can be first order or continuous. Moreover, we show that, by neural network machine learning, auxiliary Ising configurations for different universalities can be classified with high confidence level. Our results indicate that the auxiliary Ising mapping method, despite of it simplicity, can advance the application of machine learning in statistical and condensed-matter physics.
COMPUTERS
towardsdatascience.com

A Machine Learning Algorithm for Predicting Outcomes of MLB Games

Raw data is impervious to cognitive bias — devoid of human emotion and predisposition. Guided by this precept, Billy Beane’s 2001 Oakland As designed a new statistical blueprint for championship-aspiring ballclubs, and precipitated a data analytics revolution that spread like wildfire throughout all facets of professional sports. As Beane discovered, baseball’s wealth of data makes it conducive to predictive analytics. The problem I have chosen to explore is employing machine learning to predict outcomes of individual games. My final logistic regression and random forest models achieved test accuracies among the higher levels found in existing scientific literature, and outperformed the Vegas betting odds on a two-year test period. My results confirm that the betting odds are highly efficient; however, my findings also suggest that machine learning methods may provide an incremental informational edge over the wisdom of the masses, which could translate to meaningful insights in the long run. My model also shed light on the importance of team pitching, namely that the strength of a team’s bullpen is much more indicative of its ability to win than the quality of its offense. Overall, the underlying methodologies and insights drawn from my algorithm may be useful to the strategic decision making of Major League Baseball front offices, or various other sports analytics entities.
MLB
arxiv.org

Predicting the Efficiency of CO$_2$ Sequestering by Metal Organic Frameworks Through Machine Learning Analysis of Structural and Electronic Properties

Due the alarming rate of climate change, the implementation of efficient CO$_2$ capture has become crucial. This project aims to create an algorithm that predicts the uptake of CO$_2$ adsorbing Metal-Organic Frameworks (MOFs) by using Machine Learning. These values will in turn gauge the efficiency of these MOFs and provide scientists who are looking to maximize the uptake a way to know whether or not the MOF is worth synthesizing. This algorithm will save resources such as time and equipment as scientists will be able to disregard hypothetical MOFs with low efficiencies. In addition, this paper will also highlight the most important features within the data set. This research will contribute to enable the rapid synthesis of CO$_2$ adsorbing MOFs.
CHEMISTRY
arxiv.org

Machine Learning the Higgs-Top CP Phase

We explore the direct Higgs-top CP measurement via the $pp\to t\bar{t}h$ channel at the high-luminosity LHC. We show that a combination of machine learning techniques and efficient kinematic reconstruction methods can boost new physics sensitivity, effectively probing the complex $t\bar{t}h$ multi-particle phase space. Special attention is devoted to top quark polarization observables, uplifting the analysis from a raw rate to a polarization study. Through a combination of hadronic, semi-leptonic, and di-leptonic top pair final states in association with $h\to \gamma\gamma$, we obtain that the HL-LHC can probe the Higgs-top coupling modifier and CP-phase, respectively, up to $|\kappa_t|\lesssim 8\%$ and $|\alpha|\lesssim 15^{\circ}$ at $68\%$~CL.
COMPUTERS
ciodive.com

How to train, test and maintain AI and machine learning models

Editor's note: The following is a guest article from Steven Kursh, president of Software Analysis Group, and Art Schnure, senior consultant at Software Analysis Group. To get insight into the skill sets required to create AI and machine learning (ML) models, it's useful to get a sense of the model creation process, which is the gradual learning done by ML software, and the challenges faced to produce a model that meets predefined success criteria.
CODING & PROGRAMMING
Neuroscience News

Does the Brain Learn in the Same Way That Machines Learn?

Summary: Relating machine learning to biological learning, researchers say while the two approaches aren’t interchangeable, they can be harnessed to offer insights into how the human brain works. Source: Carnegie Mellon University. Pinpointing how neural activity changes with learning is anything but black and white. Recently, some have posited that...
SCIENCE
arxiv.org

A Survey on Machine Learning Techniques for Source Code Analysis

Context: The advancements in machine learning techniques have encouraged researchers to apply these techniques to a myriad of software engineering tasks that use source code analysis such as testing and vulnerabilities detection. A large number of studies poses challenges to the community to understand the current landscape. Objective: We aim to summarize the current knowledge in the area of applied machine learning for source code analysis. Method: We investigate studies belonging to twelve categories of software engineering tasks and corresponding machine learning techniques, tools, and datasets that have been applied to solve them. To do so, we carried out an extensive literature search and identified 364 primary studies published between 2002 and 2021. We summarize our observations and findings with the help of the identified studies. Results: Our findings suggest that the usage of machine learning techniques for source code analysis tasks is consistently increasing. We synthesize commonly used steps and the overall workflow for each task, and summarize the employed machine learning techniques. Additionally, we collate a comprehensive list of available datasets and tools useable in this context. Finally, we summarize the perceived challenges in this area that include availability of standard datasets, reproducibility and replicability, and hardware resources.
COMPUTERS
towardsdatascience.com

Stacking Machine Learning Models for Multivariate Time Series

Time series analysis is all too often seen as an esoteric sub-field of data science. It is not. Other data science sub-fields have their idiosyncrasies (e.g. NLP, recommender systems, graph theory etc.), and it is the same with time series. Time series is idiosyncratic, not distinct. If your goal is...
COMPUTERS
towardsdatascience.com

Turning Machine Learning Models into Products with Flask

Build an application for your model using REST APIs. In 1971, with just 21 years old, Steve Wozniak was already an incredible computer programmer and electronics engineer. He really enjoyed to spend his time creating weird devices and searching for improvements in his area. It was in 1971 when he met a 15 year old guy who would soon realize the enormous potential of what Wozniak was doing. This guy thought that the weird devices Wozniak created could become useful for many people. They worked together the following years, and 6 years later, they started to commercialize one of the first and most successful personal computers of its time: the Apple II.
SOFTWARE

Comments / 0

Community Policy