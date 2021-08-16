Cancel
Metadynamics sampling in atomic environment space for collecting training data for machine learning potentials

By Dongsun Yoo ORCID: orcid.org/0000-0002-4889-8396
Cover picture for the articleThe universal mathematical form of machine-learning potentials (MLPs) shifts the core of development of interatomic potentials to collecting proper training data. Ideally, the training set should encompass diverse local atomic environments but conventional approaches are prone to sampling similar configurations repeatedly, mainly due to the Boltzmann statistics. As such, practitioners handpick a large pool of distinct configurations manually, stretching the development period significantly. To overcome this hurdle, methods are being proposed that automatically generate training data. Herein, we suggest a sampling method optimized for gathering diverse yet relevant configurations semi-automatically. This is achieved by applying the metadynamics with the descriptor for the local atomic environment as a collective variable. As a result, the simulation is automatically steered toward unvisited local environment space such that each atom experiences diverse chemical environments without redundancy. We apply the proposed metadynamics sampling to H:Pt(111), GeTe, and Si systems. Throughout these examples, a small number of metadynamics trajectories can provide reference structures necessary for training high-fidelity MLPs. By proposing a semi-automatic sampling method tuned for MLPs, the present work paves the way to wider applications of MLPs to many challenging applications.

Diseases & TreatmentsNature.com

Distribution of diabetic retinopathy in diabetes mellitus patients and its association rules with other eye diseases

The study aims to explore the distribution characteristics and influencing factors of diabetic retinopathy (DR) in diabetes mellitus (DM) patients and association rules of eye diseases in these patients. Data were obtained from 1284 DM patients at Henan Provincial People’s Hospital. Association rules were employed to calculate the probability of the common occurrence of eye-related diseases in DM patients. A web visualization network diagram was used to display the association rules of the eye-related diseases in DM patients. DR prevalence in people aged < 40 years (≥ 58.5%) was higher than that in those aged 50–60 years (≤ 43.7%). Patients with DM in rural areas were more likely to have DR than those in urban areas (56.2% vs. 35.6%, P < 0.001). DR prevalence in Pingdingshan City (68.4%) was significantly higher than in other cities. The prevalence of DR in patients who had DM for ≥ 5 years was higher than in other groups (P < 0.001). About 33.07% of DM patients had both diabetic maculopathy and DR, and 36.02% had both diabetic maculopathy and cataracts. The number of strong rules in patients ≥ 60 years old was more than those in people under 60 in age, and those in rural areas had more strong rules than those in urban areas. DM patients with one or more eye diseases are at higher risks of other eye diseases than general DM patients. These association rules are affected by factors such as age, region, disease duration, and DR severity.
ScienceNature.com

Quasiadiabatic electron transport in room temperature nanoelectronic devices induced by hot-phonon bottleneck

Since the invention of transistors, the flow of electrons has become controllable in solid-state electronics. The flow of energy, however, remains elusive, and energy is readily dissipated to lattice via electron-phonon interactions. Hence, minimizing the energy dissipation has long been sought by eliminating phonon-emission process. Here, we report a different scenario for facilitating energy transmission at room temperature that electrons exert diffusive but quasiadiabatic transport, free from substantial energy loss. Direct nanothermometric mapping of electrons and lattice in current-carrying GaAs/AlGaAs devices exhibit remarkable discrepancies, indicating unexpected thermal isolation between the two subsystems. This surprising effect arises from the overpopulated hot longitudinal-optical (LO) phonons generated through frequent emission by hot electrons, which induce equally frequent LO-phonon reabsorption (“hot-phonon bottleneck”) cancelling the net energy loss. Our work sheds light on energy manipulation in nanoelectronics and power-electronics and provides important hints to energy-harvesting in optoelectronics (such as hot-carrier solar-cells).
CancerNature.com

Human small intestinal infection by SARS-CoV-2 is characterized by a mucosal infiltration with activated CD8 T cells

The SARS-CoV-2 pandemic has so far claimed over three and a half million lives worldwide. Though the SARS-CoV-2 mediated disease COVID-19 has first been characterized by an infection of the upper airways and the lung, recent evidence suggests a complex disease including gastrointestinal symptoms. Even if a direct viral tropism of intestinal cells has recently been demonstrated, it remains unclear, whether gastrointestinal symptoms are caused by direct infection of the gastrointestinal tract by SARS-CoV-2 or whether they are a consequence of a systemic immune activation and subsequent modulation of the mucosal immune system. To better understand the cause of intestinal symptoms we analyzed biopsies of the small intestine from SARS-CoV-2 infected individuals. Applying qRT-PCR and immunohistochemistry, we detected SARS-CoV-2 RNA and nucleocapsid protein in duodenal mucosa. In addition, applying imaging mass cytometry and immunohistochemistry, we identified histomorphological changes of the epithelium, which were characterized by an accumulation of activated intraepithelial CD8+ T cells as well as epithelial apoptosis and subsequent regenerative proliferation in the small intestine of COVID-19 patients. In summary, our findings indicate that intraepithelial CD8+ T cells are activated upon infection of intestinal epithelial cells with SARS-CoV-2, providing one possible explanation for gastrointestinal symptoms associated with COVID-19.
EngineeringScience Now

Self-contained soft electrofluidic actuators

Soft robotics revolutionized human-robot interactions, yet there exist persistent challenges for developing high-performance soft actuators that are powerful, rapid, controllable, safe, and portable. Here, we introduce a class of self-contained soft electrofluidic actuators (SEFAs), which can directly convert electrical energy into the mechanical energy of the actuators through electrically responsive fluids that drive the outside elastomer deformation. The use of special dielectric liquid enhances fluid flow capabilities, improving the actuation performance of the SEFAs. SEFAs are easily manufactured by using widely available materials and common fabrication techniques, and display excellent comprehensive performances in portability, controllability, rapid response, versatility, safety, and actuation. An artificial muscle stretching a joint and a soft bionic ray swimming in a tank demonstrate their effective performance. Hence, SEFAs offer a platform for developing soft actuators with potential applications in wearable assistant devices and soft robots.
Computersarxiv.org

Meta-Reinforcement Learning in Broad and Non-Parametric Environments

Recent state-of-the-art artificial agents lack the ability to adapt rapidly to new tasks, as they are trained exclusively for specific objectives and require massive amounts of interaction to learn new skills. Meta-reinforcement learning (meta-RL) addresses this challenge by leveraging knowledge learned from training tasks to perform well in previously unseen tasks. However, current meta-RL approaches limit themselves to narrow parametric task distributions, ignoring qualitative differences between tasks that occur in the real world. In this paper, we introduce TIGR, a Task-Inference-based meta-RL algorithm using Gaussian mixture models (GMM) and gated Recurrent units, designed for tasks in non-parametric environments. We employ a generative model involving a GMM to capture the multi-modality of the tasks. We decouple the policy training from the task-inference learning and efficiently train the inference mechanism on the basis of an unsupervised reconstruction objective. We provide a benchmark with qualitatively distinct tasks based on the half-cheetah environment and demonstrate the superior performance of TIGR compared to state-of-the-art meta-RL approaches in terms of sample efficiency (3-10 times faster), asymptotic performance, and applicability in non-parametric environments with zero-shot adaptation.
Computersarxiv.org

Unified Regularity Measures for Sample-wise Learning and Generalization

Fundamental machine learning theory shows that different samples contribute unequally both in learning and testing processes. Contemporary studies on DNN imply that such sample di?erence is rooted on the distribution of intrinsic pattern information, namely sample regularity. Motivated by the recent discovery on network memorization and generalization, we proposed a pair of sample regularity measures for both processes with a formulation-consistent representation. Specifically, cumulative binary training/generalizing loss (CBTL/CBGL), the cumulative number of correct classi?cations of the training/testing sample within training stage, is proposed to quantize the stability in memorization-generalization process; while forgetting/mal-generalizing events, i.e., the mis-classification of previously learned or generalized sample, are utilized to represent the uncertainty of sample regularity with respect to optimization dynamics. Experiments validated the effectiveness and robustness of the proposed approaches for mini-batch SGD optimization. Further applications on training/testing sample selection show the proposed measures sharing the uni?ed computing procedure could benefit for both tasks.
Youtubetimebusinessnews.com

Understand Best Data Collection Techniques

Why do top data science courses in Noida put so much emphasis on data collection? Data collection, by definition, is the process of acquiring relevant information to answer pertinent questions associated with an event. To improve any data science project, analysts put immense focus on the data collection process. We can say that the most important aspect of working in the data science industry involves data collection.
Computersarxiv.org

Sampling-Based Minimum Bayes Risk Decoding for Neural Machine Translation

In neural machine translation (NMT), we search for the mode of the model distribution to form predictions. The mode as well as other high probability translations found by beam search have been shown to often be inadequate in a number of ways. This prevents practitioners from improving translation quality through better search, as these idiosyncratic translations end up being selected by the decoding algorithm, a problem known as the beam search curse. Recently, a sampling-based approximation to minimum Bayes risk (MBR) decoding has been proposed as an alternative decision rule for NMT that would likely not suffer from the same problems. We analyse this approximation and establish that it has no equivalent to the beam search curse, i.e. better search always leads to better translations. We also design different approximations aimed at decoupling the cost of exploration from the cost of robust estimation of expected utility. This allows for exploration of much larger hypothesis spaces, which we show to be beneficial. We also show that it can be beneficial to make use of strategies like beam search and nucleus sampling to construct hypothesis spaces efficiently. We show on three language pairs (English into and from German, Romanian, and Nepali) that MBR can improve upon beam search with moderate computation.
Computersarxiv.org

Effects of sampling and horizon in predictive reinforcement learning

Plain reinforcement learning (RL) may be prone to loss of convergence, constraint violation, unexpected performance, etc. Commonly, RL agents undergo extensive learning stages to achieve acceptable functionality. This is in contrast to classical control algorithms which are typically model-based. An direction of research is the fusion of RL with such algorithms, especially model-predictive control (MPC). This, however, introduces new hyper-parameters related to the prediction horizon. Furthermore, RL is usually concerned with Markov decision processes. But the most of the real environments are not time-discrete. The factual physical setting of RL consists of a digital agent and a time-continuous dynamical system. There is thus, in fact, yet another hyper-parameter -- the agent sampling time. In this paper, we investigate the effects of prediction horizon and sampling of two hybrid RL-MPC-agents in a case study with a mobile robot parking, which is in turn a canonical control problem. We benchmark the agents with a simple variant of MPC. The sampling showed a kind of a "sweet spot" behavior, whereas the RL agents demonstrated merits at shorter horizons.
Coding & Programmingtowardsdatascience.com

Optimization Algorithms for Machine Learning

The link to the previous chapter, Chapter-5: Pre-requisites to Solve Optimization Problems is here. Chapter 6 is the part in the series from where we start looking into real optimization problems and understand what optimization is all about. In the earlier chapters, we only looked into concepts that would assist us and help us in understanding optimization better. I feel this is where the fun part of optimization starts. In this chapter, we will look into:
Medical & BiotechAPS physics

Machine-learning integrated glassy defect from an intricate configurational-thermodynamic-dynamic space

Optimizing materials' properties and functions by controlling defects in the crystalline phase has been a cornerstone of materials science and condensed matter physics. However, this paradigm has yet to be established in the broadly defined amorphous materials, which implies the identification of very subtle structural features in an otherwise uniformly disordered medium. Here we propose and define a new integrated glassy defect (IGD), based on machine learning strategy informed by atomistic physics, and also by an extremely wide configurational, thermodynamic, and dynamic variables space of the disordered state. The IGD simultaneously includes positional topology and vibrational features, as well as the local morphology of the potential energy landscape. This unprecedented combination gives rise to a much more comprehensive and more effective definition of the “glassy defect,” much beyond the conventional, purely structural input. IGD can be used not only as an efficient predictor of athermal plasticity but is also transferable to detect both short-time vibrational anomalies (the boson peak), and long-time relaxation and diffusion dynamics in glasses. The integrated strategy is instrumental to build the long-sought structure-property relationship in complex media.
ScienceNature.com

Author Correction: High-dimensional hepatopath data analysis by machine learning for predicting HBV-related fibrosis

Correction to: Scientific Reports https://doi.org/10.1038/s41598-021-84556-4, published online 03 March 2021. The original version of this Article contained an error in Affiliation 2 and Affiliation 3, where the city was incorrectly given as ‘Zhenjiang’. The correct affiliations are listed below. School of Computer Science and Engineering, Jiangsu University of Technology, Changzhou...
Computerstowardsdatascience.com

Unsupervised Machine Learning Explained

Unsupervised learning is a great solution when we want to discover the underlying structure of data. In contrast to supervised learning, we cannot apply unsupervised methods to classification or regression style problems. This is because unsupervised ML algorithms learn patterns from unlabeled data whereas, we need to know the input-output mappings to perform classification or regression (in most cases, I’ll touch on this later). Essentially, our unsupervised learning algorithm will find the hidden patterns or groupings within the data without the need for a human (or anybody) to label the data or intervene in any other way.
ComputersScience Now

Making machine learning trustworthy

Machine learning (ML) has advanced dramatically during the past decade and continues to achieve impressive human-level performance on nontrivial tasks in image, speech, and text recognition. It is increasingly powering many high-stake application domains such as autonomous vehicles, self–mission-fulfilling drones, intrusion detection, medical image classification, and financial predictions (1). However, ML must make several advances before it can be deployed with confidence in domains where it directly affects humans at training and operation, in which cases security, privacy, safety, and fairness are all essential considerations (1, 2).
Computerstowardsdatascience.com

Finding your niche in Machine Learning

Machine learning can be overwhelming. I was talking to a Research Scientist (a.k.a with a Ph.D.) in my company and she mentioned how she feels insecure about her ML skills. There is so much to do. So much to learn every day. Feeling this way is independent of which career stage you are at. Whether you are starting, in your grad school or already working in the industry. In any phase of your career, you need to have your niche or ‘your field’.
HealthNature.com

Machine learning using clinical data at baseline predicts the efficacy of vedolizumab at week 22 in patients with ulcerative colitis

Predicting the response of patients with ulcerative colitis (UC) to a biologic such as vedolizumab (VDZ) before administration is an unmet need for optimizing individual patient treatment. We hypothesized that the machine-learning approach with daily clinical information can be a new, promising strategy for developing a drug-efficacy prediction tool. Random forest with grid search and cross-validation was employed in Cohort 1 to determine the contribution of clinical features at baseline (week 0) to steroid-free clinical remission (SFCR) with VDZ at week 22. Among 49 clinical features including sex, age, height, body weight, BMI, disease duration/phenotype, treatment history, clinical activity, endoscopic activity, and blood test items, the top eight features (partial Mayo score, MCH, BMI, BUN, concomitant use of AZA, lymphocyte fraction, height, and CRP) were selected for logistic regression to develop a prediction model for SFCR at week 22. In the validation using the external Cohort 2, the positive and negative predictive values of the prediction model were 54.5% and 92.3%, respectively. The prediction tool appeared useful for identifying patients with UC who would not achieve SFCR at week 22 during VDZ therapy. This study provides a proof-of-concept that machine learning using real-world data could permit personalized treatment for UC.
Technologyarxiv.org

The Sharpe predictor for fairness in machine learning

In machine learning (ML) applications, unfair predictions may discriminate against a minority group. Most existing approaches for fair machine learning (FML) treat fairness as a constraint or a penalization term in the optimization of a ML model, which does not lead to the discovery of the complete landscape of the trade-offs among learning accuracy and fairness metrics, and does not integrate fairness in a meaningful way.
Computerstowardsdatascience.com

The Revolving Door For Machine Learning Models

Academics have been known to borrow ideas from nature and other fields while applying them in a slightly different way to new problems. These days, in data science, we see many ideas, technologies, and scientific advancements that are applied across the big three (NLP, Vision, Audio). I would like to...
ScienceNature.com

Comparing the effect of zinc oxide and titanium dioxide nanoparticles on the ability of moderately halophilic bacteria to treat wastewater

This study evaluates the ability of moderately halophilic bacterial isolates (Serratia sp., Bacillus sp., Morganella sp., Citrobacter freundii and Lysinibacillus sp.) to treat polluted wastewater in the presence of nZnO and nTiO2 nanoparticles. In this study, bacteria isolates were able to take up nZnO and nTiO2 at concentrations ranging from 1 to 50 mg/L in the presence of higher DO uptake at up to 100% and 99%, respectively, while higher concentrations triggered a significant decrease. Individual halophilic bacteria exhibited a low COD removal efficiency in the presence of both metal oxide nanoparticles concentration ranged between 1 and 10 mg/L. At higher concentrations, they triggered COD release of up to − 60% concentration. Lastly, the test isolates also demonstrated significant nutrient removal efficiency in the following ranges: 23–65% for NO3− and 28–78% for PO43−. This study suggests that moderately halophilic bacteria are good candidates for the bioremediation of highly polluted wastewater containing low metal oxide nanoparticles.
Healthtowardsdatascience.com

Machine Learning in Medicine — Part I

A hands-on introductory course on machine learning techniques for physicians and healthcare professionals. Machine Learning, deep learning, and artificial intelligence have become the latest buzzwords across all industries, including healthcare, over the past few years. There is a lot of optimism that machine learning can help physicians establish earlier and more accurate diagnoses and deliver more effective and personalized treatments for complex diseases, such as cancers. There is also hope that machine learning can be leveraged to increase the efficiency of healthcare delivery and reduce healthcare costs.

