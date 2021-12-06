ContributorsPublishersAdvertisers
Heterogeneous Treatment Effects with Instrumental Variables: A Causal Machine Learning Approach

Cover picture for the article[This article was first published on YoungStatS, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. Problem Setting. In our forthcoming paper on...

Nature.com

Gut microbiota modulates weight gain in mice after discontinued smoke exposure

Cigarette smoking constitutes a leading global cause of morbidity and preventable death1, and most active smokers report a desire or recent attempt to quit2. Smoking-cessation-induced weight gain (SCWG; 4.5"‰kg reported to beÂ gained on average per 6"“12"‰months, >10"‰kg"‰year"“1 in 13% of those who stopped smoking3) constitutes a major obstacle to smoking abstinence4, even under stable5,6 or restricted7 caloric intake. Here we use a mouse model to demonstrate that smoking and cessation induce a dysbiotic state that is driven by an intestinal influx of cigarette-smoke-related metabolites. Microbiome depletion induced by treatment with antibiotics prevents SCWG. Conversely, fecal microbiome transplantation from mice previously exposed to cigarette smoke into germ-free mice naive to smoke exposure induces excessive weight gain across diets and mouse strains. Metabolically, microbiome-induced SCWG involves a concertedÂ host and microbiomeÂ shunting of dietary choline to dimethylglycine drivingÂ increased gutÂ energy harvest, coupled with the depletion of a cross-regulated weight-lowering metabolite, N-acetylglycine, and possibly by the effects ofÂ other differentially abundant cigarette-smoke-related metabolites. Dimethylglycine and N-acetylglycine may also modulate weight and associated adipose-tissue immunity under non-smoking conditions. Preliminary observations in a small cross-sectional human cohort support these findings, which calls for larger human trials to establish the relevance of this mechanism in active smokers. Collectively, we uncover a microbiome-dependent orchestration of SCWG that may be exploitable to improve smoking-cessation success and to correct metabolic perturbations even in non-smoking settings.
SCIENCE
cfainstitute.org

Machine Learning: Explain It or Bust

Posted In: Drivers of Value, Economics, Equity Investments, Future States, Portfolio Management, Quantitative Methods, Risk Management, Standards, Ethics & Regulations (SER) “If you can’t explain it simply, you don’t understand it.”. And so it is with complex machine learning (ML). ML now measures environmental, social, and governance (ESG) risk, executes...
COMPUTERS
Nature.com

A machine learning pipeline revealing heterogeneous responses to drug perturbations on vascular smooth muscle cell spheroid morphology and formation

Machine learning approaches have shown great promise in biology and medicine discovering hidden information to further understand complex biological and pathological processes. In this study, we developed a deep learning-based machine learning algorithm to meaningfully process image data and facilitate studies in vascular biology and pathology. Vascular injury and atherosclerosis are characterized by neointima formation caused by the aberrant accumulation and proliferation of vascular smooth muscle cells (VSMCs) within the vessel wall. Understanding how to control VSMC behaviors would promote the development of therapeutic targets to treat vascular diseases. However, the response to drug treatments among VSMCs with the same diseased vascular condition is often heterogeneous. Here, to identify the heterogeneous responses of drug treatments, we created an in vitro experimental model system using VSMC spheroids and developed a machine learning-based computational method called HETEROID (heterogeneous spheroid). First, we established a VSMC spheroid model that mimics neointima-like formation and the structure of arteries. Then, to identify the morphological subpopulations of drug-treated VSMC spheroids, we used a machine learning framework that combines deep learning-based spheroid segmentation and morphological clustering analysis. Our machine learning approach successfully showed that FAK, Rac, Rho, and Cdc42 inhibitors differentially affect spheroid morphology, suggesting that multiple drug responses of VSMC spheroid formation exist. Overall, our HETEROID pipeline enables detailed quantitative drug characterization of morphological changes in neointima formation, that occurs in vivo, by single-spheroid analysis.
SCIENCE
towardsdatascience.com

Solutions against overfitting for machine learning on tabular data

In this article, I will present an overview of solutions against overfitting, that apply to tabular data and classical Machine Learning. When comparing classical machine learning with deep learning, the latter is generally considered more complex. For the problem of overfitting, the contrary is the case: many tools and tricks are easily available for avoiding the overfitting of a deep neural network, whereas you will find yourself in the dark when it comes to classical machine learning.
CODING & PROGRAMMING
IN THIS ARTICLE
#Itt#Machine Learning#Variables#Ensemble Learning#Bayesian Causal Forest#Interpretably#Iv
towardsdatascience.com

Data Visualization Before Machine Learning

Do you ever ask yourself why your machine learning model isn’t used? Why do so few people really believe in the power of machine learning rather than these old dashboards?. When I was working in a football club, I made a data visualization showing player performances during the season. It was a really simple tile plot. But when football people saw it on my screen they engaged quite quickly. They asked specific questions; if it will be possible to get this chart regularly, etc…
COMPUTERS
NIH Director's Blog

Navigating the pitfalls of applying machine learning in genomics

The scale of genetic, epigenomic, transcriptomic, cheminformatic and proteomic data available today, coupled with easy-to-use machine learning (ML) toolkits, has propelled the application of supervised learning in genomics research. However, the assumptions behind the statistical models and performance evaluations in ML software frequently are not met in biological systems. In this Review, we illustrate the impact of several common pitfalls encountered when applying supervised ML in genomics. We explore how the structure of genomics data can bias performance evaluations and predictions. To address the challenges associated with applying cutting-edge ML methods to genomics, we describe solutions and appropriate use cases where ML modelling shows great potential.
COMPUTERS
HackerNoon

Exploring the Limitations of Machine Learning

Machine Learning is a subset of artificial intelligence that uses algorithms to accomplish tasks and result in the desired output. Machine learning is a model that uses data sets within machines to learn and categorize them. It does not necessarily need to be constantly programmed by a human and often uses an algorithm that can detect patterns within a computer and database. Machine Learning systems are known to be opaque and difficult to debug, which in application causes many problems and contributes to the time that it takes for an algorithm to work in the desired way.
CODING & PROGRAMMING
Genetic Engineering News

Machine Learning Algorithm Hallucinates Novel Protein Structures

Proteins spontaneously fold into intricate three-dimensional shapes which are key to nearly every biological process. But the complexity of protein shapes makes them difficult to study. Recently, progress has been made in protein structure prediction using deep neural networks. Now, a team of researchers investigates whether the information captured by such networks can generate new folded proteins with novel sequences—unrelated to those of the naturally occurring proteins used in training the models. The work describes the development of a neural network that “hallucinates” proteins with new, stable structures.
SCIENCE
uky.edu

Researchers Look at Issue of Data Redundancy in Machine Learning

Work by a group of researchers at the University of Kentucky’s Sanders-Brown Center on Aging was recently published in Genes. The article looks at the use of data mining and machine learning in research. The Alzheimer’s Disease Neuroimaging Initiative (ADNI) contains extensive patient measurements (magnetic resonance imaging (MRI), biometrics, RNA...
SCIENCE
towardsdatascience.com

Introducing the Machine Learning Reproducibility Scale

The reproducibility of machine learning projects is a recurring topic, brought up in many different contexts — both in academia and industry. There are a lot of opinions, mainly focused on tooling, which is great but can lead to a focus on features instead of solving concrete problems. Meanwhile, it seems there hasn’t been a lot of work done on providing a way to quantify a given project’s reproducibility, which means a lot of these discussions remain abstract, and perhaps less useful to practitioners looking for a way to gauge their work and decide how to improve it on the reproducibility front.
CODING & PROGRAMMING
towardsdatascience.com

A Beginners Guide to End to End Machine Learning

Supervised machine learning is a technique that maps a series of inputs (X) to some known outputs (y) without being explicitly programmed. Training a machine learning model refers to the process where a machine learns a mapping between X and y. Once trained the model can be used to make predictions on new inputs where the output is unknown.
CODING & PROGRAMMING
Nature.com

Quantifying the effect of genetic, environmental and individual demographic stochastic variability for population dynamics in Plantago lanceolata

Simple demographic events, the survival and reproduction of individuals, drive population dynamics. These demographic events are influenced by genetic and environmental parameters, and are the focus of many evolutionary and ecological investigations that aim to predict and understand population change. However, such a focus often neglects the stochastic events that individuals experience throughout their lives. These stochastic events also influence survival and reproduction and thereby evolutionary and ecological dynamics. Here, we illustrate the influence of such non-selective demographic variability on population dynamics using population projection models of an experimental population of Plantago lanceolata. Our analysis shows that the variability in survival and reproduction among individuals is largely due to demographic stochastic variation with only modest effects of differences in environment, genes, and their interaction. Common expectations of population growth, based on expected lifetime reproduction and generation time, can be misleading when demographic stochastic variation is large. Large demographic stochastic variation exhibited within genotypes can lower population growth and slow evolutionary adaptive dynamics. Our results accompany recent investigations that call for more focus on stochastic variation in fitness components, such as survival, reproduction, and functional traits, rather than dismissal of this variation as uninformative noise.
SCIENCE
Nature.com

Survival impact of treatment for chronic obstructive pulmonary disease in patients with advanced non-small-cell lung cancer

Chronic obstructive pulmonary disease (COPD) may coexist with lung cancer, but the impact on prognosis is uncertain. Moreover, it is unclear whether pharmacological treatment for COPD improves the patient's prognosis. We retrospectively investigated patients with advanced non-small-cell lung cancer (NSCLC) who had received chemotherapy at Kyoto University Hospital. Coexisting COPD was diagnosed by spirometry, and the association between pharmacological treatment for COPD and overall survival (OS) was assessed. Of the 550 patients who underwent chemotherapy for advanced NSCLC between 2007 and 2014, 347 patients who underwent spirometry were analyzed. Coexisting COPD was revealed in 103 patients (COPD group). The median OS was shorter in the COPD group than the non-COPD group (10.6 vs. 16.8Â months). Thirty-seven patients had received COPD treatment, and they had a significantly longer median OS than those without treatment (16.7 vs. 8.2Â months). Multivariate Cox regression analysis confirmed the positive prognostic impact of COPD treatment. Additional validation analysis revealed similar results in patients treated with immune checkpoint inhibitors (ICIs). Coexisting COPD had a significant association with poor prognosis in advanced NSCLC patients if they did not have pharmacological treatment for COPD. Treatment for coexisting COPD has the potential to salvage the prognosis.
DISEASES & TREATMENTS
r-bloggers.com

Exploring design effects of stepped wedge designs with baseline measurements

[This article was first published on ouR data generation, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. In the previous post, I described...
IBM - United States

10 tips for machine learning experiment tracking and reproducibility: Do it yourself approach without additional tooling

As machine learning practitioners, we invest significant time and effort to improve our models. You usually do it iteratively and experimentally by repeatedly changing your model, running an experiment, and examining the results, then deciding whether the recent model change was positive and should be kept or discarded. Changes in...
COMPUTERS
Nature.com

Characterization of a recently synthesized microtubule-targeting compound that disrupts mitotic spindle poles in human cells

We reveal the effects of a new microtubule-destabilizing compound in human cells. C75 has a core thienoisoquinoline scaffold with several functional groups amenable to modification. Previously we found that sub micromolar concentrations of C75 caused cytotoxicity. We also found that C75 inhibited microtubule polymerization and competed with colchicine for tubulin-binding in vitro. However, here we found that the two compounds synergized suggesting differences in their mechanism of action. Indeed, live imaging revealed that C75 causes different spindle phenotypes compared to colchicine. Spindles remained bipolar and collapsed after colchicine treatment, while C75 caused bipolar spindles to become multipolar. Importantly, microtubules rapidly disappeared after C75-treatment, but then grew back unevenly and from multiple poles. The C75 spindle phenotype is reminiscent of phenotypes caused by depletion of ch-TOG, a microtubule polymerase, suggesting that C75 blocks microtubule polymerization in metaphase cells. C75 also caused an increase in the number of spindle poles in paclitaxel-treated cells, and combining low amounts of C75 and paclitaxel caused greater regression of multicellular tumour spheroids compared to each compound on their own. These findings warrant further exploration of C75's anti-cancer potential.
CANCER
Nature.com

Predicting deleterious missense genetic variants via integrative supervised nonnegative matrix tri-factorization

Among an assortment of genetic variations, Missense are major ones which a small subset of them may led to the upset of the protein function and ultimately end in human diseases. Various machine learning methods were declared to differentiate deleterious and benign missense variants by means of a large number of features, including structure, sequence, interaction networks, gene disease associations as well as phenotypes. However, development of a reliable and accurate algorithm for merging heterogeneous information is highly needed as it could be captured all information of complex interactions on network that genes participate in. In this study we proposed a new method based on the non-negative matrix tri-factorization clustering method. We outlined two versions of the proposed method: two-source and three-source algorithms. Two-source algorithm aggregates individual deleteriousness prediction methods and PPI network, and three-source algorithm incorporates gene disease associations into the other sources already mentioned. Four benchmark datasets were employed for internally and externally validation of both algorithms of our predictor. The results at all datasets confirmed that, our method outperforms most state of the art variant prediction tools. Two key features of our variant effect prediction method are worth mentioning. Firstly, despite the fact that the incorporation of gene disease information at three-source algorithm can improve prediction performance by comparison with two-source algorithm, our method did not hinder by type 2 circularity error unlike some recent ensemble-based prediction methods. Type 2 circularity error occurs when the predictor annotates variants on the basis of the genes located on. Secondly, the performance of our predictor is superior over other ensemble-based methods for variants positioned on genes in which we do not have enough information about their pathogenicity.
SCIENCE
HackerNoon

Understanding The Importance Of Data For Machine Learning

Data is crucial for machine learning, and without data, machine learning is not possible. Machine learning without data is nothing but a bare machine with no soul and no mind. This data makes machines do such amazing tasks, which we have not thought of a few years back in history. Despite having such importance, machines do not understand what data represents, but find the relations between the different data. Data is in the form of numbers and only numbers, and all machine learning models work with data. When dealing with categorical data, it is important to keep this point in mind.
COMPUTERS
Nature.com

Chemical hardness-driven interpretable machine learning approach for rapid search of photocatalysts

Strategies combining high-throughput (HT) and machine learning (ML) to accelerate the discovery of promising new materials have garnered immense attention in recent years. The knowledge of new guiding principles is usually scarce in such studies, essentially due to the 'black-box' nature of the ML models. Therefore, we devised an intuitive method of interpreting such opaque ML models through SHapley Additive exPlanations (SHAP) values and coupling them with the HT approach for finding efficient 2D water-splitting photocatalysts. We developed a new database of 3099 2D materials consisting of metals connected to six ligands in an octahedral geometry, termed as 2DO (octahedral 2D materials) database. The ML models were constructed using a combination of composition and chemical hardness-based features to gain insights into the thermodynamic and overall stabilities. Most importantly, it distinguished the target properties of the isocompositional 2DO materials differing in bond connectivities by combining the advantages of both elemental and structural features. The interpretable ML regression, classification, and data analysis lead to a new hypothesis that the highly stable 2DO materials follow the HSAB principle. The most stable 2DO materials were further screened based on suitable band gaps within the visible region and band alignments with respect to standard redox potentials using the GW method, resulting in 21 potential candidates. Moreover, HfSe2 and ZrSe2 were found to have high solar-to-hydrogen efficiencies reaching their theoretical limits. The proposed methodology will enable materials scientists and engineers to formulate predictive models, which will be accurate, physically interpretable, transferable, and computationally tractable.
CHEMISTRY

