Health

HiRID-ICU-Benchmark -- A Comprehensive Machine Learning Benchmark on High-resolution ICU Data

By Hugo Yèche, Rita Kuznetsova, Marc Zimmermann, Matthias Hüser, Xinrui Lyu, Martin Faltys, Gunnar Rätsch
arxiv.org
 8 days ago

The recent success of machine learning methods applied to time series collected from Intensive Care Units (ICU) exposes the lack of standardized machine learning benchmarks for developing and comparing such methods. While raw datasets, such...

arxiv.org

arxiv.org

A framework for randomized benchmarking over compact groups

Characterization of experimental systems is an essential step in developing and improving quantum hardware. A collection of protocols known as Randomized Benchmarking (RB) was developed in the past decade, which provides an efficient way to measure error rates in quantum systems. In a recent paper (arXiv:2010.07974), a general framework for RB was proposed, which encompassed most of the known RB protocols and overcame the limitation on error models in previous works. However, even this general framework has a restriction: it can only be applied to a finite group of gates. This does not meet the need posed by experiments, in particular the demand for benchmarking non-Clifford gates and continuous gate sets on quantum devices. In this work we generalize the RB framework to continuous groups of gates and show that as long as the noise level is reasonably small, the output can be approximated as a linear combination of matrix exponential decays. As an application, we numerically study the fully randomized benchmarking protocol (i.e. RB with the entire unitary group as the gate set) enabled by our proof. This provides a unified way to estimate the gate fidelity for any quantum gate in an experiment.
COMPUTERS
arxiv.org

GRecX: An Efficient and Unified Benchmark for GNN-based Recommendation

In this paper, we present GRecX, an open-source TensorFlow framework for benchmarking GNN-based recommendation models in an efficient and unified way. GRecX consists of core libraries for building GNN-based recommendation benchmarks, as well as the implementations of popular GNN-based recommendation models. The core libraries provide essential components for building efficient and unified benchmarks, including FastMetrics (efficient metrics computation libraries), VectorSearch (efficient similarity search libraries for dense vectors), BatchEval (efficient mini-batch evaluation libraries), and DataManager (unified dataset management libraries). Especially, to provide a unified benchmark for the fair comparison of different complex GNN-based recommendation models, we design a new metric GRMF-X and integrate it into the FastMetrics component. Based on a TensorFlow GNN library tf_geometric, GRecX carefully implements a variety of popular GNN-based recommendation models. We carefully implement these baseline models to reproduce the performance reported in the literature, and our implementations are usually more efficient and friendly. In conclusion, GRecX enables uses to train and benchmark GNN-based recommendation baselines in an efficient and unified way. We conduct experiments with GRecX, and the experimental results show that GRecX allows us to train and benchmark GNN-based recommendation baselines in an efficient and unified way. The source code of GRecX is available at this https URL.
CODING & PROGRAMMING
arxiv.org

DABS: A Domain-Agnostic Benchmark for Self-Supervised Learning

Self-supervised learning algorithms, including BERT and SimCLR, have enabled significant strides in fields like natural language processing, computer vision, and speech processing. However, these algorithms are domain-specific, meaning that new self-supervised learning algorithms must be developed for each new setting, including myriad healthcare, scientific, and multimodal domains. To catalyze progress toward domain-agnostic methods, we introduce DABS: a Domain-Agnostic Benchmark for Self-supervised learning. To perform well on DABS, an algorithm is evaluated on seven diverse domains: natural images, multichannel sensor data, English text, speech recordings, multilingual text, chest x-rays, and images with text descriptions. Each domain contains an unlabeled dataset for pretraining; the model is then is scored based on its downstream performance on a set of labeled tasks in the domain. We also present e-Mix and ShED: two baseline domain-agnostic algorithms; their relatively modest performance demonstrates that significant progress is needed before self-supervised learning is an out-of-the-box solution for arbitrary domains. Code for benchmark datasets and baseline algorithms is available at this https URL.
COMPUTERS
arxiv.org

Machine Learning for Genomic Data

This report explores the application of machine learning techniques on short timeseries gene expression data. Although standard machine learning algorithms work well on longer time-series', they often fail to find meaningful insights from fewer timepoints. In this report, we explore model-based clustering techniques. We combine popular unsupervised learning techniques like K-Means, Gaussian Mixture Models, Bayesian Networks, Hidden Markov Models with the well-known Expectation Maximization algorithm. K-Means and Gaussian Mixture Models are fairly standard, while Hidden Markov Model and Bayesian Networks clustering are more novel ideas that suit time-series gene expression data.
COMPUTERS
arxiv.org

Predicting High-Flow Nasal Cannula Failure in an ICU Using a Recurrent Neural Network with Transfer Learning and Input Data Perseveration: A Retrospective Analysis

High Flow Nasal Cannula (HFNC) provides non-invasive respiratory support for critically ill children who may tolerate it more readily than other Non-Invasive (NIV) techniques. Timely prediction of HFNC failure can provide an indication for increasing respiratory support. This work developed and compared machine learning models to predict HFNC failure. A retrospective study was conducted using EMR of patients admitted to a tertiary pediatric ICU from January 2010 to February 2020. A Long Short-Term Memory (LSTM) model was trained to generate a continuous prediction of HFNC failure. Performance was assessed using the area under the receiver operating curve (AUROC) at various times following HFNC initiation. The sensitivity, specificity, positive and negative predictive values (PPV, NPV) of predictions at two hours after HFNC initiation were also evaluated. These metrics were also computed in a cohort with primarily respiratory diagnoses. 834 HFNC trials [455 training, 173 validation, 206 test] met the inclusion criteria, of which 175 [103, 30, 42] (21.0%) escalated to NIV or intubation. The LSTM models trained with transfer learning generally performed better than the LR models, with the best LSTM model achieving an AUROC of 0.78, vs 0.66 for the LR, two hours after initiation. Machine learning models trained using EMR data were able to identify children at risk for failing HFNC within 24 hours of initiation. LSTM models that incorporated transfer learning, input data perseveration and ensembling showed improved performance than the LR and standard LSTM models.
HEALTH
arxiv.org

The RETA Benchmark for Retinal Vascular Tree Analysis

Topological and geometrical analysis of retinal blood vessel is a cost-effective way for early detection of many common diseases. Meanwhile, automated vessel segmentation and vascular tree analysis are still lacking in terms of generalization capability. In this work, we construct a novel benchmark RETA with 81 labeled vessel masks aiming to facilitate retinal vessel analysis. A semi-automated coarse-to-fine workflow is proposed to annotating vessel pixels. During dataset construction, we strived to control inter-annotator variability and intra-annotator variability by performing multi-stage annotation and label disambiguation on self-developed dedicated software. In addition to binary vessel masks, we obtained vessel annotations containing artery/vein masks, vascular skeletons, bifurcations, trees and abnormalities during vessel labelling. Both subjective and objective quality validation of labeled vessel masks have demonstrated significant improved quality over other publicly datasets. The annotation software is also made publicly available for vessel annotation visualization. Users could develop vessel segmentation algorithms or evaluate vessel segmentation performance with our dataset. Moreover, our dataset might be a good research source for cross-modality tubular structure segmentation.
SCIENCE
Dice Insights

How Data Scientists, Machine Learning Devs Specialize Their Workflows

Data scientists and machine-learning specialists are playing an increasingly integral role in many organizations’ strategy and product development. Are most of these technologists involved in every part of their employers’ data analysis and model training, or do they just specialize in specific areas?. According to SlashData’s Q3 2021 analysis, the...
COMPUTERS
arxiv.org

Reynolds Stress Modeling Using Data Driven Machine Learning Algorithms

Fluid turbulence is an important problem for physics and engineering. Turbulence modeling deals with the development of simplified models that can act as surrogates for representing the effects of turbulence on flow evolution. Such models correspond to a range of different fidelities, from simple eddy-viscosity-based closures to Reynolds Stress Models. Till now the focus of the data-driven turbulence modeling efforts has focused on Machine Learning augmented eddy-viscosity models. In this communication, we illustrate the manner in which the eddy-viscosity framework delimits the efficacy and performance of Machine learning algorithms. Based on this foundation we carry out the first application of Machine learning algorithms for developing improved Reynolds Stress Modeling-based closures for turbulence. Different machine learning approaches are assessed for modeling the pressure strain correlation in turbulence, a longstanding problem of singular importance. We evaluate the performance of these algorithms in the learning dataset, as well as their ability to generalize to different flow cases where the inherent physical processes may vary. This explores the assertion that ML-based data-driven turbulence models can overcome the modeling limitations associated with the traditional turbulence models and ML models trained with large amounts of data with different classes of flows can predict flow field with reasonable accuracy for unknown flows with similar flow physics.
SCIENCE
arxiv.org

Assessing Social Determinants-Related Performance Bias of Machine Learning Models: A case of Hyperchloremia Prediction in ICU Population

Machine learning in medicine leverages the wealth of healthcare data to extract knowledge, facilitate clinical decision-making, and ultimately improve care delivery. However, ML models trained on datasets that lack demographic diversity could yield suboptimal performance when applied to the underrepresented populations (e.g. ethnic minorities, lower social-economic status), thus perpetuating health disparity. In this study, we evaluated four classifiers built to predict Hyperchloremia - a condition that often results from aggressive fluids administration in the ICU population - and compared their performance in racial, gender, and insurance subgroups. We observed that adding social determinants features in addition to the lab-based ones improved model performance on all patients. The subgroup testing yielded significantly different AUC scores in 40 out of the 44 model-subgroup, suggesting disparities when applying ML models to social determinants subgroups. We urge future researchers to design models that proactively adjust for potential biases and include subgroup reporting in their studies.
HEALTH
arxiv.org

Types for Tables: A Language Design Benchmark

Kuang-Chen Lu (Brown University, USA), Ben Greenman (Brown University, USA), Shriram Krishnamurthi (Brown University, USA) Context: Tables are ubiquitous formats for data. Therefore, techniques for writing correct programs over tables, and debugging incorrect ones, are vital. Our specific focus in this paper is on rich types that articulate the properties of tabular operations. We wish to study both their expressive power and _diagnostic quality_.
CODING & PROGRAMMING
arxiv.org

A Global Two-stage Algorithm for Non-convex Penalized High-dimensional Linear Regression Problems

By the asymptotic oracle property, non-convex penalties represented by minimax concave penalty (MCP) and smoothly clipped absolute deviation (SCAD) have attracted much attentions in high-dimensional data analysis, and have been widely used in signal processing, image restoration, matrix estimation, etc. However, in view of their non-convex and non-smooth characteristics, they are computationally challenging. Almost all existing algorithms converge locally, and the proper selection of initial values is crucial. Therefore, in actual operation, they often combine a warm-starting technique to meet the rigid requirement that the initial value must be sufficiently close to the optimal solution of the corresponding problem. In this paper, based on the DC (difference of convex functions) property of MCP and SCAD penalties, we aim to design a global two-stage algorithm for the high-dimensional least squares linear regression problems. A key idea for making the proposed algorithm to be efficient is to use the primal dual active set with continuation (PDASC) method, which is equivalent to the semi-smooth Newton (SSN) method, to solve the corresponding sub-problems. Theoretically, we not only prove the global convergence of the proposed algorithm, but also verify that the generated iterative sequence converges to a d-stationary point. In terms of computational performance, the abundant research of simulation and real data show that the algorithm in this paper is superior to the latest SSN method and the classic coordinate descent (CD) algorithm for solving non-convex penalized high-dimensional linear regression problems.
CODING & PROGRAMMING
arxiv.org

Quantum process tomography of adiabatic and superadiabatic stimulated Raman passage

Quantum control methods for three-level systems have become recently an important direction of research in quantum information science and technology. Here we present numerical simulations using realistic experimental parameters for quantum process tomography in STIRAP (stimulated Raman adiabatic passage) and saSTIRAP (superadiabatic STIRAP). Specifically, we identify a suitable basis in the operator space as the identity operator together with the 8 Gell-Mann operators, and we calculate the corresponding process matrices, which have $9\times 9=81$ elements. We discuss these results for the ideal decoherence-free case, as well as for the experimentally-relevant case with decoherence included.
SCIENCE
arxiv.org

The Pareto-Optimal Temporal Aggregation of Energy System Models

The growing share of intermittent renewable energy sources, storage technologies, and the increasing degree of so-called sector coupling necessitates optimization-based energy system models with high temporal and spatial resolutions, which significantly increases their runtimes and limits their maximum sizes. In order to maintain the computational viability of these models for large-scale application cases, temporal aggregation has emerged as a technique for reducing the number of considered time steps by reducing the original time horizon down to fewer, more representative ones. This study presents advanced but generally applicable clustering techniques that allow for ad-hoc improvements of state-of-the-art approaches without requiring profound knowledge of the individual energy system model. These improvements comprise the optimal tradeoff between the number of typical days and inner-daily temporal resolutions, as well as constituting a representation method that can reproduce the value distribution of the original time series. We prove the superiority of these approaches by applying them to two fundamentally different model types, namely a single-node building energy system and a European carbon-neutral energy scenario, and benchmark these against state-of-the-art approaches. This is performed for a variety of temporal resolutions, which leads to many hundreds of model runs. The results show that the proposed improvements on current methods strictly dominate the status quo with respect to Pareto-optimality in terms of runtime and accuracy. Although a speeding up factor of one magnitude could be achieved using traditional aggregation methods within a cost deviation range of two percent, the algorithms proposed herein achieve this accuracy with a runtime speedup by a factor of two orders of magnitude.
ENERGY INDUSTRY
arxiv.org

QuantumCircuitOpt: An Open-source Framework for Provably Optimal Quantum Circuit Design

In recent years, the quantum computing community has seen an explosion of novel methods to implement non-trivial quantum computations on near-term hardware. An important direction of research has been to decompose an arbitrary entangled state, represented as a unitary, into a quantum circuit, that is, a sequence of gates supported by a quantum processor. It has been well known that circuits with longer decompositions and more entangling multi-qubit gates are error-prone for the current noisy, intermediate-scale quantum devices. To this end, there has been a significant interest to develop heuristic-based methods to discover compact circuits. We contribute to this effort by proposing QuantumCircuitOpt (QCOpt), a novel open-source framework which implements mathematical optimization formulations and algorithms for decomposing arbitrary unitary gates into a sequence of hardware-native gates. A core innovation of QCOpt is that it provides optimality guarantees on the quantum circuits that it produces. In particular, we show that QCOpt can find up to 57% reduction in the number of necessary gates on circuits with up to four qubits, and in run times less than a few minutes on commodity computing hardware. We also validate the efficacy of QCOpt as a tool for quantum circuit design in comparison with a naive brute-force enumeration algorithm. We also show how the QCOpt package can be adapted to various built-in types of native gate sets, based on different hardware platforms like those produced by IBM, Rigetti and Google. We hope this package will facilitate further algorithmic exploration for quantum processor designers, as well as quantum physicists.
CODING & PROGRAMMING
arxiv.org

Trimming Stability Selection increases variable selection robustness

Contamination can severely distort an estimator unless the estimation procedure is suitably robust. This is a well-known issue and has been addressed in Robust Statistics, however, the relation of contamination and distorted variable selection has been rarely considered in literature. As for variable selection, many methods for sparse model selection have been proposed, including Stability Selection which is a meta-algorithm based on some variable selection algorithm in order to immunize against particular data configurations. We introduce the variable selection breakdown point that quantifies the number of cases resp. cells that have to be contaminated in order to let no relevant variable be detected. We show that particular outlier configurations can completely mislead model selection and argue why even cell-wise robust methods cannot fix this problem. We combine the variable selection breakdown point with resampling, resulting in the Stability Selection breakdown point that quantifies the robustness of Stability Selection. We propose a trimmed Stability Selection which only aggregates the models with the lowest in-sample losses so that, heuristically, models computed on heavily contaminated resamples should be trimmed away. We provide a short simulation study that reveals both the potential of our approach as well as the fragility of variable selection, even for an extremely small cell-wise contamination rate.
SCIENCE
datasciencecentral.com

Improving Machine Learning: How Knowledge Graphs Bring Deeper Meaning to Data

Enterprise machine learning deployments are limited by two consequences of outdated data management practices widely used today. The first is the protracted time-to-insight that stems from antiquated data replication approaches. The second is the lack of unified, contextualized data that spans the organization horizontally. Excessive data replication and the resulting...
SOFTWARE
arxiv.org

Detecting triplet states in opto-electronic and photovoltaic materials and devices by transient optically detected magnetic resonance

Triplet excited states in organic semiconductor materials and devices are notoriously difficult to detect and study with established spectroscopic methods. Yet, they are a crucial intermediate step in next-generation organic light emitting diodes (OLED) that employ thermally activated delayed fluorescence (TADF) to upconvert non-emissive triplets to emissive singlet states. In organic photovoltaic (OPV) devices, however, triplets are an efficiency-limiting exciton loss channel and are also involved in device degradation. Here, we introduce an innovative spin-sensitive method to study triplet states in both, optically excited organic semiconductor films, as well as in electrically driven devices. The method of transient optically detected magnetic resonance (trODMR) can be applied to all light-emitting materials whose luminescence depends on paramagnetic spin states. It is thus an ideal spectroscopic tool to distinguish different states involved and determine their corresponding time scales. We unravel the role of intermediate excited spin states in opto-electronic and photovoltaic materials and devices and reveal fundamental differences in electrically and optically induced triplet states.
CHEMISTRY
EatThis

Doctors Warn You Not to Take Too Much of This Vitamin Right Now

Since the start of the pandemic, there have been multiple studies examining how vitamins can impact your chances of contracting COVID-19 and amping up on them may influence severity of infection and even death. However, a notable study warns that one in particular may not be as effective as previously believed. Read on to find out what it is—and to ensure your health and the health of others, don't miss these Sure Signs You May Have Already Had COVID.
HEALTH
EatThis

The #1 Worst Drink for Your Liver, New Study Says

Maybe you intuitively understand that because the liver filters toxins from your body, it's essential to keep this important organ healthy. And often, when we talk about liver damage, the first thing that comes to mind is alcohol. However, new research reveals an entirely different category of drink is what commonly harms the liver… and unlike alcohol, there's no minimum required age for this type of beverage.
HEALTH
deseret.com

What are the side effects for the Pfizer and Moderna booster shots?

All American adults became eligible for the COVID-19 booster shots last week after both the Centers for Disease Control and Prevention and the Food and Drug Administration approved the shots. But now there are lingering questions about side effects for people who get the booster shots, especially if they’ve decided...
INDUSTRY

