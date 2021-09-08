A myriad of publications in the scientific and lay literature can now be found under the heading of ‘machine learning’ (ML) or ‘artificial intelligence’ (AI) in healthcare. These ML and AI techniques mine through massive clinical databases and are intended to find higher-order correlations and relationships with the goal of prediction or prognostication. In recent years, the huge discrepancy between the volume of AI/ML publications and the relative scarcity of successful implementation studies have been highlighted in a number of commentaries and publications [1-3]. The field of clinical AI/ML is in desperate need of rigorous clinical evidence via multicenter randomized clinical trials. Among barriers to the execution of such trials are the difficulty of integrating with different electronic health records (EHRs) and demonstrating generalizability. Some of the factors affecting this problem include: 1) precise mapping of features/input elements of an algorithm across different EHR vendors is challenging, and even for a given EHR vendor different system builds/customizations complicate standardization (see more on syntactic and semantic interoperability [9]); 2) clinical constructs and inclusion/exclusion criteria to establish the gold-standard diagnosis/outcomes are often inconsistently implemented across sites (i.e., label noise); 3) frequency of measurement of clinical variables (e.g., labs) are often healthcare system-specific and tied to factors such as severity of illness, workflow design, staffing levels, and utilization of point-of-care technologies; 4) distribution of patients characteristics (such as demographics, care level/care unit type) are widely variable and generalizability has to be assessed across a geographically diverse patient population; 5) often recorded data in EHRs (such as those from monitors, ventilators, IV pumps, etc.) are biased by vendor-specific data downsampling methods and human verification, and as a consequence different sources of real-time data (e.g., via direct HL7 feeds from devices) and retrospective/archived data may not exactly match; 6) temporal data drifts may occur due to a number of factors, including changes in processes of care, or the introduction of new measurement devices (e.g., point of care lactate measurement); and 7) implementation of AI/ML algorithms can induce changes in clinical workflow and practice patterns that can alter the distribution of data. Therefore, successful real-time implementation of AI/ML models developed using retrospective data require continuous monitoring (see phase 4 in Figure 1) and the establishment of effective algorithm change protocols (ACPs).