#Data Mining


The Indicator from Planet Money

SYLVIE DOUGLIS, BYLINE: NPR. (SOUNDBITE OF DROP ELECTRIC'S "WAKING UP TO THE FIRE") Five-ish years ago, Katherine Collins was sitting down with a health care executive to talk about his company's business. This is a normal part of her job managing money at Putnam Investments. She's supposed to find out if companies have good plans or bad and invest accordingly.
Picture for The Indicator from Planet Money

Bitcoin (BTC) Price Dump Incoming? On-Chain Data Reveals Bottom

Bitcoin (BTC) value failed to carry above $17k and fell to the assist close to $16,500 once more. The BTC value stays underneath strain as miner capitulation risk continues to hang-out merchants seeking to make lengthy positions. On-chain knowledge reveals miners are certainly liquidating their Bitcoin holdings as a result of monetary constraints. The impact might be simply seen within the falling share costs of mining firms.
Picture for Bitcoin (BTC) Price Dump Incoming? On-Chain Data Reveals Bottom

Autonomous schema markups based on intelligent computing for search engine optimization

Data Mining and Machine Learning, Data Science, World Wide Web and Web Science, Text Mining. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
Picture for Autonomous schema markups based on intelligent computing for search engine optimization
Harvard Health

Behind the data, a teacher who left his students transformed

On a clear November morning, Chris Winship, the Diker-Tishman Professor of Sociology, was getting ready to begin the last class of the course on quantitative research methods he has been teaching for more than 40 years, first at Northwestern and for the past three decades at Harvard. Meanwhile, students sat...
IEEE Spectrum

A World-Class Tech MBA on Your Own Schedule

This sponsored article is brought to you by Purdue University’s Krannert School of Management. A Master of Business Administration degree can be a passport to broader career horizons—especially for ambitious young (and young-ish) engineers. While 80 percent of applicants for graduate management education do say that earning more...

An LBS and agent-based simulator for Covid-19 research

The mobility data of citizens provide important information on the epidemic spread including Covid-19. However, the privacy versus security dilemma hinders the utilization of such data. This paper proposed a method to generate pseudo mobility data on a per-agent basis, utilizing the actual geographical environment data provided by LBS to generate the agent-specific mobility trajectories and export them as GPS-like data. Demographic characteristics such as behavior patterns, gender, age, vaccination, and mask-wearing status are also assigned to the agents. A web-based data generator was implemented, enabling users to make detailed settings to meet different research needs. The simulated data indicated the usability of the proposed methods.

Most Popular Machine Learning Models in 2023

Machines today can learn in highly advanced ways. Computers churn through billions of data points to rapidly detect complex patterns and solve real-world problems. How? By using machine learning models. Machine learning is a branch of computer science that achieves one of the primary objectives of artificial intelligence (AI). This...

Crash severity analysis and risk factors identification based on an alternate data source: a case study of developing country

Road traffic injuries are one of the primary reasons for death, especially in developing countries like Bangladesh. Safety in land transport is one of the major concerns for road safety authorities and other policymakers. For this reason, contributory factors identification associated with crashes is necessary for reducing road crashes and ensuring transportation safety. This paper presents an analytical approach to identifying significant contributing factors of Bangladesh road crashes by evaluating the road crash data, considering three different severity levels (non-fetal, severe, and extremely severe). Generally, official crash databases are compiled from police-reported crash records. Though the official datasets are focusing on compiling a wide array of attributes, an assorted number of unreported issues can be observed that demands an alternative source of crash data. Therefore, this proposed approach considers compiling crash data from newspapers in Bangladesh which could be complimentary to the official crash database. To conduct the analysis, first, we filtered the useful features from compiled crash data using three popular feature selection techniques: chi-square, Two-way ANOVA, and Regression analysis. Then, we employed three machine learning classifiers: Decision Tree, Random Forest, and Naïve Bayes over the extracted features. A confusion matrix was considered to evaluate the proposed model, including classification accuracy, sensitivity, and specificity. The predictive machine learning model, namely, Random Forest using Label Encoder with chi-square and Two-way ANOVA feature selection process, seems the best option for crash severity prediction that provides high prediction accuracy. The resulting model highlights nine out of fourteen independent features as responsible factors. Significant features associated with crash severities include driver characteristics (gender, license type, seat belts), vehicle characteristics (vehicle type), road characteristics (road surface type, road classification), environmental conditions (day of crash occurred, time of crash), and injury localization. This outcome may contribute to improving traffic safety of Bangladesh.

Frontiers: Polarized America: From Political Polarization to Preference Polarization

In light of the widely discussed political divide and increasing societal polarization, we investigate in this paper whether the polarization of political ideology extends to consumers’ preferences, intentions, and purchases. Using three different data sets—the publicly available social media data of over three million brand followerships of Twitter users, a YouGov brand-preference survey data set, and Nielsen scanner panel data—we assess the evolution of brand-preference polarization. We find that the apparent polarization in political ideologies after the election of Donald Trump in 2016 stretches further to the daily lives of consumers. We observe increased polarization in preferences, behavioral intentions, and actual purchase decisions for consumer brands. Consistent with compensatory consumption theory, we find that the increase in polarization following the election of Donald Trump was stronger for liberals relative to conservatives, and that this asymmetric polarization is driven by consumers’ demand for “Democratic brands” rather than the supply of such brands. From a brand perspective, there is evidence that brands that took a political stance observed a shift in their customer base in terms of their customers’ political affiliation. We provide publicly available ( access to the unique Twitter-based brand political affiliation scores.
Traders Magazine

Women in Finance Awards Q&A: Yiyang Yang, Virtu Financial

Yiyang Yang, Quantitative Strategist, Virtu Financial, won STEM Champion at Markets Media Group’s 2022 Women in Finance (U.S.) Awards. It was one of excitement and gratitude. I am honored to be recognized in the field of quantitative research and super grateful to have had the opportunity to work at Virtu Financial and develop my skills in this area. I have been fortunate to have the support of mentors and colleagues who have encouraged and guided me along the way. And I am thrilled to become a role model and inspire other women to pursue careers in STEM fields.

Data-driven discovery of dimensionless numbers and governing laws from scarce measurements

Dimensionless numbers and scaling laws provide elegant insights into the characteristic properties of physical systems. Classical dimensional analysis and similitude theory fail to identify a set of unique dimensionless numbers for a highly multi-variable system with incomplete governing equations. This paper introduces a mechanistic data-driven approach that embeds the principle of dimensional invariance into a two-level machine learning scheme to automatically discover dominant dimensionless numbers and governing laws (including scaling laws and differential equations) from scarce measurement data. The proposed methodology, called dimensionless learning, is a physics-based dimension reduction technique. It can reduce high-dimensional parameter spaces to descriptions involving only a few physically interpretable dimensionless parameters, greatly simplifying complex process design and system optimization. We demonstrate the algorithm by solving several challenging engineering problems with noisy experimental measurements (not synthetic data) collected from the literature. Examples include turbulent Rayleigh-Bénard convection, vapor depression dynamics in laser melting of metals, and porosity formation in 3D printing. Lastly, we show that the proposed approach can identify dimensionally homogeneous differential equations with dimensionless number(s) by leveraging sparsity-promoting techniques.

An Intuitive Comparison of MCMC and Variational Inference

I recently started working my way through “Probabilistic Programming and Bayesian Methods for Hackers”, which has been on my to-do list for a long time. Speaking as someone who’s taken my fair share of statistics and machine learning classes (including Bayesian stats), I find that I’m understanding things through this coding-first approach that were never clear before. I highly recommend this read!

Data-driven methods for discovery of next-generation electrostrictive materials

All dielectrics exhibit electrostriction, i.e., display a quadratic strain response to an electric field compared to the linear strain dependence of piezoelectrics. As such, there is significant interest in discovering new electrostrictors with enhanced electrostrictive coefficients, especially as electrostrictors can exhibit effective piezoelectricity when a bias electric field is applied. We present the results of a study combining data mining and first-principles computations that indicate that there exists a group of iodides, bromides, and chlorides that have electrostrictive coefficients exceeding 10"‰m4"‰C"“2 which are substantially higher than typical oxide electrostrictive ceramics and polymers. The corresponding effective piezoelectric voltage coefficients are three orders of magnitude larger than lead zirconate titanate.

Global Employment Screening Services Market Regional Growth Analysis With Industry Players Data By 2030 – PRIZM News

New Jersey, United States – Verified Market Analysis has not too long ago printed a analysis report titled, “International Employment Screening Companies Market Perception, Forecast To 2028” assessing varied elements impacting its trajectory. The International Employment Screening Companies market report presents a high-quality, correct, and complete analysis research to equip gamers with precious insights for making strategic enterprise selections. The analysis analysts have offered deep segmental evaluation of the International Employment Screening Companies market on the premise of sort, software, and geography. The seller panorama can be shed gentle upon to tell readers about future modifications available in the market competitors. As a part of aggressive evaluation, the report contains detailed firm profiling of prime gamers of the International Employment Screening Companies market. Gamers may use the worth chain evaluation and Porter’s 5 Forces evaluation provided within the report for strengthening their place within the International Employment Screening Companies market.
Yale Daily News

Activist and whistleblower Chelsea Manning talks privacy at YPU event

In 2013, Former United States Army intelligence analyst Chelsea Manning was convicted for violations of the Espionage Act in 2013 after disclosing over 750,000 classified or sensitive military and diplomatic documents to the media organization WikiLeaks. The incident sparked years of national debate over privacy and data collection — a conversation she continued at a Dec. 6 event on campus.

What Is A Helium Miner And How Does It Operate?

The helium miner, is a wireless device known as a hotspot, uses radio technologies for HNT minting. It rewards HNT tokens for offering coverage. Mining assists in validating the legitimacy of transactions executed through a blockchain network like the Bitcoin blockchain. Miners can begin mining cryptos using hardware like a central processing unit (CPU) or some application-specific integrated circuits (ASICs). On the other hand, they can use smartphones that are powered by iOS and Android systems to mine the cryptos of their choice.

JPMorgan Invests in Sustainability-Focused Mining Tech Startup MineSense

Mining-focused data solutions company MineSense Technologies announced today that it has raised $42 million, with proceeds from the financing aimed at accelerating the commercial deployment of its solutions to address growing demand for more sustainable supply chains and for energy transition-related materials. The Series E financing round was led by...

Successful in the data economy

Unlike other resources, data grows at an unstoppable pace. To put it in numbers: The total volume of data is expected to exceed 175 zettabytes by 2025. But that means little unless the data is harnessed and used. The data economy provides an environment for organizing, transforming, analyzing, and sharing data to derive value from it; Organizations participating in the data economy gain insights into current and future trends to enable innovation, customer acquisition, product development, problem solving and more.