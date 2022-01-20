ContributorsPublishersAdvertisers
Statistical detection of format dialects using the weighted Dowker complex

By Michael Robinson, Letitia W. Li, Cory Anderson, Steve Huntsman
 4 days ago

This paper provides an experimentally validated, probabilistic model of file behavior when consumed by a set of pre-existing parsers. File behavior is measured by way of a standardized set of Boolean "messages" produced as the files are read. By thresholding the posterior...

Small Object Detection using Deep Learning

Now a days, UAVs such as drones are greatly used for various purposes like that of capturing and target detection from ariel imagery etc. Easy access of these small ariel vehicles to public can cause serious security threats. For instance, critical places may be monitored by spies blended in public using drones. Study in hand proposes an improved and efficient Deep Learning based autonomous system which can detect and track very small drones with great precision. The proposed system consists of a custom deep learning model Tiny YOLOv3, one of the flavors of very fast object detection model You Look Only Once (YOLO) is built and used for detection. The object detection algorithm will efficiently the detect the drones. The proposed architecture has shown significantly better performance as compared to the previous YOLO version. The improvement is observed in the terms of resource usage and time complexity. The performance is measured using the metrics of recall and precision that are 93% and 91% respectively.
Drone Object Detection Using RGB/IR Fusion

Object detection using aerial drone imagery has received a great deal of attention in recent years. While visible light images are adequate for detecting objects in most scenarios, thermal cameras can extend the capabilities of object detection to night-time or occluded objects. As such, RGB and Infrared (IR) fusion methods for object detection are useful and important. One of the biggest challenges in applying deep learning methods to RGB/IR object detection is the lack of available training data for drone IR imagery, especially at night. In this paper, we develop several strategies for creating synthetic IR images using the AIRSim simulation engine and CycleGAN. Furthermore, we utilize an illumination-aware fusion framework to fuse RGB and IR images for object detection on the ground. We characterize and test our methods for both simulated and actual data. Our solution is implemented on an NVIDIA Jetson Xavier running on an actual drone, requiring about 28 milliseconds of processing per RGB/IR image pair.
Detecting log4j using ShiftLeft CORE

Over the last few weeks, log4j has been the focus in most organizations. It continues to dominate tech media as the FTC threatens action against unpatched systems and Microsoft warns of continued exploits of the vulnerability. We have covered it in detail here, here, and here. In this blog, we will focus on how you can easily detect vulnerable versions of log4j in your Java applications using ShiftLeft CORE.
Roadside Lidar Vehicle Detection and Tracking Using Range And Intensity Background Subtraction

In this paper, we present the solution of roadside LiDAR object detection using a combination of two unsupervised learning algorithms. The 3D point clouds data are firstly converted into spherical coordinates and filled into the azimuth grid matrix using a hash function. After that, the raw LiDAR data were rearranged into spatial-temporal data structures to store the information of range, azimuth, and intensity. Dynamic Mode Decomposition method is applied for decomposing the point cloud data into low-rank backgrounds and sparse foregrounds based on intensity channel pattern recognition. The Triangle Algorithm automatically finds the dividing value to separate the moving targets from static background according to range information. After intensity and range background subtraction, the foreground moving objects will be detected using a density-based detector and encoded into the state-space model for tracking. The output of the proposed model includes vehicle trajectories that can enable many mobility and safety applications. The method was validated against a commercial traffic data collection platform and demonstrated to be an efficient and reliable solution for infrastructure LiDAR object detection. In contrast to the previous methods that process directly on the scattered and discrete point clouds, the proposed method can establish the less sophisticated linear relationship of the 3D measurement data, which captures the spatial-temporal structure that we often desire.
Detecting danger in gridworlds using Gromov's Link Condition

Gridworlds have been long-utilised in AI research, particularly in reinforcement learning, as they provide simple yet scalable models for many real-world applications such as robot navigation, emergent behaviour, and operations research. We initiate a study of gridworlds using the mathematical framework of reconfigurable systems and state complexes due to Abrams, Ghrist & Peterson. State complexes represent all possible configurations of a system as a single geometric space, thus making them conducive to study using geometric, topological, or combinatorial methods. The main contribution of this work is a modification to the original Abrams, Ghrist & Peterson setup which we believe is more naturally-suited to the context of gridworlds. With this modification, the state complexes may exhibit geometric defects (failure of Gromov's Link Condition), however, we argue that these failures can indicate undesirable or dangerous states in the gridworld. Our results provide a novel method for seeking guaranteed safety limitations in discrete task environments with single or multiple agents, and offer potentially useful geometric and topological information for incorporation in or analysis of machine learning systems.
Raspberry Pi-based device uses electromagnetic waves to detect malware

In brief: Antivirus software typically relies on a combination of machine learning algorithms and frequently-updated malware definitions to protect our computers from outside threats. However, no antivirus software is perfect, and they will occasionally miss newer or heavily-disguised threats. That's why researchers from the Institute of Computer Science and Random Systems have sought to explore new methods of detecting hostile programs that don't rely on software solutions at all.
Tracking Adversaries in AWS using Anomaly Detection, Part 2

The first part of this series explored minimizing the impact of a breach by identifying malicious actors’ anomalous behavior and taking action. In part two, we will go through the cyber “kill chain” with Pacu and explain how to use automated analysis to detect anomalous behavior. While...
Collision Detection: An Improved Deep Learning Approach Using SENet and ResNext

In recent days, with increased population and traffic on roadways, vehicle collision is one of the leading causes of death worldwide. The automotive industry is motivated on developing techniques to use sensors and advancements in the field of computer vision to build collision detection and collision prevention systems to assist drivers. In this article, a deep-learning-based model comprising of ResNext architecture with SENet blocks is proposed. The performance of the model is compared to popular deep learning models like VGG16, VGG19, Resnet50, and stand-alone ResNext. The proposed model outperforms the existing baseline models achieving a ROC-AUC of 0.91 using a significantly less proportion of the GTACrash synthetic data for training, thus reducing the computational overhead.
Remote photonic detection of human senses using secondary speckle patterns

Neural activity research has recently gained significant attention due to its association with sensory information and behavior control. However, the current methods of brain activity sensing require expensive equipment and physical contact with the tested subject. We propose a novel photonic-based method for remote detection of human senses. Physiological processes associated with hemodynamic activity due to activation of the cerebral cortex affected by different senses have been detected by remote monitoring of nano"vibrations generated by the transient blood flow to the specific regions of the human brain. We have found that a combination of defocused, self"interference random speckle patterns with a spatiotemporal analysis, using Deep Neural Network, allows associating between the activated sense and the seemingly random speckle patterns.
Targeted Optimal Treatment Regime Learning Using Summary Statistics

Personalized decision-making, aiming to derive optimal individualized treatment rules (ITRs) based on individual characteristics, has recently attracted increasing attention in many fields, such as medicine, social services, and economics. Current literature mainly focuses on estimating ITRs from a single source population. In real-world applications, the distribution of a target population can be different from that of the source population. Therefore, ITRs learned by existing methods may not generalize well to the target population. Due to privacy concerns and other practical issues, individual-level data from the target population is often not available, which makes ITR learning more challenging. We consider an ITR estimation problem where the source and target populations may be heterogeneous, individual data is available from the source population, and only the summary information of covariates, such as moments, is accessible from the target population. We develop a weighting framework that tailors an ITR for a given target population by leveraging the available summary statistics. Specifically, we propose a calibrated augmented inverse probability weighted estimator of the value function for the target population and estimate an optimal ITR by maximizing this estimator within a class of pre-specified ITRs. We show that the proposed calibrated estimator is consistent and asymptotically normal even with flexible semi/nonparametric models for nuisance function approximation, and the variance of the value estimator can be consistently estimated. We demonstrate the empirical performance of the proposed method using simulation studies and a real application to an eICU dataset as the source sample and a MIMIC-III dataset as the target sample.
DNA Base Detection Using Two-Dimensional Materials Beyond Graphene

The success of graphene for nanopore DNA sequencing has shown that it is possible to explore other potential single- and few-atom thick layers of 2D materials beyond graphene, and also that these materials can exhibit fascinating and technologically useful properties for DNA base detection that are superior to those of graphene. In this article, we review the state-of-the art of DNA base detection using 2D materials beyond graphene. Initially, we present an overview of nanopore-based DNA sequencing methods using biological and solid-state nanopores, and discuss several challenges that limit their use for single-base resolution. Then we outline the progress, challenges, and opportunities using graphene. Additionally, we discuss several potential 2D materials beyond graphene such as hexagonal boron nitride, elemental 2D materials beyond graphene, and 2D transition metal dichalcogenides. Finally, we highlight the potential of using van der Waals materials for advanced DNA base detection technologies.
Understanding and Detecting Hateful Content using Contrastive Learning

The spread of hate speech and hateful imagery on the Web is a significant problem that needs to be mitigated to improve our Web experience. This work contributes to research efforts to detect and understand hateful content on the Web by undertaking a multimodal analysis of Antisemitism and Islamophobia on 4chan's /pol/ using OpenAI's CLIP. This large pre-trained model uses the Contrastive Learning paradigm. We devise a methodology to identify a set of Antisemitic and Islamophobic hateful textual phrases using Google's Perspective API and manual annotations. Then, we use OpenAI's CLIP to identify images that are highly similar to our Antisemitic/Islamophobic textual phrases. By running our methodology on a dataset that includes 66M posts and 5.8M images shared on 4chan's /pol/ for 18 months, we detect 573,513 posts containing 92K Antisemitic/Islamophobic images and 246K posts that include 420 hateful phrases. Among other things, we find that we can use OpenAI's CLIP model to detect hateful content with an accuracy score of 0.84 (F1 score = 0.58). Also, we find that Antisemitic/Islamophobic imagery is shared in 2x more posts on 4chan's /pol/ compared to Antisemitic/Islamophobic textual phrases, highlighting the need to design more tools for detecting hateful imagery. Finally, we make publicly available a dataset of 420 Antisemitic/Islamophobic phrases and 92K images that can assist researchers in further understanding Antisemitism/Islamophobia and developing more accurate hate speech detection models.
Automatic detection of multilevel communities in complex networks with a scalable community fitness function

Community detection in complex networks has been held back by two obstacles: the resolution limit problem, which restrains simultaneous detection for communities of heterogeneous sizes, and the divergent outputs of heuristic algorithm, which are unfavorable for differentiating more relevant and significant results. In this paper, we propose a renewed method for community detection with a multiresolution and rescalable community fitness function. The scalability of the community fitness function on the one hand exempts our method from the resolution limit problem in heterogeneous networks, and on the other hand enables our method to detect multilevel community structures in deep hierarchical networks. Furthermore, we suggest a strict definition of "plateau," with which we evaluate the stability of the outputs, and remove unstable and irrelevant ones automatically, without any artificial or arbitrary selection. As a result, our method outperforms most previous methods; it reproduces the expected community structures accurately for various classes of synthetic networks, as well as the ground truths for real-world networks.
The Coupled Rejection Sampler

We propose a novel coupled rejection-sampling method for sampling from couplings of arbitrary distributions. The method relies on accepting or rejecting coupled samples coming from dominating marginals. Contrary to existing acceptance-rejection methods, the variance of the execution time of the proposed method is limited and stays finite as the two target marginals approach each other in the sense of the total variation norm. In the important special case of coupling multivariate Gaussians with different means and covariances, we derive positive lower bounds for the resulting coupling probability of our algorithm, and we then show how the coupling method can be optimised using convex optimisation. Finally, we show how we can modify the coupled-rejection method to propose from coupled ensemble of proposals, so as to asymptotically recover a maximal coupling. We then apply the method to derive a novel parallel coupled particle filter resampling algorithm, and show how it can be used to speed up unbiased MCMC methods based on couplings.
Optimal SQ Lower Bounds for Learning Halfspaces with Massart Noise

We give tight statistical query (SQ) lower bounds for learnining halfspaces in the presence of Massart noise. In particular, suppose that all labels are corrupted with probability at most $\eta$. We show that for arbitrary $\eta \in [0,1/2]$ every SQ algorithm achieving misclassification error better than $\eta$ requires queries of superpolynomial accuracy or at least a superpolynomial number of queries. Further, this continues to hold even if the information-theoretically optimal error $\mathrm{OPT}$ is as small as $\exp\left(-\log^c(d)\right)$, where $d$ is the dimension and $0 < c < 1$ is an arbitrary absolute constant, and an overwhelming fraction of examples are noiseless. Our lower bound matches known polynomial time algorithms, which are also implementable in the SQ framework. Previously, such lower bounds only ruled out algorithms achieving error $\mathrm{OPT} + \epsilon$ or error better than $\Omega(\eta)$ or, if $\eta$ is close to $1/2$, error $\eta - o_\eta(1)$, where the term $o_\eta(1)$ is constant in $d$ but going to 0 for $\eta$ approaching $1/2$.
Probability Distribution on Rooted Trees

The hierarchical and recursive expressive capability of rooted trees is applicable to represent statistical models in various areas, such as data compression, image processing, and machine learning. On the other hand, such hierarchical expressive capability causes a problem in tree selection to avoid overfitting. One unified approach to solve this is a Bayesian approach, on which the rooted tree is regarded as a random variable and a direct loss function can be assumed on the selected model or the predicted value for a new data point. However, all the previous studies on this approach are based on the probability distribution on full trees, to the best of our knowledge. In this paper, we propose a generalized probability distribution for any rooted trees in which only the maximum number of child nodes and the maximum depth are fixed. Furthermore, we derive recursive methods to evaluate the characteristics of the probability distribution without any approximations.
Multi-multifractality and dynamic scaling in stochastic porous lattice

In this article, we extend the idea of stochastic dyadic Cantor set to weighted planar stochastic lattice that leads to a stochastic porous lattice. The process starts with an initiator which we choose to be a square of unit area for convenience. We then define a generator that divides the initiator or one of the blocks, picked preferentially with respect to their areas, to divide it either horizontally or vertically into two rectangles of which one of them is removed with probability $q=1-p$. We find that the remaining number of blocks and their mass varies with time as $t^{p}$ and $t^{-q}$ respectively. Analytical solution shows that the dynamics of this process is governed by infinitely many hidden conserved quantities each of which is a multifractal measure with porous structure as it contains missing blocks of various different sizes. The support where these measures are distributed is fractal with fractal dimension $2p$ provided $0<p<1$. We find that if the remaining blocks are characterized by their respective area then the corresponding block size distribution function obeys dynamic scaling.
Imputing Missing Values in the Occupational Requirements Survey

The U.S. Bureau of Labor Statistics allows public access to much of the data acquired through its Occupational Requirements Survey (ORS). This data can be used to draw inferences about the requirements of various jobs and job classes within the United States workforce. However, the dataset contains a multitude of missing observations and estimates, which somewhat limits its utility. Here, we propose a method by which to impute these missing values that leverages many of the inherent features present in the survey data, such as known population limit and correlations between occupations and tasks. An iterative regression fit, implemented with a recent version of XGBoost and executed across a set of simulated values drawn from the distribution described by the known values and their standard deviations reported in the survey, is the approach used to arrive at a distribution of predicted values for each missing estimate. This allows us to calculate a mean prediction and bound said estimate with a 95% confidence interval. We discuss the use of our method and how the resulting imputations can be utilized to inform and pursue future areas of study stemming from the data collected in the ORS. Finally, we conclude with an outline of WIGEM, a generalized version of our weighted, iterative imputation algorithm that could be applied to other contexts.
A Machine Learning Framework for Distributed Functional Compression over Wireless Channels in IoT

IoT devices generating enormous data and state-of-the-art machine learning techniques together will revolutionize cyber-physical systems. In many diverse fields, from autonomous driving to augmented reality, distributed IoT devices compute specific target functions without simple forms like obstacle detection, object recognition, etc. Traditional cloud-based methods that focus on transferring data to a central location either for training or inference place enormous strain on network resources. To address this, we develop, to the best of our knowledge, the first machine learning framework for distributed functional compression over both the Gaussian Multiple Access Channel (GMAC) and orthogonal AWGN channels. Due to the Kolmogorov-Arnold representation theorem, our machine learning framework can, by design, compute any arbitrary function for the desired functional compression task in IoT. Importantly the raw sensory data are never transferred to a central node for training or inference, thus reducing communication. For these algorithms, we provide theoretical convergence guarantees and upper bounds on communication. Our simulations show that the learned encoders and decoders for functional compression perform significantly better than traditional approaches, are robust to channel condition changes and sensor outages. Compared to the cloud-based scenario, our algorithms reduce channel use by two orders of magnitude.
CTRMs: Learning to Construct Cooperative Timed Roadmaps for Multi-agent Path Planning in Continuous Spaces

Multi-agent path planning (MAPP) in continuous spaces is a challenging problem with significant practical importance. One promising approach is to first construct graphs approximating the spaces, called roadmaps, and then apply multi-agent pathfinding (MAPF) algorithms to derive a set of conflict-free paths. While conventional studies have utilized roadmap construction methods developed for single-agent planning, it remains largely unexplored how we can construct roadmaps that work effectively for multiple agents. To this end, we propose a novel concept of roadmaps called cooperative timed roadmaps (CTRMs). CTRMs enable each agent to focus on its important locations around potential solution paths in a way that considers the behavior of other agents to avoid inter-agent collisions (i.e., "cooperative"), while being augmented in the time direction to make it easy to derive a "timed" solution path. To construct CTRMs, we developed a machine-learning approach that learns a generative model from a collection of relevant problem instances and plausible solutions and then uses the learned model to sample the vertices of CTRMs for new, previously unseen problem instances. Our empirical evaluation revealed that the use of CTRMs significantly reduced the planning effort with acceptable overheads while maintaining a success rate and solution quality comparable to conventional roadmap construction approaches.
