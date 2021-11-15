ContributorsPublishersAdvertisers
Attention Mechanisms in Computer Vision: A Survey

By Meng-Hao Guo, Tian-Xing Xu, Jiang-Jiang Liu, Zheng-Ning Liu, Peng-Tao Jiang, Tai-Jiang Mu, Song-Hai Zhang, Ralph R. Martin, Ming-Ming Cheng, Shi-Min Hu
 5 days ago

Meng-Hao Guo, Tian-Xing Xu, Jiang-Jiang Liu, Zheng-Ning Liu, Peng-Tao Jiang, Tai-Jiang Mu, Song-Hai Zhang, Ralph R. Martin, Ming-Ming Cheng, Shi-Min Hu. Humans can naturally and effectively find salient regions in complex scenes. Motivated by this observation,...

VentureBeat

Video-level computer vision advances business insights

This article was contributed by Can Kocagil, data scientist at OREDATA. Instance-based classification, segmentation, and object detection in images are fundamental issues in the context of computer vision. Different from image-level information retrieval, the video-level problems aim at detection, segmentation, and tracking of object instances in spatiotemporal domain that have both space and time dimensions.
SOFTWARE
arxiv.org

A Survey on Hyperdimensional Computing aka Vector Symbolic Architectures, Part I: Models and Data Transformations

This two-part comprehensive survey is devoted to a computing framework most commonly known under the names Hyperdimensional Computing and Vector Symbolic Architectures (HDC/VSA). Both names refer to a family of computational models that use high-dimensional distributed representations and rely on the algebraic properties of their key operations to incorporate the advantages of structured symbolic representations and vector distributed representations. Notable models in the HDC/VSA family are Tensor Product Representations, Holographic Reduced Representations, Multiply-Add-Permute, Binary Spatter Codes, and Sparse Binary Distributed Representations but there are other models too. HDC/VSA is a highly interdisciplinary area with connections to computer science, electrical engineering, artificial intelligence, mathematics, and cognitive science. This fact makes it challenging to create a thorough overview of the area. However, due to a surge of new researchers joining the area in recent years, the necessity for a comprehensive survey of the area has become extremely important. Therefore, amongst other aspects of the area, this Part I surveys important aspects such as: known computational models of HDC/VSA and transformations of various input data types to high-dimensional distributed representations. Part II of this survey is devoted to applications, cognitive computing and architectures, as well as directions for future work. The survey is written to be useful for both newcomers and practitioners.
CODING & PROGRAMMING
arxiv.org

Computing Simple Mechanisms: Lift-and-Round over Marginal Reduced Forms

We study revenue maximization in multi-item multi-bidder auctions under the natural item-independence assumption - a classical problem in Multi-Dimensional Bayesian Mechanism Design. One of the biggest challenges in this area is developing algorithms to compute (approximately) optimal mechanisms that are not brute-force in the size of the bidder type space, which is usually exponential in the number of items in multi-item auctions. Unfortunately, such algorithms were only known for basic settings of our problem when bidders have unit-demand [CHMS10,CMS15] or additive valuations [Yao15].
COMPUTERS
towardsdatascience.com

How to Unlock Powerful Computer Vision Applications by Adding a Flavor of NLP

Join hundreds of subscribers to my weekly newsletter if you’re interested in learning and staying up-to-date with what’s happening in the field of Machine Learning. Flavored by my experience as a Machine Learning Engineer :) What is CLIP?. Self-supervised learning in computer vision has shown great potential in learning different...
CODING & PROGRAMMING
#Computer Vision
arxiv.org

Computer Vision for Supporting Image Search

Computer vision and multimedia information processing have made extreme progress within the last decade and many tasks can be done with a level of accuracy as if done by humans, or better. This is because we leverage the benefits of huge amounts of data available for training, we have enormous computer processing available and we have seen the evolution of machine learning as a suite of techniques to process data and deliver accurate vision-based systems. What kind of applications do we use this processing for ? We use this in autonomous vehicle navigation or in security applications, searching CCTV for example, and in medical image analysis for healthcare diagnostics. One application which is not widespread is image or video search directly by users. In this paper we present the need for such image finding or re-finding by examining human memory and when it fails, thus motivating the need for a different approach to image search which is outlined, along with the requirements of computer vision to support it.
SOFTWARE
Supply & Demand Chain Executive

How Computer Vision Technology is Enabling Micro-Fulfillment

The supply chain industry grew during the Coronavirus disease (COVID-19) crisis and so did the need for faster operational processes and automation of human tasks. As part of it, the logistics sector is struggling to meet the growing consumer demands, high labor costs, regulatory measures and siloed data, whilst complying with a dynamic environment. Complexities woven in the industry are not just occasional, but tend to create a ripple effect across the infrastructure. Ultimately, the warehouse workforce strives to meet customers’ requirements by managing incoming orders through multiple layers, regardless of inventory processes.
SOFTWARE
towardsdatascience.com

Computer Vision Sensors & Systems

From the hardware to the systems enabling computer vision, this article is an overview that favors breadth over depth. To balance this approach, the article directs the reader towards instructive references and provides ready-to-run source code. We start with the mechanics of image formation. We cover pinholes, lenses, sensors (CCD and CMOS), Bayer filters, and color reconstruction.
SOFTWARE
VentureBeat

MLOps platform Landing AI raises $57M to help manufacturers adopt computer vision

Palo Alto, California-based Landing AI, the AI startup led by Andrew Ng — the cofounder of Google Brain, one of Google’s AI research divisions — today announced that it raised $57 million in a series A funding round led by McRock Capital. In addition, Insight Partners, Taiwania Capital, Canadian Pension Plan Investment Board, Intel Capital, Samsung Catalyst Fund, Far Eastern Group’s DRIVE Catalyst, Walsin Lihwa, and AI Fund participated, bringing Landing AI’s total raised to around $100 million.
ENGINEERING
NewsBreak
Technology
NewsBreak
Computers
NewsBreak
Coding & Programming
NewsBreak
Software
ZDNet

SambaNova is enabling disruption in the enterprise with AI language models, computer vision, recommendations, and graphs

Like artificial intelligence itself, the AI startup SambaNova is interesting across the stack. From software to hardware, from technology to business model, and from vision to execution. SambaNova has made the news for a number of reasons: high-profile founders, a series of funding rounds propelling it into unicorn territory, impressive...
SOFTWARE
abilenetx.gov

Computer Basics

If you have an interest in learning how to use a computer, we have the class for you. Computer Basics is a hands-on class giving an introduction to the parts of the computer and how to use them, focusing on its primary navigational tool, the mouse. This 1-session class is designed for people with little to no computer experience, so don’t hesitate to sign-up if you fall into one of these categories. Class size is limited to five and registration is required. Register online at www.abilenetx.gov/apl/EventsCal or call to register over the phone at 325-437-7323.
COMPUTERS
towardsdatascience.com

Attending to Attention

I have been a Machine Learning Engineer for almost 4 years now, I started with what is now called the “Classical Models”, Logistic, Tree-based, Baysian, etc, and since last year has moved into Neural Networks and Deep Learning. I would say I did pretty well, that was until my attention lay on “Attention” (pun intended). I tried reading through tutorials, lectures, guides but nothing ever fully helped me grasp the core idea.
CODING & PROGRAMMING
arxiv.org

Automatic Neural Network Pruning that Efficiently Preserves the Model Accuracy

Neural networks performance has been significantly improved in the last few years, at the cost of an increasing number of floating point operations per second (FLOPs). However, more FLOPs can be an issue when computational resources are limited. As an attempt to solve this problem, pruning filters is a common solution, but most existing pruning methods do not preserve the model accuracy efficiently and therefore require a large number of finetuning epochs. In this paper, we propose an automatic pruning method that learns which neurons to preserve in order to maintain the model accuracy while reducing the FLOPs to a predefined target. To accomplish this task, we introduce a trainable bottleneck that only requires one single epoch with 25.6% (CIFAR-10) or 7.49% (ILSVRC2012) of the dataset to learn which filters to prune. Experiments on various architectures and datasets show that the proposed method can not only preserve the accuracy after pruning but also outperform existing methods after finetuning. We achieve a 52.00% FLOPs reduction on ResNet-50, with a Top-1 accuracy of 47.51% after pruning and a state-of-the-art (SOTA) accuracy of 76.63% after finetuning on ILSVRC2012. Code is available at (link anonymized for review).
CODING & PROGRAMMING
TrendHunter.com

Customization-Focused Mechanical Keyboards

Customization continues to be of key importance to many technology users when it comes to their everyday desktop solutions, which is exactly what the Sharkoon SKILLER SGK50 S4 keyboard is designed to help with. The mechanical keyboard maintains a 60% layout that combines all essential functions into a compact format...
TECHNOLOGY
arxiv.org

Self-Attending Task Generative Adversarial Network for Realistic Satellite Image Creation

We introduce a self-attending task generative adversarial network (SATGAN) and apply it to the problem of augmenting synthetic high contrast scientific imagery of resident space objects with realistic noise patterns and sensor characteristics learned from collected data. Augmenting these synthetic data is challenging due to the highly localized nature of semantic content in the data that must be preserved. Real collected images are used to train a network what a given class of sensor's images should look like. The trained network then acts as a filter on noiseless context images and outputs realistic-looking fakes with semantic content unaltered. The architecture is inspired by conditional GANs but is modified to include a task network that preserves semantic information through augmentation. Additionally, the architecture is shown to reduce instances of hallucinatory objects or obfuscation of semantic content in context images representing space observation scenes.
ARCHITECTURE
arxiv.org

Information-theoretic formulation of dynamical systems: causality, modeling, and control

The problems of causality, modeling, and control for chaotic, high-dimensional dynamical systems are formulated in the language of information theory. The central quantity of interest is the Shannon entropy, which measures the amount of information in the states of the system. Within this framework, causality in a dynamical system is quantified by the information flux among the variables of interest. Reduced-order modeling is posed as a problem on the conservation of information, in which models aim at preserving the maximum amount of relevant information from the original system. Similarly, control theory is cast in information-theoretic terms by envisioning the tandem sensor-actuator as a device reducing the unknown information of the state to be controlled. The new formulation is applied to address three problems in the causality, modeling, and control of turbulence, which stands as a primary example of a chaotic, high-dimensional dynamical system. The applications include the causality of the energy transfer in the turbulent cascade, subgrid-scale modeling for large-eddy simulation, and flow control for drag reduction in wall-bounded turbulence.
MATHEMATICS
softwareengineeringdaily.com

Deploying Computer Vision to the Edge at Anduril Industries with Forrest Iandola

Neural networks, in particular, deep neural networks have revolutionized machine learning. Researchers and companies have pushed on the efficiency of every aspect of the machine learning lifecycle. The impact of the trained models is particularly significant for computer vision and in turn for autonomous driving and security systems. In this...
SOFTWARE
arxiv.org

Large-scale Building Height Retrieval from Single SAR Imagery based on Bounding Box Regression Networks

Building height retrieval from synthetic aperture radar (SAR) imagery is of great importance for urban applications, yet highly challenging owing to the complexity of SAR data. This paper addresses the issue of building height retrieval in large-scale urban areas from a single TerraSAR-X spotlight or stripmap image. Based on the radar viewing geometry, we propose that this problem can be formulated as a bounding box regression problem and therefore allows for integrating height data from multiple data sources in generating ground truth on a larger scale. We introduce building footprints from geographic information system (GIS) data as complementary information and propose a bounding box regression network that exploits the location relationship between a building's footprint and its bounding box, allowing for fast computation. This is important for large-scale applications. The method is validated on four urban data sets using TerraSAR-X images in both high-resolution spotlight and stripmap modes. Experimental results show that the proposed network can reduce the computation cost significantly while keeping the height accuracy of individual buildings compared to a Faster R-CNN based method. Moreover, we investigate the impact of inaccurate GIS data on our proposed network, and this study shows that the bounding box regression network is robust against positioning errors in GIS data. The proposed method has great potential to be applied to regional or even global scales.
arxiv.org

Automated Approach for Computer Vision-based Vehicle Movement Classification at Traffic Intersections

Udita Jana, Jyoti Prakash Das Karmakar, Pranamesh Chakraborty, Tingting Huang, Dave Ness, Duane Ritcher, Anuj Sharma. Movement specific vehicle classification and counting at traffic intersections is a crucial component for various traffic management activities. In this context, with recent advancements in computer-vision based techniques, cameras have emerged as a reliable data source for extracting vehicular trajectories from traffic scenes. However, classifying these trajectories by movement type is quite challenging as characteristics of motion trajectories obtained this way vary depending on camera calibrations. Although some existing methods have addressed such classification tasks with decent accuracies, the performance of these methods significantly relied on manual specification of several regions of interest. In this study, we proposed an automated classification method for movement specific classification (such as right-turn, left-turn and through movements) of vision-based vehicle trajectories. Our classification framework identifies different movement patterns observed in a traffic scene using an unsupervised hierarchical clustering technique Thereafter a similarity-based assignment strategy is adopted to assign incoming vehicle trajectories to identified movement groups. A new similarity measure was designed to overcome the inherent shortcomings of vision-based trajectories. Experimental results demonstrated the effectiveness of the proposed classification approach and its ability to adapt to different traffic scenarios without any manual intervention.
TECHNOLOGY
arxiv.org

TransMix: Attend to Mix for Vision Transformers

Mixup-based augmentation has been found to be effective for generalizing models during training, especially for Vision Transformers (ViTs) since they can easily overfit. However, previous mixup-based methods have an underlying prior knowledge that the linearly interpolated ratio of targets should be kept the same as the ratio proposed in input interpolation. This may lead to a strange phenomenon that sometimes there is no valid object in the mixed image due to the random process in augmentation but there is still response in the label space. To bridge such gap between the input and label spaces, we propose TransMix, which mixes labels based on the attention maps of Vision Transformers. The confidence of the label will be larger if the corresponding input image is weighted higher by the attention map. TransMix is embarrassingly simple and can be implemented in just a few lines of code without introducing any extra parameters and FLOPs to ViT-based models. Experimental results show that our method can consistently improve various ViT-based models at scales on ImageNet classification. After pre-trained with TransMix on ImageNet, the ViT-based models also demonstrate better transferability to semantic segmentation, object detection and instance segmentation. TransMix also exhibits to be more robust when evaluating on 4 different benchmarks. Code will be made publicly available at this https URL.
COMPUTERS
arxiv.org

Successor Feature Landmarks for Long-Horizon Goal-Conditioned Reinforcement Learning

Operating in the real-world often requires agents to learn about a complex environment and apply this understanding to achieve a breadth of goals. This problem, known as goal-conditioned reinforcement learning (GCRL), becomes especially challenging for long-horizon goals. Current methods have tackled this problem by augmenting goal-conditioned policies with graph-based planning algorithms. However, they struggle to scale to large, high-dimensional state spaces and assume access to exploration mechanisms for efficiently collecting training data. In this work, we introduce Successor Feature Landmarks (SFL), a framework for exploring large, high-dimensional environments so as to obtain a policy that is proficient for any goal. SFL leverages the ability of successor features (SF) to capture transition dynamics, using it to drive exploration by estimating state-novelty and to enable high-level planning by abstracting the state-space as a non-parametric landmark-based graph. We further exploit SF to directly compute a goal-conditioned policy for inter-landmark traversal, which we use to execute plans to "frontier" landmarks at the edge of the explored state space. We show in our experiments on MiniGrid and ViZDoom that SFL enables efficient exploration of large, high-dimensional state spaces and outperforms state-of-the-art baselines on long-horizon GCRL tasks.
COMPUTERS

