TRIG: Transformer-Based Text Recognizer with Initial Embedding Guidance

By Yue Tao, Zhiwei Jia, Runze Ma, Shugong Xu
arxiv.org
 8 days ago

Scene text recognition (STR) is an important bridge between images and text, attracting abundant research attention. While convolutional neural networks (CNNS) have achieved remarkable progress in this task, most of the existing works need an extra module (context modeling module) to help CNN to capture...

arxiv.org

arxiv.org

A Novel Approach for Deterioration and Damage Identification in Building Structures Based on Stockwell-Transform and Deep Convolutional Neural Network

Vahid Reza Gharehbaghi, Hashem Kalbkhani, Ehsan Noroozinejad Farsangi, T.Y. Yang, Andy Nguyene, Seyedali Mirjalili, C. Málaga-Chuquitaype. In this paper, a novel deterioration and damage identification procedure (DIP) is presented and applied to building models. The challenge associated with applications on these types of structures is related to the strong correlation of responses, which gets further complicated when coping with real ambient vibrations with high levels of noise. Thus, a DIP is designed utilizing low-cost ambient vibrations to analyze the acceleration responses using the Stockwell transform (ST) to generate spectrograms. Subsequently, the ST outputs become the input of two series of Convolutional Neural Networks (CNNs) established for identifying deterioration and damage to the building models. To the best of our knowledge, this is the first time that both damage and deterioration are evaluated on building models through a combination of ST and CNN with high accuracy.
SCIENCE
arxiv.org

Improved Robustness of Vision Transformer via PreLayerNorm in Patch Embedding

Vision transformers (ViTs) have recently demonstrated state-of-the-art performance in a variety of vision tasks, replacing convolutional neural networks (CNNs). Meanwhile, since ViT has a different architecture than CNN, it may behave differently. To investigate the reliability of ViT, this paper studies the behavior and robustness of ViT. We compared the robustness of CNN and ViT by assuming various image corruptions that may appear in practical vision tasks. We confirmed that for most image transformations, ViT showed robustness comparable to CNN or more improved. However, for contrast enhancement, severe performance degradations were consistently observed in ViT. From a detailed analysis, we identified a potential problem: positional embedding in ViT's patch embedding could work improperly when the color scale changes. Here we claim the use of PreLayerNorm, a modified patch embedding structure to ensure scale-invariant behavior of ViT. ViT with PreLayerNorm showed improved robustness in various corruptions including contrast-varying environments.
COMPUTERS
semiengineering.com

Debugging Embedded Applications

Debugging embedded designs is becoming increasingly difficult as the number of observed and possible interactions between hardware and software continue to grow, and as more features are crammed into chips, packages, and systems. But there also appear to be some advances on this front, involving a mix of techniques, including hardware trace, scan chain-based debug, along with better simulation models.
SOFTWARE
arxiv.org

Entropy optimized semi-supervised decomposed vector-quantized variational autoencoder model based on transfer learning for multiclass text classification and generation

Semisupervised text classification has become a major focus of research over the past few years. Hitherto, most of the research has been based on supervised learning, but its main drawback is the unavailability of labeled data samples in practical applications. It is still a key challenge to train the deep generative models and learn comprehensive representations without supervision. Even though continuous latent variables are employed primarily in deep latent variable models, discrete latent variables, with their enhanced understandability and better compressed representations, are effectively used by researchers. In this paper, we propose a semisupervised discrete latent variable model for multi-class text classification and text generation. The proposed model employs the concept of transfer learning for training a quantized transformer model, which is able to learn competently using fewer labeled instances. The model applies decomposed vector quantization technique to overcome problems like posterior collapse and index collapse. Shannon entropy is used for the decomposed sub-encoders, on which a variable DropConnect is applied, to retain maximum information. Moreover, gradients of the Loss function are adaptively modified during backpropagation from decoder to encoder to enhance the performance of the model. Three conventional datasets of diversified range have been used for validating the proposed model on a variable number of labeled instances. Experimental results indicate that the proposed model has surpassed the state-of-the-art models remarkably.
COMPUTERS
arxiv.org

Generalizable Cross-Graph Embedding for GNN-based Congestion Prediction

Presently with technology node scaling, an accurate prediction model at early design stages can significantly reduce the design cycle. Especially during logic synthesis, predicting cell congestion due to improper logic combination can reduce the burden of subsequent physical implementations. There have been attempts using Graph Neural Network (GNN) techniques to tackle congestion prediction during the logic synthesis stage. However, they require informative cell features to achieve reasonable performance since the core idea of GNNs is built on the message passing framework, which would be impractical at the early logic synthesis stage. To address this limitation, we propose a framework that can directly learn embeddings for the given netlist to enhance the quality of our node features. Popular random-walk based embedding methods such as Node2vec, LINE, and DeepWalk suffer from the issue of cross-graph alignment and poor generalization to unseen netlist graphs, yielding inferior performance and costing significant runtime. In our framework, we introduce a superior alternative to obtain node embeddings that can generalize across netlist graphs using matrix factorization methods. We propose an efficient mini-batch training method at the sub-graph level that can guarantee parallel training and satisfy the memory restriction for large-scale netlists. We present results utilizing open-source EDA tools such as DREAMPLACE and OPENROAD frameworks on a variety of openly available circuits. By combining the learned embedding on top of the netlist with the GNNs, our method improves prediction performance, generalizes to new circuit lines, and is efficient in training, potentially saving over $90 \%$ of runtime.
CODING & PROGRAMMING
arxiv.org

The channel-spatial attention-based vision transformer network for automated, accurate prediction of crop nitrogen status from UAV imagery

Nitrogen (N) fertiliser is routinely applied by farmers to increase crop yields. At present, farmers often over-apply N fertilizer in some locations or timepoints because they do not have high-resolution crop N status data. N-use efficiency can be low, with the remaining N lost to the environment, resulting in high production costs and environmental pollution. Accurate and timely estimation of N status in crops is crucial to improving cropping systems' economic and environmental sustainability. The conventional approaches based on tissue analysis in the laboratory for estimating N status in plants are time consuming and destructive. Recent advances in remote sensing and machine learning have shown promise in addressing the aforementioned challenges in a non-destructive way. We propose a novel deep learning framework: a channel-spatial attention-based vision transformer (CSVT) for estimating crop N status from large images collected from a UAV in a wheat field. Unlike the existing works, the proposed CSVT introduces a Channel Attention Block (CAB) and a Spatial Interaction Block (SIB), which allows capturing nonlinear characteristics of spatial-wise and channel-wise features from UAV digital aerial imagery, for accurate N status prediction in wheat crops. Moreover, since acquiring labeled data is time consuming and costly, local-to-global self-supervised learning is introduced to pre-train the CSVT with extensive unlabelled data. The proposed CSVT has been compared with the state-of-the-art models, tested and validated on both testing and independent datasets. The proposed approach achieved high accuracy (0.96) with good generalizability and reproducibility for wheat N status estimation.
AGRICULTURE
lpgasmagazine.com

Cargas Energy incorporates embedded text messaging

Cargas Energy’s new texting feature embeds texting capabilities directly in Cargas Energy. By automatically sending personalized messages about upcoming deliveries and service appointments, Cargas Energy texting improves customer communication while reducing call volume, says the company. Cargas Energy texting also allows propane marketers to hold text conversations right in the software, with a message center to track all ongoing and historical conversations. The texting feature allows propane marketers to give customers personalized communication without adding to employees’ workload, according to Cargas.
ENERGY INDUSTRY
arxiv.org

A Convolutional Neural Network Based Approach to Recognize Bangla Spoken Digits from Speech Signal

Speech recognition is a technique that converts human speech signals into text or words or in any form that can be easily understood by computers or other machines. There have been a few studies on Bangla digit recognition systems, the majority of which used small datasets with few variations in genders, ages, dialects, and other variables. Audio recordings of Bangladeshi people of various genders, ages, and dialects were used to create a large speech dataset of spoken '0-9' Bangla digits in this study. Here, 400 noisy and noise-free samples per digit have been recorded for creating the dataset. Mel Frequency Cepstrum Coefficients (MFCCs) have been utilized for extracting meaningful features from the raw speech data. Then, to detect Bangla numeral digits, Convolutional Neural Networks (CNNs) were utilized. The suggested technique recognizes '0-9' Bangla spoken digits with 97.1% accuracy throughout the whole dataset. The efficiency of the model was also assessed using 10-fold crossvalidation, which yielded a 96.7% accuracy.
COMPUTERS
Beta News

Embedded analytics is the future of analytics

Digital transformations have taken over corporate America. Over the last few years, businesses of all sizes have discovered much of their success now relies upon the ability to quickly interpret incredible amounts of data. While the business intelligence space grows exponentially, traditional BI tools still struggle to keep pace with the need for quick, decisive interpretation of this information surge.
COMPUTERS
arxiv.org

Pre-trained Transformer-Based Approach for Arabic Question Answering : A Comparative Study

Question answering(QA) is one of the most challenging yet widely investigated problems in Natural Language Processing (NLP). Question-answering (QA) systems try to produce answers for given questions. These answers can be generated from unstructured or structured text. Hence, QA is considered an important research area that can be used in evaluating text understanding systems. A large volume of QA studies was devoted to the English language, investigating the most advanced techniques and achieving state-of-the-art results. However, research efforts in the Arabic question-answering progress at a considerably slower pace due to the scarcity of research efforts in Arabic QA and the lack of large benchmark datasets. Recently many pre-trained language models provided high performance in many Arabic NLP problems. In this work, we evaluate the state-of-the-art pre-trained transformers models for Arabic QA using four reading comprehension datasets which are Arabic-SQuAD, ARCD, AQAD, and TyDiQA-GoldP datasets. We fine-tuned and compared the performance of the AraBERTv2-base model, AraBERTv0.2-large model, and AraELECTRA model. In the last, we provide an analysis to understand and interpret the low-performance results obtained by some models.
SCIENCE
arxiv.org

Automated question generation and question answering from Turkish texts using text-to-text transformers

While exam-style questions are a fundamental educational tool serving a variety of purposes, manual construction of questions is a complex process that requires training, experience and resources. To reduce the expenses associated with the manual construction of questions and to satisfy the need for a continuous supply of new questions, automatic question generation (QG) techniques can be utilized. However, compared to automatic question answering (QA), QG is a more challenging task. In this work, we fine-tune a multilingual T5 (mT5) transformer in a multi-task setting for QA, QG and answer extraction tasks using a Turkish QA dataset. To the best of our knowledge, this is the first academic work that attempts to perform automated text-to-text question generation from Turkish texts. Evaluation results show that the proposed multi-task setting achieves state-of-the-art Turkish question answering and question generation performance over TQuADv1, TQuADv2 datasets and XQuAD Turkish split. The source code and pre-trained models are available at this https URL.
COMPUTERS
dataversity.net

How Transformer-Based Machine Learning Can Power Fintech Data Processing

Machine learning (ML) has enabled a whole host of innovations and new business models in fintech, driving breakthroughs in areas such as personalized wealth management, automated fraud detection, and real-time small business accounting tools. For a long time, one of the most significant challenges of machine learning has been the amount and quality of data that is required to train machine learning models. Recent developments of Transformer architectures, however, have started to change this equation.
SOFTWARE
arxiv.org

Transformer-based Image Compression

A Transformer-based Image Compression (TIC) approach is developed which reuses the canonical variational autoencoder (VAE) architecture with paired main and hyper encoder-decoders. Both main and hyper encoders are comprised of a sequence of neural transformation units (NTUs) to analyse and aggregate important information for more compact representation of input image, while the decoders mirror the encoder-side operations to generate pixel-domain image reconstruction from the compressed bitstream. Each NTU is consist of a Swin Transformer Block (STB) and a convolutional layer (Conv) to best embed both long-range and short-range information; In the meantime, a casual attention module (CAM) is devised for adaptive context modeling of latent features to utilize both hyper and autoregressive priors. The TIC rivals with state-of-the-art approaches including deep convolutional neural networks (CNNs) based learnt image coding (LIC) methods and handcrafted rules-based intra profile of recently-approved Versatile Video Coding (VVC) standard, and requires much less model parameters, e.g., up to 45% reduction to leading-performance LIC.
COMPUTERS
arxiv.org

Soft-Sensing ConFormer: A Curriculum Learning-based Convolutional Transformer

Over the last few decades, modern industrial processes have investigated several cost-effective methodologies to improve the productivity and yield of semiconductor manufacturing. While playing an essential role in facilitating real-time monitoring and control, the data-driven soft-sensors in industries have provided a competitive edge when augmented with deep learning approaches for wafer fault-diagnostics. Despite the success of deep learning methods across various domains, they tend to suffer from bad performance on multi-variate soft-sensing data domains. To mitigate this, we propose a soft-sensing ConFormer (CONvolutional transFORMER) for wafer fault-diagnostic classification task which primarily consists of multi-head convolution modules that reap the benefits of fast and light-weight operations of convolutions, and also the ability to learn the robust representations through multi-head design alike transformers. Another key issue is that traditional learning paradigms tend to suffer from low performance on noisy and highly-imbalanced soft-sensing data. To address this, we augment our soft-sensing ConFormer model with a curriculum learning-based loss function, which effectively learns easy samples in the early phase of training and difficult ones later. To further demonstrate the utility of our proposed architecture, we performed extensive experiments on various toolsets of Seagate Technology's wafer manufacturing process which are shared openly along with this work. To the best of our knowledge, this is the first time that curriculum learning-based soft-sensing ConFormer architecture has been proposed for soft-sensing data and our results show strong promise for future use in soft-sensing research domain.
ENGINEERING
TechRepublic

Linux users: These text-based file managers are overlooked gems

Terminal-based file managers may seem a little old school, but these two options for Linux will come in handy for a variety of tasks. Terminal-based file managers may seem like relics of ancient times, but even in this age of touchscreens, nothing can handle hundreds of files more efficiently. Besides, a terminal may still be your only option to work on remote servers or recover your files after a system crash.
COMPUTERS
arxiv.org

Reference-based Magnetic Resonance Image Reconstruction Using Texture Transforme

Deep Learning (DL) based methods for magnetic resonance (MR) image reconstruction have been shown to produce superior performance in recent years. However, these methods either only leverage under-sampled data or require a paired fully-sampled auxiliary modality to perform multi-modal reconstruction. Consequently, existing approaches neglect to explore attention mechanisms that can transfer textures from reference fully-sampled data to under-sampled data within a single modality, which limits these approaches in challenging cases. In this paper, we propose a novel Texture Transformer Module (TTM) for accelerated MRI reconstruction, in which we formulate the under-sampled data and reference data as queries and keys in a transformer. The TTM facilitates joint feature learning across under-sampled and reference data, so the feature correspondences can be discovered by attention and accurate texture features can be leveraged during reconstruction. Notably, the proposed TTM can be stacked on prior MRI reconstruction approaches to further improve their performance. Extensive experiments show that TTM can significantly improve the performance of several popular DL-based MRI reconstruction methods.
SCIENCE
arxiv.org

Transformation of Node to Knowledge Graph Embeddings for Faster Link Prediction in Social Networks

Recent advances in neural networks have solved common graph problems such as link prediction, node classification, node clustering, node recommendation by developing embeddings of entities and relations into vector spaces. Graph embeddings encode the structural information present in a graph. The encoded embeddings then can be used to predict the missing links in a graph. However, obtaining the optimal embeddings for a graph can be a computationally challenging task specially in an embedded system. Two techniques which we focus on in this work are 1) node embeddings from random walk based methods and 2) knowledge graph embeddings. Random walk based embeddings are computationally inexpensive to obtain but are sub-optimal whereas knowledge graph embeddings perform better but are computationally expensive. In this work, we investigate a transformation model which converts node embeddings obtained from random walk based methods to embeddings obtained from knowledge graph methods directly without an increase in the computational cost. Extensive experimentation shows that the proposed transformation model can be used for solving link prediction in real-time.
COMPUTERS
arxiv.org

A transformer-based model for default prediction in mid-cap corporate markets

In this paper, we study mid-cap companies, i.e. publicly traded companies with less than US $10 billion in market capitalisation. Using a large dataset of US mid-cap companies observed over 30 years, we look to predict the default probability term structure over the medium term and understand which data sources (i.e. fundamental, market or pricing data) contribute most to the default risk. Whereas existing methods typically require that data from different time periods are first aggregated and turned into cross-sectional features, we frame the problem as a multi-label time-series classification problem. We adapt transformer models, a state-of-the-art deep learning model emanating from the natural language processing domain, to the credit risk modelling setting. We also interpret the predictions of these models using attention heat maps. To optimise the model further, we present a custom loss function for multi-label classification and a novel multi-channel architecture with differential training that gives the model the ability to use all input data efficiently. Our results show the proposed deep learning architecture's superior performance, resulting in a 13% improvement in AUC (Area Under the receiver operating characteristic Curve) over traditional models. We also demonstrate how to produce an importance ranking for the different data sources and the temporal relationships using a Shapley approach specific to these models.
MARKETS

