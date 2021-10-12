CreatorsPublishersAdvertisers
View more in
Computers

Rescoring Sequence-to-Sequence Models for Text Line Recognition with CTC-Prefixes

By Christoph Wick, Jochen Zöllner, Tobias Grüning
arxiv.org
 10 days ago

In contrast to Connectionist Temporal Classification (CTC) approaches, Sequence-To-Sequence (S2S) models for Handwritten Text Recognition (HTR) suffer from errors such as skipped or repeated words which often occur at the end of a sequence. In this paper, to combine the best of

arxiv.org

Comments / 0

Related
Business Insider

Facebook is working on AI tech that will monitor your every move

Facebook envisions a future where smartglasses "become as useful in everyday life as smartphones," the company said in a new blog post. In order to achieve that future, such devices will require powerful AI software that can read and respond to the world around the headset's user. And the only way to train AI to see and hear the world like humans do is for it to experience the world like we do: from a first-person perspective.
INTERNET
psychologytoday.com

Word Sequence Puzzles as Experiments in Associative Thinking

Word sequence puzzles constitute fascinating and fun experiments in associative thinking. The associative system in the brain assigns meaning to information by connecting it to previous knowledge and experiences. Association is seen as the process guiding metaphor, analogical constructs, and memory. Word sequence puzzles constitute fascinating (and fun) experiments in...
ARISTOTLE
ScienceAlert

A Physicist Quantified The Amount of Information in The Entire Observable Universe

In attempts to understand the very nature of our reality, physicists sure have some mind-bending theories. Like what if information is a tangible and fundamental aspect of physical reality itself – alongside matter and energy? Or, alternatively, what if information is the fifth state of matter? Information is, after all, something all matter and energy measurably possess. The rules that govern their existence, like their mass, speed, or charge, are all bits of information they contain. So to allow experimental probing of such ideas, physicist Melvin Vopson from the University of Portsmouth in the UK estimated how much information a single elementary...
ASTRONOMY
arxiv.org

Relationship between low-discrepancy sequence and static solution to multi-bodies problem

The main interest of this paper is to study the relationship between the low-discrepancy sequence and the static solution to the multi-bodies problem in high-dimensional space. An assumption that the static solution to the multi-bodies problem is a low-discrepancy sequence is proposed. Considering the static solution to the multi-bodies problem corresponds to the minimum potential energy principle, we further assume that the distribution of the bodies is the most uniform when the potential energy is the smallest. To verify the proposed assumptions, a dynamical evolutionary model (DEM) based on the minimum potential energy is established to find out the static solution. The central difference algorithm is adopted to solve the DEM and an evolutionary iterative scheme is developed. The selection of the mass and the damping coefficient to ensure the convergence of the evolutionary iteration is discussed in detail. Based on the DEM, the relationship between the potential energy and the discrepancy during the evolutionary iteration process is studied. It is found that there is a significant positive correlation between them, which confirms the proposed assumptions. We also combine the DEM with the restarting technique to generate a series of low-discrepancy sequences. These sequences are unbiased and perform better than other low-discrepancy sequences in terms of the discrepancy, the potential energy, integrating eight test functions and computing the statistical moments for two practical stochastic problems. Numerical examples also show that the DEM can not only generate uniformly distributed sequences in cubes, but also in non-cubes.
MATHEMATICS
IN THIS ARTICLE
#Prefix#Ctc#Ctc Prefixes#Htr#Ctc Prefix Score#Cnn#Iam
arxiv.org

Planning Sensing Sequences for Subsurface 3D Tumor Mapping

Surgical automation has the potential to enable increased precision and reduce the per-patient workload of overburdened human surgeons. An effective automation system must be able to sense and map subsurface anatomy, such as tumors, efficiently and accurately. In this work, we present a method that plans a sequence of sensing actions to map the 3D geometry of subsurface tumors. We leverage a sequential Bayesian Hilbert map to create a 3D probabilistic occupancy model that represents the likelihood that any given point in the anatomy is occupied by a tumor, conditioned on sensor readings. We iteratively update the map, utilizing Bayesian optimization to determine sensing poses that explore unsensed regions of anatomy and exploit the knowledge gained by previous sensing actions. We demonstrate our method's efficiency and accuracy in three anatomical scenarios including a liver tumor scenario generated from a real patient's CT scan. The results show that our proposed method significantly outperforms comparison methods in terms of efficiency while detecting subsurface tumors with high accuracy.
HEALTH
golfmonthly.com

Golf Downswing Sequence Explained

For the majority of golfers, taking to the course with one or more swing thoughts is a recipe for disaster. This is especially true in the downswing. Yet, it’s often approaching impact where technical thoughts will creep in. And not only will that result in tension and a loss of accuracy, but it can also cause the natural flow of the swing that is so crucial for generating power to be lost.
GOLF
arxiv.org

Permutation Designs and Sequencing Highly Transitive Group Actions

We consider an experimental design problem for permutations: given a fixed set $X$, and an integer $t$, construct a list $L$ of permutations of $X$ such that every ordered $t$-tuple of distinct elements of $X$ occurs as a consecutive subsequence of exactly one permutation in $L$. In this paper we focus on solutions based on sharply transitive group actions, in effect generalizing Gordon's notion of group sequencing. We give an explicit construction when $|X|$ is prime for the case $t=3$, and analyze a branching algorithm for the general case which produces, for example, a rare design with $t=6$ based on the Mathieu group $M_{12}$, and suggests that every sharply transitive group action leads to a solution, apart from an explicit list of counterexamples. We state a number of conjectures and indicate directions for future work.
MATHEMATICS
arxiv.org

A New Census of the 0.2< z <3.0 Universe, Part II: The Star-Forming Sequence

Joel Leja, Joshua S. Speagle, Yuan-Sen Ting, Benjamin D. Johnson, Charlie Conroy, Katherine E. Whitaker, Erica J. Nelson, Pieter van Dokkum, Marijn Franx. We use the panchromatic SED-fitting code Prospector to measure the galaxy logM$^*$-logSFR relationship (the `star-forming sequence') across $0.2 < z < 3.0$ using the COSMOS-2015 and 3D-HST UV-IR photometric catalogs. We demonstrate that the chosen method of identifying star-forming galaxies introduces a systematic uncertainty in the inferred normalization and width of the star-forming sequence, peaking for massive galaxies at $\sim 0.5$ dex and $\sim0.2$ dex respectively. To avoid this systematic, we instead parameterize the density of the full galaxy population in the logM$^*$-logSFR-redshift plane using a flexible neural network known as a normalizing flow. The resulting star-forming sequence has a low-mass slope near unity and a much flatter slope at higher masses, with a normalization $0.2-0.5$ dex lower than typical inferences in the literature. We show this difference is due to the sophistication of the Prospector stellar populations modeling: the nonparametric star formation histories naturally produce higher masses while the combination of individualized metallicity, dust, and star formation history constraints produce lower star formation rates than typical UV+IR formulae. We introduce a simple formalism to understand the difference between SFRs inferred from spectral energy distribution fitting and standard template-based approaches such as UV+IR SFRs. Finally, we demonstrate the inferred star-forming sequence is consistent with predictions from theoretical models of galaxy formation, resolving a long-standing $\sim0.2-0.5$ dex offset with observations at $0.5<z<3$. The fully trained normalizing flow including a nonparametric description of $\rho(\log{\rm M}^*,\log{\rm SFR},z)$ is made available online to facilitate straightforward comparisons with future work.
ASTRONOMY
YOU MAY ALSO LIKE
NewsBreak
Technology
NewsBreak
Computers
NewsBreak
Science
NewsBreak
Computer Science
arxiv.org

CasSeqGCN: Combining Network Structure and Temporal Sequence to Predict Information Cascades

One important task in the study of information cascade is to predict the future recipients of a message given its past spreading trajectory. While the network structure serves as the backbone of the spreading, an accurate prediction can hardly be made without the knowledge of the dynamics on the network. The temporal information in the spreading sequence captures many hidden features, but predictions based on sequence alone have their limitations. Recent efforts start to explore the possibility of combining both the network structure and the temporal feature for a more accurate prediction. Nevertheless, it is still a challenge to efficiently and optimally associate these two interdependent factors. Here, we propose a new end-to-end prediction method CasSeqGCN in which the structure and temporal feature are simultaneously taken into account. A cascade is divided into multiple snapshots which record the network topology and the state of nodes. The graph convolutional network (GCN) is used to learn the representation of a snapshot. The dynamic routing and the long short-term memory (LSTM) model are used to aggregate node representation and extract temporal information. CasSeqGCN predicts the future cascade size more accurately compared with other state-of-art baseline methods. The ablation study demonstrates that the improvement mainly comes from the design of the input and the GCN layer. Taken together, our method confirms the benefit of combining the structural and temporal features in cascade prediction, which not only brings new insights but can also serve as a useful baseline method for future studies.
SCIENCE
Synthtopia

Conductive Labs NDLR Generative Sequencer & Arpeggiator In-Depth Review

In this video, synthesist Karl Clarke takes an in-depth look at the Conductive Labs NDLR, a polyphonic generative sequencer and arpeggiator. The NDLR is a unique device, tailored to controlling multiple synths via MIDI. It lets you select or play a chord and translate that input into four unique channels of MIDI output – drone, pad and two arpeggiators.
ELECTRONICS
arxiv.org

Plug-Tagger: A Pluggable Sequence Labeling Framework Using Language Models

Plug-and-play functionality allows deep learning models to adapt well to different tasks without requiring any parameters modified. Recently, prefix-tuning was shown to be a plug-and-play method on various text generation tasks by simply inserting corresponding continuous vectors into the inputs. However, sequence labeling tasks invalidate existing plug-and-play methods since different label sets demand changes to the architecture of the model classifier. In this work, we propose the use of label word prediction instead of classification to totally reuse the architecture of pre-trained models for sequence labeling tasks. Specifically, for each task, a label word set is first constructed by selecting a high-frequency word for each class respectively, and then, task-specific vectors are inserted into the inputs and optimized to manipulate the model predictions towards the corresponding label words. As a result, by simply switching the plugin vectors on the input, a frozen pre-trained language model is allowed to perform different tasks. Experimental results on three sequence labeling tasks show that the performance of the proposed method can achieve comparable performance with standard fine-tuning with only 0.1\% task-specific parameters. In addition, our method is up to 70 times faster than non-plug-and-play methods while switching different tasks under the resource-constrained scenario.
CODING & PROGRAMMING
arxiv.org

Training Dynamics for Text Summarization Models

Pre-trained language models (e.g. BART) have shown impressive results when fine-tuned on large summarization datasets. However, little is understood about this fine-tuning process, including what knowledge is retained from pre-training models or how content selection and generation strategies are learnt across iterations. In this work, we analyze the training dynamics for generation models, focusing on news summarization. Across different datasets (CNN/DM, XSum, MediaSum) and summary properties, such as abstractiveness and hallucination, we study what the model learns at different stages of its fine-tuning process. We find that properties such as copy behavior are learnt earlier in the training process and these observations are robust across domains. On the other hand, factual errors, such as hallucination of unsupported facts, are learnt in the later stages, and this behavior is more varied across domains. Based on these observations, we explore complementary approaches for modifying training: first, disregarding high-loss tokens that are challenging to learn and second, disregarding low-loss tokens that are learnt very quickly. This simple training modification allows us to configure our model to achieve different goals, such as improving factuality or improving abstractiveness.
COMPUTERS
arxiv.org

Minimax-robust estimation problems for sequences with periodically stationary increments observed with noise

The problem of optimal estimation of linear functionals constructed from the unobserved values of a stochastic sequence with periodically stationary increments based on observations of the sequence with stationary noise is considered. For sequences with known spectral densities, we obtain formulas for calculating values of the mean square errors and the spectral characteristics of the optimal estimates of the functionals. Formulas that determine the least favorable spectral densities and the minimax-robust spectral characteristics of the optimal linear estimates of functionals are proposed in the case where spectral densities of the sequence are not exactly known while some sets of admissible spectral densities are specified.
MATHEMATICS
arxiv.org

Proactive Mobility Management of UEs using Sequence-to-Sequence Modeling

Beyond 5G networks will operate at high frequencies with wide bandwidths. This brings both opportunities and challenges. Opportunities include high throughput connectivity with low latency. However, one of the main challenges in these networks is due to the high path loss at operating frequencies, which requires network to be deployed densely to provide coverage. Since these cells have small inter-site-distance (ISD), the dwell-time of the UEs in these cells are small, thus supporting mobility in these types of dense networks is a challenge and require frequent beam or cell reassignments. A pro-active mobility management scheme which exploits the trajectory can provide better prediction of cells and beams as UEs move in the coverage area. We propose an AI based method using sequence-to-sequence modeling for the estimation of handover cells/beams along with dwell-time using the trajectory information of the UE. Results indicate that for a dense deployment an accuracy of more than 90 percent can be achieved for handover cell estimation with very low mean absolute error (MAE) for dwell-time.
COMPUTERS
arxiv.org

Crisis Domain Adaptation Using Sequence-to-sequence Transformers

User-generated content (UGC) on social media can act as a key source of information for emergency responders in crisis situations. However, due to the volume concerned, computational techniques are needed to effectively filter and prioritise this content as it arises during emerging events. In the literature, these techniques are trained using annotated content from previous crises. In this paper, we investigate how this prior knowledge can be best leveraged for new crises by examining the extent to which crisis events of a similar type are more suitable for adaptation to new events (cross-domain adaptation). Given the recent successes of transformers in various language processing tasks, we propose CAST: an approach for Crisis domain Adaptation leveraging Sequence-to-sequence Transformers. We evaluate CAST using two major crisis-related message classification datasets. Our experiments show that our CAST-based best run without using any target data achieves the state of the art performance in both in-domain and cross-domain contexts. Moreover, CAST is particularly effective in one-to-one cross-domain adaptation when trained with a larger language model. In many-to-one adaptation where multiple crises are jointly used as the source domain, CAST further improves its performance. In addition, we find that more similar events are more likely to bring better adaptation performance whereas fine-tuning using dissimilar events does not help for adaptation. To aid reproducibility, we open source our code to the community.
TECHNOLOGY
arxiv.org

Understanding Procedural Knowledge by Sequencing Multimodal Instructional Manuals

The ability to sequence unordered events is an essential skill to comprehend and reason about real world task procedures, which often requires thorough understanding of temporal common sense and multimodal information, as these procedures are often communicated through a combination of texts and images. Such capability is essential for applications such as sequential task planning and multi-source instruction summarization. While humans are capable of reasoning about and sequencing unordered multimodal procedural instructions, whether current machine learning models have such essential capability is still an open question. In this work, we benchmark models' capability of reasoning over and sequencing unordered multimodal instructions by curating datasets from popular online instructional manuals and collecting comprehensive human annotations. We find models not only perform significantly worse than humans but also seem incapable of efficiently utilizing the multimodal information. To improve machines' performance on multimodal event sequencing, we propose sequentiality-aware pretraining techniques that exploit the sequential alignment properties of both texts and images, resulting in > 5% significant improvements.
COMPUTERS
arxiv.org

Spatial-Temporal Transformer for 3D Point Cloud Sequences

Effective learning of spatial-temporal information within a point cloud sequence is highly important for many down-stream tasks such as 4D semantic segmentation and 3D action recognition. In this paper, we propose a novel framework named Point Spatial-Temporal Transformer (PST2) to learn spatial-temporal representations from dynamic 3D point cloud sequences. Our PST2 consists of two major modules: a Spatio-Temporal Self-Attention (STSA) module and a Resolution Embedding (RE) module. Our STSA module is introduced to capture the spatial-temporal context information across adjacent frames, while the RE module is proposed to aggregate features across neighbors to enhance the resolution of feature maps. We test the effectiveness our PST2 with two different tasks on point cloud sequences, i.e., 4D semantic segmentation and 3D action recognition. Extensive experiments on three benchmarks show that our PST2 outperforms existing methods on all datasets. The effectiveness of our STSA and RE modules have also been justified with ablation experiments.
CODING & PROGRAMMING
arxiv.org

BGaitR-Net: Occluded Gait Sequence reconstructionwith temporally constrained model for gait recognition

Recent advancements in computational resources and Deep Learning methodologies has significantly benefited development of intelligent vision-based surveillance applications. Gait recognition in the presence of occlusion is one of the challenging research topics in this area, and the solutions proposed by researchers to date lack in robustness and also dependent of several unrealistic constraints, which limits their practical applicability. We improve the state-of-the-art by developing novel deep learning-based algorithms to identify the occluded frames in an input sequence and next reconstruct these occluded frames by exploiting the spatio-temporal information present in the gait sequence. The multi-stage pipeline adopted in this work consists of key pose mapping, occlusion detection and reconstruction, and finally gait recognition. While the key pose mapping and occlusion detection phases are done %using Constrained KMeans Clustering and via a graph sorting algorithm, reconstruction of occluded frames is done by fusing the key pose-specific information derived in the previous step along with the spatio-temporal information contained in a gait sequence using a Bi-Directional Long Short Time Memory. This occlusion reconstruction model has been trained using synthetically occluded CASIA-B and OU-ISIR data, and the trained model is termed as Bidirectional Gait Reconstruction Network BGait-R-Net. Our LSTM-based model reconstructs occlusion and generates frames that are temporally consistent with the periodic pattern of a gait cycle, while simultaneously preserving the body structure.
COMPUTERS
TheConversationAU

Facebook wants AI to find your keys and understand your conversations

Facebook has announced a research project that aims to push the “frontier of first-person perception”, and in the process help you remember where your left your keys. The Ego4D project provides a huge collection of first-person video and related data, plus a set of challenges for researchers to teach computers to understand the data and gather useful information from it. In September, the social media giant launched a line of “smart glasses” called Ray-Ban Stories, which carry a digital camera and other features. Much like the Google Glass project, which met mixed reviews in 2013, this one has prompted complaints of...
SOFTWARE
arxiv.org

Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction

Synthesis planning and reaction outcome prediction are two fundamental problems in computer-aided organic chemistry for which a variety of data-driven approaches have emerged. Natural language approaches that model each problem as a SMILES-to-SMILES translation lead to a simple end-to-end formulation, reduce the need for data preprocessing, and enable the use of well-optimized machine translation model architectures. However, SMILES representations are not an efficient representation for capturing information about molecular structures, as evidenced by the success of SMILES augmentation to boost empirical performance. Here, we describe a novel Graph2SMILES model that combines the power of Transformer models for text generation with the permutation invariance of molecular graph encoders that mitigates the need for input data augmentation. As an end-to-end architecture, Graph2SMILES can be used as a drop-in replacement for the Transformer in any task involving molecule(s)-to-molecule(s) transformations. In our encoder, an attention-augmented directed message passing neural network (D-MPNN) captures local chemical environments, and the global attention encoder allows for long-range and intermolecular interactions, enhanced by graph-aware positional embedding. Graph2SMILES improves the top-1 accuracy of the Transformer baselines by $1.7\%$ and $1.9\%$ for reaction outcome prediction on USPTO_480k and USPTO_STEREO datasets respectively, and by $9.8\%$ for one-step retrosynthesis on the USPTO_50k dataset.
CHEMISTRY

Comments / 0

Community Policy