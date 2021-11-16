ContributorsPublishersAdvertisers

Meeting Summarization with Pre-training and Clustering Methods

By Andras Huebner, Wei Ji, Xiang Xiao
arxiv.org
 8 days ago

Automatic meeting summarization is becoming increasingly popular these days. The ability to automatically summarize meetings and to extract key information could greatly increase the efficiency of our work and life. In this paper, we experiment with different approaches...

arxiv.org

Comments / 0

Related
arxiv.org

Pre-trained Transformer-Based Approach for Arabic Question Answering : A Comparative Study

Question answering(QA) is one of the most challenging yet widely investigated problems in Natural Language Processing (NLP). Question-answering (QA) systems try to produce answers for given questions. These answers can be generated from unstructured or structured text. Hence, QA is considered an important research area that can be used in evaluating text understanding systems. A large volume of QA studies was devoted to the English language, investigating the most advanced techniques and achieving state-of-the-art results. However, research efforts in the Arabic question-answering progress at a considerably slower pace due to the scarcity of research efforts in Arabic QA and the lack of large benchmark datasets. Recently many pre-trained language models provided high performance in many Arabic NLP problems. In this work, we evaluate the state-of-the-art pre-trained transformers models for Arabic QA using four reading comprehension datasets which are Arabic-SQuAD, ARCD, AQAD, and TyDiQA-GoldP datasets. We fine-tuned and compared the performance of the AraBERTv2-base model, AraBERTv0.2-large model, and AraELECTRA model. In the last, we provide an analysis to understand and interpret the low-performance results obtained by some models.
SCIENCE
arxiv.org

DEEP: DEnoising Entity Pre-training for Neural Machine Translation

It has been shown that machine translation models usually generate poor translations for named entities that are infrequent in the training corpus. Earlier named entity translation methods mainly focus on phonetic transliteration, which ignores the sentence context for translation and is limited in domain and language coverage. To address this limitation, we propose DEEP, a DEnoising Entity Pre-training method that leverages large amounts of monolingual data and a knowledge base to improve named entity translation accuracy within sentences. Besides, we investigate a multi-task learning strategy that finetunes a pre-trained neural machine translation model on both entity-augmented monolingual data and parallel data to further improve entity translation. Experimental results on three language pairs demonstrate that \method results in significant improvements over strong denoising auto-encoding baselines, with a gain of up to 1.3 BLEU and up to 9.2 entity accuracy points for English-Russian translation.
CODING & PROGRAMMING
arxiv.org

SDCUP: Schema Dependency-Enhanced Curriculum Pre-Training for Table Semantic Parsing

Recently pre-training models have significantly improved the performance of various NLP tasks by leveraging large-scale text corpora to improve the contextual representation ability of the neural network. The large pre-training language model has also been applied in the area of table semantic parsing. However, existing pre-training approaches have not carefully explored explicit interaction relationships between a question and the corresponding database schema, which is a key ingredient for uncovering their semantic and structural correspondence. Furthermore, the question-aware representation learning in the schema grounding context has received less attention in pre-training this http URL alleviate these issues, this paper designs two novel pre-training objectives to impose the desired inductive bias into the learned representations for table pre-training. We further propose a schema-aware curriculum learning approach to mitigate the impact of noise and learn effectively from the pre-training data in an easy-to-hard manner. We evaluate our pre-trained framework by fine-tuning it on two benchmarks, Spider and SQUALL. The results demonstrate the effectiveness of our pre-training objective and curriculum compared to a variety of baselines.
EDUCATION
arxiv.org

Self-Supervised Pre-Training for Transformer-Based Person Re-Identification

Transformer-based supervised pre-training achieves great performance in person re-identification (ReID). However, due to the domain gap between ImageNet and ReID datasets, it usually needs a larger pre-training dataset (e.g. ImageNet-21K) to boost the performance because of the strong data fitting ability of the transformer. To address this challenge, this work targets to mitigate the gap between the pre-training and ReID datasets from the perspective of data and model structure, respectively. We first investigate self-supervised learning (SSL) methods with Vision Transformer (ViT) pretrained on unlabelled person images (the LUPerson dataset), and empirically find it significantly surpasses ImageNet supervised pre-training models on ReID tasks. To further reduce the domain gap and accelerate the pre-training, the Catastrophic Forgetting Score (CFS) is proposed to evaluate the gap between pre-training and fine-tuning data. Based on CFS, a subset is selected via sampling relevant data close to the down-stream ReID data and filtering irrelevant data from the pre-training dataset. For the model structure, a ReID-specific module named IBN-based convolution stem (ICS) is proposed to bridge the domain gap by learning more invariant features. Extensive experiments have been conducted to fine-tune the pre-training models under supervised learning, unsupervised domain adaptation (UDA), and unsupervised learning (USL) settings. We successfully downscale the LUPerson dataset to 50% with no performance degradation. Finally, we achieve state-of-the-art performance on Market-1501 and MSMT17. For example, our ViT-S/16 achieves 91.3%/89.9%/89.6% mAP accuracy on Market1501 for supervised/UDA/USL ReID. Codes and models will be released to this https URL.
SOFTWARE
IN THIS ARTICLE
#Clustering#Qmsum#Bart
arxiv.org

GN-Transformer: Fusing Sequence and Graph Representation for Improved Code Summarization

As opposed to natural languages, source code understanding is influenced by grammatical relationships between tokens regardless of their identifier name. Graph representations of source code such as Abstract Syntax Tree (AST) can capture relationships between tokens that are not obvious from the source code. We propose a novel method, GN-Transformer to learn end-to-end on a fused sequence and graph modality we call Syntax-Code-Graph (SCG). GN-Transformer expands on Graph Networks (GN) framework using a self-attention mechanism. SCG is the result of the early fusion between a source code snippet and the AST representation. We perform experiments on the structure of SCG, an ablation study on the model design, and the hyper-parameters to conclude that the performance advantage is from the fused representation. The proposed methods achieve state-of-the-art performance in two code summarization datasets and across three automatic code summarization metrics (BLEU, METEOR, ROUGE-L). We further evaluate the human perceived quality of our model and previous work with an expert-user study. Our model outperforms the state-of-the-art in human perceived quality and accuracy.
CODING & PROGRAMMING
arxiv.org

Clustering based method for finding spikes in insect neurons

Spikes can be easily detected inmostintracellular recordings as sharp peaks. However, insome experimental preparations,because of unipolar morphology or other characteristicsof the recorded neurons, the sizes of the spikes recorded from the soma can be much smaller. The experimental settings and the quality of the recording can also affect the observed amplitudes of the spikes. Whole-cell patch-clamp recordings from the somata of projection neurons of the antennal lobe in Drosophila or mosquitoes can show spikes with amplitudes as small as 2 mV. Moreover, the observed spikes often ride on relatively large depolarizations, which makes it difficult for the standard thresholding-based approaches to distinguish them from noise or sharp EPSPs present in the signal. For spike detection in such neuronal recordings, we propose a clustering-based algorithm that separates peaks corresponding to action potentials from those corresponding to noise. Candidate peaks, including many noise peaks, are first selected according to their sharpness, and then a feature vector is extracted for each peak. The 3-dimensional feature vector contains the absolute value of the peak voltage, height of the spike, and the magnitude of the second derivative minima attained during the spike. In most recordings, this 3D space reveals two natural clusters, separating the noise peaks from the true action potentials. Some parameters of the algorithm can be optionally altered by the user to improve detection, which comes handy in the few recordings where the default parameters do not work well. In summary, the algorithm facilitates accurate spike detection to enable the interpretation and analysis of patch-clamp data from neuronal recordings in invertebrates. The algorithm is implemented as an freely available open-source tool.
WILDLIFE
arxiv.org

Pre-training Graph Neural Network for Cross Domain Recommendation

A recommender system predicts users' potential interests in items, where the core is to learn user/item embeddings. Nevertheless, it suffers from the data-sparsity issue, which the cross-domain recommendation can alleviate. However, most prior works either jointly learn the source domain and target domain models, or require side-features. However, jointly training and side features would affect the prediction on the target domain as the learned embedding is dominated by the source domain containing bias information. Inspired by the contemporary arts in pre-training from graph representation learning, we propose a pre-training and fine-tuning diagram for cross-domain recommendation. We devise a novel Pre-training Graph Neural Network for Cross-Domain Recommendation (PCRec), which adopts the contrastive self-supervised pre-training of a graph encoder. Then, we transfer the pre-trained graph encoder to initialize the node embeddings on the target domain, which benefits the fine-tuning of the single domain recommender system on the target domain. The experimental results demonstrate the superiority of PCRec. Detailed analyses verify the superiority of PCRec in transferring information while avoiding biases from source domains.
COMPUTERS
arxiv.org

Prune Once for All: Sparse Pre-Trained Language Models

Transformer-based language models are applied to a wide range of applications in natural language processing. However, they are inefficient and difficult to deploy. In recent years, many compression algorithms have been proposed to increase the implementation efficiency of large Transformer-based models on target hardware. In this work we present a new method for training sparse pre-trained Transformer language models by integrating weight pruning and model distillation. These sparse pre-trained models can be used to transfer learning for a wide range of tasks while maintaining their sparsity pattern. We demonstrate our method with three known architectures to create sparse pre-trained BERT-Base, BERT-Large and DistilBERT. We show how the compressed sparse pre-trained models we trained transfer their knowledge to five different downstream natural language tasks with minimal accuracy loss. Moreover, we show how to further compress the sparse models' weights to 8bit precision using quantization-aware training. For example, with our sparse pre-trained BERT-Large fine-tuned on SQuADv1.1 and quantized to 8bit we achieve a compression ratio of $40$X for the encoder with less than $1\%$ accuracy loss. To the best of our knowledge, our results show the best compression-to-accuracy ratio for BERT-Base, BERT-Large, and DistilBERT.
COMPUTERS
arxiv.org

On the validation of pansharpening methods

Validation of the quality of pansharpening methods is a difficult task because the reference is not directly available. In the meantime, two main approaches have been established: validation in reduced resolution and original resolution. In the former approach it is still not clear how the data are to be processed to a lower resolution. Other open issues are related to the question which resolution and measures should be used. In the latter approach the main problem is how the appropriate measure should be selected. In the most comparison studies the results of both approaches do not correspond, that means in each case other methods are selected as the best ones. Thus, the developers of the new pansharpening methods still stand in the front of dilemma: how to perform a correct or appropriate comparison/evaluation/validation. It should be noted, that the third approach is possible, that is to perform the comparison of methods in a particular application with the usage of their ground truth. But this is not always possible, because usually developers are not working with applications. Moreover, it can be an additional computational load for a researcher in a particular application. In this paper some of the questions/problems raised above are approached/discussed. The following component substitution (CS) and high pass filtering (HPF) pansharpening methods with additive and multiplicative models and their enhancements such as haze correction, histogram matching, usage of spectral response functions (SRF), modulation transfer function (MTF) based lowpass filtering are investigated on remote sensing data of WorldView-2 and WorldView-4 sensors.
SCIENCE
ScienceAlert

Researchers Are Figuring Out Why Some People Can 'Hear' The Voices of The Dead

Scientists have identified the traits that may make a person more likely to claim they hear the voices of the dead. According to research published earlier this year, a predisposition to high levels of absorption in tasks, unusual auditory experiences in childhood, and a high susceptibility to auditory hallucinations all occur more strongly in self-described clairaudient mediums than the general population. The finding could help us to better understand the upsetting auditory hallucinations that accompany mental illnesses such as schizophrenia, the researchers say. The Spiritualist experiences of clairvoyance and clairaudience – the experience of seeing or hearing something in the absence of an...
SCIENCE
ZDNet

Over half of millennials are responsible for executing their parents' wills, but hardly any have access to their parents' online passwords

As COVID-19 spread, many American millennials finally began their estate planning. Yet, many of them do not have the correct digital information if their parents pass on, according to new research from Toronto -- Canada-based security and privacy company 1Password. In partnership with digital estate planning companies Trust & Will...
INTERNET
towardsdatascience.com

Fuzzy C-Means Clustering with Python

Fuzzy C-means clustering algorithm is an unsupervised learning method. Before learning the details, let me first decipher its fancy name. So, “fuzzy” here means “not sure”, which indicates that it’s a soft clustering method. “C-means” means c cluster centers, which only replaces the “K” in “K-means” with a “C” to make it look different.
CODING & PROGRAMMING
arxiv.org

Matryoshka and Disjoint Cluster Synchronization of Networks

The main motivation for this paper is to present a definition of network synchronizability for the case of cluster synchronization, similar to the definition introduced by Barahona and Pecora for the case of complete synchronization. We find this problem to be substantially more complex than the original one. We distinguish between the cases that the master stability function is negative in either a bounded or an unbounded range of its argument. For CS, each cluster may be stable independent of the others, which indicates that the range of a given parameter that synchronizes the cluster may be different for different clusters (isolated CS.) For each pair of clusters, we distinguish between three different cases: Matryoshka Cluster Synchronization (when the range of stability for one cluster is included in that of the other cluster), Partially Disjoint Cluster Synchronization (when the ranges of stability partially overlap), and Complete Disjoint Cluster Synchronization (when the ranges of stability do not overlap.) Among these cases, only the case of Matryoshka synchronization had been previously reported. However, a study of several real networks from the literature shows that Partially Disjoint Cluster Synchronization is prevalent in these networks.
COMPUTERS
atsu.edu

Sage Research Methods for writing

Sage Research Methods is an online platform containing numerous eBooks, videos, encyclopedias, and other useful tools on the entire research life cycle. The Methods Map introduces people to research terms, shows how terms are related, provides definitions of key concepts, and allows you to discover content relevant to your research methods.
KIRKSVILLE, MO
arxiv.org

Attention-based Multi-hypothesis Fusion for Speech Summarization

Speech summarization, which generates a text summary from speech, can be achieved by combining automatic speech recognition (ASR) and text summarization (TS). With this cascade approach, we can exploit state-of-the-art models and large training datasets for both subtasks, i.e., Transformer for ASR and Bidirectional Encoder Representations from Transformers (BERT) for TS. However, ASR errors directly affect the quality of the output summary in the cascade approach. We propose a cascade speech summarization model that is robust to ASR errors and that exploits multiple hypotheses generated by ASR to attenuate the effect of ASR errors on the summary. We investigate several schemes to combine ASR hypotheses. First, we propose using the sum of sub-word embedding vectors weighted by their posterior values provided by an ASR system as an input to a BERT-based TS system. Then, we introduce a more general scheme that uses an attention-based fusion module added to a pre-trained BERT module to align and combine several ASR hypotheses. Finally, we perform speech summarization experiments on the How2 dataset and a newly assembled TED-based dataset that we will release with this paper. These experiments show that retraining the BERT-based TS system with these schemes can improve summarization performance and that the attention-based fusion module is particularly effective.
arxiv.org

Hierarchical Clustering: New Bounds and Objective

Hierarchical Clustering has been studied and used extensively as a method for analysis of data. More recently, Dasgupta [2016] defined a precise objective function. Given a set of $n$ data points with a weight function $w_{i,j}$ for each two items $i$ and $j$ denoting their similarity/dis-similarity, the goal is to build a recursive (tree like) partitioning of the data points (items) into successively smaller clusters. He defined a cost function for a tree $T$ to be $Cost(T) = \sum_{i,j \in [n]} \big(w_{i,j} \times |T_{i,j}| \big)$ where $T_{i,j}$ is the subtree rooted at the least common ancestor of $i$ and $j$ and presented the first approximation algorithm for such clustering. Then Moseley and Wang [2017] considered the dual of Dasgupta's objective function for similarity-based weights and showed that both random partitioning and average linkage have approximation ratio $1/3$ which has been improved in a series of works to $0.585$ [Alon et al. 2020]. Later Cohen-Addad et al. [2019] considered the same objective function as Dasgupta's but for dissimilarity-based metrics, called $Rev(T)$. It is shown that both random partitioning and average linkage have ratio $2/3$ which has been only slightly improved to $0.667078$ [Charikar et al. SODA2020]. Our first main result is to consider $Rev(T)$ and present a more delicate algorithm and careful analysis that achieves approximation $0.71604$. We also introduce a new objective function for dissimilarity-based clustering. For any tree $T$, let $H_{i,j}$ be the number of $i$ and $j$'s common ancestors. Intuitively, items that are similar are expected to remain within the same cluster as deep as possible. So, for dissimilarity-based metrics, we suggest the cost of each tree $T$, which we want to minimize, to be $Cost_H(T) = \sum_{i,j \in [n]} \big(w_{i,j} \times H_{i,j} \big)$. We present a $1.3977$-approximation for this objective.
MARKETS
golfmonthly.com

Wet lie chipping method

In this video, GM Top Coach Andrew Jones offers some excellent advice on how to chip from muddy lies. His wet lie chipping method is a must for every golfer!. Facing a chip from a muddy or wet lie is a scenario no golfer enjoys. The margin for error on the strike is much smaller than usual and using your normal technique leaves this shot wrought with danger. That's why you need to employ a wet lie chipping method.
GOLF
TechRepublic

Gain comprehensive JavaScript training for an extra 15% off in this pre-Black Friday sale

Build your coding expertise from the ground up with 121 hours of content on JavaScript, from the basics to building your own project. When it comes to programming languages, aspiring coders seem to always want to pick up the new ones. But there's no point in learning a fancy new language if you don't have a handle on the basics. If you're a newbie coder, it would benefit your career to learn languages that have undergone significant development, are widely used and are beginner-friendly. If you don't know where to begin, an excellent place to start is JavaScript.
COMPUTERS

Comments / 0

Community Policy