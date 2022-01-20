ContributorsPublishersAdvertisers
AstBERT: Enabling Language Model for Code Understanding with Abstract Syntax Tree

By Rong Liang, Yujie Lu, Zhen Huang, Tiehua Zhang, Yuze Liu
arxiv.org
 4 days ago

Using a pre-trained language model (i.e. BERT) to apprehend source codes has attracted increasing attention in the natural language processing community. However, there are several challenges when it comes to applying these language models to solve programming language...

arxiv.org

towardsdatascience.com

GPT-3, RNNs and All That: A Deep Dive into Language Modelling

As I’ve been working on Chai I’ve been exposed to large language models (LLMs), something I didn’t really know anything about previously. In this article I’ll summarise everything I have since learned on the subject. We’ll go from the very simple (what researchers were doing 40-ish years ago) to the state of the art, staying at a big picture level. The idea is not to get the details of the math right, but rather to be able to give a good “by and large” explanation of what is going on under the hood in these language models.
COMPUTERS
arxiv.org

A Study on Mitigating Hard Boundaries of Decision-Tree-based Uncertainty Estimates for AI Models

Outcomes of data-driven AI models cannot be assumed to be always correct. To estimate the uncertainty in these outcomes, the uncertainty wrapper framework has been proposed, which considers uncertainties related to model fit, input quality, and scope compliance. Uncertainty wrappers use a decision tree approach to cluster input quality related uncertainties, assigning inputs strictly to distinct uncertainty clusters. Hence, a slight variation in only one feature may lead to a cluster assignment with a significantly different uncertainty. Our objective is to replace this with an approach that mitigates hard decision boundaries of these assignments while preserving interpretability, runtime complexity, and prediction performance. Five approaches were selected as candidates and integrated into the uncertainty wrapper framework. For the evaluation based on the Brier score, datasets for a pedestrian detection use case were generated using the CARLA simulator and YOLOv3. All integrated approaches achieved a softening, i.e., smoothing, of uncertainty estimation. Yet, compared to decision trees, they are not so easy to interpret and have higher runtime complexity. Moreover, some components of the Brier score impaired while others improved. Most promising regarding the Brier score were random forests. In conclusion, softening hard decision tree boundaries appears to be a trade-off decision.
COMPUTERS
arxiv.org

Sentiment Analysis with Deep Learning Models: A Comparative Study on a Decade of Sinhala Language Facebook Data

The relationship between Facebook posts and the corresponding reaction feature is an interesting subject to explore and understand. To archive this end, we test state-of-the-art Sinhala sentiment analysis models against a data set containing a decade worth of Sinhala posts with millions of reactions. For the purpose of establishing benchmarks and with the goal of identifying the best model for Sinhala sentiment analysis, we also test, on the same data set configuration, other deep learning models catered for sentiment analysis. In this study we report that the 3 layer Bidirectional LSTM model achieves an F1 score of 84.58% for Sinhala sentiment analysis, surpassing the current state-of-the-art model; Capsule B, which only manages to get an F1 score of 82.04%. Further, since all the deep learning models show F1 scores above 75% we conclude that it is safe to claim that Facebook reactions are suitable to predict the sentiment of a text.
SOFTWARE
IN THIS ARTICLE
#Programming Language#Language Model#Bert#Astbert#Java#Github#Machine Learning#Lg
arxiv.org

Assemble Foundation Models for Automatic Code Summarization

Automatic code summarization is beneficial to software development and maintenance since it reduces the burden of manual tasks. Currently, artificial intelligence is undergoing a paradigm shift. The foundation models pretrained on massive data and finetuned to downstream tasks surpass specially customized models. This trend inspired us to consider reusing foundation models instead of learning from scratch. Based on this, we propose a flexible and robust approach for automatic code summarization based on neural networks. We assemble available foundation models, such as CodeBERT and GPT-2, into a single model named AdaMo. Moreover, we utilize Gaussian noise as the simulation of contextual information to optimize the latent representation. Furthermore, we introduce two adaptive schemes from the perspective of knowledge transfer, namely continuous pretraining and intermediate finetuning, and design intermediate stage tasks for general sequence-to-sequence learning. Finally, we evaluate AdaMo against a benchmark dataset for code summarization, by comparing it with state-of-the-art models.
CODING & PROGRAMMING
arxiv.org

Eliciting Knowledge from Pretrained Language Models for Prototypical Prompt Verbalizer

Recent advances on prompt-tuning cast few-shot classification tasks as a masked language modeling problem. By wrapping input into a template and using a verbalizer which constructs a mapping between label space and label word space, prompt-tuning can achieve excellent results in zero-shot and few-shot scenarios. However, typical prompt-tuning needs a manually designed verbalizer which requires domain expertise and human efforts. And the insufficient label space may introduce considerable bias into the results. In this paper, we focus on eliciting knowledge from pretrained language models and propose a prototypical prompt verbalizer for prompt-tuning. Labels are represented by prototypical embeddings in the feature space rather than by discrete words. The distances between the embedding at the masked position of input and prototypical embeddings are used as classification criterion. For zero-shot settings, knowledge is elicited from pretrained language models by a manually designed template to form initial prototypical embeddings. For few-shot settings, models are tuned to learn meaningful and interpretable prototypical embeddings. Our method optimizes models by contrastive learning. Extensive experimental results on several many-class text classification datasets with low-resource settings demonstrate the effectiveness of our approach compared with other verbalizer construction methods. Our implementation is available at this https URL.
COMPUTERS
towardsdatascience.com

LM!=KM: The Five Reasons Why Language Models Fall Short of Supporting Knowledge Model Requirements of Next-Gen AI

Large language models (LMs) have demonstrated that they can serve as relatively good knowledge models (KMs). But do they excel at performing all the functions required for enabling truly intelligent, cognitive AI systems? The answer, I believe, is no. This post will discuss the five capabilities that make up an advanced KM and how these areas cannot be easily addressed by LMs in their present form. These capabilities are Scalability, Fidelity, Adaptability, Richness and Explainability.
COMPUTERS
arxiv.org

Does Interaction Help Users Better Understand the Structure of Probabilistic Models?

Evdoxia Taka (1), Sebastian Stein (1), John H. Williamson (1) ((1) School of Computing Science, University of Glasgow, Scotland, United Kingdom) Probabilistic modelling needs specialized tools to support modelers, decision-makers or researchers in the design, checking, refinement and communication of models. Users' comprehension of probabilistic models is vital in all above cases and interactive visualisations could enhance it. Although there are various studies evaluating interactivity in Bayesian reasoning and available tools for visualizing the inference-related distributions, we focus specifically on evaluating the effect of interaction on users' comprehension of probabilistic models' structure. We conducted a user study based on our Interactive Pair Plot for visualizing models' distribution and conditioning sample space graphically. Our results suggest that improvements in the understanding of the interactive group are most pronounced for more exotic structures, such as hierarchical models or unfamiliar parameterisations in comparison to the static group. As the detail of the inferred information increases, interaction does not lead to considerably longer response times. Finally, interaction improves users' confidence.
COMPUTERS
arxiv.org

A Survey of Pretrained Language Models Based Text Generation

Text Generation aims to produce plausible and readable text in human language from input data. The resurgence of deep learning has greatly advanced this field by neural generation models, especially the paradigm of pretrained language models (PLMs). Grounding text generation on PLMs is seen as a promising direction in both academia and industry. In this survey, we present the recent advances achieved in the topic of PLMs for text generation. In detail, we begin with introducing three key points of applying PLMs to text generation: 1) how to encode the input data as representations preserving input semantics which can be fused into PLMs; 2) how to design a universal and performant architecture of PLMs served as generation models; and 3) how to optimize PLMs given the reference text and ensure the generated text satisfying special text properties. Then, we figure out several challenges and future directions within each key point. Next, we present a summary of various useful resources and typical text generation applications to work with PLMs. Finally, we conclude and summarize the contribution of this survey.
COMPUTERS
arxiv.org

Modelling of propagation of very-high-energy gamma rays with CRbeam code. Comparison with CRPropa and ELMAG codes

Very-high-energy gamma rays produce electron positron pairs in interactions with low-energy photons of extragalactic background light during propagation through the intergalactic medium. The electron-positron pairs generate secondary gamma rays detectable by gamma-ray telescopes. This secondary emission can be used to detect Inter-Galactic Magnetic Fields (IGMF) in the voids of Large Scale Structure. New gamma-ray observatory, Cherenkov Telescope Array (CTA), will provide an increase of sensitivity for detection of these secondary gamma-ray emission and enable measurement of its properties for sources at cosmological distances. Interpretation of the CTA data including detection of IGMF and study of it's properties and origin will require precision modelling of the primary and secondary gamma-ray fluxes. We asses the precision of the modelling of the secondary gamma-ray emission using model calculations with publicly available Monte-Carlo codes CRPropa and ELMAG and compare their predictions with theoretical expectations and with model calculations of a newly developed CRbeam code. We find that model predictions of different codes differ by up to 50% for low-redshift sources, with discrepancies increasing up to order-of-magnitude level with the increasing source redshifts. We identify the origin of these discrepancies and argue that the new CRbeam code provides reliable predictions for spectral, timing and imaging properties of the secondary gamma-ray signal and can be used to study gamma-ray sources and IGMF with precision relevant for the prospective CTA study of the effects of gamma-ray propagation through the intergalactic medium.
SCIENCE
aithority.com

Mydecine Achieves Innovative Supercomputing Artificial Intelligence Modeling in Psychedelic Drug Development Enabling Quick Screening of Billions of Compounds

Mydecine Innovations Group, a biotechnology and digital technology company aiming to transform the treatment of mental health and addiction disorders, announced they have completed a target-based model of the classic psychedelic serotonin receptor 5-HT2A for use in their AI-driven drug discovery program. The new model will allow them to expeditiously screen billions of structures to determine which novel compounds are most likely to increase binding affinity, enabling them to continue creating improved second and third generation psychedelic molecules for medical use.
HEALTH
Nature.com

Mini-batch optimization enables training of ODE models on large-scale datasets

Quantitative dynamic models are widely used to study cellular signal processing. A critical step in modelling is the estimation of unknown model parameters from experimental data. As model sizes and datasets are steadily growing, established parameter optimization approaches for mechanistic models become computationally extremely challenging. Mini-batch optimization methods, as employed in deep learning, have better scaling properties. In this work, we adapt, apply, and benchmark mini-batch optimization for ordinary differential equation (ODE) models, thereby establishing a direct link between dynamic modelling and machine learning. On our main application example, a large-scale model of cancer signaling, we benchmark mini-batch optimization against established methods, achieving better optimization results and reducing computation by more than an order of magnitude. We expect that our work will serve as a first step towards mini-batch optimization tailored to ODE models and enable modelling of even larger and more complex systems than what is currently possible.
SCIENCE
VentureBeat

Inside BigScience, the quest to build a powerful open language model

Roughly a year ago, Hugging Face, a Brooklyn, New York-based natural language processing startup, launched BigScience, an international project with more than 900 researchers that is designed to better understand and improve the quality of large natural language models. Large language models (LLMs) — algorithms that can recognize, predict, and generate language on the basis of text-based datasets — have captured the attention of entrepreneurs and tech enthusiasts alike. But the costly hardware required to develop LLMs has kept them largely out of reach of researchers without the resources of companies like OpenAI and DeepMind behind them.
BROOKLYN, NY
devops.com

How Low-Code Enables the Composable Enterprise

Enterprises know that their customers and partners expect a superior personalized experience. After all, customer experience has been a top priority for organizations over the last decade. However, despite all the technological advancements and transformation initiatives, the ability to deliver innovation with speed continues to elude. Without this ability, enterprises...
COMPUTERS
arxiv.org

Pedestrians in static crowds are not grains, but game players

Thibault Bonnemain, Matteo Butano (LPTMS), Théophile Bonnet (IJCLab, LPTMS, CEA), Iñaki Echeverría-Huarte (UPNA), Antoine Seguin (FAST), Alexandre Nicolas (ILM), Cécile Appert-Rolland (IJCLab), Denis Ullmo (LPTMS) The short-term (`operational') dynamics of pedestrian crowds are generally thought to involve no anticipation, except perhaps the avoidance of the most...
TrendHunter.com

AR-Enabled Heart Modelling Solutions

Health tech company Magic Leap has announced that it will be providing a select group of healthcare companies with early access to its second-generation augmented reality headset, the Magic Leap 2. For example, SentiAI will adopt the technology to improve upon its 3D heart-rendering software, which has been designed to...
ELECTRONICS
arxiv.org

Unveiling Project-Specific Bias in Neural Code Models

Neural code models have introduced significant improvements over many software analysis tasks like type inference, vulnerability detection, etc. Despite the good performance of such models under the common intra-project independent and identically distributed (IID) training and validation setting, we observe that they usually fail to generalize to real-world inter-project out-of-distribution (OOD) setting. In this work, we show that such phenomenon is caused by model heavily relying on project-specific, ungeneralizable tokens like self-defined variable and function names for downstream prediction, and we formulate it as the project-specific bias learning behavior. We propose a measurement to interpret such behavior, termed as Cond-Idf, which combines co-occurrence probability and inverse document frequency to measure the level of relatedness of token with label and its project-specificness. The approximation indicates that without proper regularization with prior knowledge, model tends to leverage spurious statistical cues for prediction. Equipped with these observations, we propose a bias mitigation mechanism Batch Partition Regularization (BPR) that regularizes model to infer based on proper behavior by leveraging latent logic relations among samples. Experimental results on two deep code benchmarks indicate that BPR can improve both inter-project OOD generalization and adversarial robustness while not sacrificing accuracy on IID data.
CODING & PROGRAMMING
arxiv.org

High-Dimensional Inference over Networks: Linear Convergence and Statistical Guarantees

We study sparse linear regression over a network of agents, modeled as an undirected graph and no server node. The estimation of the $s$-sparse parameter is formulated as a constrained LASSO problem wherein each agent owns a subset of the $N$ total observations. We analyze the convergence rate and statistical guarantees of a distributed projected gradient tracking-based algorithm under high-dimensional scaling, allowing the ambient dimension $d$ to grow with (and possibly exceed) the sample size $N$. Our theory shows that, under standard notions of restricted strong convexity and smoothness of the loss functions, suitable conditions on the network connectivity and algorithm tuning, the distributed algorithm converges globally at a {\it linear} rate to an estimate that is within the centralized {\it statistical precision} of the model, $O(s\log d/N)$. When $s\log d/N=o(1)$, a condition necessary for statistical consistency, an $\varepsilon$-optimal solution is attained after $\mathcal{O}(\kappa \log (1/\varepsilon))$ gradient computations and $O (\kappa/(1-\rho) \log (1/\varepsilon))$ communication rounds, where $\kappa$ is the restricted condition number of the loss function and $\rho$ measures the network connectivity. The computation cost matches that of the centralized projected gradient algorithm despite having data distributed; whereas the communication rounds reduce as the network connectivity improves. Overall, our study reveals interesting connections between statistical efficiency, network connectivity \& topology, and convergence rate in high dimensions.
COMPUTERS
arxiv.org

Tridiagonal real symmetric matrices with a connection to Pascal's triangle and the Fibonacci sequence

We explore a certain family $\{A_n\}_{n=1}^{\infty}$ of $n \times n$ tridiagonal real symmetric matrices. After deriving a three-term recurrence relation for the characteristic polynomials of this family, we find a closed form solution. The coefficients of these characteristic polynomials turn out to involve the diagonal entries of Pascal's triangle in a tantalizingly predictive manner. Lastly, we explore a relation between the eigenvalues of various members of the family. More specifically, we give a sufficient condition on the values $m,n \in \mathbb{N}$ for when $\texttt{spec}(A_m)$ is contained in $\texttt{spec}(A_n)$. We end the paper with a number of open questions, one of which intertwines our characteristic polynomials with the Fibonacci sequence in an intriguing manner involving ellipses.
MATHEMATICS

