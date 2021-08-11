Cancel
CreatorsPublishersAdvertisers
View more in
Coding & Programming

AI21 Labs trains a massive language model to rival OpenAI’s GPT-3

By Kyle Wiggers
VentureBeat
 8 days ago

Cover picture for the articleFor the better part of a year, OpenAI’s GPT-3 has remained among the largest AI language models ever created, if not the largest of its kind. Via an API, people have used it to automatically write emails and articles, summarize text, compose poetry and recipes, create website layouts, and generate code for deep learning in Python. But an AI lab based in Tel Aviv, Israel — AI21 Labs — says it’s planning to release a larger model and make it available via a service, with the idea being to challenge OpenAI’s dominance in the “natural language processing-as-a-service” field.

venturebeat.com

Comments / 0

IN THIS ARTICLE
#Openai#The Models#Language Learning#Gpt 3#Openai#Api#Python#Udacity#Crowdx#Stanford University#Mobileye#Jurassic 1 Jumbo#Venturebeat#Ai21 Labs#Opensubtitles#Tokens#Itarian
YOU MAY ALSO LIKE
NewsBreak
Intel
NewsBreak
Technology
NewsBreak
Computers
NewsBreak
Coding & Programming
NewsBreak
Google
Related
ComputersVentureBeat

Prompt-based learning can make language models more capable

Supervised learning, where AI models are trained on input data annotated for a particular output until they can detect the underlying relationships between the inputs and outputs, plays a major role in natural language processing (NLP). Early NLP models relied heavily on feature engineering — researchers used domain knowledge to extract key information from training datasets and provide models with the guidance needed to learn from the data. But with the advent of neural network models for NLP, the focus pivoted from feature engineering to model architecture engineering. Neural networks enabled features to be learned jointly with the training of the models themselves.
Computersarxiv.org

W2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training

Motivated by the success of masked language modeling~(MLM) in pre-training natural language processing models, we propose w2v-BERT that explores MLM for self-supervised speech representation learning. w2v-BERT is a framework that combines contrastive learning and MLM, where the former trains the model to discretize input continuous speech signals into a finite set of discriminative speech tokens, and the latter trains the model to learn contextualized speech representations via solving a masked prediction task consuming the discretized tokens. In contrast to existing MLM-based speech pre-training frameworks such as HuBERT, which relies on an iterative re-clustering and re-training process, or vq-wav2vec, which concatenates two separately trained modules, w2v-BERT can be optimized in an end-to-end fashion by solving the two self-supervised tasks~(the contrastive task and MLM) simultaneously. Our experiments show that w2v-BERT achieves competitive results compared to current state-of-the-art pre-trained models on the LibriSpeech benchmarks when using the Libri-Light~60k corpus as the unsupervised data. In particular, when compared to published models such as conformer-based wav2vec~2.0 and HuBERT, our model shows~5\% to~10\% relative WER reduction on the test-clean and test-other subsets. When applied to the Google's Voice Search traffic dataset, w2v-BERT outperforms our internal conformer-based wav2vec~2.0 by more than~30\% relatively.
arxiv.org

Robust Transfer Learning with Pretrained Language Models through Adapters

Transfer learning with large pretrained transformer-based language models like BERT has become a dominating approach for most NLP tasks. Simply fine-tuning those large language models on downstream tasks or combining it with task-specific pretraining is often not robust. In particular, the performance considerably varies as the random seed changes or the number of pretraining and/or fine-tuning iterations varies, and the fine-tuned model is vulnerable to adversarial attack. We propose a simple yet effective adapter-based approach to mitigate these issues. Specifically, we insert small bottleneck layers (i.e., adapter) within each layer of a pretrained model, then fix the pretrained layers and train the adapter layers on the downstream task data, with (1) task-specific unsupervised pretraining and then (2) task-specific supervised training (e.g., classification, sequence labeling). Our experiments demonstrate that such a training scheme leads to improved stability and adversarial robustness in transfer learning to various downstream tasks.
Computersarxiv.org

Noisy Channel Language Model Prompting for Few-Shot Text Classification

We introduce a noisy channel approach for language model prompting in few-shot text classification. Instead of computing the likelihood of the label given the input (referred as direct models), channel models compute the conditional probability of the input given the label, and are thereby required to explain every word in the input. We use channel models for recently proposed few-shot learning methods with no or very limited updates to the language model parameters, via either in-context demonstration or prompt tuning. Our experiments show that, for both methods, channel models significantly outperform their direct counterparts, which we attribute to their stability, i.e., lower variance and higher worst-case accuracy. We also present extensive ablations that provide recommendations for when to use channel prompt tuning instead of other competitive models (e.g., direct head tuning): channel prompt tuning is preferred when the number of training examples is small, labels in the training data are imbalanced, or generalization to unseen labels is required.
BusinessLaw.com

Former Parsons Behle Lab CEO's Next Act? Rethinking the Firm Business Model

There’s no time like the present—unless of course that present happens to be unfolding against the backdrop of a global pandemic. Nevertheless, Tomu Johnson decided to open his new, two-attorney law firm, The Broad Axe, in June, confident that more than year spent working from home adequately prepared clients for a virtual—and subscription-based—business model.
Beauty & Fashionarxiv.org

Image Retrieval on Real-life Images with Pre-trained Vision-and-Language Models

We extend the task of composed image retrieval, where an input query consists of an image and short textual description of how to modify the image. Existing methods have only been applied to non-complex images within narrow domains, such as fashion products, thereby limiting the scope of study on in-depth visual reasoning in rich image and language contexts. To address this issue, we collect the Compose Image Retrieval on Real-life images (CIRR) dataset, which consists of over 36,000 pairs of crowd-sourced, open-domain images with human-generated modifying text. To extend current methods to the open-domain, we propose CIRPLANT, a transformer based model that leverages rich pre-trained vision-and-language (V&L) knowledge for modifying visual features conditioned on natural language. Retrieval is then done by nearest neighbor lookup on the modified features. We demonstrate that with a relatively simple architecture, CIRPLANT outperforms existing methods on open-domain images, while matching state-of-the-art accuracy on the existing narrow datasets, such as fashion. Together with the release of CIRR, we believe this work will inspire further research on composed image retrieval.
Softwarearxiv.org

IntenT5: Search Result Diversification using Causal Language Models

Search result diversification is a beneficial approach to overcome under-specified queries, such as those that are ambiguous or multi-faceted. Existing approaches often rely on massive query logs and interaction data to generate a variety of possible query intents, which then can be used to re-rank documents. However, relying on user interaction data is problematic because one first needs a massive user base to build a sufficient log; public query logs are insufficient on their own. Given the recent success of causal language models (such as the Text-To-Text Transformer (T5) model) at text generation tasks, we explore the capacity of these models to generate potential query intents. We find that to encourage diversity in the generated queries, it is beneficial to adapt the model by including a new Distributional Causal Language Modeling (DCLM) objective during fine-tuning and a representation replacement during inference. Across six standard evaluation benchmarks, we find that our method (which we call IntenT5) improves search result diversity and attains (and sometimes exceeds) the diversity obtained when using query suggestions based on a proprietary query log. Our analysis shows that our approach is most effective for multi-faceted queries and is able to generalize effectively to queries that were unseen in training data.
Engineeringarxiv.org

Embodied BERT: A Transformer Model for Embodied, Language-guided Visual Task Completion

Language-guided robots performing home and office tasks must navigate in and interact with the world. Grounding language instructions against visual observations and actions to take in an environment is an open challenge. We present Embodied BERT (EmBERT), a transformer-based model which can attend to high-dimensional, multi-modal inputs across long temporal horizons for language-conditioned task completion. Additionally, we bridge the gap between successful object-centric navigation models used for non-interactive agents and the language-guided visual task completion benchmark, ALFRED, by introducing object navigation targets for EmBERT training. We achieve competitive performance on the ALFRED benchmark, and EmBERT marks the first transformer-based model to successfully handle the long-horizon, dense, multi-modal histories of ALFRED, and the first ALFRED model to utilize object-centric navigation targets.
Video Gamesgamewatcher.com

Victoria 3's Dev Diary 9 Tackles National Markets and Its Open Economic Model

Victoria 3 focuses on the economy of your nation on top of its society management-heavy approach to grand strategy. Making the most out of your goods and ensuring that your national markets run as smoothly as possible are key aspects of the more open economic model it proposes and which Paradox delved into in today's developer diary 9.
Coding & Programmingarxiv.org

A Transformer-based Math Language Model for Handwritten Math Expression Recognition

Handwritten mathematical expressions (HMEs) contain ambiguities in their interpretations, even for humans sometimes. Several math symbols are very similar in the writing style, such as dot and comma or 0, O, and o, which is a challenge for HME recognition systems to handle without using contextual information. To address this problem, this paper presents a Transformer-based Math Language Model (TMLM). Based on the self-attention mechanism, the high-level representation of an input token in a sequence of tokens is computed by how it is related to the previous tokens. Thus, TMLM can capture long dependencies and correlations among symbols and relations in a mathematical expression (ME). We trained the proposed language model using a corpus of approximately 70,000 LaTeX sequences provided in CROHME 2016. TMLM achieved the perplexity of 4.42, which outperformed the previous math language models, i.e., the N-gram and recurrent neural network-based language models. In addition, we combine TMLM into a stochastic context-free grammar-based HME recognition system using a weighting parameter to re-rank the top-10 best candidates. The expression rates on the testing sets of CROHME 2016 and CROHME 2019 were improved by 2.97 and 0.83 percentage points, respectively.
Softwarearxiv.org

BROS: A Layout-Aware Pre-trained Language Model for Understanding Documents

Understanding documents from their visual snapshots is an emerging problem that requires both advanced computer vision and NLP methods. The recent advance in OCR enables the accurate recognition of text blocks, yet it is still challenging to extract key information from documents due to the diversity of their layouts. Although recent studies on pre-trained language models show the importance of incorporating layout information on this task, the conjugation of texts and their layouts still follows the style of BERT optimized for understanding the 1D text. This implies there is room for further improvement considering the 2D nature of text layouts. This paper introduces a pre-trained language model, BERT Relying On Spatiality (BROS), which effectively utilizes the information included in individual text blocks and their layouts. Specifically, BROS encodes spatial information by utilizing relative positions and learns spatial dependencies between OCR blocks with a novel area-masking strategy. These two novel approaches lead to an efficient encoding of spatial layout information highlighted by the robust performance of BROS under low-resource environments. We also introduce a general-purpose parser that can be combined with BROS to extract key information even when there is no order information between text blocks. BROS shows its superiority on four public benchmarks---FUNSD, SROIE*, CORD, and SciTSR---and its robustness in practical cases where order information of text blocks is not available. Further experiments with a varying number of training examples demonstrate the high training efficiency of our approach. Our code will be open to the public.
Computersarxiv.org

Channel Modeling and Channel Estimation for Holographic Massive MIMO with Planar Arrays

In a realistic wireless environment, the multi-antenna channel usually exhibits spatially correlation fading. This is more emphasized when a large number of antennas is densely deployed, known as holographic massive MIMO (multiple-input multiple-output). In the first part of this letter, we develop a channel model for holographic massive MIMO by considering both non-isotropic scattering and directive antennas. With a large number of antennas, it is difficult to obtain full knowledge of the spatial correlation matrix. In this case, channel estimation is conventionally done using the least-squares (LS) estimator that requires no prior information of the channel statistics or array geometry. In the second part of this letter, we propose novel channel estimation schemes that exploit the array geometry to identify a subspace of reduced rank that covers the eigenspace of any spatial correlation matrix. The proposed estimators outperform the LS estimator, without using channel statistics, and provide different performance/complexity tradeoffs.
Computersarxiv.org

AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing

Transformer-based pretrained language models (T-PTLMs) have achieved great success in almost every NLP task. The evolution of these models started with GPT and BERT. These models are built on the top of transformers, self-supervised learning and transfer learning. Transformed-based PTLMs learn universal language representations from large volumes of text data using self-supervised learning and transfer this knowledge to downstream tasks. These models provide good background knowledge to downstream tasks which avoids training of downstream models from scratch. In this comprehensive survey paper, we initially give a brief overview of self-supervised learning. Next, we explain various core concepts like pretraining, pretraining methods, pretraining tasks, embeddings and downstream adaptation methods. Next, we present a new taxonomy of T-PTLMs and then give brief overview of various benchmarks including both intrinsic and extrinsic. We present a summary of various useful libraries to work with T-PTLMs. Finally, we highlight some of the future research directions which will further improve these models. We strongly believe that this comprehensive survey paper will serve as a good reference to learn the core concepts as well as to stay updated with the recent happenings in T-PTLMs.
Softwareaithority.com

AI21 Labs Makes Language AI Applications Accessible to Broader Audience

AI21 Labs, an AI company specializing in Natural Language Processing (NLP), has released Jurassic-1 Jumbo, the world’s largest and most sophisticated language model, to anyone interested in prototyping custom text-based AI applications. “You shouldn’t have to be an AI researcher working at a big tech company to do this stuff....
Softwaresiliconangle.com

OpenAI’s new Codex neural network writes software in response to text prompts

OpenAI LLC today launched the private beta program for an artificial intelligence system dubbed Codex that can write software and perform data science tasks in response to natural-language prompts. Codex will be available to participants of the private beta program through an application programming interface. The API will enable participants...

Comments / 0

Community Policy