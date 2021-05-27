Cancel
CreatorsPublishersAdvertisers
View more in
Winston-salem, NC

Lead Semantics Selects Fluree for TextDistil Natural Language Processing Pipeline

enterpriseai.news
 18 days ago

Cover picture for the articleWINSTON-SALEM, N.C., May 27, 2021 -- Fluree, provider of an immutable semantic graph data platform, today announced a technical partnership with Lead Semantics to provide a fully integrated solution, TextDistil, for enterprise data management teams looking to build semantic-capable, secure data fabrics. A key area of focus for the integrated solution includes highly regulated industries, with a greater magnitude and scope of requirements needed to prove compliance, including fintech, banking, insurance and the public sector, among others.

www.enterpriseai.news
RELATED LOCAL CHANNELS
Winston-salem, NC
Business
City
Winston-salem, NC
IN THIS ARTICLE
#Language Technology#Data Processing#Information Processing#Data Integration#Textdistil#Fintech#Co Ceo#Idc#Permissioned#Fluree Ledger#Fluree Database#Fluree Pbc#Rdf#Neural Language Pipeline#Lead Semantics Nlp#Semantic Graph Startup#W3c Semantic Standards#Semantic Underpinnings#Unstructured Data Assets#End To End Extraction
YOU MAY ALSO LIKE
News Break
Business
News Break
Technology
News Break
Economy
Related
Technologyaithority.com

Enterprise Automation Announces Results Of Innovative Artificial Intelligence And Data Visualization Research And Development Projects

Enterprise Automation’s internal R&D group has completed projects for the drinking water industry using AVEVA artificial intelligence and data visualization software solutions and seeks partners for developing technology-based proofs of concept. Enterprise Automation, North America’s premier control systems integration firm serving the water and wastewater and life sciences industries announced...
Beauty & Fashionarxiv.org

Bridging Subword Gaps in Pretrain-Finetune Paradigm for Natural Language Generation

A well-known limitation in pretrain-finetune paradigm lies in its inflexibility caused by the one-size-fits-all vocabulary. This potentially weakens the effect when applying pretrained models into natural language generation (NLG) tasks, especially for the subword distributions between upstream and downstream tasks with significant discrepancy. Towards approaching this problem, we extend the vanilla pretrain-finetune pipeline with an extra embedding transfer step. Specifically, a plug-and-play embedding generator is introduced to produce the representation of any input token, according to pre-trained embeddings of its morphologically similar ones. Thus, embeddings of mismatch tokens in downstream tasks can also be efficiently initialized. We conduct experiments on a variety of NLG tasks under the pretrain-finetune fashion. Experimental results and extensive analyses show that the proposed strategy offers us opportunities to feel free to transfer the vocabulary, leading to more efficient and better performed downstream NLG models.
Coding & Programminggnu.org

Reproducible data processing pipelines

Last week, we at Guix-HPC published videos of a workshop on reproducible software environments we organized on-line. The videos are well worth watching—especially if you’re into reproducible research, and especially if you speak French or want to practice. This post, though, is more of a meta-post: it’s about how we processed these videos. “A workshop on reproducibility ought to have a reproducible video pipeline”, we thought. So this is what we did!
Computersarxiv.org

Graph Neural Networks for Natural Language Processing: A Survey

Deep learning has become the dominant approach in coping with various tasks in Natural LanguageProcessing (NLP). Although text inputs are typically represented as a sequence of tokens, there isa rich variety of NLP problems that can be best expressed with a graph structure. As a result, thereis a surge of interests in developing new deep learning techniques on graphs for a large numberof NLP tasks. In this survey, we present a comprehensive overview onGraph Neural Networks(GNNs) for Natural Language Processing. We propose a new taxonomy of GNNs for NLP, whichsystematically organizes existing research of GNNs for NLP along three axes: graph construction,graph representation learning, and graph based encoder-decoder models. We further introducea large number of NLP applications that are exploiting the power of GNNs and summarize thecorresponding benchmark datasets, evaluation metrics, and open-source codes. Finally, we discussvarious outstanding challenges for making the full use of GNNs for NLP as well as future researchdirections. To the best of our knowledge, this is the first comprehensive overview of Graph NeuralNetworks for Natural Language Processing.
Computersarxiv.org

AUGNLG: Few-shot Natural Language Generation using Self-trained Data Augmentation

Natural Language Generation (NLG) is a key component in a task-oriented dialogue system, which converts the structured meaning representation (MR) to the natural language. For large-scale conversational systems, where it is common to have over hundreds of intents and thousands of slots, neither template-based approaches nor model-based approaches are scalable. Recently, neural NLGs started leveraging transfer learning and showed promising results in few-shot settings. This paper proposes AUGNLG, a novel data augmentation approach that combines a self-trained neural retrieval model with a few-shot learned NLU model, to automatically create MR-to-Text data from open-domain texts. The proposed system mostly outperforms the state-of-the-art methods on the FewShotWOZ data in both BLEU and Slot Error Rate. We further confirm improved results on the FewShotSGD data and provide comprehensive analysis results on key components of our system. Our code and data are available at this https URL.
Softwarearxiv.org

Case Studies on using Natural Language Processing Techniques in Customer Relationship Management Software

How can a text corpus stored in a customer relationship management (CRM) database be used for data mining and segmentation? In order to answer this question we inherited the state of the art methods commonly used in natural language processing (NLP) literature, such as word embeddings, and deep learning literature, such as recurrent neural networks (RNN). We used the text notes from a CRM system which are taken by customer representatives of an internet ads consultancy agency between years 2009 and 2020. We trained word embeddings by using the corresponding text corpus and showed that these word embeddings can not only be used directly for data mining but also be used in RNN architectures, which are deep learning frameworks built with long short term memory (LSTM) units, for more comprehensive segmentation objectives. The results prove that structured text data in a CRM can be used to mine out very valuable information and any CRM can be equipped with useful NLP features once the problem definitions are properly built and the solution methods are conveniently implemented.
Sciencearxiv.org

Title:Auto-tagging of Short Conversational Sentences using Natural Language Processing Methods

Abstract: In this study, we aim to find a method to auto-tag sentences specific to a domain. Our training data comprises short conversational sentences extracted from chat conversations between company's customer representatives and web site visitors. We manually tagged approximately 14 thousand visitor inputs into ten basic categories, which will later be used in a transformer-based language model with attention mechanisms for the ultimate goal of developing a chatbot application that can produce meaningful dialogue. We considered three different state-of-the-art models and reported their auto-tagging capabilities. We achieved the best performance with the bidirectional encoder representation from transformers (BERT) model. Implementation of the models used in these experiments can be cloned from our GitHub repository and tested for similar auto-tagging problems without much effort.
Coding & Programmingarxiv.org

Turing: an Accurate and Interpretable Multi-Hypothesis Cross-Domain Natural Language Database Interface

Peng Xu, Wenjie Zi, Hamidreza Shahidi, Ákos Kádár, Keyi Tang, Wei Yang, Jawad Ateeq, Harsh Barot, Meidan Alon, Yanshuai Cao. A natural language database interface (NLDB) can democratize data-driven insights for non-technical users. However, existing Text-to-SQL semantic parsers cannot achieve high enough accuracy in the cross-database setting to allow good usability in practice. This work presents Turing, a NLDB system toward bridging this gap. The cross-domain semantic parser of Turing with our novel value prediction method achieves $75.1\%$ execution accuracy, and $78.3\%$ top-5 beam execution accuracy on the Spider validation set. To benefit from the higher beam accuracy, we design an interactive system where the SQL hypotheses in the beam are explained step-by-step in natural language, with their differences highlighted. The user can then compare and judge the hypotheses to select which one reflects their intention if any. The English explanations of SQL queries in Turing are produced by our high-precision natural language generation system based on synchronous grammars.
Coding & Programmingarxiv.org

Adversarial Semantic Hallucination for Domain Generalized Semantic Segmentation

Convolutional neural networks may perform poorly when the test and train data are from different domains. While this problem can be mitigated by using the target domain data to align the source and target domain feature representations, the target domain data may be unavailable due to privacy concerns. Consequently, there is a need for methods that generalize well without access to target domain data during training. In this work, we propose an adversarial hallucination approach, which combines a class-wise hallucination module and a semantic segmentation module. Since the segmentation performance varies across different classes, we design a semantic-conditioned style hallucination layer to adaptively stylize each class. The classwise stylization parameters are generated from the semantic knowledge in the segmentation probability maps of the source domain image. Both modules compete adversarially, with the hallucination module generating increasingly 'difficult' style images to challenge the segmentation module. In response, the segmentation module improves its performance as it is trained with generated samples at an appropriate class-wise difficulty level. Experiments on state of the art domain adaptation work demonstrate the efficacy of our proposed method when no target domain data are available for training.
Sciencearxiv.org

Embodied negation and levels of concreteness: A TMS Study on German and Italian language processing

According to the embodied cognition perspective, linguistic negation may block the motor simulations induced by language processing. Transcranial magnetic stimulation (TMS) was applied to the left primary motor cortex (hand area) of monolingual Italian and German healthy participants during a rapid serial visual presentation of sentences from their own language. In these languages, the negative particle is located at the beginning and at the end of the sentence, respectively. The study investigated whether the interruption of the motor simulation processes, accounted for by reduced motor evoked potentials (MEPs), takes place similarly in two languages differing on the position of the negative marker. Different levels of sentence concreteness were also manipulated to investigate if negation exerts generalized effects or if it is affected by the semantic features of the sentence. Our findings indicate that negation acts as a block on motor representations, but independently from the language and words concreteness level.
Economydweb.news

Natural Language Generation – Beyond Business Intelligence By Mitul Makadia

It goes without saying that in order for companies to make better decisions, optimize processes, increase productivity, and drive revenue, they must invest in data. This explains the tremendous success and adoption of latest business intelligence tools and services across all verticals of businesses. Despite this, only 20% of employees...
Technologyarxiv.org

Generate, Prune, Select: A Pipeline for Counterspeech Generation against Online Hate Speech

Countermeasures to effectively fight the ever increasing hate speech online without blocking freedom of speech is of great social interest. Natural Language Generation (NLG), is uniquely capable of developing scalable solutions. However, off-the-shelf NLG methods are primarily sequence-to-sequence neural models and they are limited in that they generate commonplace, repetitive and safe responses regardless of the hate speech (e.g., "Please refrain from using such language.") or irrelevant responses, making them ineffective for de-escalating hateful conversations. In this paper, we design a three-module pipeline approach to effectively improve the diversity and relevance. Our proposed pipeline first generates various counterspeech candidates by a generative model to promote diversity, then filters the ungrammatical ones using a BERT model, and finally selects the most relevant counterspeech response using a novel retrieval-based method. Extensive Experiments on three representative datasets demonstrate the efficacy of our approach in generating diverse and relevant counterspeech.
Computersarxiv.org

Attention-Guided Supervised Contrastive Learning for Semantic Segmentation

Contrastive learning has shown superior performance in embedding global and spatial invariant features in computer vision (e.g., image classification). However, its overall success of embedding local and spatial variant features is still limited, especially for semantic segmentation. In a per-pixel prediction task, more than one label can exist in a single image for segmentation (e.g., an image contains both cat, dog, and grass), thereby it is difficult to define 'positive' or 'negative' pairs in a canonical contrastive learning setting. In this paper, we propose an attention-guided supervised contrastive learning approach to highlight a single semantic object every time as the target. With our design, the same image can be embedded to different semantic clusters with semantic attention (i.e., coerce semantic masks) as an additional input channel. To achieve such attention, a novel two-stage training strategy is presented. We evaluate the proposed method on multi-organ medical image segmentation task, as our major task, with both in-house data and BTCV 2015 datasets. Comparing with the supervised and semi-supervised training state-of-the-art in the backbone of ResNet-50, our proposed pipeline yields substantial improvement of 5.53% and 6.09% in Dice score for both medical image segmentation cohorts respectively. The performance of the proposed method on natural images is assessed via PASCAL VOC 2012 dataset, and achieves 2.75% substantial improvement.
Computersarxiv.org

SyGNS: A Systematic Generalization Testbed Based on Natural Language Semantics

Recently, deep neural networks (DNNs) have achieved great success in semantically challenging NLP tasks, yet it remains unclear whether DNN models can capture compositional meanings, those aspects of meaning that have been long studied in formal semantics. To investigate this issue, we propose a Systematic Generalization testbed based on Natural language Semantics (SyGNS), whose challenge is to map natural language sentences to multiple forms of scoped meaning representations, designed to account for various semantic phenomena. Using SyGNS, we test whether neural networks can systematically parse sentences involving novel combinations of logical expressions such as quantifiers and negation. Experiments show that Transformer and GRU models can generalize to unseen combinations of quantifiers, negations, and modifiers that are similar to given training instances in form, but not to the others. We also find that the generalization performance to unseen combinations is better when the form of meaning representations is simpler. The data and code for SyGNS are publicly available at this https URL.
Coding & Programmingarxiv.org

Bottom-Up and Top-Down Neural Processing Systems Design: Neuromorphic Intelligence as the Convergence of Natural and Artificial Intelligence

While Moore's law has driven exponential computing power expectations, its nearing end calls for new avenues for improving the overall system performance. One of these avenues is the exploration of new alternative brain-inspired computing architectures that promise to achieve the flexibility and computational efficiency of biological neural processing systems. Within this context, neuromorphic intelligence represents a paradigm shift in computing based on the implementation of spiking neural network architectures tightly co-locating processing and memory. In this paper, we provide a comprehensive overview of the field, highlighting the different levels of granularity present in existing silicon implementations, comparing approaches that aim at replicating natural intelligence (bottom-up) versus those that aim at solving practical artificial intelligence applications (top-down), and assessing the benefits of the different circuit design styles used to achieve these goals. First, we present the analog, mixed-signal and digital circuit design styles, identifying the boundary between processing and memory through time multiplexing, in-memory computation and novel devices. Next, we highlight the key tradeoffs for each of the bottom-up and top-down approaches, survey their silicon implementations, and carry out detailed comparative analyses to extract design guidelines. Finally, we identify both necessary synergies and missing elements required to achieve a competitive advantage for neuromorphic edge computing over conventional machine-learning accelerators, and outline the key elements for a framework toward neuromorphic intelligence.
HealthNature.com

A deep database of medical abbreviations and acronyms for natural language processing

The recognition, disambiguation, and expansion of medical abbreviations and acronyms is of upmost importance to prevent medically-dangerous misinterpretation in natural language processing. To support recognition, disambiguation, and expansion, we present the Medical Abbreviation and Acronym Meta-Inventory, a deep database of medical abbreviations. A systematic harmonization of eight source inventories across multiple healthcare specialties and settings identified 104,057 abbreviations with 170,426 corresponding senses. Automated cross-mapping of synonymous records using state-of-the-art machine learning reduced redundancy, which simplifies future application. Additional features include semi-automated quality control to remove errors. The Meta-Inventory demonstrated high completeness or coverage of abbreviations and senses in new clinical text, a substantial improvement over the next largest repository (6–14% increase in abbreviation coverage; 28–52% increase in sense coverage). To our knowledge, the Meta-Inventory is the most complete compilation of medical abbreviations and acronyms in American English to-date. The multiple sources and high coverage support application in varied specialties and settings. This allows for cross-institutional natural language processing, which previous inventories did not support. The Meta-Inventory is available at https://bit.ly/github-clinical-abbreviations.
Winston-salem, NCwraltechwire.com

Fluree and Lead Semantics, two NC tech startups, partner on new product

WINSTON-SALEM & CHAPEL HILL – Two North Carolina companies are entering a technical partnership to bring a fully integrated solution to enterprise data management teams seeking to build semantic-capable, secure data fabrics. Winston-Salem-based Fluree, which built what it calls an immutable semantic graph data platform, will partner with Chapel Hill’s...