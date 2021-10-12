CreatorsPublishersAdvertisers
Fast Forward Indexes for Efficient Document Ranking

By Jurek Leonhardt, Koustav Rudra, Megha Khosla, Abhijit Anand, Avishek Anand
arxiv.org
 10 days ago

Neural approaches, specifically transformer models, for ranking documents have delivered impressive gains in ranking performance. However, query processing using such over-parameterized models is both resource and time intensive. Consequently, to keep query processing costs manageable, trade-offs are

arxiv.org

arxiv.org

Fast Data Series Indexing for In-Memory Data

Data series similarity search is a core operation for several data series analysis applications across many different domains. However, the state-of-the-art techniques fail to deliver the time performance required for interactive exploration, or analysis of large data series collections. In this work, we propose MESSI, the first data series index designed for in-memory operation on modern hardware. Our index takes advantage of the modern hardware parallelization opportunities (i.e., SIMD instructions, multi-socket and multi-core architectures), in order to accelerate both index construction and similarity search processing times. Moreover, it benefits from a careful design in the setup and coordination of the parallel workers and data structures, so that it maximizes its performance for in-memory operations. MESSI supports similarity search using both the Euclidean and Dynamic Time Warping (DTW) distances. Our experiments with synthetic and real datasets demonstrate that overall MESSI is up to 4x faster at index construction, and up to 11x faster at query answering than the state-of-the-art parallel approach. MESSI is the first to answer exact similarity search queries on 100GB datasets in ~50msec (30-75msec across diverse datasets), which enables real-time, interactive data exploration on very large data series collections.
CODING & PROGRAMMING
arxiv.org

Cascaded Fast and Slow Models for Efficient Semantic Code Search

The goal of natural language semantic code search is to retrieve a semantically relevant code snippet from a fixed set of candidates using a natural language query. Existing approaches are neither effective nor efficient enough towards a practical semantic code search system. In this paper, we propose an efficient and accurate semantic code search framework with cascaded fast and slow models, in which a fast transformer encoder model is learned to optimize a scalable index for fast retrieval followed by learning a slow classification-based re-ranking model to improve the performance of the top K results from the fast retrieval. To further reduce the high memory cost of deploying two separate models in practice, we propose to jointly train the fast and slow model based on a single transformer encoder with shared parameters. The proposed cascaded approach is not only efficient and scalable, but also achieves state-of-the-art results with an average mean reciprocal ranking (MRR) score of 0.7795 (across 6 programming languages) as opposed to the previous state-of-the-art result of 0.713 MRR on the CodeSearchNet benchmark.
CODING & PROGRAMMING
arxiv.org

Using RDMA for Efficient Index Replication in LSM Key-Value Stores

Log-Structured Merge tree (LSM tree) Key-Value (KV) stores have become a foundational layer in the storage stacks of datacenter and cloud services. Current approaches for achieving reliability and availability avoid replication at the KV store level and instead perform these operations at higher layers, e.g., the DB layer that runs on top of the KV store. The main reason is that past designs for replicated KV stores favor reducing network traffic and increasing I/O size. Therefore, they perform costly compactions to reorganize data in both the primary and backup nodes, which hurts overall system performance.
SOFTWARE
arxiv.org

Likelihood Training of Schrödinger Bridge using Forward-Backward SDEs Theory

Schrödinger Bridge (SB) is an optimal transport problem that has received increasing attention in deep generative modeling for its mathematical flexibility compared to the Scored-based Generative Model (SGM). However, it remains unclear whether the optimization principle of SB relates to the modern training of deep generative models, which often rely on constructing parameterized log-likelihood objectives.This raises questions on the suitability of SB models as a principled alternative for generative applications. In this work, we present a novel computational framework for likelihood training of SB models grounded on Forward-Backward Stochastic Differential Equations Theory -- a mathematical methodology appeared in stochastic optimal control that transforms the optimality condition of SB into a set of SDEs. Crucially, these SDEs can be used to construct the likelihood objectives for SB that, surprisingly, generalizes the ones for SGM as special cases. This leads to a new optimization principle that inherits the same SB optimality yet without losing applications of modern generative training techniques, and we show that the resulting training algorithm achieves comparable results on generating realistic images on MNIST, CelebA, and CIFAR10.
CODING & PROGRAMMING
IN THIS ARTICLE
#Indexes#Trec
aithority.com

AppsFlyer Launches SKAdNetwork Ranking For The New Privacy Era In Latest Performance Index

The first index following Apple’s ATT enforcement shows that mobile media sources are still adapting to the new privacy reality. AppsFlyer, the marketing measurement and experience platform, released the 13th edition of its Performance Index, ranking the top media sources in mobile advertising. In its edition, AppsFlyer is pioneering the SKAN Index, a SKAdNetwork ranking that reflects the new reality of privacy generated by the enforcement of Apple’s App Tracking Transparency (ATT) framework.
INTERNET
Searchengine Journal

What Google’s Indexing Looks like From Discovery to Ranking

How a website renders is a part of the indexing queue and like the other parts has an impact on SEO and ranking. Google’s Martin Splitt explained in a webinar hosted by Duda how rendering impacts SEO and what the crawl process looks like from initial discover to ranking. The...
INTERNET
arxiv.org

Mean-field theory of vector spin models on networks with arbitrary degree distributions

Understanding the relationship between the heterogeneous structure of complex networks and cooperative phenomena occurring on them remains a key problem in network science. Mean-field theories of spin models on networks constitute a fundamental tool to tackle this problem and a cornerstone of statistical physics, with an impressive number of applications in condensed matter, biology, and computer science. In this work we derive the mean-field equations for the equilibrium behavior of vector spin models on high-connectivity random networks with an arbitrary degree distribution and with randomly weighted links. We demonstrate that the high-connectivity limit of spin models on networks is not universal in that it depends on the full degree distribution. Such nonuniversal behavior is akin to a remarkable mechanism that leads to the breakdown of the central limit theorem when applied to the distribution of effective local fields. Traditional mean-field theories on fully-connected models, such as the Curie-Weiss, the Kuramoto, and the Sherrington-Kirkpatrick model, are only valid if the network degree distribution is highly concentrated around its mean degree. We obtain a series of results that highlight the importance of degree fluctuations to the phase diagram of mean-field spin models by focusing on the Kuramoto model of synchronization and on the Sherrington-Kirkpatrick model of spin-glasses. Numerical simulations corroborate our theoretical findings and provide compelling evidence that the present mean-field theory describes an intermediate regime of connectivity, in which the average degree $c$ scales as a power $c \propto N^{b}$ ($b < 1$) of the total number $N \gg 1$ of spins. Our findings put forward a novel class of spin models that incorporate the effects of degree fluctuations and, at the same time, are amenable to exact analytic solutions.
SCIENCE
The Motley Fool

2 Stocks That Can Turn $500 Into $7,500 (Or More)

The stock market continues to chug along, opening up opportunities to cash in. This duo of stocks provides more unique growth opportunities than most. There are few better roads to building wealth over your lifetime than investing in the stock market. The long-term average return for the S&P 500 has been about 11% per year, and that's through depressions and recessions, war and civil unrest.
STOCKS
