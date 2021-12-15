ContributorsPublishersAdvertisers
Computers

Design Challenges for a Multi-Perspective Search Engine

By Sihao Chen, Siyi Liu, Xander Uyttendaele, Yi Zhang, William Bruno, Dan Roth
arxiv.org
 4 days ago

Many users turn to document retrieval systems (e.g. search engines) to seek answers to controversial questions. Answering such user queries usually require identifying responses within web documents, and aggregating the responses based on their different...

arxiv.org

arxiv.org

Data Collection and Quality Challenges in Deep Learning: A Data-Centric AI Perspective

Software 2.0 is a fundamental shift in software engineering where machine learning becomes the new software, powered by big data and computing infrastructure. As a result, software engineering needs to be re-thought where data becomes a first-class citizen on par with code. One striking observation is that 80-90% of the machine learning process is spent on data preparation. Without good data, even the best machine learning algorithms cannot perform well. As a result, data-centric AI practices are now becoming mainstream. Unfortunately, many datasets in the real world are small, dirty, biased, and even poisoned. In this survey, we study the research landscape for data collection and data quality primarily for deep learning applications. Data collection is important because there is lesser need for feature engineering for recent deep learning approaches, but instead more need for large amounts of data. For data quality, we study data validation and data cleaning techniques. Even if the data cannot be fully cleaned, we can still cope with imperfect data during model training where using robust model training techniques. In addition, while bias and fairness have been less studied in traditional data management research, these issues become essential topics in modern machine learning applications. We thus study fairness measures and unfairness mitigation techniques that can be applied before, during, or after model training. We believe that the data management community is well poised to solve problems in these directions.
SOFTWARE
arxiv.org

AI Ethics Principles in Practice: Perspectives of Designers and Developers

Conrad Sanderson, David Douglas, Qinghua Lu, Emma Schleiger, Jon Whittle, Justine Lacey, Glenn Newnham, Stefan Hajkowicz, Cathy Robinson, David Hansen. As consensus across the various published AI ethics principles is approached, a gap remains between high-level principles and practical techniques that can be readily adopted to design and develop responsible AI systems. We examine the practices and experiences of researchers and engineers from Australia's national scientific research agency (CSIRO), who are involved in designing and developing AI systems for a range of purposes. Semi-structured interviews were used to examine how the practices of the participants relate to and align with a set of high-level AI ethics principles that are proposed by the Australian Government. The principles comprise: Privacy Protection & Security, Reliability & Safety, Transparency & Explainability, Fairness, Contestability, Accountability, Human-centred Values, and Human, Social & Environmental Wellbeing. The insights of the researchers and engineers as well as the challenges that arose for them in the practical application of the principles are examined. Finally, a set of organisational responses are provided to support the implementation of high-level AI ethics principles into practice.
ENGINEERING
utoledo.edu

Engineering Students to Present Senior Design Projects Dec. 10

Making it safer for pedestrians to cross Monroe Street in front of the Toledo Museum of Art. Reducing recovery time for a broken bone by stimulating muscle movement. Programming an autonomous drone to identify and record security concerns in a building. These are just a few examples of projects engineering...
TOLEDO, OH
builtinchicago.org

Avoiding an Us vs. Them Mentality Between Design and Engineering

Designers and engineers have a lot more in common than they think. The two disciplines are often positioned at odds with each other, operating in different silos unless completely necessary. But it doesn’t have to be that way. “The biggest challenge I’ve seen is isolated streams of work,” said...
arxiv.org

Bootstrap Equilibrium and Probabilistic Speaker Representation Learning for Self-supervised Speaker Verification

In this paper, we propose self-supervised speaker representation learning strategies, which comprise of a bootstrap equilibrium speaker representation learning in the front-end and an uncertainty-aware probabilistic speaker embedding training in the back-end. In the front-end stage, we learn the speaker representations via the bootstrap training scheme with the uniformity regularization term. In the back-end stage, the probabilistic speaker embeddings are estimated by maximizing the mutual likelihood score between the speech samples belonging to the same speaker, which provide not only speaker representations but also data uncertainty. Experimental results show that the proposed bootstrap equilibrium training strategy can effectively help learn the speaker representations and outperforms the conventional methods based on contrastive learning. Also, we demonstrate that the integrated two-stage framework further improves the speaker verification performance on the VoxCeleb1 test set in terms of EER and MinDCF.
COMPUTERS
arxiv.org

Uplink Transceiver Design and Optimization for Transmissive RMS Multi-Antenna Systems

In this paper, a novel uplink communication for the transmissive reconfigurable metasurface (RMS) multi-antenna system with orthogonal frequency division multiple access (OFDMA) is investigated. Specifically, a transmissive RMS-based receiver equipped with a single receiving antenna is first proposed, and a far-near field channel model based on planar waves and spherical waves is given. Then, in order to maximize the system sum-rate of uplink communications, we formulate a joint optimization problem over subcarrier allocation, power allocation and RMS transmissive coefficient design. Due to the coupling of optimization variables, the optimization problem is non-convex, so it is challenging to solve it directly. In order to tackle this problem, the alternating optimization (AO) algorithm is used to decouple the optimization variables and divide the problem into two sub-problems to solve. First, the problem of joint subcarrier allocation and power allocation is solved via the Lagrangian dual decomposition method. Then, the RMS transmissive coefficient design can be obtained by applying difference-of-convex (DC) programming, successive convex approximation (SCA) and penalty function methods. Finally, the two sub-problems are iterated alternately until convergence is achieved. Numerical simulation results verify that the proposed algorithm has good convergence performance and can improve system sum-rate compared with other benchmark algorithms.
TECHNOLOGY
arxiv.org

DISTREAL: Distributed Resource-Aware Learning in Heterogeneous Systems

We study the problem of distributed training of neural networks (NNs) on devices with heterogeneous, limited, and time-varying availability of computational resources. We present an adaptive, resource-aware, on-device learning mechanism, DISTREAL, which is able to fully and efficiently utilize the available resources on devices in a distributed manner, increasing the convergence speed. This is achieved with a dropout mechanism that dynamically adjusts the computational complexity of training an NN by randomly dropping filters of convolutional layers of the model. Our main contribution is the introduction of a design space exploration (DSE) technique, which finds Pareto-optimal per-layer dropout vectors with respect to resource requirements and convergence speed of the training. Applying this technique, each device is able to dynamically select the dropout vector that fits its available resource without requiring any assistance from the server. We implement our solution in a federated learning (FL) system, where the availability of computational resources varies both between devices and over time, and show through extensive evaluation that we are able to significantly increase the convergence speed over the state of the art without compromising on the final accuracy.
COMPUTERS
arxiv.org

IsometricMT: Neural Machine Translation for Automatic Dubbing

Automatic dubbing (AD) is among the use cases where translations should fit a given length template in order to achieve synchronicity between source and target speech. For neural machine translation (MT), generating translations of length close to the source length (e.g. within +-10% in character count), while preserving quality is a challenging task. Controlling NMT output length comes at a cost to translation quality which is usually mitigated with a two step approach of generation of n-best hypotheses and then re-ranking them based on length and quality. This work, introduces a self-learning approach that allows a transformer model to directly learn to generate outputs that closely match the source length, in short isometric MT. In particular, our approach for isometric MT does not require to generate multiple hypotheses nor any auxiliary scoring function. We report results on four language pairs (English - French, Italian, German, Spanish) with a publicly available benchmark based on TED Talk data. Both automatic and manual evaluations show that our self-learning approach to performs on par with more complex isometric MT approaches.
COMPUTERS
arxiv.org

Khmer Word Search: Challenges, Solutions, and Semantic-Aware Search

Search is one of the key functionalities in digital platforms and applications such as an electronic dictionary, a search engine, and an e-commerce platform. While the search function in some languages is trivial, Khmer word search is challenging given its complex writing system. Multiple orders of characters and different spelling realizations of words impose a constraint on Khmer word search functionality. Additionally, spelling mistakes are common since robust spellcheckers are not commonly available across the input device platforms. These challenges hinder the use of Khmer language in search-embedded applications. Moreover, due to the absence of WordNet-like lexical databases for Khmer language, it is impossible to establish semantic relation between words, enabling semantic search. In this paper, we propose a set of robust solutions to the above challenges associated with Khmer word search. The proposed solutions include character order normalization, grapheme and phoneme-based spellcheckers, and Khmer word semantic model. The semantic model is based on the word embedding model that is trained on a 30-million-word corpus and is used to capture the semantic similarities between words.
TECHNOLOGY
arxiv.org

Neural Style Transfer and Unpaired Image-to-Image Translation to deal with the Domain Shift Problem on Spheroid Segmentation

Background and objectives. Domain shift is a generalisation problem of machine learning models that occurs when the data distribution of the training set is different to the data distribution encountered by the model when it is deployed. This is common in the context of biomedical image segmentation due to the variance of experimental conditions, equipment, and capturing settings. In this work, we address this challenge by studying both neural style transfer algorithms and unpaired image-to-image translation methods in the context of the segmentation of tumour spheroids.
COMPUTERS
arxiv.org

Learning from Guided Play: A Scheduled Hierarchical Approach for Improving Exploration in Adversarial Imitation Learning

Effective exploration continues to be a significant challenge that prevents the deployment of reinforcement learning for many physical systems. This is particularly true for systems with continuous and high-dimensional state and action spaces, such as robotic manipulators. The challenge is accentuated in the sparse rewards setting, where the low-level state information required for the design of dense rewards is unavailable. Adversarial imitation learning (AIL) can partially overcome this barrier by leveraging expert-generated demonstrations of optimal behaviour and providing, essentially, a replacement for dense reward information. Unfortunately, the availability of expert demonstrations does not necessarily improve an agent's capability to explore effectively and, as we empirically show, can lead to inefficient or stagnated learning. We present Learning from Guided Play (LfGP), a framework in which we leverage expert demonstrations of, in addition to a main task, multiple auxiliary tasks. Subsequently, a hierarchical model is used to learn each task reward and policy through a modified AIL procedure, in which exploration of all tasks is enforced via a scheduler composing different tasks together. This affords many benefits: learning efficiency is improved for main tasks with challenging bottleneck transitions, expert data becomes reusable between tasks, and transfer learning through the reuse of learned auxiliary task models becomes possible. Our experimental results in a challenging multitask robotic manipulation domain indicate that our method compares favourably to supervised imitation learning and to a state-of-the-art AIL method. Code is available at this https URL.
COMPUTERS
arxiv.org

Inexact Newton combined approximations in the topology optimization of geometrically nonlinear elastic structures and compliant mechanisms

This work blends the inexact Newton method with iterative combined approximations (ICA) for solving topology optimization problems under the assumption of geometric nonlinearity. The density-based problem formulation is solved using a sequential piecewise linear programming (SPLP) algorithm. Five distinct strategies have been proposed to control the frequency of the factorizations of the Jacobian matrices of the nonlinear equilibrium equations. Aiming at speeding up the overall iterative scheme while keeping the accuracy of the approximate solutions, three of the strategies also use an ICA scheme for the adjoint linear system associated with the sensitivity analysis. The robustness of the proposed reanalysis strategies is corroborated by means of numerical experiments with four benchmark problems -- two structures and two compliant mechanisms. Besides assessing the performance of the strategies considering a fixed budget of iterations, the impact of a theoretically supported stopping criterion for the SPLP algorithm was analyzed as well.
MATHEMATICS

