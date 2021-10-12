CreatorsPublishersAdvertisers
Privacy-Preserving Phishing Email Detection Based on Federated Learning and LSTM

By Yuwei Sun, Ng Chong, Hideya Ochiai
Phishing emails that appear legitimate lure people into clicking on the attached malicious links or documents. Increasingly more sophisticated phishing campaigns in recent years necessitate a more adaptive detection system other than traditional signature-based methods. In this regard, natural language processing (NLP) with deep neural

WTVCFOX

How to find and delete old accounts online

Ever get frustrated when you try to close an account online, but you can’t figure out how, so you just forget about it? Keeping all that unused personal data lying around the internet could put your digital privacy and security at risk. Now, Consumer Reports reveals some tips and tricks to help you say goodbye to unwanted accounts once and for all.
arxiv.org

GaitPrivacyON: Privacy-Preserving Mobile Gait Biometrics using Unsupervised Learning

Numerous studies in the literature have already shown the potential of biometrics on mobile devices for authentication purposes. However, it has been shown that, the learning processes associated to biometric systems might expose sensitive personal information about the subjects. This study proposes GaitPrivacyON, a novel mobile gait biometrics verification approach that provides accurate authentication results while preserving the sensitive information of the subject. It comprises two modules: i) a convolutional Autoencoder that transforms attributes of the biometric raw data, such as the gender or the activity being performed, into a new privacy-preserving representation; and ii) a mobile gait verification system based on the combination of Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) with a Siamese architecture. The main advantage of GaitPrivacyON is that the first module (convolutional Autoencoder) is trained in an unsupervised way, without specifying the sensitive attributes of the subject to protect. The experimental results achieved using two popular databases (MotionSense and MobiAct) suggest the potential of GaitPrivacyON to significantly improve the privacy of the subject while keeping user authentication results higher than 99% Area Under the Curve (AUC). To the best of our knowledge, this is the first mobile gait verification approach that considers privacy-preserving methods trained in an unsupervised way.
notebookcheck.net

Google alerts 14,000 users to targeted phishing emails by Russian threat group

Google disclosed security updates on October 8 in response to a month of phishing targeted attempts purportedly by Russian-linked group, APT28. The company estimates 14,000 Gmail accounts were warned and protected. The Threat Analysis Group of Google (TAG) posited that the hacking group APT28, which is also referred to as...
arxiv.org

FSL: Federated Supermask Learning

Federated learning (FL) allows multiple clients with (private) data to collaboratively train a common machine learning model without sharing their private training data. In-the-wild deployment of FL faces two major hurdles: robustness to poisoning attacks and communication efficiency. To address these concurrently, we propose Federated Supermask Learning (FSL). FSL server trains a global subnetwork within a randomly initialized neural network by aggregating local subnetworks of all collaborating clients. FSL clients share local subnetworks in the form of rankings of network edges; more useful edges have higher ranks. By sharing integer rankings, instead of float weights, FSL restricts the space available to craft effective poisoning updates, and by sharing subnetworks, FSL reduces the communication cost of training. We show theoretically and empirically that FSL is robust by design and also significantly communication efficient; all this without compromising clients' privacy. Our experiments demonstrate the superiority of FSL in real-world FL settings; in particular, (1) FSL achieves similar performances as state-of-the-art FedAvg with significantly lower communication costs: for CIFAR10, FSL achieves same performance as Federated Averaging while reducing communication cost by ~35%. (2) FSL is substantially more robust to poisoning attacks than state-of-the-art robust aggregation algorithms. We have released the code for reproducibility.
VentureBeat

Gretel.ai, a platform for generating synthetic and privacy-preserving data, raises $50M

Gretel.ai, a platform for generating synthetic and privacy-preserving data, today announced that it raised $50 million in a series B led by Anthos Capital with participation from Section 32, Greylock, and Moonshots Capital. The funds bring the company’s total raised to $65.5 million and will be used to support product development, according to CEO Ali Golshan, with a particular focus on expansion into new use cases.
arxiv.org

The Skellam Mechanism for Differentially Private Federated Learning

We introduce the multi-dimensional Skellam mechanism, a discrete differential privacy mechanism based on the difference of two independent Poisson random variables. To quantify its privacy guarantees, we analyze the privacy loss distribution via a numerical evaluation and provide a sharp bound on the Rényi divergence between two shifted Skellam distributions. While useful in both centralized and distributed privacy applications, we investigate how it can be applied in the context of federated learning with secure aggregation under communication constraints. Our theoretical findings and extensive experimental evaluations demonstrate that the Skellam mechanism provides the same privacy-accuracy trade-offs as the continuous Gaussian mechanism, even when the precision is low. More importantly, Skellam is closed under summation and sampling from it only requires sampling from a Poisson distribution -- an efficient routine that ships with all machine learning and data analysis software packages. These features, along with its discrete nature and competitive privacy-accuracy trade-offs, make it an attractive alternative to the newly introduced discrete Gaussian mechanism.
arxiv.org

A Survey of Machine Learning Algorithms for Detecting Ransomware Encryption Activity

A survey of machine learning techniques trained to detect ransomware is presented. This work builds upon the efforts of Taylor et al. in using sensor-based methods that utilize data collected from built-in instruments like CPU power and temperature monitors to identify encryption activity. Exploratory data analysis (EDA) shows the features most useful from this simulated data are clock speed, temperature, and CPU load. These features are used in training multiple algorithms to determine an optimal detection approach. Performance is evaluated with accuracy, F1 score, and false-negative rate metrics. The Multilayer Perceptron with three hidden layers achieves scores of 97% in accuracy and F1 and robust data preparation. A random forest model produces scores of 93% accuracy and 92% F1, showing that sensor-based detection is currently a viable option to detect even zero-day ransomware attacks before the code fully executes.
arxiv.org

Covariance-Based Joint Device Activity and Delay Detection in Asynchronous mMTC

In this letter, we study the joint device activity and delay detection problem in asynchronous massive machine-type communications (mMTC), where all active devices asynchronously transmit their preassigned preamble sequences to the base station (BS) for device identification and delay detection. We first formulate this joint detection problem as a maximum likelihood estimation problem, which depends on the received signal only through its sample covariance, and then propose efficient coordinate descent type of algorithms to solve the formulated problem. Our proposed covariance-based approach is sharply different from the existing compressed sensing (CS) approach for the same problem. Numerical results show that our proposed covariance-based approach significantly outperforms the CS approach in terms of the detection performance since our proposed approach can make better use of the BS antennas than the CS approach.
arxiv.org

Deep learning-based detection of intravenous contrast in computed tomography scans

Zezhong Ye, Jack M. Qian, Ahmed Hosny, Roman Zeleznik, Deborah Plana, Jirapat Likitlersuang, Zhongyi Zhang, Raymond H. Mak, Hugo J. W. L. Aerts, Benjamin H. Kann. Purpose: Identifying intravenous (IV) contrast use within CT scans is a key component of data curation for model development and testing. Currently, IV contrast is poorly documented in imaging metadata and necessitates manual correction and annotation by clinician experts, presenting a major barrier to imaging analyses and algorithm deployment. We sought to develop and validate a convolutional neural network (CNN)-based deep learning (DL) platform to identify IV contrast within CT scans. Methods: For model development and evaluation, we used independent datasets of CT scans of head, neck (HN) and lung cancer patients, totaling 133,480 axial 2D scan slices from 1,979 CT scans manually annotated for contrast presence by clinical experts. Five different DL models were adopted and trained in HN training datasets for slice-level contrast detection. Model performances were evaluated on a hold-out set and on an independent validation set from another institution. DL models was then fine-tuned on chest CT data and externally validated on a separate chest CT dataset. Results: Initial DICOM metadata tags for IV contrast were missing or erroneous in 1,496 scans (75.6%). The EfficientNetB4-based model showed the best overall detection performance. For HN scans, AUC was 0.996 in the internal validation set (n = 216) and 1.0 in the external validation set (n = 595). The fine-tuned model on chest CTs yielded an AUC: 1.0 for the internal validation set (n = 53), and AUC: 0.980 for the external validation set (n = 402). Conclusion: The DL model could accurately detect IV contrast in both HN and chest CT scans with near-perfect performance.
pharmaceutical-technology.com

FiVerity Unveils Machine Learning-Powered Solution to Detect and Prevent Cyber Fraud

Concept: US-based tech startup FiVerity has launched a new machine learning solution, Collaborative AI platform, to detect and prevent cybercrime. It helps banks, credit unions, credit cards and other financial service providers in combating synthetic identity fraud (SIF). The platform identifies fraudsters using a pattern matching technique without needing consumer personally identifiable information (PII).
arxiv.org

Fake News Detection in Spanish Using Deep Learning Techniques

This paper addresses the problem of fake news detection in Spanish using Machine Learning techniques. It is fundamentally the same problem tackled for the English language; however, there is not a significant amount of publicly available and adequately labeled fake news in Spanish to effectively train a Machine Learning model, similarly to those proposed for the English language. Therefore, this work explores different training strategies and architectures to establish a baseline for further research in this area. Four datasets were used, two in English and two in Spanish, and four experimental schemes were tested, including a baseline with classical Machine Learning models, trained and validated using a small dataset in Spanish. The remaining schemes include state-of-the-art Deep Learning models trained (or fine-tuned) and validated in English, trained and validated in Spanish, and fitted in English and validated with automatic translated Spanish sentences. The Deep Learning architectures were built on top of different pre-trained Word Embedding representations, including GloVe, ELMo, BERT, and BETO (a BERT version trained on a large corpus in Spanish). According to the results, the best strategy was a combination of a pre-trained BETO model and a Recurrent Neural Network based on LSTM layers, yielding an accuracy of up to 80%; nonetheless, a baseline model using a Random Forest estimator obtained similar outcomes. Additionally, the translation strategy did not yield acceptable results because of the propagation error; there was also observed a significant difference in models performance when trained in English or Spanish, mainly attributable to the number of samples available for each language.
arxiv.org

Efficient Representations for Privacy-Preserving Inference

Deep neural networks have a wide range of applications across multiple domains such as computer vision and medicine. In many cases, the input of a model at inference time can consist of sensitive user data, which raises questions concerning the levels of privacy and trust guaranteed by such services. Much existing work has leveraged homomorphic encryption (HE) schemes that enable computation on encrypted data to achieve private inference for multi-layer perceptrons and CNNs. An early work along this direction was CryptoNets, which takes 250 seconds for one MNIST inference. The main limitation of such approaches is that of compute, which is due to the costly nature of the NTT (number theoretic transform)operations that constitute HE operations. Others have proposed the use of model pruning and efficient data representations to reduce the number of HE operations required. In this paper, we focus on improving upon existing work by proposing changes to the representations of intermediate tensors during CNN inference. We construct and evaluate private CNNs on the MNIST and CIFAR-10 datasets, and achieve over a two-fold reduction in the number of operations used for inferences of the CryptoNets architecture.
arxiv.org

Graph-Fraudster: Adversarial Attacks on Graph Neural Network Based Vertical Federated Learning

Graph neural network (GNN) models have achieved great success on graph representation learning. Challenged by large scale private data collection from user-side, GNN models may not be able to reflect the excellent performance, without rich features and complete adjacent relationships. Addressing to the problem, vertical federated learning (VFL) is proposed to implement local data protection through training a global model collaboratively. Consequently, for graph-structured data, it is natural idea to construct VFL framework with GNN models. However, GNN models are proven to be vulnerable to adversarial attacks. Whether the vulnerability will be brought into the VFL has not been studied. In this paper, we devote to study the security issues of GNN based VFL (GVFL), i.e., robustness against adversarial attacks. Further, we propose an adversarial attack method, named Graph-Fraudster. It generates adversarial perturbations based on the noise-added global node embeddings via GVFL's privacy leakage, and the gradient of pairwise node. First, it steals the global node embeddings and sets up a shadow server model for attack generator. Second, noises are added into node embeddings to confuse the shadow server model. At last, the gradient of pairwise node is used to generate attacks with the guidance of noise-added node embeddings. To the best of our knowledge, this is the first study of adversarial attacks on GVFL. The extensive experiments on five benchmark datasets demonstrate that Graph-Fraudster performs better than three possible baselines in GVFL. Furthermore, Graph-Fraudster can remain a threat to GVFL even if two possible defense mechanisms are applied. This paper reveals that GVFL is vulnerable to adversarial attack similar to centralized GNN models.
arxiv.org

Fast Hand Detection in Collaborative Learning Environments

Long-term object detection requires the integration of frame-based results over several seconds. For non-deformable objects, long-term detection is often addressed using object detection followed by video tracking. Unfortunately, tracking is inapplicable to objects that undergo dramatic changes in appearance from frame to frame. As a related example, we study hand detection over long video recordings in collaborative learning environments. More specifically, we develop long-term hand detection methods that can deal with partial occlusions and dramatic changes in appearance.
arxiv.org

WAFFLE: Weighted Averaging for Personalized Federated Learning

In collaborative or federated learning, model personalization can be a very effective strategy to deal with heterogeneous training data across clients. We introduce WAFFLE (Weighted Averaging For Federated LEarning), a personalized collaborative machine learning algorithm based on SCAFFOLD. SCAFFOLD uses stochastic control variates to converge towards a model close to the globally optimal model even in tasks where the distribution of data and labels across clients is highly skewed. In contrast, WAFFLE uses the Euclidean distance between clients' updates to weigh their individual contributions and thus minimize the trained personalized model loss on the specific agent of interest. Through a series of experiments, we compare our proposed new method to two recent personalized federated learning methods, Weight Erosion and APFL, as well as two global learning methods, federated averaging and SCAFFOLD. We evaluate our method using two categories of non-identical client data distributions (concept shift and label skew) on two benchmark image data sets, MNIST and CIFAR10. Our experiments demonstrate the effectiveness of WAFFLE compared with other methods, as it achieves or improves accuracy with faster convergence.
arxiv.org

Privacy-Preserving Mutual Authentication and Key Agreement Scheme for Multi-Server Healthcare System

The usage of different technologies and smart devices helps people to get medical services remotely for multiple benefits. Thus, critical and sensitive data is exchanged between a user and a doctor. When health data is transmitted over a common channel, it becomes essential to preserve various privacy and security properties in the system. Further, the number of users for remote services is increasing day-by-day exponentially, and thus, it is not adequate to deal with all users using the one server due to the verification overhead, server failure, and scalability issues. Thus, researchers proposed various authentication protocols for multi-server architecture, but most of them are vulnerable to different security attacks and require high computational resources during the implementation. To Tackle privacy and security issues using less computational resources, we propose a privacy-preserving mutual authentication and key agreement protocol for a multi-server healthcare system. We discuss the proposed scheme's security analysis and performance results to understand its security strengths and the computational resource requirement, respectively. Further, we do the comparison of security and performance results with recent relevant authentication protocols.
TheConversationAU

Facebook wants AI to find your keys and understand your conversations

Facebook has announced a research project that aims to push the “frontier of first-person perception”, and in the process help you remember where your left your keys. The Ego4D project provides a huge collection of first-person video and related data, plus a set of challenges for researchers to teach computers to understand the data and gather useful information from it. In September, the social media giant launched a line of “smart glasses” called Ray-Ban Stories, which carry a digital camera and other features. Much like the Google Glass project, which met mixed reviews in 2013, this one has prompted complaints of...
arxiv.org

Sharing FANCI Features: A Privacy Analysis of Feature Extraction for DGA Detection

The goal of Domain Generation Algorithm (DGA) detection is to recognize infections with bot malware and is often done with help of Machine Learning approaches that classify non-resolving Domain Name System (DNS) traffic and are trained on possibly sensitive data. In parallel, the rise of privacy research in the Machine Learning world leads to privacy-preserving measures that are tightly coupled with a deep learning model's architecture or training routine, while non deep learning approaches are commonly better suited for the application of privacy-enhancing methods outside the actual classification module. In this work, we aim to measure the privacy capability of the feature extractor of feature-based DGA detector FANCI (Feature-based Automated Nxdomain Classification and Intelligence). Our goal is to assess whether a data-rich adversary can learn an inverse mapping of FANCI's feature extractor and thereby reconstruct domain names from feature vectors. Attack success would pose a privacy threat to sharing FANCI's feature representation, while the opposite would enable this representation to be shared without privacy concerns. Using three real-world data sets, we train a recurrent Machine Learning model on the reconstruction task. Our approaches result in poor reconstruction performance and we attempt to back our findings with a mathematical review of the feature extraction process. We thus reckon that sharing FANCI's feature representation does not constitute a considerable privacy leakage.
nunewsindustry.com

Phishing email survey targeting Gmail and Outlook users

Fake marketing surveys promising gift cards worth up to £100 in exchange for completing them impersonate ASDA, Morrisons, and Tesco. Although no presents are provided, users do lose money and data. Check out the short list for tips on how Gmail and Outlook users may be safe and secure online.
arxiv.org

Distribution-Free Federated Learning with Conformal Predictions

Federated learning has attracted considerable interest for collaborative machine learning in healthcare to leverage separate institutional datasets while maintaining patient privacy. However, additional challenges such as poor calibration and lack of interpretability may also hamper widespread deployment of federated models into clinical practice and lead to user distrust or misuse...
