HyObscure: Hybrid Obscuring for Privacy-Preserving Data Publishing
By Xiao Han, Yuncong Yang, Junjie Wu
arxiv.org
4 days ago
Minimizing privacy leakage while ensuring data utility is a critical problem to data holders in a privacy-preserving data publishing task. Most prior research concerns only with one type of data and resorts to a single obscuring method, \eg, obfuscation or generalization, to achieve a...
The endless cookie settings that pop up for every website feel a bit like prank compliance by a surveillance internet hell-bent on not changing. It is very annoying. And as it turns out, it doesn’t even matter what you click. Because “Real-Time Bidding,” the primary tracking-based ad system, nevertheless “broadcasts internet users’ behavior and real-world locations to thousands of companies, billions of times a day.” And the main European provider of these pestering pop-ups to Google and 80% of all websites in Europe knew it and is now in trouble.
Privitar is introducing NOVLT, a new way to do tokenization, in the latest version of the Privitar Data Privacy Platform. NOVLT, which supports broader use cases and data sharing across borders, is designed to provide the benefits of tokenization while removing the dependency of a vault. Customers will have full flexibility to customize their approach to data protection for individual use cases, with Privitar offering two, complementary methods of tokenization.
Privacy preserving data analysis (PPDA) has received increasing attention due to a great variety of applications. Local differential privacy (LDP), as an emerging standard that is suitable for PPDA, has been widely deployed into various real-world scenarios to analyze massive data while protecting against many forms of privacy breach. In this study, we are mainly concerned with piecewise transformation technique (PTT) for analyzing numerical data under local differential privacy. We provide a principled framework for PTT in the context of LDP, based on which PTT is studied systematically. As a result, we show that (1) many members in PTTs are asymptotically optimal when used to obtain an unbiased estimator for mean of numerical data, and (2) for a given privacy budget, there is PTT that reaches the theoretical low bound with respect to variance. Next, we prove by studying two classes of PTTs in detail that (1) there do not exist optimal PTTs compared to the well-used technique, i.e., Duchi's scheme, in terms of the consistency noisy variance, (2) on the other hand, one has the ability to find a great number of PTTs that are consistently more optimal than the latter with regard to the worst-case noisy variance, which is never reported so far. When we are restricted to consider only the high privacy level, enough PTTs turn out to be optimal than the well-known Laplace mechanism. Lastly, we prove that for a family of PTTs, the correspondingly theoretical low bound of noisy variance follows $O(\epsilon^{-2})$ when considering the high privacy level.
The three slices of the data pie -- data governance, data privacy and data security -- are often lumped together -- but although they naturally overlap, there are crucial differences that are important to understand. Let’s slice up the pie. First, there’s data governance. You can think of it as...
VEON launched its MobileID initiative, an authentication, credentials management and permission control system that will safeguard consumers and protect retail companies. . VEON intends to roll out MobileID to its 212 million customers across nine countries as well as offering it as the standard for mobile operators worldwide to provide digital identity validation that is fully compliant with local data privacy laws.
Officials from the USA and UK have signaled an intention to together shape a new world order for data sharing across borders. International Trade Secretary Anne-Marie Trevelyan and Nadine Dorries, Secretary of State for Digital, Culture, Media and Sport, met with US Secretary of Commerce Gina Raimondo to hold discussions on cross border data flows, supply chains and tariffs.
Technology is rapidly reshaping automotive property and casualty. The introduction of advanced driver assistance systems such as lane-keeping, distraction warning, automated braking, and collision warning brought the promise of reduced accidents, including injuries and fatalities. It also brought challenges, such as appropriately pricing risk, rising repair costs, operational and legal challenges with the recalibration of these systems, and an increased share of vehicles deemed a total loss. The rise of telematics and driver monitoring offered ways to coach drivers to be safer behind the wheel, new pricing models like usage-based insurance and creating new touchpoints and loyalty programs. It also brought challenges, including how to deal with an adverse selection of customers, the ethical and legal implications of profiling and automated decision making, not to mention the security and privacy implications of the data collected and shared via these systems.
Guardum’s data intelligence platform helps to reduce risk, decrease costs, and enable corporations to identify and protect enterprise data quickly and efficiently. Donnelley Financial Solutions, a leading risk and compliance company, announced it has acquired Guardum, a leading data security and privacy software provider that helps companies locate, secure, and control data. The acquisition strengthens DFIN’s software solutions portfolio by making data security a competitive differentiator, enhancing regulatory compliance, safeguarding privacy, and improving data accuracy.
Randomized response, as a basic building-block for differentially private mechanism, has given rise to great interest and found various potential applications in science communities. In this work, we are concerned with three-elements randomized response (RR$_{3}$) along with relevant applications to the analysis of weighted bipartite graph upon differentially private guarantee. We develop a principled framework for estimating statistics produced by RR$_{3}$-based mechanisms, and then prove the corresponding estimations to be unbiased. At the same time, we study in detail several fundamental and significant members in RR$_{3}$ family, and derive the closed-form solutions to unbiased estimations. Next, we show potential applications of several RR$_{3}$-based mechanisms into the estimation of average degree and average weighted value on weighted bipartite graph when requiring local differential privacy guarantee. In the meantime, we determine the lower bounds for choice of relevant parameters by minimizing variance of statistics in order to design optimal RR$_{3}$-based local differential private mechanisms, with which we optimize previous protocols in the literature and put forward a version that achieves the tight bound. Last but most importantly, we observe that in the analysis of relational data such as weighted bipartite graph, a portion of privacy budget in local differential private mechanism is sometimes "consumed" by mechanism itself accidentally, resulting to a more stronger privacy guarantee than we would get by simply sequential compositions.
Known for its Encryption Technology, Optable Upgrades Privacy Feature to Offer Privacy Preserving Collaboration for Advertisers and Publishers. Optable, a SaaS data connectivity platform and clean room solution designed for the advertising ecosystem, is leading the way on differential privacy, making several enhancements to its platform and enabling clients to protect the privacy of their users when sharing statistical and aggregated data with their partners.
Spirion, a pioneer in data protection and privacy, today announced its top predictions for the cyber-related risks enterprises will face in 2022. Ransomware, data breaches and new privacy regulations, combined with a surge in data consumption, will increase the data protection and privacy risks around an organization’s sensitive data.
Qonsent, a New York-based data privacy and consent engagement platform provider, today announced it raised $5 million. Qonsent uses an encrypted and auditable ledger-based system to help brands maintain a record of customers’ consent to meet CCPA, GDPR, CPRA compliance requirements, and other applicable privacy laws. Qonsent says it...
Graph neural networks (GNNs) and message passing neural networks (MPNNs) have been proven to be expressive for subgraph structures in many applications. Some applications in heterogeneous graphs require explicit edge modeling, such as subgraph isomorphism counting and matching. However, existing message passing mechanisms are not designed well in theory. In this paper, we start from a particular edge-to-vertex transform and exploit the isomorphism property in the edge-to-vertex dual graphs. We prove that searching isomorphisms on the original graph is equivalent to searching on its dual graph. Based on this observation, we propose dual message passing neural networks (DMPNNs) to enhance the substructure representation learning in an asynchronous way for subgraph isomorphism counting and matching as well as unsupervised node classification. Extensive experiments demonstrate the robust performance of DMPNNs by combining both node and edge representation learning in synthetic and real heterogeneous graphs. Code is available at this https URL.
We present two novel coded federated learning (FL) schemes for linear regression that mitigate the effect of straggling devices. The first scheme, CodedPaddedFL, mitigates the effect of straggling devices while retaining the privacy level of conventional FL. Particularly, it combines one-time padding for user data privacy with gradient codes to yield resiliency against straggling devices. To apply one-time padding to real data, our scheme exploits a fixed-point arithmetic representation of the data. For a scenario with 25 devices, CodedPaddedFL achieves a speed-up factor of 6.6 and 9.2 for an accuracy of 95\% and 85\% on the MMIST and Fashion-MNIST datasets, respectively, compared to conventional FL. Furthermore, it yields similar performance in terms of latency compared to a recently proposed scheme by Prakash \emph{et al.} without the shortcoming of additional leakage of private data. The second scheme, CodedSecAgg, provides straggler resiliency and robustness against model inversion attacks and is based on Shamir's secret sharing. CodedSecAgg outperforms state-of-the-art secure aggregation schemes such as LightSecAgg by a speed-up factor of 6.6--14.6, depending on the number of colluding devices, on the MNIST dataset for a scenario with 120 devices, at the expense of a 30\% increase in latency compared to CodedPaddedFL.
Dominik Thalmeier, Gregor Miller, Elida Schneltzer, Anja Hurt, Martin Hrabě de Angelis, Lore Becker, Christian L. Müller, Holger Maier. Hearing loss is a major health problem and psychological burden in humans. Mouse models offer a possibility to elucidate genes involved in the underlying developmental and pathophysiological mechanisms of hearing impairment. To this end, large-scale mouse phenotyping programs include auditory phenotyping of single-gene knockout mouse lines. Using the auditory brainstem response (ABR) procedure, the German Mouse Clinic and similar facilities worldwide have produced large, uniform data sets of averaged ABR raw data of mutant and wildtype mice. In the course of standard ABR analysis, hearing thresholds are assessed visually by trained staff from series of signal curves of increasing sound pressure level. This is time-consuming and prone to be biased by the reader as well as the graphical display quality and scale. In an attempt to reduce workload and improve quality and reproducibility, we developed and compared two methods for automated hearing threshold identification from averaged ABR raw data: a supervised approach involving two combined neural networks trained on human-generated labels and a self-supervised approach, which exploits the signal power spectrum and combines random forest sound level estimation with a piece-wise curve fitting algorithm for threshold finding. We show that both models work well, outperform human threshold detection, and are suitable for fast, reliable, and unbiased hearing threshold detection and quality control. In a high-throughput mouse phenotyping environment, both methods perform well as part of an automated end-to-end screening pipeline to detect candidate genes for hearing involvement. Code for both models as well as data used for this work are freely available.
Researchers in the UK and China have developed an artificial intelligence (AI) model that can diagnose COVID-19 as well as a panel of professional radiologists, while preserving the privacy of patient data. The international team, led by the University of Cambridge and the Huazhong University of Science and Technology, used...
Motivated by the increasing availability of high-performance parallel computing, we design a distributed parallel algorithm for linearly-coupled block-structured nonconvex constrained optimization problems. Our algorithm performs Jacobi-type proximal updates of the augmented Lagrangian function, requiring only local solutions of separable block nonlinear programming (NLP) problems. We provide a cheap and explicitly computable Lyapunov function that allows us to establish global and local sublinear convergence of our algorithm, its iteration complexity, as well as simple, practical and theoretically convergent rules for automatically tuning its parameters. This in contrast to existing algorithms for nonconvex constrained optimization based on the alternating direction method of multipliers that rely on at least one of the following: Gauss-Seidel or sequential updates, global solutions of NLP problems, non-computable Lyapunov functions, and hand-tuning of parameters. Numerical experiments showcase its advantages for large-scale problems, including the multi-period optimization of a 9000-bus AC optimal power flow test case over 168 time periods, solved on the Summit supercomputer using an open-source Julia code.
Large-scale cyber-physical systems require that control policies are distributed, that is, that they only rely on local real-time measurements and communication with neighboring agents. Optimal Distributed Control (ODC) problems are, however, highly intractable even in seemingly simple cases. Recent work has thus proposed training Neural Network (NN) distributed controllers. A main challenge of NN controllers is that they are not dependable during and after training, that is, the closed-loop system may be unstable, and the training may fail due to vanishing and exploding gradients. In this paper, we address these issues for networks of nonlinear port-Hamiltonian (pH) systems, whose modeling power ranges from energy systems to non-holonomic vehicles and chemical reactions. Specifically, we embrace the compositional properties of pH systems to characterize deep Hamiltonian control policies with built-in closed-loop stability guarantees, irrespective of the interconnection topology and the chosen NN parameters. Furthermore, our setup enables leveraging recent results on well-behaved neural ODEs to prevent the phenomenon of vanishing gradients by design. Numerical experiments corroborate the dependability of the proposed architecture, while matching the performance of general neural network policies.
Background and objectives. Domain shift is a generalisation problem of machine learning models that occurs when the data distribution of the training set is different to the data distribution encountered by the model when it is deployed. This is common in the context of biomedical image segmentation due to the variance of experimental conditions, equipment, and capturing settings. In this work, we address this challenge by studying both neural style transfer algorithms and unpaired image-to-image translation methods in the context of the segmentation of tumour spheroids.
We study the problem of distributed training of neural networks (NNs) on devices with heterogeneous, limited, and time-varying availability of computational resources. We present an adaptive, resource-aware, on-device learning mechanism, DISTREAL, which is able to fully and efficiently utilize the available resources on devices in a distributed manner, increasing the convergence speed. This is achieved with a dropout mechanism that dynamically adjusts the computational complexity of training an NN by randomly dropping filters of convolutional layers of the model. Our main contribution is the introduction of a design space exploration (DSE) technique, which finds Pareto-optimal per-layer dropout vectors with respect to resource requirements and convergence speed of the training. Applying this technique, each device is able to dynamically select the dropout vector that fits its available resource without requiring any assistance from the server. We implement our solution in a federated learning (FL) system, where the availability of computational resources varies both between devices and over time, and show through extensive evaluation that we are able to significantly increase the convergence speed over the state of the art without compromising on the final accuracy.
