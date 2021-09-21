CreatorsPublishersAdvertisers
How to use Permutation Tests

Cover picture for the articleA walkthrough of permutation tests and how they can be applied to time series data. Permutation tests are non-parametric tests that require very few assumptions. So, when you don’t know much about your data generating mechanism (the population), permutation tests are an effective way to determine statistical significance. A recent...

The FP Growth algorithm

Using the FP Growth algorithm in Python to do frequent itemset mining for basket analysis. In this article, you will discover the FP Growth algorithm. It is one of the state-of-the-art algorithms for frequent itemset mining (also called Association Rule Mining) and basket analysis. Frequent Itemset Mining and Basket Analysis.
How to use the BIOS

Deep within your computer, there is a system called BIOS (Basic Input-Output System). It waits on the motherboard and is responsible for waking everything up, running basic diagnostics, and booting up your operating system when you turn on your computer. Typically, the BIOS is happy to do its work behind...
Linux X86 Assembly – How To Test Custom Shellcode Using a C Payload Tester

In the last blog post in this series, we created a tool to make it easy to build our custom payloads and extract them. However, what if we want to test them before trying to use them? It seems like a good idea to make sure it works before you include it in an exploit. Testing it first would at least let you know that it works and reduce troubleshooting surface if the exploit fails. Today we are going to focus on building a payload tester stub in the C programming language. This will make it easy for us to copy and paste our C-style formatted payload from our build-and-extract tool. Once it’s pasted in the tester stub, just compile and run it and you will be able to see your payload in action. The code for payload tester stub and Makefile can be found in the /utils/ folder of the Secure Ideas Professionally Evil x86_asm GitHub repository.
How to use the PJ app

Launched in 2013, The Pharmaceutical Journal app has been a key resource for many members of the Royal Pharmaceutical Society (RPS) for the past eight years. Immediately after the relaunch of the Pharmaceutical Journal website in February 2021, we started work on the apps (Android and iOS) to bring them up to date.
Distributed Deep Learning with BigDL and PySpark

What I chose among BigDL, Horovod and others for Deep Learning on Spark. As I said, this post was only written in order to help anybody get the hands-on this interesting package. Of course, this is not an exhaustive search. For example, the following tests may be performed:. Run this...
Mini Guide to Supervised Learning for Time Series Forecasting

Model training, feature engineering and error calculation techniques for Time Series Forecasting via Supervised Learning. One of the first techniques I learnt for Time Series Forecasting was ARIMA. But as I started building forecasting models, I came across research papers and blogs about using supervised learning models to forecast time series data. These models provide benefit over ARIMA especially if the forecast needs to be at multiple granularities. In this blog post, I am going to capture the learnings I have had as I have built these models and some do(s)/don’t(s). I will be covering a few learnings around cross validation and model training, feature engineering, target variable engineering and error calculation.
Multi-Armed Bandits: Thompson Sampling Algorithm

In this series of posts, we experiment with different bandit algorithms to optimise our movie nights — more specifically, how we select movies and restaurants for food delivery!. For newcomers, the name bandit comes from slot machines (known as one-armed bandits). You can think of it as something that can...
Graph Neural Networks: A learning journey since 2008 — Part 2

The second story about Scarselli’s Graph Neural Networks. Today, let’s implement what we’ve learned: GNN in Python. We learned in the first part of this series the theoretical background of Scarselli’s Graph Neural Networks. In particular, we learned:. GNN is suitable both for node-based and graph-based predictions. In the former...
Managing Python Environments Like a Pro

Still using virtualenv? Try this new tool. Python virtual environments help us manage dependencies easily and effortlessly. The most common environment creation tools are virtualenv and conda, the latter is used for environment management for multiple languages whereas the former is made especially for python. Why not use global python...
Automated Interactive Reports with Plotly and Python

Generating reports is a tedious task. Instead, develop in Python using Plotly and create automated interactive reports. This article will discuss the steps required to create automated interactive reports for different cryptocurrencies. Then, the final report is combined into a single HTML file which maintains the interactive functionality of Plotly without an external server.
How I built the ELTEA17 dataset

Entity-Level Tweets Emotion Analysis Dataset for emotion classification and sarcasm detection. Entity-Level Tweets Emotion Analysis Dataset (ELTEA17) is a dataset for fine-grain emotion analysis of tweets which I made publicly available here. It is a sub-product of my research in 2017 about structured emotion prediction of tweets with co-extraction of cause, holder, and target.
Parametric Bayesian Inference: Implementation of Numerical Sampling Techniques with Proofs

Acceptance/Rejection Sampling & MCMC Metropolis-Hastings Sampling with full Computational Simulation. In parametric Bayesian Inference, our objective is to recover the posterior distribution of the parameter (or parameters) of interest. By “recover” the distribution, this either refers to recovering a closed analytic form of the posterior distribution (PDF, CDF, MGF, etc), or a means of empirically drawing samples from the posterior distribution after which it can be empirically constructed numerically.
Approaching Anomaly Detection in Transactional Data

Anomaly detection in transactional data can be hard but bring benefits of discovering unknowns in vast amounts of data that wouldn’t be possible otherwise. Usually, people mean financial transactions when they talk about transactional data. However, according to Wikipedia, “Transactional Data is data describing an event (the change as a result of a transaction) and is usually described with verbs. Transaction data always has a time dimension, a numerical value and refers to one or more objects”. In this article, we will use data on requests made to a server (internet traffic data) as an example, but the considered approaches can be applied to most of the datasets falling under the aforementioned definition of transactional data.
Colour Image Quantization using K-means

A simple tutorial on how to reduce the number of distinct colours in an image using Python and OpenCV. Colour quantization is a process that reduces the number of colours in an image while it tries to preserve the quality and the important global information. Images are composed of pixels each of which can be associated with 16,777,216 different colours in the case of RGB colour space, which is probably the most commonly used colour space. Each colour can be represented as a 3d vector, and each vector element has an 8-bit dynamic range, which means 2⁸=256 different values (i.e. 256x256x256 =16,777,216). This kind of representation is often called RGB triplet. The key factor for successful colour quantization is the appropriate selection of the colour palette that sufficiently summarizes the information of the initial image.
A quick guide to understanding Vectors Norms

Many applications like information retrieval, personalization, document categorization, image processing, etc., rely on the computation of similarity or dissimilarity between items. Two items are considered to be similar if the distance between them is less and vice versa. So how do we calculate this distance? Well, each data object (item) can be thought of as an n-dimensional vector where the dimensions are the attributes (features) in the data. The vector representations thereby make it possible to compute the distance between pairs using the standard vector-based similarity measures like the Manhattan distance, Euclidean distance, etc., to name a few. It is during such calculations that the mentions of norms come up. Vector norms occupy an important space in the context of machine learning, and hence in this article, we’ll first understand the basics of a norm, its properties and then go over some of the most common vector norms.
Combining Physics and Deep Learning

With the rise in compute power over the past 10 years, we have seen a sharp increase in the number of simulations. Digital twins are one such example. They are virtual replicas of a physical object or process that can be simulated in a variety of scenarios. One problem faced...
Data analysis beginners: Python makes it easy

In this series, ‘Spreadsheets to Python,’ I explore the many joys and benefits of Python data analysis and encourage readers to try it themselves. It’s a great time to be a data analyst. With so much data available for analysis at the click of a button, the opportunities for driving knowledge and having impact are limitless.
Real Time Image Segmentation Using 5 Lines of Code

Perform accurate and fast object segmentation in images and videos with PixelLib. Image Segmentation in Computer Vision Applications. Computer Vision is the ability of computers to see and analyze what they see. Image segmentation is an aspect of computer vision that deals with segmenting the contents of objects visualized by a computer into different categories for better analysis. A good example of the process of image segmentation is foreground-background separation of objects in images, which is the technique of separating objects from its background for analyzing the objects and its background. The ability of image segmentation to achieve the task of foreground-background separation makes it an invaluable field in solving a lot of computer vision problems such as analysis of medical images, background editing, vision in self driving cars and analysis of satellite images.
Make Python Faster with CFFI Python Bindings

Python is one of the most user-friendly programming languages. It is easy to learn, free to use, and you can extend its functionality however you like. Moreover, Python is arguably the most-used programming language in the Data Science and Machine Learning world. Excellent numerical libraries, like NumPy and SciPy, and remarkable deep learning frameworks, like PyTorch and TensorFlow, create a vast arsenal of tools for every programmer that likes to play with data and artificial neural networks.
