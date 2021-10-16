CreatorsPublishersAdvertisers
The Life of a Data Analyst

By Editors' Picks
 8 days ago

Cover picture for the articleMedium is my creative outlet these days. As a data analyst, I get to be a little creative, like when I am designing a dashboard or solving an interesting problem, but it’s not a large percentage of my job. So, instead of getting frustrated about the harder parts of...

The Guardian

Graduate Operations Analyst 2022

Convex is an international specialty insurer and reinsurer founded by Stephen Catlin and Paul Brand in 2019. With operations in London and Bermuda, Convex occupies a unique position in the insurance industry combining unrivalled experience, reputation and a legacy free balance sheet. We have brought together some of the most talented people in the market - who have the knowledge, personality and relationships to make a difference - to create a dynamic team that is passionate about pushing boundaries. Convex is a modern organisation passionate about opportunity, diversity and inclusion.
10 Most Important SQL Commands Every Data Analyst Needs to Know

Querying data from a database doesn’t need to be complicated. As a data analyst or data scientist, it doesn’t matter how good you are at creating fancy visualizations or how skilled you are at building complicated models — at its core, you need data in order to do those things.
Intro to Data Structures

Imagine you build a wildly popular app that is quickly growing towards a million users. (Congrats!) While users love the app, they’re complaining that the app is becoming slower and slower, to the point that some users are starting to leave. You notice that the main bottleneck is how user info is retrieved during authentication: currently, your app searches through an unsorted list of Python dictionaries until it finds the requested user ID.
C++ Basics: Array Data Structure

C++ offers different types of arrays, understanding how they work internally will help us in choosing the right type for our application. One of the most important things, when we code in any programming language is to choose the right data structure to represent our data. This is important because we don’t want our application to slow down to the point of becoming a bottleneck or use a lot more memory than necessary when our application scales.
3 NumPy Functions to Facilitate Data Analysis with Pandas

Pandas and Numpy are the two most popular Python libraries used for data analysis and manipulation. Pandas is equipped with a lot of practical and handy functions. Pandas also allows for using some Numpy functions which results in more functional and efficient operations with Pandas. In this article, we will go over 3 Numpy functions that are of great help when doing data analysis with Pandas.
Dealing with Leaky, Missing Data in a Production Environment

As a consultant, I don’t always have control over the data I receive. Going back and forth with a client can only get you so far. At a certain point, you need to work with the data you have. That means I work with a lot of messy data, and have become intimately familiar with all different kinds of data leakage.
Top 6 Python Tips for Data Scientists

Coding is, and will remain to be an essential part of data science! It’s said that close to 70% of data scientists’ time is spent on coding, and I’m no exception. In this article, I will share 6 most useful and practical Python snippets that I have compiled from solving real analytics problems in my day-to-day work.
UCL Data Science Society: Python Logic

Workshop 3: conditional statements, logic statements, loops and functions. This year, as Head of Science for the UCL Data Science Society, the society is presenting a series of 20 workshops covering topics such as introduction to Python, a Data Scientists toolkit and Machine learning methods, throughout the academic year. For each of these that I present and deliver I aim to create a series of small blogposts that will outline the main points with links to the full workshop for anyone who wishes to follow along. All of these can be found at out GitHub, and will be updated throughout the year with new workshops and challenges.
Mathematical Understanding of Bias Variance Tradeoff

Understanding Bias-Variance Tradeoff from the equation. Many of us have read about the Bias and Variance at various places in the AI literature but still many people struggle to explain it with respect to mathematical equation. People always comment about bias-variance whenever they build a model in order to figure out whether the model can be used or not in the real world and how good its performance will be. In this article, we will focus on the mathematical equation depicting Bias-Variance and try to understand different parts of this equation from a mathematical perspective. Let me first highlight some of the assumptions that are essential in order to understand the equation:
9 Tips That Helped Me Clear All HackerRank SQL Challenges in 2 Weeks

Current projects at work require more SQL skills, so I took the time to brush up my SQL knowledge using the HackerRank coding challenges platform. It took me a bit more than two weeks (on and off between work, workout, and taking the kid to day school, etc.) to finish all the 58 SQL challenges on the site, and I gained quite some insights from the journey (and having quite some fun doing that!). This article could be read by anyone who wants to get into the data science world or prepare for your next SQL interview using a coding site like HackerRank or LeetCode, Even if you still don’t have any prior SQL knowledge. I did put a bit of code in there to show you straightforward examples. Most of the tips are about making full use of these SQL challenges so you can benefit the most from them, rather than teaching you how to crack the problems. So don’t worry if you don’t know squash about SQL or coding. Please sit down, relax, and let’s do this.
Customize Your Pandas Data Frame For Effective Data Analysis

Pandas is a powerful package for data manipulation and analytics, and I believe this library is not new to many of you. However, sometimes when I analyze data in tabular form, all I observe is plain numbers, making it hard to notice characteristics of the values. I think it would be nice if I could apply more visual components or conditional formatting to my Pandas data frame.
Active Learning: An Exploratory Study of its Application in Statistics and R

In this post, we aim to implement some baseline active learning strategies in R, experiment on a famous dataset Iris, highlight some insights and suggest future directions for active learning in the Statistics domain. What is active learning? In many real-world situations of machine learning, unlabeled data are available at...
Phys.org

Bringing new life to ATLAS data

The ATLAS collaboration is breathing new life into its LHC Run 2 dataset, recorded from 2015 to 2018. Physicists will be reprocessing the entire dataset—nearly 18 PB of collision data—using an updated version of the ATLAS offline analysis software (Athena). Not only will this improve ATLAS physics measurements and searches, it will also position the collaboration well for the upcoming challenges of Run 3 and beyond.
How To Change The Column Type in PySpark DataFrames

Discussing how to cast the data types of columns in PySpark DataFrames. A fairly common operation in PySpark is type casting that is usually required when we need to change the data type of specific columns in DataFrames. For instance, it’s quite common (and a bad practise!) to have datetimes stored as strings, or even integers and doubles stored as StringType.
Quantile Encoder

Tackling high cardinality categorical features in regression tasks. In this blog, we introduce the Quantile Encoder and Summary Encoder. This is a short synthesis of a published paper, done in collaboration with David Masip, Jordi Nin, Oriol Pujol & Carlos Mougan. The project contains:. A python implementation in the category_encoders...
Yet Another Largest Neural Network — But why?

GPT-3’s success keeps paying off but do we really need to do this anymore?. It has happened again. Microsoft and Nvidia have built a new largest dense language model — three times the size of GPT-3, the former holder of the title. However, in contrast with GPT-3, this new model hasn’t caused any commotion, neither in the press nor in the AI community. And there’s a reason for that.
Top 3 Visualization Python Packages to Help Your Data Science Activities

Data visualization is a process to summary your data into a graphic representation to help people understand the data. Imagine your data exploration or validation without data visualization? It is hard, right?. Additionally, data visualization might reveal additional information we cannot find in the statistical summary. Data visualization also helps...
PC Magazine

Protect Your iOS Data for Life With This $70 App Bundle

Every new generation of Apple device comes with more bells and whistles, but upgrading and transferring your data doesn't always go smoothly. Your data can be lost with no easy way to get it back unless you took the time to make backups. Those who've been through that routine before...
The #1 Mistake Companies Make When Creating Their Data Science Foundation

Imagine you just finished training an excellent neural network after months of hard work. It works well on the training data, test data and passes all your validation tests. But as you move it to the production, you start to notice; it doesn’t do a great job. If this sounds...
Don’t Do These 3 Things in Your Next Data Science Interview

From my participation in data science interviews, I have experienced a variety of applicants who have exhibited some things that they should have and some that they should not have. For this article, we will discuss what to avoid on your next data science interview (some of this can also be applied to non-data science interviews). Below, I will give the top examples that I personally think are things you should avoid in your interview, as well as what to do instead.
