START GUIDE

Model optimization techniques play a very crucial role in the field of machine learning. These optimization techniques are also used for hyperparameter tuning, leading to better-performing machine learning models for a given dataset. This article shows different ways of hyperparameter tuning of machine learning models in Python.

The previous article was about finding the best-performing machine learning algorithm for the given dataset.

These techniques are often the first step after exploratory data analysis to cross-check if the input features in a given dataset have enough prediction power or not. Also, it is an efficient way to explore various models, and later one can choose the top 10% high performing models for further studies. Once we have few models in our bag which are the plausible candidate to perform well on the dataset, then hyperparameter tuning of these models is done to make them even better. One should…


Start guide

Data visualization techniques are a quick way to identify patterns and understand the complex dataset. As a result, it is widely used in several industries to present the data to stakeholders. This article shows various data visualization techniques which can be helpful for your next data science project.

The human brain responds well and retains more information from simple diagrams or visual content than text or numbers. Therefore, representing a complex dataset in graphical format is an effective way to drive crucial insights and gain more information about the data. Furthermore, the popularity of data visualization techniques can be estimated through the number of visualization tools available now in the market. Many online platforms and businesses use data visualization techniques to present data as visual content (infographics), which helps them deliver crucial information quickly. …


START GUIDE

Models trained on imbalanced datasets tend to perform poorly on minority classes because most machine learning algorithms for classification assume the classes are balanced. Not treating the imbalanced datasets correctly and not using correct metrics for model evaluation can cause severe problems if business decisions rely on the model’s outcome. This article shows few tricks when working with such datasets.

Classification algorithms are machine learning techniques that involve categorizing data into classes. It is one of the kinds of supervised machine learning, in which algorithms learn from labeled data. Since algorithms learn from the labeled data, hence the distribution of classes plays an important role. For example, training algorithms on the severely skewed dataset, also known as imbalanced datasets, can result in algorithms that perform poorly on minority classes. Fraud detection, churn prediction, spam detection are real-world examples of the imbalanced dataset. …


START GUIDE

This article shows quick ways of comparing multiple machine learning algorithms for classification or regression.

Imagine a situation where you want to test if the given dataset has sufficient features to train machine learning algorithms or to test different algorithms’ performance on the given dataset. Both cases are pretty common in the field of data science.

Usually, to test the features, one can train models with no regularization and verify if the loss function is close to zero or not. This test can quickly tell if the model has enough parameters to memorize the dataset or not.

Which algorithm to use?

The answer to the question is similar to the process of Exploratory Data Analysis…


START GUIDE

Exploratory Data Analysis (EDA) is the primary building block of any data-centric project. This article focuses on graphical and numerical ways of performing EDA using Python libraries such as Pandas, Seaborn, Tensorflow data validator, and Lux.

Data Scientists widely use EDA to understand datasets for decision-making and data cleaning processes. EDA reveals crucial information about the data, such as hidden patterns, outliers, variance, covariance, correlations between features. The information is essential for the hypothesis’s design and creating better-performing models.


START GUIDE

This article shows how to create fantastic art using artificial neural networks.

The convolution neural network may contain several stacked layers, images fed as an input to neural network travel through subsequent layers, and the final decision made by the output layer. But, there exist several questions, such as

  • How layers communicate with one another?
  • What does each layer see?
  • What kind of information passes from one layer to another?

Visualizing the output of the layer of interest by enhancing the input image helps to understand what is happening at each neural network layer. A trained convolution neural network progressively…


PYTHON DATA ANALYSIS LIBRARY: PANDAS

Pandas is a Python Data Analysis Library that has cemented its place in the Data Science world. Articles on the internet about top Python libraries for Data Science include Pandas as one of its favorites. Pandas library offers several functions that can speed up data wrangling and exploratory data analysis processes. However, the first step for any Data Science project is to import data, and here also Pandas library has some great functions to offer. This article shows ways to import data into Pandas from different data sources.

In 2008, Wes McKinney started developing Pandas library to fulfill the need…


MACHINE LEARNING MODEL DEPLOYMENT

Machine Learning (ML) based applications play a crucial role in scaling, automating, and optimizing processes in this era of digitization. The ML model development lifecycle comprises data cleaning, exploratory data analysis, model development, model training, and model serving. There exist many articles on model development and training but not much on model serving. This article shows simple steps to deploy a trained ML model on Heroku.

Investing a considerable amount of time optimizing the ML model is one of the most common misconceptions and pitfalls for an unsuccessful ML project. Instead, teams with successful ML project invests time in gathering data, building efficient data pipelines to avoid training-serving skew, and building reliable model serving infrastructure. The following picture shows steps involved in the ML development phases.

This article focuses on the ML model deployment step using Flask and Heroku. Flask is a micro web framework used for web application development, and it is a perfect choice for simple web applications. …


A/B Testing

Companies run experiments to understand the demand and the likely changes for their businesses to generate more revenue. However, it is not an easy task. Even changing the color of a button on the website is not random but calculated. This article shows few tricks widely used by Data Analysts and Data Scientists to build strategies for growing businesses efficiently.

Studies conducted by big companies have shown that even changing a minor feature such as the response time by few milliseconds, the color of a button, welcome image, fonts, and many more can significantly affect website traffic. A relatable example could be posting a picture on social media. Why specific picture gets more likes than others? Why posting at a particular time leads to more engagement? Why logos of Facebook, Samsung, Paypal, IBM, and many more of the color blue? Is it a coincidence? Or are there any plausible reasons behind it?

A/B testing is widely used in marketing industries…


STOCK PRICE FORECASTING MODELS

The popularity of deep learning models in the financial industry has grown drastically over the past decade. The rise of such models in this sector comes from the fundamental requirement, i.e., automation, scaling, and personalization. However, should I use models shown in various articles on the internet for stock prediction to become rich?

Forecasting stock price is an exciting topic. The number of articles published on the internet shows the popularity of this topic. However, many of them suffer from a fundamental error. This article offers some of the common pitfalls to avoid when creating a multi-step prediction model for stock prices.

Pitfall 1: Shuffling time-series data

Time-series data is sequential data measured at consistent time intervals. Each data point in the series is highly dependent on previous data points and also telling a story. Shuffling of time-series data should be avoided while training to retain the time dependency. …

Rahul Pandey

Google Certified ML Engineer | Exploring possibilites of ML in Photovoltaics

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store