VII. Product/SKU Segmentation Implementation with Trendify Product 

  • Trendify Segmentation Algorithm
  • Monitoring Results for Segmentation Algorithm

Trendify Segmentation Algorithm

In this article, Trendify segmentation algorithm and the product segmentation process were carried out by using the data of the products in the sample textile retail. In the first step, data preparation, filling in missing data, determining outlier values, scaling the data and applying dimension reduction (PCA) processes were carried out. In the second stage, hyperparameter optimization and training process were carried out to create clustering models. In the last stage, profiling was done automatically in the product groups obtained. 

The demo dataser information as detailed in the product SKU segmentation 6 blog article: Interpreting Cluster Profiling Results with Sample Data 

Additionally for access from GitHub account

Image1: Converting Data Types

In the first step, the editing of the data types was carried out. The values in the date format in the data were converted to the datetime format. Then, all the columns in the data that are in string format but are numeric values are converted to float type. In this way, the string values in the data can be entered into the clustering model. After this process, if there are values that can be currency in the data, namely the symbols of currencies such as TL, dollar, euro, these values are automatically converted to float values. The purpose here is to make a column containing a string value such as currency contribute to the model. 

In the second step of data preparation, repetitive lines were reduced. In clustering problems, since each data affects the center of gravity of the clusters, the ID column is also included in the reduction process here. In this way, when the ID column is included, all repeating columns are dropped. Since there are no repeating rows in this data, no data deletion took place in this step. After this step, columns with exactly the same value are checked. If there is a column in these conditions, it is also dropped. 

Encoding Datetime is performed after the step of deleting the repeating lines. In this step, all datetime formatted values were converted to numeric values by applying circular or linear transformations to month, day and year values. The aim here is to make the dates useful to the model because the date format cannot directly enter the clustering model. That’s why it converts date formats to numeric formats. 

Image2: Dropping Outliers

The next step is to detect outlier values and extract these values from the data. The Trendify data preparation module iteratively searches for outliers at different standard deviations for outlier detection. It decides the appropriate outliers according to the size of the data. In this way, an appropriate number of outliers is removed from the data. And then the decided outlier groups will be labeled as a separate segmentation group. 

Image3: Scaling Data

In the scaling phase, Standard Scaling or Normal Scaling is performed in order to bring the data to the same size. Which scaling method to use is decided according to the needs of the model. The purpose of the scaling process is to bring the different values in the columns into a certain range, such as –1, 1 or 0, 1. 

Image4: Filling Missing Values

In the stage of filling in the missing values, values such as mean, median or mode are generally preferred specifically for the variable. In addition, model-based filling methods such as KNN-Imputer can also be preferred. Here, the Trendify data preparation module follows a filling method specific to segmentation algorithms. The purpose here is to show the missing values separately and to enable the clustering algorithms to understand the missing data. 

In the graphic below, you can see the size reduced by the PCA method of the missing data. The section enclosed in the red circle in the graph shows the part with missing data. Thanks to the Trendify preparation module, missing data is positioned separately from normal data and input is provided to the model in this way. 

Image5: Data with Filled Missing Values

The last step in the data preparation phase is the PCA dimention reduction process. At this stage, the correlation between features is checked to decide how many dimensions the data will be, and the data size is decided. Thanks to PCA, unnecessary columns in the data are eliminated, and storage is saved and the model gives faster results. There are 8 columns in this data. The data has been reduced to 5 columns thanks to the PCA dimention reduction method.

Image6: Dimensionality Reduction with PCA

After the data preparation steps are completed, the model creation phase is started. The Trendify Segmentation algorithm performs hyperparameter optimization to find the most suitable model parameters. The goal of the algorithm here is to find the most suitable parameters as soon as possible. After the Trendify Segmentation algorithm finds the most optimal parameters, a model is selected among the trained models according to the evaluation scores. If the clusters formed by the selected model are sufficiently separated, the process is completed and the model is completed. However, if the clusters are not sufficiently divided and a large part of the data is still in a single cluster, a clustering operation is performed again using this cluster. The process continues iteratively until the clusters are sufficiently divided. 

You can read the product SKU segmentation 5 blog article which is about finding niche segments : Identifying Niche Segments 

Image7: Trendify Segmentation and Other Model Results

The picture above shows the clustering results of the models. All algorithms run and one of their hyperparameters, the scores of the evaluation metrics and the number of clusters obtained are displayed. If we compare according to evaluation metrics, the model with the best Silhoutte score, Calinski Harabasz and Davis Bouldin score is the hierarchical clustering algorithm. As a result, the algorithm divided the data into 2 different clusters. The Trendify Segmentation algorithm, on the other hand, divided the data into 8 different clusters. The difference here is that the Trendify Segmentation algorithm again separates the clusters that cannot be divided enough. The graphs below show the clustering results of Hierarchical clustering and Trendify Segmentation Algorithm by making them 3D. 

Image8: Hieararchical Clustering Results(Cluster0, cluster1)

There are 2 clusters in the clustering graph of the 3-dimensional hierarchical algorithm. As shown in the graph, the cluster names are cluster 0 and cluster 1, respectively. Cluster 0 seems not to be separated enough. 

The second graph shows the Trendify Segmentation clustering results. The graph shows the repartitioning of the cluster defined as cluster 0 by the Hieararchical clustering algorithm by the Trendify Segmentation Algorithm. Small clusters are not included in this graph. When we look at the graph, it is clearly seen that cluster 0, which is a large cluster on its own, has undergone multiple divisions and has been made more useful, assessable and interpretable. 

Image9: Trendify Segmentation Results (Cluster 0)

The final stage is the profiling of the clustering results. All clusters and an overview of the clusters are shown in the view below.

Image 10. Trendify Demo Product Segmentation Result

For the detailed evaluation of cluster profiles, you can reach the article we wrote before the product SKU segmentation 6 article: Interpreting Cluster Profiling with Sample Data. 

In this article, we have done the data preparation process, model creation and training of the data of the products in textile retail, profiling the created clusters with the Trendify Segmentation algorithm and evaluated the results. Thanks to the Trendify Segmentation algorithm, you have the power to extract groups with common features from your products and turn them into value for your business by mastering your data. For more information and contact us, you can visit Trendify website.

The Author: Mustafa Gencer

Publishing Date : February 01, 2022

Related Blog Posts
veri toplama araclari
İşiniz Kolaylaştıracak, Popüler Veri Toplama Araçları Nelerdir?

Büyük verilerden yapılan analizler ile kararları iyileştiren ve stratejik iş hamleleri yapmak için güven veren içgörüler elde edebilirsiniz.

Product / SKU Segmentation 6

What is the key points of product segmentation for Data Scientist?

Product / SKU Segmentation 5

What Are Segmentation Algorithms ?