Telco

Table of Contents

1. Init

Let's start by importing the necessary modules.

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib
import matplotlib.pyplot as plt

matplotlib.use('Agg') # non-interactive backend, produce pngs instead
pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)
# uncomment the following line to prevent truncated output
# pd.set_option('display.large_repr', 'info')

from context import src
from src import utils, plotter

2. Analysis

In this section we analyse the telco dataset. We start by reading the accompanying data docs which is scarce.

2.1. Preliminary analysis

We start by loading the dataset and answering out initial set of questions.

telco = pd.read_csv('../data/data/telco.csv')
telco.head()
   customerID  gender  SeniorCitizen Partner Dependents  tenure PhoneService  \
0  7590-VHVEG  Female              0     Yes         No       1           No   
1  5575-GNVDE    Male              0      No         No      34          Yes   
2  3668-QPYBK    Male              0      No         No       2          Yes   
3  7795-CFOCW    Male              0      No         No      45           No   
4  9237-HQITU  Female              0      No         No       2          Yes   

      MultipleLines InternetService OnlineSecurity OnlineBackup  \
0  No phone service             DSL             No          Yes   
1                No             DSL            Yes           No   
2                No             DSL            Yes          Yes   
3  No phone service             DSL            Yes           No   
4                No     Fiber optic             No           No   

  DeviceProtection TechSupport StreamingTV StreamingMovies        Contract  \
0               No          No          No              No  Month-to-month   
1              Yes          No          No              No        One year   
2               No          No          No              No  Month-to-month   
3              Yes         Yes          No              No        One year   
4               No          No          No              No  Month-to-month   

  PaperlessBilling              PaymentMethod  MonthlyCharges TotalCharges  \
0              Yes           Electronic check           29.85        29.85   
1               No               Mailed check           56.95       1889.5   
2              Yes               Mailed check           53.85       108.15   
3               No  Bank transfer (automatic)           42.30      1840.75   
4              Yes           Electronic check           70.70       151.65   

  Churn  
0    No  
1    No  
2   Yes  
3    No  
4   Yes  
telco.shape
7043 21
telco.dtypes
customerID           object
gender               object
SeniorCitizen         int64
Partner              object
Dependents           object
tenure                int64
PhoneService         object
MultipleLines        object
InternetService      object
OnlineSecurity       object
OnlineBackup         object
DeviceProtection     object
TechSupport          object
StreamingTV          object
StreamingMovies      object
Contract             object
PaperlessBilling     object
PaymentMethod        object
MonthlyCharges      float64
TotalCharges         object
Churn                object
dtype: object

We have several categorical features which should be represented as category dtype. The customerID column can be dropped since it doesn't add any new information. TotalCharges should be converted to float dtype. Let's also convert SeniorCitizen to yes or no format inline with other categorical features.

2.1.1. Handling customerID

telco = telco.drop('customerID', axis='columns')
telco.shape
7043 20

2.1.2. Handling categorical features

telco.loc[telco['SeniorCitizen'].eq(0), 'SeniorCitizen'] = 'No'
telco.loc[telco['SeniorCitizen'].eq(1), 'SeniorCitizen'] = 'Yes'
telco['SeniorCitizen'] = telco['SeniorCitizen'].astype('category')
categorical_features = ['gender',
                        'Partner',
                        'Dependents',
                        'PhoneService',
                        'MultipleLines',
                        'InternetService',
                        'OnlineSecurity',
                        'OnlineBackup',
                        'DeviceProtection',
                        'TechSupport',
                        'StreamingTV',
                        'StreamingMovies',
                        'Contract',
                        'PaperlessBilling',
                        'PaymentMethod',
                        'Churn']

for feature in categorical_features:
    telco[feature] = telco[feature].str.strip().astype('category')

2.1.3. Handling TotalCharges

telco['TotalCharges'] = telco['TotalCharges'].str.strip()
telco.loc[telco['TotalCharges'].apply(len).eq(0), 'TotalCharges'] = np.nan
telco['TotalCharges'] = telco['TotalCharges'].astype('float')
telco.dtypes
gender              category
SeniorCitizen       category
Partner             category
Dependents          category
tenure                 int64
PhoneService        category
MultipleLines       category
InternetService     category
OnlineSecurity      category
OnlineBackup        category
DeviceProtection    category
TechSupport         category
StreamingTV         category
StreamingMovies     category
Contract            category
PaperlessBilling    category
PaymentMethod       category
MonthlyCharges       float64
TotalCharges         float64
Churn               category
dtype: object

2.1.4. Descriptive statistics, missing & duplicates

Let's look at the descriptive statistics, missing & duplicates next.

telco.describe(include='all')
       gender SeniorCitizen Partner Dependents       tenure PhoneService  \
count    7043          7043    7043       7043  7043.000000         7043   
unique      2             2       2          2          NaN            2   
top      Male            No      No         No          NaN          Yes   
freq     3555          5901    3641       4933          NaN         6361   
mean      NaN           NaN     NaN        NaN    32.371149          NaN   
std       NaN           NaN     NaN        NaN    24.559481          NaN   
min       NaN           NaN     NaN        NaN     0.000000          NaN   
25%       NaN           NaN     NaN        NaN     9.000000          NaN   
50%       NaN           NaN     NaN        NaN    29.000000          NaN   
75%       NaN           NaN     NaN        NaN    55.000000          NaN   
max       NaN           NaN     NaN        NaN    72.000000          NaN   

       MultipleLines InternetService OnlineSecurity OnlineBackup  \
count           7043            7043           7043         7043   
unique             3               3              3            3   
top               No     Fiber optic             No           No   
freq            3390            3096           3498         3088   
mean             NaN             NaN            NaN          NaN   
std              NaN             NaN            NaN          NaN   
min              NaN             NaN            NaN          NaN   
25%              NaN             NaN            NaN          NaN   
50%              NaN             NaN            NaN          NaN   
75%              NaN             NaN            NaN          NaN   
max              NaN             NaN            NaN          NaN   

       DeviceProtection TechSupport StreamingTV StreamingMovies  \
count              7043        7043        7043            7043   
unique                3           3           3               3   
top                  No          No          No              No   
freq               3095        3473        2810            2785   
mean                NaN         NaN         NaN             NaN   
std                 NaN         NaN         NaN             NaN   
min                 NaN         NaN         NaN             NaN   
25%                 NaN         NaN         NaN             NaN   
50%                 NaN         NaN         NaN             NaN   
75%                 NaN         NaN         NaN             NaN   
max                 NaN         NaN         NaN             NaN   

              Contract PaperlessBilling     PaymentMethod  MonthlyCharges  \
count             7043             7043              7043     7043.000000   
unique               3                2                 4             NaN   
top     Month-to-month              Yes  Electronic check             NaN   
freq              3875             4171              2365             NaN   
mean               NaN              NaN               NaN       64.761692   
std                NaN              NaN               NaN       30.090047   
min                NaN              NaN               NaN       18.250000   
25%                NaN              NaN               NaN       35.500000   
50%                NaN              NaN               NaN       70.350000   
75%                NaN              NaN               NaN       89.850000   
max                NaN              NaN               NaN      118.750000   

        TotalCharges Churn  
count    7032.000000  7043  
unique           NaN     2  
top              NaN    No  
freq             NaN  5174  
mean     2283.300441   NaN  
std      2266.771362   NaN  
min        18.800000   NaN  
25%       401.450000   NaN  
50%      1397.475000   NaN  
75%      3794.737500   NaN  
max      8684.800000   NaN  
telco.isna().any()
gender              False
SeniorCitizen       False
Partner             False
Dependents          False
tenure              False
PhoneService        False
MultipleLines       False
InternetService     False
OnlineSecurity      False
OnlineBackup        False
DeviceProtection    False
TechSupport         False
StreamingTV         False
StreamingMovies     False
Contract            False
PaperlessBilling    False
PaymentMethod       False
MonthlyCharges      False
TotalCharges         True
Churn               False
dtype: bool

Let's drop the missing values we introduced in TotalCharges.

telco = telco.dropna()
telco.shape
7032 20
telco[telco.duplicated()].shape
22 20

Let's investigate the duplicates.

telco[telco.duplicated(keep=False)]
      gender SeniorCitizen Partner Dependents  tenure PhoneService  \
22      Male            No      No         No       1          Yes   
100     Male            No      No         No       1          Yes   
542   Female            No      No         No       1          Yes   
646     Male            No      No         No       1          Yes   
662     Male            No      No         No       1          Yes   
690     Male            No      No         No       1          Yes   
964     Male            No      No         No       1          Yes   
976     Male            No      No         No       1          Yes   
1243    Male            No      No         No       1          Yes   
1338    Male            No      No         No       1          Yes   
1491  Female            No      No         No       1          Yes   
1731  Female           Yes      No         No       1          Yes   
1739    Male            No      No         No       1          Yes   
1932    Male            No      No         No       1          Yes   
2033    Male            No      No         No       1          Yes   
2121    Male            No      No         No       1          Yes   
2713    Male            No      No         No       1          Yes   
2892    Male            No      No         No       1          Yes   
3301  Female           Yes      No         No       1          Yes   
3312    Male            No      No         No       1          Yes   
3499  Female            No      No         No       1          Yes   
3679    Male            No      No         No       1          Yes   
3754    Male            No      No         No       1          Yes   
4098    Male            No      No         No       1          Yes   
4476  Female            No      No         No       1          Yes   
4495  Female            No      No         No       1          Yes   
4536    Male            No      No         No       1          Yes   
4817  Female            No      No         No       1          Yes   
5170  Female            No      No         No       1          Yes   
5506    Male            No      No         No       1          Yes   
5522  Female            No      No         No       1          Yes   
5736    Male            No      No         No       1          Yes   
5759  Female            No      No         No       1          Yes   
6267  Female            No      No         No       1          Yes   
6491  Female            No      No         No       1          Yes   
6499    Male            No      No         No       1          Yes   
6518    Male            No      No         No       1          Yes   
6609    Male            No      No         No       1          Yes   
6706  Female            No      No         No       1          Yes   
6764  Female            No      No         No       1          Yes   
6774  Female            No      No         No       1          Yes   
6924    Male            No      No         No       1          Yes   

     MultipleLines InternetService       OnlineSecurity         OnlineBackup  \
22              No              No  No internet service  No internet service   
100             No              No  No internet service  No internet service   
542             No              No  No internet service  No internet service   
646             No             DSL                   No                   No   
662             No              No  No internet service  No internet service   
690             No              No  No internet service  No internet service   
964             No             DSL                   No                   No   
976             No     Fiber optic                   No                   No   
1243            No             DSL                   No                   No   
1338            No              No  No internet service  No internet service   
1491            No              No  No internet service  No internet service   
1731            No     Fiber optic                   No                   No   
1739            No     Fiber optic                   No                   No   
1932            No              No  No internet service  No internet service   
2033            No              No  No internet service  No internet service   
2121            No              No  No internet service  No internet service   
2713            No              No  No internet service  No internet service   
2892            No              No  No internet service  No internet service   
3301            No     Fiber optic                   No                   No   
3312            No              No  No internet service  No internet service   
3499            No              No  No internet service  No internet service   
3679            No              No  No internet service  No internet service   
3754            No              No  No internet service  No internet service   
4098            No              No  No internet service  No internet service   
4476            No              No  No internet service  No internet service   
4495            No     Fiber optic                   No                   No   
4536            No     Fiber optic                   No                   No   
4817            No              No  No internet service  No internet service   
5170            No              No  No internet service  No internet service   
5506            No              No  No internet service  No internet service   
5522            No     Fiber optic                   No                   No   
5736            No              No  No internet service  No internet service   
5759            No     Fiber optic                   No                   No   
6267            No     Fiber optic                   No                   No   
6491            No     Fiber optic                   No                   No   
6499            No              No  No internet service  No internet service   
6518            No             DSL                   No                   No   
6609            No              No  No internet service  No internet service   
6706            No              No  No internet service  No internet service   
6764            No     Fiber optic                   No                   No   
6774            No              No  No internet service  No internet service   
6924            No     Fiber optic                   No                   No   

         DeviceProtection          TechSupport          StreamingTV  \
22    No internet service  No internet service  No internet service   
100   No internet service  No internet service  No internet service   
542   No internet service  No internet service  No internet service   
646                    No                   No                   No   
662   No internet service  No internet service  No internet service   
690   No internet service  No internet service  No internet service   
964                    No                   No                   No   
976                    No                   No                   No   
1243                   No                   No                   No   
1338  No internet service  No internet service  No internet service   
1491  No internet service  No internet service  No internet service   
1731                   No                   No                   No   
1739                   No                   No                   No   
1932  No internet service  No internet service  No internet service   
2033  No internet service  No internet service  No internet service   
2121  No internet service  No internet service  No internet service   
2713  No internet service  No internet service  No internet service   
2892  No internet service  No internet service  No internet service   
3301                   No                   No                   No   
3312  No internet service  No internet service  No internet service   
3499  No internet service  No internet service  No internet service   
3679  No internet service  No internet service  No internet service   
3754  No internet service  No internet service  No internet service   
4098  No internet service  No internet service  No internet service   
4476  No internet service  No internet service  No internet service   
4495                   No                   No                   No   
4536                   No                   No                   No   
4817  No internet service  No internet service  No internet service   
5170  No internet service  No internet service  No internet service   
5506  No internet service  No internet service  No internet service   
5522                   No                   No                   No   
5736  No internet service  No internet service  No internet service   
5759                   No                   No                   No   
6267                   No                   No                   No   
6491                   No                   No                   No   
6499  No internet service  No internet service  No internet service   
6518                   No                   No                   No   
6609  No internet service  No internet service  No internet service   
6706  No internet service  No internet service  No internet service   
6764                   No                   No                   No   
6774  No internet service  No internet service  No internet service   
6924                   No                   No                   No   

          StreamingMovies        Contract PaperlessBilling     PaymentMethod  \
22    No internet service  Month-to-month               No      Mailed check   
100   No internet service  Month-to-month               No      Mailed check   
542   No internet service  Month-to-month               No      Mailed check   
646                    No  Month-to-month              Yes      Mailed check   
662   No internet service  Month-to-month               No      Mailed check   
690   No internet service  Month-to-month              Yes      Mailed check   
964                    No  Month-to-month              Yes      Mailed check   
976                    No  Month-to-month              Yes  Electronic check   
1243                   No  Month-to-month               No  Electronic check   
1338  No internet service  Month-to-month               No      Mailed check   
1491  No internet service  Month-to-month               No      Mailed check   
1731                   No  Month-to-month              Yes  Electronic check   
1739                   No  Month-to-month              Yes  Electronic check   
1932  No internet service  Month-to-month               No      Mailed check   
2033  No internet service  Month-to-month              Yes      Mailed check   
2121  No internet service  Month-to-month               No      Mailed check   
2713  No internet service  Month-to-month              Yes      Mailed check   
2892  No internet service  Month-to-month               No      Mailed check   
3301                   No  Month-to-month              Yes  Electronic check   
3312  No internet service  Month-to-month               No      Mailed check   
3499  No internet service  Month-to-month               No      Mailed check   
3679  No internet service  Month-to-month              Yes      Mailed check   
3754  No internet service  Month-to-month               No      Mailed check   
4098  No internet service  Month-to-month              Yes      Mailed check   
4476  No internet service  Month-to-month               No      Mailed check   
4495                   No  Month-to-month              Yes  Electronic check   
4536                   No  Month-to-month              Yes  Electronic check   
4817  No internet service  Month-to-month               No      Mailed check   
5170  No internet service  Month-to-month               No      Mailed check   
5506  No internet service  Month-to-month               No      Mailed check   
5522                   No  Month-to-month              Yes      Mailed check   
5736  No internet service  Month-to-month               No      Mailed check   
5759                   No  Month-to-month              Yes      Mailed check   
6267                   No  Month-to-month              Yes  Electronic check   
6491                   No  Month-to-month              Yes  Electronic check   
6499  No internet service  Month-to-month               No      Mailed check   
6518                   No  Month-to-month               No  Electronic check   
6609  No internet service  Month-to-month              Yes      Mailed check   
6706  No internet service  Month-to-month               No      Mailed check   
6764                   No  Month-to-month              Yes  Electronic check   
6774  No internet service  Month-to-month               No      Mailed check   
6924                   No  Month-to-month              Yes  Electronic check   

      MonthlyCharges  TotalCharges Churn  
22             20.15         20.15   Yes  
100            20.20         20.20    No  
542            19.55         19.55    No  
646            45.70         45.70   Yes  
662            20.05         20.05    No  
690            20.45         20.45    No  
964            45.70         45.70   Yes  
976            69.90         69.90   Yes  
1243           45.30         45.30   Yes  
1338           20.15         20.15   Yes  
1491           19.55         19.55    No  
1731           69.60         69.60   Yes  
1739           69.90         69.90   Yes  
1932           20.20         20.20    No  
2033           20.20         20.20   Yes  
2121           20.45         20.45    No  
2713           20.45         20.45    No  
2892           20.45         20.45    No  
3301           69.60         69.60   Yes  
3312           20.30         20.30    No  
3499           20.90         20.90   Yes  
3679           20.10         20.10   Yes  
3754           20.05         20.05    No  
4098           20.20         20.20   Yes  
4476           20.90         20.90   Yes  
4495           70.10         70.10   Yes  
4536           69.35         69.35   Yes  
4817           19.90         19.90    No  
5170           19.65         19.65    No  
5506           20.20         20.20    No  
5522           70.15         70.15   Yes  
5736           20.05         20.05    No  
5759           70.15         70.15   Yes  
6267           70.10         70.10   Yes  
6491           69.20         69.20   Yes  
6499           20.30         20.30    No  
6518           45.30         45.30   Yes  
6609           20.10         20.10   Yes  
6706           19.90         19.90    No  
6764           69.20         69.20   Yes  
6774           19.65         19.65    No  
6924           69.35         69.35   Yes  

We see that the churn feature varies for the duplicate entries. Since we have so many categorical features, it is likely that the same values occur for several customers. The same value for MonthlyCharges & TotalCharges is also possible since they can be on a fixed cost plan/subscription. We don't have to drop the duplicates for this dataset.

2.1.5. Correlations

name = 'heatmap@telco--corr.png'
corr = telco.corr()

plotter.corr(corr, name)
name

heatmap@telco--corr.png

tenure & MonthlyCharges are positively correlated with TotalCharges which is expected.

Date: 2021-10-27 Wed 00:00

Created: 2021-10-27 Wed 15:23