Youtube

Table of Contents

1. Init

Let's start by importing the necessary modules.

from context import src
from context import pd, np, sns, plt
from src import utils, plotter

2. Analysis

In this section we analyse the youtube dataset. We start by reading the data docs which gives us some preliminary information regarding the dataset and it's features.

The data is divided into several files based on the country. For this analysis we consider the data from the US only. This is because:

  • The smells will not be very different in the data from other countries and
  • The language is not english for (most of the) other countries.

2.1. Preliminary analysis

For the purpose of this analysis, the USvideos.csv file was renamed to youtube.csv.

youtube = pd.read_csv(utils.data_path('youtube.csv'))
youtube.head()
      video_id trending_date  \
0  2kyS6SvSYSE      17.14.11   
1  1ZAPwfrtAFY      17.14.11   
2  5qpjK5DgCt4      17.14.11   
3  puqaWrEC7tY      17.14.11   
4  d380meD0W0M      17.14.11   

                                                            title  \
0                              WE WANT TO TALK ABOUT OUR MARRIAGE   
1  The Trump Presidency: Last Week Tonight with John Oliver (HBO)   
2           Racist Superman | Rudy Mancuso, King Bach & Lele Pons   
3                                Nickelback Lyrics: Real or Fake?   
4                                        I Dare You: GOING BALD!?   

           channel_title  category_id              publish_time  \
0           CaseyNeistat           22  2017-11-13T17:13:01.000Z   
1        LastWeekTonight           24  2017-11-13T07:30:00.000Z   
2           Rudy Mancuso           23  2017-11-12T19:05:24.000Z   
3  Good Mythical Morning           24  2017-11-13T11:00:04.000Z   
4               nigahiga           24  2017-11-12T18:01:41.000Z   

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            tags  \
0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                SHANtell martin   
1                                                                                                                                                                                                                                                                                                                                                                                         last week tonight trump presidency|"last week tonight donald trump"|"john oliver trump"|"donald trump"   
2                                                                                                                                                                     racist superman|"rudy"|"mancuso"|"king"|"bach"|"racist"|"superman"|"love"|"rudy mancuso poo bear black white official music video"|"iphone x by pineapple"|"lelepons"|"hannahstocking"|"rudymancuso"|"inanna"|"anwar"|"sarkis"|"shots"|"shotsstudios"|"alesso"|"anitta"|"brazil"|"Getting My Driver's License | Lele Pons"   
3  rhett and link|"gmm"|"good mythical morning"|"rhett and link good mythical morning"|"good mythical morning rhett and link"|"mythical morning"|"Season 12"|"nickelback lyrics"|"nickelback lyrics real or fake"|"nickelback"|"nickelback songs"|"nickelback song"|"rhett link nickelback"|"gmm nickelback"|"lyrics (website category)"|"nickelback (musical group)"|"rock"|"music"|"lyrics"|"chad kroeger"|"canada"|"music (industry)"|"mythical"|"gmm challenge"|"comedy"|"funny"|"challenge"   
4                                                                                                                                                                                                                                                                                                                                                                       ryan|"higa"|"higatv"|"nigahiga"|"i dare you"|"idy"|"rhpc"|"dares"|"no truth"|"comments"|"comedy"|"funny"|"stupid"|"fail"   

     views   likes  dislikes  comment_count  \
0   748374   57527      2966          15954   
1  2418783   97185      6146          12703   
2  3191434  146033      5339           8181   
3   343168   10172       666           2146   
4  2095731  132235      1989          17518   

                                   thumbnail_link  comments_disabled  \
0  https://i.ytimg.com/vi/2kyS6SvSYSE/default.jpg              False   
1  https://i.ytimg.com/vi/1ZAPwfrtAFY/default.jpg              False   
2  https://i.ytimg.com/vi/5qpjK5DgCt4/default.jpg              False   
3  https://i.ytimg.com/vi/puqaWrEC7tY/default.jpg              False   
4  https://i.ytimg.com/vi/d380meD0W0M/default.jpg              False   

   ratings_disabled  video_error_or_removed  \
0             False                   False   
1             False                   False   
2             False                   False   
3             False                   False   
4             False                   False   

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          description  
0  SHANTELL'S CHANNEL - https://www.youtube.com/shantellmartin\nCANDICE - https://www.lovebilly.com\n\nfilmed this video in 4k on this -- http://amzn.to/2sTDnRZ\nwith this lens -- http://amzn.to/2rUJOmD\nbig drone - http://tinyurl.com/h4ft3oy\nOTHER GEAR ---  http://amzn.to/2o3GLX5\nSony CAMERA http://amzn.to/2nOBmnv\nOLD CAMERA; http://amzn.to/2o2cQBT\nMAIN LENS; http://amzn.to/2od5gBJ\nBIG SONY CAMERA; http://amzn.to/2nrdJRO\nBIG Canon CAMERA; http://tinyurl.com/jn4q4vz\nBENDY TRIPOD THING; http://tinyurl.com/gw3ylz2\nYOU NEED THIS FOR THE BENDY TRIPOD; http://tinyurl.com/j8mzzua\nWIDE LENS; http://tinyurl.com/jkfcm8t\nMORE EXPENSIVE WIDE LENS; http://tinyurl.com/zrdgtou\nSMALL CAMERA; http://tinyurl.com/hrrzhor\nMICROPHONE; http://tinyurl.com/zefm4jy\nOTHER MICROPHONE; http://tinyurl.com/jxgpj86\nOLD DRONE (cheaper but still great);http://tinyurl.com/zcfmnmd\n\nfollow me; on http://instagram.com/caseyneistat\non https://www.facebook.com/cneistat\non https://twitter.com/CaseyNeistat\n\namazing intro song by https://soundcloud.com/discoteeth\n\nad disclosure.  THIS IS NOT AN AD.  not selling or promoting anything.  but samsung did produce the Shantell Video as a 'GALAXY PROJECT' which is an initiative that enables creators like Shantell and me to make projects we might otherwise not have the opportunity to make.  hope that's clear.  if not ask in the comments and i'll answer any specifics.  
1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              One year after the presidential election, John Oliver discusses what we've learned so far and enlists our catheter cowboy to teach Donald Trump what he hasn't.\n\nConnect with Last Week Tonight online...\n\nSubscribe to the Last Week Tonight YouTube channel for more almost news as it almost happens: www.youtube.com/user/LastWeekTonight\n\nFind Last Week Tonight on Facebook like your mom would: http://Facebook.com/LastWeekTonight\n\nFollow us on Twitter for news about jokes and jokes about news: http://Twitter.com/LastWeekTonight\n\nVisit our official site for all that other stuff at once: http://www.hbo.com/lastweektonight  
2                                                                                                                                                                                                                                           WATCH MY PREVIOUS VIDEO ▶ \n\nSUBSCRIBE ► https://www.youtube.com/channel/UC5jkXpfnBhlDjqh0ir5FsIQ?sub_confirmation=1\n\nTHANKS FOR WATCHING! LIKE & SUBSCRIBE FOR MORE VIDEOS!\n-----------------------------------------------------------\nFIND ME ON: \nInstagram | http://instagram.com/rudymancuso\nTwitter | http://twitter.com/rudymancuso\nFacebook | http://facebook.com/rudymancuso\n\nCAST: \nRudy Mancuso | http://youtube.com/c/rudymancuso\nLele Pons | http://youtube.com/c/lelepons\nKing Bach | https://youtube.com/user/BachelorsPadTv\n\nVideo Effects: \nCaleb Natale | https://instagram.com/calebnatale\n\nPA:\nPaulina Gregory\n\n\nShots Studios Channels:\nAlesso | https://youtube.com/c/alesso\nAnitta | http://youtube.com/c/anitta\nAnwar Jibawi | http://youtube.com/c/anwar\nAwkward Puppets | http://youtube.com/c/awkwardpuppets\nHannah Stocking | http://youtube.com/c/hannahstocking\nInanna Sarkis | http://youtube.com/c/inanna\nLele Pons | http://youtube.com/c/lelepons\nMaejor | http://youtube.com/c/maejor\nMike Tyson | http://youtube.com/c/miketyson \nRudy Mancuso | http://youtube.com/c/rudymancuso\nShots Studios | http://youtube.com/c/shots\n\n#Rudy\n#RudyMancuso  
3         Today we find out if Link is a Nickelback amateur or a secret Nickelback devotee. GMM #1218\nDon't miss an all new Ear Biscuits: https://goo.gl/xeZNQt\nWatch Part 4: https://youtu.be/MhCdiiB8CQg | Watch Part 2: https://youtu.be/7qiOrNao9fg\nWatch today's episode from the start: http://bit.ly/GMM1218\n\nPick up all of the official GMM merch only at https://mythical.store\n\nFollow Rhett & Link: \nInstagram: https://instagram.com/rhettandlink\nFacebook: https://facebook.com/rhettandlink\nTwitter: https://twitter.com/rhettandlink\nTumblr: https://rhettandlink.tumblr.com\nSnapchat: @realrhettlink\nWebsite: https://mythical.co/\n\nCheck Out Our Other Mythical Channels:\nGood Mythical MORE: https://youtube.com/goodmythicalmore\nRhett & Link: https://youtube.com/rhettandlink\nThis Is Mythical: https://youtube.com/thisismythical\nEar Biscuits: https://applepodcasts.com/earbiscuits\n\nWant to send us something? https://mythical.co/contact\nHave you made a Wheel of Mythicality intro video? Submit it here: https://bit.ly/GMMWheelIntro\n\nIntro Animation by Digital Twigs: https://www.digitaltwigs.com\nIntro & Outro Music by Jeff Zeigler & Sarah Schimeneck https://www.jeffzeigler.com\nWheel of Mythicality theme: https://www.royaltyfreemusiclibrary.com/\nAll Supplemental Music fromOpus 1 Music: https://opus1.sourceaudio.com/\nWe use ‘The Mouse’ by Blue Microphones https://www.bluemic.com/mouse/  
4                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        I know it's been a while since we did this show, but we're back with what might be the best episode yet!\nLeave your dares in the comment section! \n\nOrder my book how to write good \nhttp://higatv.com/ryan-higas-how-to-write-good-pre-order-links/\n\nJust Launched New Official Store\nhttps://www.gianthugs.com/collections/ryan\n\nHigaTV Channel\nhttp://www.youtube.com/higatv\n\nTwitter\nhttp://www.twitter.com/therealryanhiga\n\nFacebook\nhttp://www.facebook.com/higatv\n\nWebsite\nhttp://www.higatv.com\n\nInstagram\nhttp://www.instagram.com/notryanhiga\n\nSend us mail or whatever you want here!\nPO Box 232355\nLas Vegas, NV 89105  
youtube.shape
40949 16
youtube.dtypes
video_id                  object
trending_date             object
title                     object
channel_title             object
category_id                int64
publish_time              object
tags                      object
views                      int64
likes                      int64
dislikes                   int64
comment_count              int64
thumbnail_link            object
comments_disabled           bool
ratings_disabled            bool
video_error_or_removed      bool
description               object
dtype: object

2.1.1. Handling text features

title, channel_title, tags & description are text features. Let's investigate them further.

text_features = ['title',
                 'channel_title',
                 'tags',
                 'description']
text = youtube[text_features]
text
                                                                                      title  \
0                                                        WE WANT TO TALK ABOUT OUR MARRIAGE   
1                            The Trump Presidency: Last Week Tonight with John Oliver (HBO)   
2                                     Racist Superman | Rudy Mancuso, King Bach & Lele Pons   
3                                                          Nickelback Lyrics: Real or Fake?   
4                                                                  I Dare You: GOING BALD!?   
...                                                                                     ...   
40944                                                          The Cat Who Caught the Laser   
40945                                                            True Facts : Ant Mutualism   
40946  I GAVE SAFIYA NYGAARD A PERFECT HAIR MAKEOVER BASED ON HER FEATURES: BTS! |bradmondo   
40947                                                   How Black Panther Should Have Ended   
40948                      Official Call of Duty®: Black Ops 4 — Multiplayer Reveal Trailer   

                  channel_title  \
0                  CaseyNeistat   
1               LastWeekTonight   
2                  Rudy Mancuso   
3         Good Mythical Morning   
4                      nigahiga   
...                         ...   
40944             AaronsAnimals   
40945                  zefrank1   
40946                Brad Mondo   
40947  How It Should Have Ended   
40948              Call of Duty   

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               tags  \
0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   SHANtell martin   
1                                                                                                                                                                                                                                                                                                                                                                                                            last week tonight trump presidency|"last week tonight donald trump"|"john oliver trump"|"donald trump"   
2                                                                                                                                                                                        racist superman|"rudy"|"mancuso"|"king"|"bach"|"racist"|"superman"|"love"|"rudy mancuso poo bear black white official music video"|"iphone x by pineapple"|"lelepons"|"hannahstocking"|"rudymancuso"|"inanna"|"anwar"|"sarkis"|"shots"|"shotsstudios"|"alesso"|"anitta"|"brazil"|"Getting My Driver's License | Lele Pons"   
3                     rhett and link|"gmm"|"good mythical morning"|"rhett and link good mythical morning"|"good mythical morning rhett and link"|"mythical morning"|"Season 12"|"nickelback lyrics"|"nickelback lyrics real or fake"|"nickelback"|"nickelback songs"|"nickelback song"|"rhett link nickelback"|"gmm nickelback"|"lyrics (website category)"|"nickelback (musical group)"|"rock"|"music"|"lyrics"|"chad kroeger"|"canada"|"music (industry)"|"mythical"|"gmm challenge"|"comedy"|"funny"|"challenge"   
4                                                                                                                                                                                                                                                                                                                                                                                          ryan|"higa"|"higatv"|"nigahiga"|"i dare you"|"idy"|"rhpc"|"dares"|"no truth"|"comments"|"comedy"|"funny"|"stupid"|"fail"   
...                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             ...   
40944                                                                                                                                                                                                                                                                                                                                                                        aarons animals|"aarons"|"animals"|"cat"|"cats"|"kitten"|"kittens"|"prince michael"|"prince"|"michael"|"laser"|"olympics"|"red"|"dream"   
40945                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        [none]   
40946  I gave safiya nygaard a perfect hair makeover based on her features: bts|"brad mondo"|"safiya and tyler"|"safiya nygaard"|"hair transformation"|"makeover"|"I got a perfect makeover based on my features"|"bts"|"hairdresser reacts"|"before and after"|"hair"|"makeup"|"transformation"|"ANTM"|"what not to wear"|"the ideal haircut and color for your face"|"safiya buzzfeed"|"color for your skin tone"|"haircut for your face shape"|"tutorial"|"balayage"|"hair stylist"|"hair color"|"hair tutorial"   
40947                                                                                                                                                                                                       Black Panther|"HISHE"|"Marvel"|"Infinity War"|"How It Should Have Ended"|"parody"|"comedy"|"entertainment"|"wakanda"|"Chadwick Boseman"|"Michael B Jordan"|"movies"|"animation"|"fortnite"|"azerrz"|"movie"|"plothole"|"review"|"childish gambino"|"donald glover"|"this is america"|"ending explained"   
40948                                                                                                                                                                                                                                                                                                                                                                                                                                                                 call of duty|"cod"|"activision"|"Black Ops 4"   

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                description  
0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        SHANTELL'S CHANNEL - https://www.youtube.com/shantellmartin\nCANDICE - https://www.lovebilly.com\n\nfilmed this video in 4k on this -- http://amzn.to/2sTDnRZ\nwith this lens -- http://amzn.to/2rUJOmD\nbig drone - http://tinyurl.com/h4ft3oy\nOTHER GEAR ---  http://amzn.to/2o3GLX5\nSony CAMERA http://amzn.to/2nOBmnv\nOLD CAMERA; http://amzn.to/2o2cQBT\nMAIN LENS; http://amzn.to/2od5gBJ\nBIG SONY CAMERA; http://amzn.to/2nrdJRO\nBIG Canon CAMERA; http://tinyurl.com/jn4q4vz\nBENDY TRIPOD THING; http://tinyurl.com/gw3ylz2\nYOU NEED THIS FOR THE BENDY TRIPOD; http://tinyurl.com/j8mzzua\nWIDE LENS; http://tinyurl.com/jkfcm8t\nMORE EXPENSIVE WIDE LENS; http://tinyurl.com/zrdgtou\nSMALL CAMERA; http://tinyurl.com/hrrzhor\nMICROPHONE; http://tinyurl.com/zefm4jy\nOTHER MICROPHONE; http://tinyurl.com/jxgpj86\nOLD DRONE (cheaper but still great);http://tinyurl.com/zcfmnmd\n\nfollow me; on http://instagram.com/caseyneistat\non https://www.facebook.com/cneistat\non https://twitter.com/CaseyNeistat\n\namazing intro song by https://soundcloud.com/discoteeth\n\nad disclosure.  THIS IS NOT AN AD.  not selling or promoting anything.  but samsung did produce the Shantell Video as a 'GALAXY PROJECT' which is an initiative that enables creators like Shantell and me to make projects we might otherwise not have the opportunity to make.  hope that's clear.  if not ask in the comments and i'll answer any specifics.  
1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    One year after the presidential election, John Oliver discusses what we've learned so far and enlists our catheter cowboy to teach Donald Trump what he hasn't.\n\nConnect with Last Week Tonight online...\n\nSubscribe to the Last Week Tonight YouTube channel for more almost news as it almost happens: www.youtube.com/user/LastWeekTonight\n\nFind Last Week Tonight on Facebook like your mom would: http://Facebook.com/LastWeekTonight\n\nFollow us on Twitter for news about jokes and jokes about news: http://Twitter.com/LastWeekTonight\n\nVisit our official site for all that other stuff at once: http://www.hbo.com/lastweektonight  
2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 WATCH MY PREVIOUS VIDEO ▶ \n\nSUBSCRIBE ► https://www.youtube.com/channel/UC5jkXpfnBhlDjqh0ir5FsIQ?sub_confirmation=1\n\nTHANKS FOR WATCHING! LIKE & SUBSCRIBE FOR MORE VIDEOS!\n-----------------------------------------------------------\nFIND ME ON: \nInstagram | http://instagram.com/rudymancuso\nTwitter | http://twitter.com/rudymancuso\nFacebook | http://facebook.com/rudymancuso\n\nCAST: \nRudy Mancuso | http://youtube.com/c/rudymancuso\nLele Pons | http://youtube.com/c/lelepons\nKing Bach | https://youtube.com/user/BachelorsPadTv\n\nVideo Effects: \nCaleb Natale | https://instagram.com/calebnatale\n\nPA:\nPaulina Gregory\n\n\nShots Studios Channels:\nAlesso | https://youtube.com/c/alesso\nAnitta | http://youtube.com/c/anitta\nAnwar Jibawi | http://youtube.com/c/anwar\nAwkward Puppets | http://youtube.com/c/awkwardpuppets\nHannah Stocking | http://youtube.com/c/hannahstocking\nInanna Sarkis | http://youtube.com/c/inanna\nLele Pons | http://youtube.com/c/lelepons\nMaejor | http://youtube.com/c/maejor\nMike Tyson | http://youtube.com/c/miketyson \nRudy Mancuso | http://youtube.com/c/rudymancuso\nShots Studios | http://youtube.com/c/shots\n\n#Rudy\n#RudyMancuso  
3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               Today we find out if Link is a Nickelback amateur or a secret Nickelback devotee. GMM #1218\nDon't miss an all new Ear Biscuits: https://goo.gl/xeZNQt\nWatch Part 4: https://youtu.be/MhCdiiB8CQg | Watch Part 2: https://youtu.be/7qiOrNao9fg\nWatch today's episode from the start: http://bit.ly/GMM1218\n\nPick up all of the official GMM merch only at https://mythical.store\n\nFollow Rhett & Link: \nInstagram: https://instagram.com/rhettandlink\nFacebook: https://facebook.com/rhettandlink\nTwitter: https://twitter.com/rhettandlink\nTumblr: https://rhettandlink.tumblr.com\nSnapchat: @realrhettlink\nWebsite: https://mythical.co/\n\nCheck Out Our Other Mythical Channels:\nGood Mythical MORE: https://youtube.com/goodmythicalmore\nRhett & Link: https://youtube.com/rhettandlink\nThis Is Mythical: https://youtube.com/thisismythical\nEar Biscuits: https://applepodcasts.com/earbiscuits\n\nWant to send us something? https://mythical.co/contact\nHave you made a Wheel of Mythicality intro video? Submit it here: https://bit.ly/GMMWheelIntro\n\nIntro Animation by Digital Twigs: https://www.digitaltwigs.com\nIntro & Outro Music by Jeff Zeigler & Sarah Schimeneck https://www.jeffzeigler.com\nWheel of Mythicality theme: https://www.royaltyfreemusiclibrary.com/\nAll Supplemental Music fromOpus 1 Music: https://opus1.sourceaudio.com/\nWe use ‘The Mouse’ by Blue Microphones https://www.bluemic.com/mouse/  
4                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              I know it's been a while since we did this show, but we're back with what might be the best episode yet!\nLeave your dares in the comment section! \n\nOrder my book how to write good \nhttp://higatv.com/ryan-higas-how-to-write-good-pre-order-links/\n\nJust Launched New Official Store\nhttps://www.gianthugs.com/collections/ryan\n\nHigaTV Channel\nhttp://www.youtube.com/higatv\n\nTwitter\nhttp://www.twitter.com/therealryanhiga\n\nFacebook\nhttp://www.facebook.com/higatv\n\nWebsite\nhttp://www.higatv.com\n\nInstagram\nhttp://www.instagram.com/notryanhiga\n\nSend us mail or whatever you want here!\nPO Box 232355\nLas Vegas, NV 89105  
...                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     ...  
40944                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        The Cat Who Caught the Laser - Aaron's Animals  
40945                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   NaN  
40946                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               I had so much fun transforming Safiyas hair in this video! She was serving major lewks!SAFIYAS VIDEO▷https://goo.gl/C92AmbSHOP MY LIMITED EDITION HOODIE!▷ https://goo.gl/VN6tVD LET'S BE BFFS!INSTAGRAM ▷ https://www.instagram.com/bradmondonyc/TWITTER ▷ https://twitter.com/bradmondonycFACEBOOK ▷ https://www.facebook.com/bradmondonyc/WANNA SEE MORE OF MY FACE? ▷ https://goo.gl/QjHDAuWANNA SEE MY LAST VIDEO? ▷ https://goo.gl/exP6gWFILMING EQUIPMENT: UMBRELLA LIGHTS▷ http://amzn.to/2qNy9K4RING LIGHT▷ http://amzn.to/2Erv1p9CAMERA▷ http://amzn.to/2EsXQRYCAMERA LENS▷http://amzn.to/2DdlN0rTRIPOD▷ http://amzn.to/2mePXbDMIC▷ http://amzn.to/2Bpt9PHBACKGROUND PAPER▷http://amzn.to/2FkKKHXWANT AN INTRO LIKE MINE? CONTACT▷www.marcelsaleta.comDON'T FORGET TO LIVE YOUR EXTRA LIFE! 😁  
40947  How Black Panther Should Have EndedWatch More HISHEs: https://bit.ly/HISHEPlaylistSubscribe to HISHE: https://bit.ly/HISHEsubscribeTwitter @theHISHEdotcomhttps://twitter.com/TheHISHEdotcomInstagram @HISHEgramhttps://instagram.com/hishegram/Facebook:https://www.facebook.com/howitshouldhaveended/HISHE Swag:http://www.dftba.com/hisheSpecial Thanks to Guest Voices Azerzz https://www.youtube.com/user/HeyitzAzerrzNicholas Andrew Louie https://www.youtube.com/user/NicholasAndrewLouie--------------Previous Episodes--------------------Avengers Infinity War and Beyond (Toy Story Mashup)https://youtu.be/bvXxLp_G9w0How IT Should Have Endedhttps://youtu.be/gh0WvZtbATEVillain Pub - The Dead Poolhttps://youtu.be/3DGlk_JAm8UHow Justice League Should Have Endedhttps://youtu.be/zj_y8eAKpQUHow Star Wars The Last Jedi Should Have Endedhttps://youtu.be/rCB8DUGpYQQHow Thor Ragnarok Should Have Endedhttps://youtu.be/lPZRmkVLeOEHow Spider-Man Homecoming Should Have Endedhttps://youtu.be/hjuHNdEgN30Batman V Superman - Comedy Recaphttps://youtu.be/bNjhtHyihJ0How The Incredibles Should Have Endedhttps://youtu.be/C0VJaFN4bncVillain Pub - Penny For Your Fearshttps://youtu.be/ZLyulYMZbj8Blade Runner - Comedy Recaphttps://youtu.be/fVwe9xXBdHQRogue One LEGO HISHE - Chirrut VS Everythinghttps://youtu.be/T5jK4XnaAQQHow Wonder Woman Should Have Endedhttps://youtu.be/Lf6gl4-MPd0How Guardians of the Galaxy Vol.2 Should Have Endedhttps://youtu.be/GBxxjhnjH4YHow Jurassic World Should Have Ended:https://youtu.be/TXGCyjJh48IHulk Spoils Movieshttps://youtu.be/HAg3fuFczs0How Kong Skull Island Should Have Endedhttps://youtu.be/C3H0OWehlVI?list=PL3...How Rogue One Should Have Endedhttps://youtu.be/RjR71XpAu0I?list=PL3...How Moana Should Have Endedhttps://youtu.be/4aHGssCxMo4?list=PL3...How The LEGO Batman Movie Should Have Endedhttps://youtu.be/g7OH2OhIjJAHow Doctor Strange Should Have Endedhttps://youtu.be/9e5epVDd9h0?list=PL3...How Beauty and the Beast Should Have Endedhttps://youtu.be/8hm9ezomDhQHow Star Wars Should Have Ended (Special Edition)https://youtu.be/oXUJiHut7YE?list=PLi...More HISHE Reviewshttps://www.youtube.com/playlist?list...Villain Pub - The Boss Battlehttps://youtu.be/bt__1gwGZSA?list=PL3...LEGO Harry Potter in 90 Secondshttps://youtu.be/jnbBcAr7XGo?list=PL3...Suicide Squad HISHEhttps://youtu.be/Wje0SdFWrzUHow Guardians of the Galaxy Vol.1 Should Have Endedhttps://youtu.be/d0K436vUM4wStar Trek Beyond HISHEhttps://youtu.be/Fymz7yoELS4?list=PL3...Super Cafe: Batman GOhttps://youtu.be/KntOy6am7CM?list=PL3...Civil War HISHEhttps://youtu.be/fvLw021rVN0Villain Pub - The New Smilehttps://youtu.be/0oP8s4GK1BE?list=PLA...How Batman V Superman Should Have Endedhttps://youtu.be/pTuyfQ5CR4QTMNT: Out of the Shadows HISHEhttps://youtu.be/_ac8xKxeqzk?list=PL3...How Deadpool Should Have Endedhttps://youtu.be/5vbEcTIAdPs?list=PL3...Hero Swap - Gladiator Starring Iron Manhttps://youtu.be/P4mY4qmuJas?list=PL3...How X-Men: Days of Future Past Should Have Ended:https://youtu.be/uT6YOI6JcRsStar Wars - Revenge of the Sith HISHEhttps://youtu.be/K2ScVx4mRDEJungle Book HISHEhttps://youtu.be/WcfDDa5YoV8?list=PL3...BAT BLOOD - A Batman V Superman AND Bad Blood Parody ft. Batman:https://youtu.be/maX-ObiJB3oVillain Pub - The New Smile:https://youtu.be/0oP8s4GK1BE  
40948                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 Call of Duty: Black Ops 4 Multiplayer raises the bar for the famed multiplayer mode delivering the most thrilling grounded combat experience yet with a focus on tactical gameplay and player choice.Call of Duty®: Black Ops 4 is available October 12, 2018. Pre-order at participating retailers on disc or digital download and get Private Beta Access: https://www.callofduty.com/blackops4/buyFollow us for all the latest intel: Web: http://www.CallofDuty.com ;Facebook: http://facebook.com/CallofDuty and http://www.facebook.com/Treyarch/;Twitter: http://twitter.com/CallofDuty and http://twitter.com/Treyarch;Instagram: http://instagram.com/CallofDuty and http://www.instagram.com/treyarch/;Snapchat: callofduty  

[40949 rows x 4 columns]

It's possible to extract some numerical features (especially from tags) however for this analysis we don't consider these features.

Another observation is the presence of multiple languages (observed while browsing the data on Kaggle). This problem is only amplified when we consider data from all other countries.

youtube = youtube.drop(text_features, axis='columns')
youtube.shape
40949 12

2.1.2. Handling redundant columns

thumbnail_link is redundant, we can drop it.

youtube = youtube.drop('thumbnail_link', axis='columns')
youtube.shape
40949 11

The video_id column is redundant if we decide to fit a ML model on the dataset (since we don't want the model to learn the unique identifier of a video, rather learn general attributes from each example). Since there is no ML target for this dataset, we don't do anything.

2.1.3. Handling datetime features

trending_date & publish_time contain datetime info, they should be converted to the datetime dtype.

datetime_features = ['trending_date',
                     'publish_time']
datetime = youtube[datetime_features]
datetime
      trending_date              publish_time
0          17.14.11  2017-11-13T17:13:01.000Z
1          17.14.11  2017-11-13T07:30:00.000Z
2          17.14.11  2017-11-12T19:05:24.000Z
3          17.14.11  2017-11-13T11:00:04.000Z
4          17.14.11  2017-11-12T18:01:41.000Z
...             ...                       ...
40944      18.14.06  2018-05-18T13:00:04.000Z
40945      18.14.06  2018-05-18T01:00:06.000Z
40946      18.14.06  2018-05-18T17:34:22.000Z
40947      18.14.06  2018-05-17T17:00:04.000Z
40948      18.14.06  2018-05-17T17:09:38.000Z

[40949 rows x 2 columns]

Looks like both of them have custom formats and require some wrangling to get them into a standardised format. The format for publish_time is particularly odd, without documentation it's hard to decypher what the 'T' & the 'Z' stand for.

2.1.4. Handling categorical features

comments_disabled, ratings_disabled & video_error_or_removed are bool, but we can extract numerical features from them.

bool_features = ['comments_disabled',
                 'ratings_disabled',
                 'video_error_or_removed']
bools = youtube[bool_features]
bools
       comments_disabled  ratings_disabled  video_error_or_removed
0                  False             False                   False
1                  False             False                   False
2                  False             False                   False
3                  False             False                   False
4                  False             False                   False
...                  ...               ...                     ...
40944              False             False                   False
40945              False             False                   False
40946              False             False                   False
40947              False             False                   False
40948              False             False                   False

[40949 rows x 3 columns]

My intuition tells me that category_id is also categorical, let's investigate.

youtube['category_id'].value_counts()
24    9964
10    6472
26    4146
23    3457
22    3210
25    2487
28    2401
1     2345
17    2174
27    1656
15     920
20     817
19     402
2      384
29      57
43      57
Name: category_id, dtype: int64

2.1.5. Descriptive statistics, missing & duplicates

Let's look at the descriptive statistics next.

youtube.describe(include='all')
           video_id trending_date   category_id              publish_time  \
count         40949         40949  40949.000000                     40949   
unique         6351           205           NaN                      6269   
top     j4KvrAUjn6c      17.14.11           NaN  2018-05-18T14:00:04.000Z   
freq             30           200           NaN                        50   
mean            NaN           NaN     19.972429                       NaN   
std             NaN           NaN      7.568327                       NaN   
min             NaN           NaN      1.000000                       NaN   
25%             NaN           NaN     17.000000                       NaN   
50%             NaN           NaN     24.000000                       NaN   
75%             NaN           NaN     25.000000                       NaN   
max             NaN           NaN     43.000000                       NaN   

               views         likes      dislikes  comment_count  \
count   4.094900e+04  4.094900e+04  4.094900e+04   4.094900e+04   
unique           NaN           NaN           NaN            NaN   
top              NaN           NaN           NaN            NaN   
freq             NaN           NaN           NaN            NaN   
mean    2.360785e+06  7.426670e+04  3.711401e+03   8.446804e+03   
std     7.394114e+06  2.288853e+05  2.902971e+04   3.743049e+04   
min     5.490000e+02  0.000000e+00  0.000000e+00   0.000000e+00   
25%     2.423290e+05  5.424000e+03  2.020000e+02   6.140000e+02   
50%     6.818610e+05  1.809100e+04  6.310000e+02   1.856000e+03   
75%     1.823157e+06  5.541700e+04  1.938000e+03   5.755000e+03   
max     2.252119e+08  5.613827e+06  1.674420e+06   1.361580e+06   

       comments_disabled ratings_disabled video_error_or_removed  
count              40949            40949                  40949  
unique                 2                2                      2  
top                False            False                  False  
freq               40316            40780                  40926  
mean                 NaN              NaN                    NaN  
std                  NaN              NaN                    NaN  
min                  NaN              NaN                    NaN  
25%                  NaN              NaN                    NaN  
50%                  NaN              NaN                    NaN  
75%                  NaN              NaN                    NaN  
max                  NaN              NaN                    NaN  

Let's look for missing values next.

youtube.isna().any()
video_id                  False
trending_date             False
category_id               False
publish_time              False
views                     False
likes                     False
dislikes                  False
comment_count             False
comments_disabled         False
ratings_disabled          False
video_error_or_removed    False
dtype: bool

And check for duplicates.

youtube[youtube.duplicated(keep=False)]
          video_id trending_date  category_id              publish_time  \
34750  QBL8IRJ5yHU      18.15.05           26  2018-05-14T19:00:01.000Z   
34751  t4pRQ0jn23Q      18.15.05           24  2018-05-14T14:00:03.000Z   
34752  j4KvrAUjn6c      18.15.05           24  2018-05-13T18:03:56.000Z   
34753  MAjY8mCTXWk      18.15.05           10  2018-05-14T15:59:47.000Z   
34754  xhs8tf1v__w      18.15.05           24  2018-05-14T16:00:29.000Z   
...            ...           ...          ...                       ...   
34944  iILJvqrAQ_w      18.15.05           10  2018-05-11T04:00:34.000Z   
34945  zcEE8J2Bqa8      18.15.05           23  2018-05-11T18:27:01.000Z   
34946  q1jzwV_s8_Y      18.15.05           10  2018-05-11T07:00:01.000Z   
34947  mkz1zoo15zI      18.15.05           17  2018-05-11T19:21:53.000Z   
34948  2PH7dK6SLC8      18.15.05           10  2018-05-10T17:00:01.000Z   

         views   likes  dislikes  comment_count  comments_disabled  \
34750  1469627  188652      3124          33032              False   
34751  1199587   49709      2380           7261              False   
34752  3906727   77378     12160          15874              False   
34753   916128   40485      1042           4746              False   
34754   343967   16988       132           1308              False   
...        ...     ...       ...            ...                ...   
34944  2124177   81085      1321           4019              False   
34945   165617   20572       140           1407              False   
34946  1869585   64523      1891           5903              False   
34947   472999    3505       163           1511              False   
34948  1201548   51670       964           4264              False   

       ratings_disabled  video_error_or_removed  
34750             False                   False  
34751             False                   False  
34752             False                   False  
34753             False                   False  
34754             False                   False  
...                 ...                     ...  
34944             False                   False  
34945             False                   False  
34946             False                   False  
34947             False                   False  
34948             False                   False  

[96 rows x 11 columns]

Duplicates are possible here since the same video may be trending at different times. Video may also have similar number of likes, dislikes, etc.

2.1.6. Correlations

Let's check for correlations next.

name = 'heatmap@youtube--corr.png'
corr = youtube.corr()

plotter.corr(corr, name)
name

heatmap@youtube--corr.png

Some numerical features are positively correlated to one another.

Date: 2021-11-06 Sat 00:00

Created: 2022-01-25 Tue 15:48