Exploring a YouTube spam network with Python

As with all social media platforms, YouTube plays host to a variety of spam accounts, and as with all social media platforms, networks of such accounts can more easily be uncovered through the use of tools that download public data from the platform in bulk. The examples in this article use the pytubefix Python library, a fork of an older and currently unmaintained library, pytube. This library does not require you to have a Google developer account or your own API keys. This article outlines the process of exploring a set of inauthentic YouTube accounts that post #Accelerationism spam.import json import pandas as pd from pytubefix import Search, Channel PREFIX = “https://www.youtube.com/watch?v=” def get_channel_handle (result): for p in result.initial_data[“engagementPanels”]: p = p[“engagementPanelSectionListRenderer”] p = p[“content”] if “structuredDescriptionContentRenderer” in p.keys (): p = p[“structuredDescriptionContentRenderer”] for p in p[“items”]: if “videoDescriptionHeaderRenderer” in p: p = p[“videoDescriptionHeaderRenderer”] if “channelNavigationEndpoint” in p: p = p[“channelNavigationEndpoint”] p = p[“browseEndpoint”] p = p[“canonicalBaseUrl”] return p[1:] def youtube_search (query, max_results=1000): s = Search (query) prev = 0 while prev < len (s.results) and len (s.results) < max_results: prev = len (s.results) print (str (prev) + ” results so far”) s.get_next_results () print (str (len (s.results)) + ” results found”) items = [] for result in s.results: captions = [c.json_captions for c in result.captions] video_id = result.vid_info[“videoDetails”][“videoId”] items.append ({ “author” : result.author, “captions” : captions, “channel_id” : result.channel_id, “description” : result.description, “handle” : get_channel_handle (result), “id” : video_id, “keywords” : result.keywords, “length” : result.length, “publish_date” : str (result.publish_date),…Exploring a YouTube spam network with Python

Leave a Reply

Your email address will not be published. Required fields are marked *