A simple Python scraper for infinite scroll websites

How should social media researchers go about gathering data in an era when major online platforms are removing or severely restricting the public APIs (application programming interfaces) formerly used by the aforementioned researchers to gather public data for analysis? The obvious alternative is to scrape content in some manner, which is generally accomplished by studying and replicating the internal API calls used by a given platform’s website and smartphone apps. While this technique can be quite effective, it suffers from the problem that scrapers of this sort need to be tailored to each website one desires to scrape.An alternative approach: automate a web browser using a tool such as Selenium, navigating the site as a user would, and parsing out desired text and other data from the HTML displayed in the browser. This approach allows one to take advantage of the fact that many modern sites have more or less the same user interface: a scrollable list of items that loads additional items when the user scrolls up or down (sometimes referred to as infinite scroll). By automating the process of scrolling a large number of items into view and then parsing them out, we can write a simple albeit clunky web scraper that works with many (but not all) infinite scroll websites.# SCRAPER FOR INFINITE SCROLL SITES import bs4 import json from selenium.webdriver import FirefoxOptions from selenium.webdriver.common.by import By from selenium import * import sys import time def matches (e1, e2): if e1 == e2: return True e1 =…A simple Python scraper for infinite scroll websites

Leave a Reply

Your email address will not be published. Required fields are marked *