Skip to main content

Unlock the potential of web crawling with our comprehensive 3-hour workshop! This hands-on tutorial will lead you through the fundamental aspects of web scraping, covering everything from understanding its use cases to overcoming challenges and deploying your own web crawlers. Our experts will showcase a range of successfully implemented web crawling projects across various industries. Additionally, our experienced mentors will provide hands-on assistance with coding exercises in Jupyter Notebooks Environments hosted on the cloud. Whether you’re a beginner or have some experience, this workshop caters to various levels of expertise.

Objectives

Learn to develop a production-ready web crawler for automated, large-scale data gathering

Agenda

09:00 – 09:45 Introduction
  • Examples of use cases of data collection using web scraping
  • Web crawling obstacles (JavaScript rendering, rate limits & captchas) and how to overcome them
  • Legal considerations
09:45 – 10:30 Data Discovery
  • Analyzing webpages using Chrome Developer Tools
  • Extracting data in HTML using CSS/Xpath selectors
  • Finding data hidden in API endpoints
10:30 – 11:00 Break
11:00 – 12:30 Practical Hands-on Development & Deployment
  • Building your first web crawler using the Scrapy Python Module
  • Containerize code using Docker and deploy to the cloud
  • Scheduling and scaling web crawlers