Skip to content

Professional Web Scraping with Java


Stone River eLearning

Summary

Price
£12 inc VAT
Study method
Online
Duration
1 hour · Self-paced
Qualification
No formal qualification

Overview

In this short and concise course you will learn everything to get started with web scrapingusing Java.

You will learn the concepts behind web scraping that you can apply to practically any web page(static AND dynamic / AJAX).

Course structure

We start with an overview of what web scraping is and what you can do with it.

Then we explain the difference in scraping static pages vs dynamic / AJAX pages. You learn how to classify a website in one of the two categories and then apply the right concept in order to scrape the data you want.

Now you will learn how to export the scraped data either as CSV or JSON. These are some popular formats that can be used for further processing.

Unfortunately many websites try to block scrapers or sometimes you just do not want to be detected. In the section going undercover you will learn how to stay undetected and avoid getting blocked.

At the end of the course you can download the full source code of all the lectures and we discuss an outlook to some advanced topics (private proxies, cloud deployment, multi threading ...). Those advanced topics are covered in a follow up course I am going to teach.

Why you should take this course

Stop imagining you can scrape data from websites and use the skills for your next web project, you can do it now.

  • Stay ahead of your competition
  • Be more efficient and automate tedious, manual tasks
  • Increase your value by adding web scraping to your skill set

Requirements

  • You should already be familiar with Java and Maven at a basic to medium level (the course will not show you how to setup Java, Maven or an IDE)
  • You should be familiar with HTML/CSS and know how to use your browser's developer tools
  • You should know about CSS selectors, though we use them for scraping static web pages
  • Prior knowledge of jQuery helps you getting started faster with Jsoup, though this is not required
  • You should know what a web API and AJAX is (basic level is enough)

What Will I Learn?

  • Have a solid understanding of web scraping with Java
  • Beeing able to scrape practically any web page (static AND dynamic / AJAX) though you learn the concepts behind web scraping
  • Download, parse and extract data from websites with Jsoup
  • Call web APIs in Java with Unirest
  • Export your data as CSV or JSON
  • Build web scrapers that stay undetected and do not get blocked or banned

Description

Class Curriculum

Course Introduction

StartIntroduction (1:53)

Scraping static web pages

StartWhat is a static web page (0:52)

PreviewConcept how to scrape static web pages (2:00)

StartJsoup - the jQuery for Java (5:45)

StartExample - Scraping Google (14:03)

Scraping dynamic / AJAX web pages

StartWhat is a dynamic web page (1:55)

StartUnirest (11:20)

StartConcept how to scrape dynamic web pages (2:35)

StartExample - Scraping peoplescrapers (14:59)

Exporting your data

StartExport as CSV (2:10)

StartExport as JSON (4:22)

Going undercover

PreviewHow to stay undetected (2:22)

Conclusion

StartConclusion (1:20)

Who is this course for?

  • Anyone with an interest in learning web scraping and understanding the concepts
  • Anyone who likes a short and concise course
  • This course is NOT an introduction to Java
  • This course will NOT show you how to setup your development environment
  • This course is intended to get you started with web scraping. Very advanced topics (e.g. private proxies, cloud deployment, multi threading) are discussed but not implemented in this course. I will do an an advanced / enterprise level course on this separately...
  • Windows, Mac, or Linux PC

Requirements

  • You should already be familiar with Java and Maven at a basic to medium level (the course will not show you how to setup Java, Maven or an IDE)
  • You should be familiar with HTML/CSS and know how to use your browser's developer tools
  • You should know about CSS selectors, though we use them for scraping static web pages
  • Prior knowledge of jQuery helps you getting started faster with Jsoup, though this is not required
  • You should know what a web API and AJAX is (basic level is enough)

Career path

Web Scrapping

Questions and answers

Currently there are no Q&As for this course. Be the first to ask a question.

Reviews

Currently there are no reviews for this course. Be the first to leave a review.

FAQs

Study method describes the format in which the course will be delivered. At Reed Courses, courses are delivered in a number of ways, including online courses, where the course content can be accessed online remotely, and classroom courses, where courses are delivered in person at a classroom venue.

CPD stands for Continuing Professional Development. If you work in certain professions or for certain companies, your employer may require you to complete a number of CPD hours or points, per year. You can find a range of CPD courses on Reed Courses, many of which can be completed online.

A regulated qualification is delivered by a learning institution which is regulated by a government body. In England, the government body which regulates courses is Ofqual. Ofqual regulated qualifications sit on the Regulated Qualifications Framework (RQF), which can help students understand how different qualifications in different fields compare to each other. The framework also helps students to understand what qualifications they need to progress towards a higher learning goal, such as a university degree or equivalent higher education award.

An endorsed course is a skills based course which has been checked over and approved by an independent awarding body. Endorsed courses are not regulated so do not result in a qualification - however, the student can usually purchase a certificate showing the awarding body's logo if they wish. Certain awarding bodies - such as Quality Licence Scheme and TQUK - have developed endorsement schemes as a way to help students select the best skills based courses for them.