Xem thêm

Scraping Zillow Made Easy with Python

If you're looking to analyze the property market in your area, scraping Zillow is one of the most effective methods available. With an estimated 348.4 million monthly user visits, Zillow provides valuable insights into the...

If you're looking to analyze the property market in your area, scraping Zillow is one of the most effective methods available. With an estimated 348.4 million monthly user visits, Zillow provides valuable insights into the real estate market. And in this blog, we'll show you how to scrape Zillow using Python, making the process quick and efficient.

Is Zillow Scraping Allowed?

Zillow employs anti-scraping techniques to protect its data. However, we'll discuss how to avoid getting banned while scraping Zillow later in this post. For now, let's focus on extracting data using Python and two essential libraries, Requests and BeautifulSoup4.

Why Use Python for Zillow Scraping?

Python is a powerful programming language with numerous libraries for web scraping. Its ease of use and extensive documentation make it an excellent choice for beginners and experienced developers alike. From scraping Google search results to collecting pricing data for business needs, Python allows for limitless possibilities. Additionally, Python has a supportive community and various forums where you can find solutions to any issues you may encounter along the way.

Some of the best Python forums for support and learning include PythonAnywhere, StackOverflow, Subreddit on Python, Sitepoint, and Python Forum.

Let's Get Started with Zillow Scraping

To begin scraping Zillow using Python, we'll first perform a normal HTTP GET request to our target page. We'll extract the price, size, and address of each property listed. Let's take a look at the code:

import requests
from bs4 import BeautifulSoup

target_url = "https://www.zillow.com/homes/for_sale/Brooklyn,-New-York,-NY_rb/"
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36"}

resp = requests.get(target_url, headers=headers)
print(resp.status_code)

In the code above, we're using the Requests library to make the HTTP request and the BeautifulSoup library to parse the HTML response. By inspecting the website, we identify the class names where our target elements are stored. We use these class names to extract the desired data.

properties = soup.find_all("div", {"class": "StyledPropertyCardDataWrapper-c11n-8-69-2__sc-1omp4c3-0 KzAaq property-card-data"})

for property in properties:
    obj = {}

    try:
        obj["pricing"] = property.find("div", {"class": "StyledPropertyCardDataArea-c11n-8-69-2__sc-yipmu-0 kJFQQX"}).text
    except:
        obj["pricing"] = None

    try:
        obj["size"] = property.find("div", {"class": "StyledPropertyCardDataArea-c11n-8-69-2__sc-yipmu-0 bKFUMJ"}).text
    except:
        obj["size"] = None

    try:
        obj["address"] = property.find("a", {"class": "StyledPropertyCardDataArea-c11n-8-69-2__sc-yipmu-0 dZxoFm property-card-link"}).text
    except:
        obj["address"] = None

    l.append(obj)

print(l)

Once we have the target elements, we iterate over them and extract the pricing, size, and address information. We store each property's data in a dictionary object and append it to a list. Finally, we print the list of properties.

To scrape multiple pages and gather more data, we can modify the URL to include different page numbers. For example, we can scrape the first 10 pages using the following code:

for page in range(1, 11):
    resp = requests.get(f"https://www.zillow.com/homes/for_sale/Brooklyn,-New-York,-NY_rb/{page}_p/", headers=headers).text
    soup = BeautifulSoup(resp, 'html.parser')
    properties = soup.find_all("div", {"class": "StyledPropertyCardDataWrapper-c11n-8-69-2__sc-1omp4c3-0 KzAaq property-card-data"})

    for property in properties:
        obj = {}
        # Extract property data here...
        l.append(obj)

Scraping Zillow with JS Rendering

While the normal HTTP request method works for some websites, Zillow requires JavaScript rendering to load the complete website. To achieve this, we'll use Selenium, a powerful tool for web automation, to simulate a browser and extract the necessary data. Here's how you can do it:

from bs4 import BeautifulSoup
from selenium import webdriver
import time

PATH = 'C:Program Files (x86)chromedriver.exe'

l = []
obj = {}
target_url = "https://www.zillow.com/homes/for_sale/Brooklyn,-New-York,-NY_rb/"

driver = webdriver.Chrome(PATH)
driver.get(target_url)

html = driver.find_element_by_tag_name('html')
html.send_keys(Keys.END)

time.sleep(5)
resp = driver.page_source
driver.close()

soup = BeautifulSoup(resp, 'html.parser')
properties = soup.find_all("div", {"class": "StyledPropertyCardDataWrapper-c11n-8-69-2__sc-1omp4c3-0 KzAaq property-card-data"})

for property in properties:
    obj = {}
    # Extract property data here...
    l.append(obj)

print(l)

In this code snippet, we use Selenium to open the target URL in a browser and scroll down to load the complete website. We then extract the page source code and close the browser. Finally, we extract the desired property data using BeautifulSoup, similar to our previous method.

Using Scrapingdog for Zillow Scraping

Scraping large websites like Zillow often leads to captchas and other blocks. To avoid these issues and ensure a smooth scraping process, you can use Scrapingdog's Web Scraper API. Let's take a look at how to use it:

from bs4 import BeautifulSoup
import requests

l = []
obj = {}
target_url = "https://api.scrapingdog.com/scrape?api_key=Your-API-Key&url=https://www.zillow.com/homes/for_sale/Brooklyn,-New-York,-NY_rb/&dynamic=false"

resp = requests.get(target_url)
soup = BeautifulSoup(resp.text, 'html.parser')
properties = soup.find_all("div", {"class": "StyledPropertyCardDataWrapper-c11n-8-69-2__sc-1omp4c3-0 KzAaq property-card-data"})

for property in properties:
    obj = {}
    # Extract property data here...
    l.append(obj)

print(l)

With Scrapingdog, you don't need to install Selenium or manage proxies. Simply make a GET request to the API, replacing "Your-API-Key" with your own API key. This code snippet is similar to our previous methods, and the extracted data remains the same.

Conclusion

In this article, we've learned how to scrape Zillow using Python, whether through normal HTTP requests, JavaScript rendering, or with the help of Scrapingdog's Web Scraper API. Python's libraries and robust community support make it an ideal choice for web scraping tasks. By following the steps outlined here, you can collect valuable real estate data from Zillow and other websites efficiently and effectively.

Remember to respect website policies and use these techniques responsibly. Happy scraping!

1