January 4, 2021

Scraping and Geocoding Real Estate Properties for GIS Analysis

Scraping and Geocoding Real Estate Properties for GIS Analysis

How do you find and buy a house near a beach that isn't flood-prone?

Real estate portals contains a comprehensive listing of homes that are for sale, but it doesn't really provided you with additional information such as flood hazard, noise levels or proximity of fault zones. You can get all that with GIS of course, but how would you marry them together?

Through a scrape-extract-geocode-export-ingest workflow of course!

Project Code: Github

System Design

System Diagram

The software system consists of a headless chrome browser driven by Puppeteer that scraps real estate property data from domain.com.au's search pages. Paginated results pages results are collated.

Scraped property addresses are used as input for geocoding using mappify.io APIs. To keep ourselves below the 2500 request per month free tier (and to speed things up), the APIs calls are placed behind a simple in-memory / file cache. The geocoding APIs produces coordinates (latitude and longitude) that can be saved along with the scraped data as a flat CSV file.

The resulting CSV file can be ingested by GIS systems (such as QGIS) and overlaid on top of other geospatial data (like flood hazard map).

Results

The scrapped and geocoded data can ingested into GIS system. Allowing for further analysis along side other geospatial data (in this case, elevation data).

Geocoded Real Estate Property Ingested in GIS System