List ScrapeMaps allow the user to scrape multiple rows of similar data from a website. This is ideal for scraping data from lists and search results. In this example we’ll be targeting houses for sale in a particular area.
Let’s begin by typing “trulia” in the Navigation Bar and press enter.
From the google search results select the first entry for the Trulia Real Estate website. Once you have browsed to trulia click on the “Buy” button to restrict our search to just properties that you can buy.
Once you have browsed to homes to buy specify a search location that you would like to target. In this example I will specify “Raleigh, NC” and press enter to see the real estate listings.
Let’s restrict the kinds of houses that we are looking for to 2+ bedroom homes that are “House” type.
Lastly, let’s sort the houses by Newest. Sorting your search results can help you target just the data you want and save you from scraping unnecessary data.
Now that we have the data we want to target let’s create our ScrapeMap. Click the Create ScrapeMap button in the ScrapeMap toolbar.
In the Create ScrapeMap Dialog specify the ScrapeMap File Path. Choose a “List” Type Of ScrapeMap. In the Write Settings enable Add Date Updated Field. Make sure Create New Or Update Existing Table is set to “UpdateExistingTable”. Let’s choose to Write To Excel and specify an Excel Table Name and File Path. If you don’t have a version of Excel installed you can write to a CSV file. Select OK to create the ScrapeMap.
Now that we’ve created our ScrapeMap let’s specify the data that we’d like to select. Let’s start by selecting the first price of the first listing. Make sure that the data you are targeting is highlighted in yellow.
In the ScrapeMap Wizard Dialog select “Capture List Data”.
In the Follow-up Wizard pane specify the DataAddress’ Name. A DataAddress name must be unique and cannot contain spaces or special characters. Let’s name this DataAddress “Price”. The Description field allows us to store more information about our DataAddress. Let’s type “House Price” in this Description field.
For list data you must select multiple instances of the data in order for the Selector to be properly create. Click another price to finish creating this first DataAddress. When the Price DataAddress is created you should notice that the data you have selected is highlighted in green and visible in the Data Preview pane.
Now repeat the previous step to grab two more data fields from the house listings. Click the details of the listing making sure all of the bedroom, bath, and square footage data is highlighted in yellow.
Let’s Name this DataAddress “Details” with a Description of “Bedroom, Bathroom, and Square Footage Info”. Select another listing’s details to finalize selector and create this DataAddress. Now repeat this process for the address line at the bottom of the listing. Name this DataAddress “Address”. Select a second address line to create this DataAddress.
Now let’s capture the URL of each listing so we’ll be able to browse to each listing from our Excel file. Click on an image of one of the listings. Select the “Capture List URL” option from the ScrapeMap Wizard Dialog.
By choosing the “Capture Link URL” option we're telling ScrapeMate to target a link's URL data instead of the web element's text. Name this DataAddress “ListingURL”.
Click another listing’s image to finalize the creation of this DataAddress. At this point you should have 4 DataAddresses with their information visible in the Data Preview pane.
The web scraper ScrapeMate has the ability to drill down into listings to scrape additional data. Let’s get a little more information about these house listings. Click on the image of one of the properties and select “Browse To List Webpage” from the ScrapeMap Wizard Dialog.
This initiates the creation of a Browse Action. Let’s Name this Browse Action “BrowseToListing”.
Click another listing’s image to finalize this action. After create a browse action you’ll notice the Source Browser browses to the first URL that you clicked on and a second InstructionSet has been created in your ScrapeMap.
In the web page for the listing we clicked on let’s get the long description for the listing. Click on this to create a DataAddress for this data.
Choose “Capture Data” from the ScrapeMap Wizard Dialog. Name the DataAddress “Description”. Because this data is not in a list you don’t have to click other examples to properly create the selector. Click OK to create the DataAddress. Because the Data Preview pane will only display data from pages it has browsed to you may not see data for DataAddresses created in Browse Action Sub InstructionSets.
To browse back to the list of houses you can either press the browse back button on the Source Browser Toolbar or click on the root InstructionSet.
At this point your ScrapeMap should have 2 InstructionSets with 5 total DataAddresses.
Let’s add a Next Action to our DataAddress so that we can browse through all of the house listings. At the bottom of the list of houses click on the next arrow.
In the ScrapeMap Wizard Dialog expand the “Other Actions” section and select “Click Next”.
Name the “Next”. For Actions that can be performed and unknown number of times you can specify a Max Times Performed. This can save you time by allowing you to scrape only the data that you need. Since this is just an example let’s set this to two.
Click OK to create this Action.
Let’s make one last change to the ScrapeMap before we run it. When we created the ScrapeMap we specified that we wanted to “UpdateExistingTable”. This means the data will be added to the same table each time the ScrapeMap is run. As it stands the who list of house listings will be added to the end of our table with each ScrapeMap run. In this example we don’t care about maintaining a historical snapshot of all listings at a given time. Instead we want only new listings to be added to the end of the table and if the price or description of a listing changes we want the existing rows updated with the latest price. To accomplish this we need to tell ScrapeMate what data we want to use to match to the scraped data with the previously written data. To do this we create a Key DataAddress.
Open the ScrapeMap Properties Dialog.
In the ScrapeMap Properties Dialog select the Details tab. When deciding on a best Key Data Address it should be something that will be consistent for the data row. You can set multiple DataAddress to identify matching rows. In this example we can use the Address to match listing data. Enable the Is Key DataAddress option for the “Address” DataAddress. Select OK to update the ScrapeMap.
Now we’re ready to run the ScrapeMap to start scraping web data. Save the ScrapeMap and run it. In this example ScrapeMate has to browse to each house listing so it may take some time. After ScrapeMate has finished running browse to the Excel file you specified and verify that the correct data was scraped.