Using Lists : The Key to Building Successful Data Mining Agents
September 09, 2015
The Internet is full of lists. If you think about most of the websites you visit regularly, you’ll realize that they are made up of lists. Blogs, retail stores, travel sites, directories, etc. They all contain lists of data in one form or another. That is precisely why Mozenda uses lists as the foundational principle behind building agents.
Some websites contain simple lists while others contain multiple lists of different types nested within parent lists. Mozenda recognizes several different kinds of lists so that the user can automate repetitive actions anywhere on a website, regardless of the complexity of the list.
In this post, our goal is to clearly explain the “List” paradigm as well as educate users on Mozenda’s three main list types and when they should be used: Item Lists, Data Lists, and Input Lists.
Begin with Lists in Mind
When you first load a website, your eyes should be drawn to the navigation of the site. Immediately you should be looking for various list structures like categories and sub-categories. Don’t forget to check the site map to see if it groups the data in a better way. The goal is to identify a path that takes you directly to the data you want and can be replicated easily for all subsequent items. This is usually accomplished through creating a series of lists. For each type of list there will be actions to perform at the beginning, in the middle, and at the end of the list. Below is an explanation of each.
Begin List Action
First, all lists start with a Begin List action. This action specifies what type of list it is and also defines the items that should be included in the list. (On certain types of lists you can refine your list to include or exclude items based on certain conditions.)
Inner List Actions
Next, all lists have one or more actions listed after the Begin List action that will be repeated for all items in the list. The number of actions inside the List will be determined by what the user is trying to accomplish; for example, these actions may instruct Mozenda to capture a particular piece of data like a product name, or there may be a click action instructing Mozenda to click on a link for each item in the list and navigate to a new web page.
End List Action
Finally, all lists have an End List action specifying where the list ends. This action tells Mozenda that all the list items have been collected for a specific list and to check with the Begin List action to see if there are any additional list items to be processed.
An Item List is the most used list type and the most flexible. Most websites containing large amounts of data organize it into an Item List. Some examples of these Item List formats are shown in the image gallery above, but here are several scenarios describing when an Item List is appropriate:
- Capturing and clicking into each product category on an e-commerce site
- Capturing a list of products and its associated image and price.
- Capturing a list of all the reviews for a given product or service.
- Capturing a list of images associated with items in the list.
Data lists allow users to upload a spreadsheet of inputs into a Mozenda Collection and then use the values in the spreadsheet as inputs on a target website. Imagine you had the UPCs for 100,000 products and you wanted to search for them individually on a target website then gather the price and availability. A Data List allows you to do just that. Furthermore, once you have an Agent setup to use inputs from a spreadsheet then you can change the contents of the spreadsheet without having to change the Agent.
Check out this help document on using data from a file as inputs to see a step-by-step walkthrough and video.
Input lists differ from Data Lists in that they require the user to manually input or select an option to get a result. For example, most auto-parts sites require the user to first enter the make, model, and year of the car before searching for relevant parts. This usually is done by selecting the correct value from a drop-down menu. Wherever Input Lists are required, the user can instruct Mozenda to iterate through specific, or all possible combinations of inputs to capture the desired results.
Once you become proficient with the Mozenda Agent Builder and have a few Agents under your belt, you’ll start to see how multiple types of lists can be used together in the same Agent to simplify and organize your data collection process even more.
For example, imagine you want to search several hundred zip codes to find all of the locations of a popular retailer and then capture the contact information? You’ll start by using a Data List of zip codes to search for specific results. The Data List will systematically enter one zip code at a time and click the find button to locate nearby locations. Then, you can create an item list on the results for all the locations within the zip code’s geographical area. For each location you can capture the contact information, the location information, and its operating hours.
Our Help Center has additional information, videos, and step-by-step walkthroughs on how to use lists in Mozenda. Here are a few helpful topics to get you started: