Web Harvesting Tips: Capture Table and Capture Name-Value Pairs
March 24, 2015
Welcome to a new chapter of Mozenda harvesting technology; we have released the culmination of many months of work, and we are thrilled to put it into the hands of our customers. At first glance, you may not see much of a difference in the appearance of the Agent Builder. But under the hood, the entire application has been transformed. In future blog posts, we’ll get into the nitty-gritty of what exactly was changed and why it was necessary. In this post, I’ll give you the critical information you need to take your Mozenda Agents to a whole new level.
Opt In to the Updated Builder
The next time you log into the Mozenda Web Console, you’ll be prompted to accept the latest update of the Mozenda Agent Builder. By accepting this update, any new or edited Agents will run on new servers that are dedicated to running the latest version of the software. Until you accept the update, your Agents will continue to run as they did prior to the update and you will not have access to the latest version of the Builder. However, I’m fairly certain that after reading through the rest of this post, you’ll be excited to accept the update as soon as possible.
This update came with two new actions that have the ability to transform the way many Agents are architected and maintained. These new actions eliminate the time spent creating individual Capture Actions for each value that you want to capture and associating them with nearby text. Instead, Mozenda will automatically create the Capture Actions for you and will automatically add additional Capture Actions as new values are detected. Here is a brief introduction into both of these new actions.
Capture Name-Value Pairs
The genesis of this action came from specification lists that many websites have for their products and services. Imagine creating an Agent on an electronics website and trying to create capture actions for each specification that you want to capture and then associating them with nearby text. Next, imagine going to 10 or 20 different product pages to find examples of all the different specifications that can be listed and appropriately adding capture actions for them.
For many of you, this process sounds very familiar and you also know that this can take hours. The Name-Value Pairs action is the solution for this problem and within a few clicks your Agent will not only create all the capture actions needed and associate them with nearby text, but it will also add additional capture actions as new specifications are found on different pages.
This action was designed to be used in a wide variety of situations and layouts, and it has quickly become a favorite for our Professional Services team that maintains tens of thousands of Agents. Whenever there are multiple values that you want to capture that have a similar format and associated labels, you’ll want to use this feature.
Here are a few examples of places where this feature can be used:
- Technical specification lists (see the image below)
- Individual bios – (name, address, phone, title, etc.)
- Thumbnail images lists
- Related category lists
- Bulleted lists
- Key-value lists
Click on the additional resource below to show step-by-step examples of using this Action.
This action makes capturing a table that has dynamically generated columns a quick and easy process. To illustrate its power, first imagine trying to capture a table of products from different categories on an electronics site. Now, imagine the table has a varying number of columns depending on the product category and that sometimes the site merges table cells across columns and rows. Accurately capturing tables like this would traditionally take hours to research and account for all the variations and to create the necessary alternate locations for many Capture Actions.
With our new Capture Table Action, these problems are solved and this can be accomplished in only a few clicks. And as Mozenda detects additional columns in the table through testing or running the Agent, it will add the necessary capture actions automatically. Click on the additional resources below to see step-by-step examples of how to use this new Action.
Browser Isolation Mode
This release has significant improvements to running the Agent Builder in Browser Isolation mode. This mode causes all of the browsers in the Agent Builder to run in their own process. If you are a new customer since this release, you are already running in Browser Isolation mode. If not, you’ll need to enable it in the Advanced Features area of the Agent Builder (see the image below). The main benefit of running the Agent Builder in Browser Isolation mode is improved performance for Agents that have lots of pages or on websites that have complex DOM structures. In addition, Mozenda is able to better manage the memory used by the browsers and prevent poorly designed websites from causing the Agent Builder to need to be restarted after long periods of use. This is particularly helpful when testing long-running Agents that navigate to many webpages.
As mentioned in our blog post announcing this update, we have added significant improvements to the Agent Builder to improve overall site compatibility and improve the experience of our users when creating and editing Agents.
Internet Explorer 10 and 11 Compatibility
As promised, we have made the necessary changes in our application to take advantage of the latest versions of Internet Explorer 11 and below. We talked about this change at length in our previous post and we encourage you to read it to become familiar with the changes and how they can affect your Agents.
Ajax and Frame Request Handling
One significant change that we made to the Agent builder was the logic that determined whether or not to load a navigation request into a new page or allow the website to change the current page. Previous to this update, the Agent Builder would try to detect if the website initiated Ajax or Frame navigations as a result of clicking a link on the webpage. Unfortunately, this led to some instances where the website was unable to load additional content properly. With so much Ajax and iFrame activity taking place on most modern websites, we have determined that users will be more successful in these instances if we “get out of the way” and allow the website to perform the frame and Ajax requests it desires. We now simply annotate the user initiated action with the Wait for Ajax flag and allow the website to finish loading any new content before proceeding to execute the next action.
This is just the beginning of many new features and improvements we are working hard to make here at Mozenda. We are excited to transform the way professionals get data from the web and look forward to working with each of our customers on how they can better use these new features. If you have any questions about the update or on how these new Actions can be used in your Agents then please don’t hesitate to reach out to our Support team.
Chris Curtis is a Technical Lead at Mozenda and is part of our Product team. He has also served as part of our Support and Managed Services teams. Chris joined Mozenda shortly after the company was founded in 2007.