Scraping Web Data Up To 500% Faster
November 06, 2017
In doing this, we are demonstrating our continued commitment to bringing you not only the best and most effective web scraping software, but also our dedication to building software that evolves to meet the challenges and needs you experience each day.
Let’s dive in.
Auto-Blocker (or Request Blocker Version 2)
The original version of Request Blocker released in October enabled you to block navigation requests in each agent that slowed down the data gathering process making your agent faster. Our new Auto-Blocker feature within the Request Blocker functionality allows you to get results even faster with less heavy lifting to more effectively scrape the data you need.
On this new release, Shane Whitlock, a Mozenda developer who worked on the new Auto-Blocker feature, said, “With Request Blocker Version 2 users can compare every navigation request against a list of more than 500,000 domains that we have identified as potentially bad or unnecessary. This will automatically block many requests from ad servers that slow down an agent and the agent building process. There is a toggle button to turn this setting on or off. With this update, you can get some of the performance gains of request blocking without having to do any of the dirty work.”
You can turn on the Auto-Blocker by toggling it on/off in the Navigation Request window. When matches with the Auto-Block list are found, the Request Blocker control will light up with a green background to let you know there are requests that Auto-Block suggests you block. When you click through to the Request Blocking editor, the Auto-Block matches will prompt you to keep or deny. While additional gains can be achieved through manual and personalized request blocking, the results from simply clicking a button to accept the recommended blocks will be significant.
Here are our latest speed testing results showing the use of the Agent Builder with or without the Request Blocker feature:
View our Help Center’s guided tutorial of Auto-Block.
Job Sequencer Version 2
Building upon the Job Sequencer release last month, the Mozenda team is excited to give you three new tools within Job Sequencer that remove roadblocks to greater efficiency:
- Delete a View
- Update Field View
- Run a Sequence
Kenny Nielsen, an Account Manager in Mozenda’s Professional Services department who relies on the Job Sequencer functionality to gather millions of pieces of data for Mozenda clients, said, “The Professional Services team has used API functionality inside a sequencer environment for some time, and now that functionality is available in an automated step. These new steps allow you to do even more with the Job Sequencer tool, making it more versatile and customizable for the wide variety of ways our customers run their data.”
Be sure to check out our Help Center’s step-by-step tutorial of the new Job Sequencer features.
The first new step is the ability to Delete a View. This allows you to specify certain data from a scrape and delete it. For example, you may be scraping data about luggage but only want to compare the options that are $99 or less. You can set view that includes anything $100 or more and then use Delete View Data to remove that data from the collection. This feature gives you the control to view only the data you want to.
The second new feature is Update Field Value. It’s similar to the Delete a View feature, but instead of deleting the data it enables you to change the data. This can be especially helpful when scraping large amounts of data as it empowers you to do it in batches. Simply choose a view and a field from that subset, and then select a value and this feature allows you to change what that value is.
For example, say you are creating a web data extraction project that involves 100 items and each has a file that needs to be downloaded. Originally, the only option to download all of those files was to wait until the scrape had completed all 100 items. Now, you can set a view that only begins the file downloads on those that have a “Ready” status. You can use the Update Field Value feature to update 10 fields at a time to say “Ready” and then download 10 files at a time, instead of 100 at once. You can then update the field to say “Done” after download to know which items have been published.
The final feature is aptly named Run a Sequence. Now you can run another sequence in addition to the one currently running. This is important for three reasons:
- It breaks up the work. This sequence can start multiple other sequences to compartmentalize the scrape into smaller, easier to digest components.
- It allows for two very similar sequences to run each other. For example, after a substantial amount of data has been collected, you can start another sequence to start collecting more data right away while the large scrape publishes simultaneously.
- Now you can reuse repeated steps. You can reuse the same five steps for every single sequence every time instead of building them in every time.
Chris Curtis, Mozenda’s System Architect, said of these new Job Sequencer features, “If you’re doing difficult, hard projects you’ll recognize the value of these features because you understand the pain of doing these things in large projects. Each one of these options solves a major pain point.”
The possibilities with this feature are open-ended and can be used in very diverse and specific instances. Here are just a few examples of how you can use this new feature:
- DOM manipulation – Simply put, this allows you to add, modify, or remove any elements on the page. For example, this could enable you to add a new column or row to a table.
All of these features combine to make Mozenda an even more robust web scraping experience, bringing you the data you need faster than ever.
Demo Webinar Signup
We’ll also be hosting a live training webinar on Thursday, November 9 at 11 AM Mountain time. Space is limited so save your seat here right away.