A pagescrape is a project designed to collect data from a specific webpage. This is different from a “crawl”, which targets a mass of domains. Pagescrapes can be more specific and allow users to collect more complete data, including context data.
Use Mozenda’s pagescrape software to harvest anything from any website.
In order to let PageScrape to know how to scrape the required data the user provides the following parameters – target URL (along with optional HTTP GET request parameters) and a Regular Expression. A Regular Expression is just a powerful and fairly standard way to express a set of textual search criteria, the provided Regular Expression is used by PageScrape to search the resulting HTML stream for the required data. PageScrape connects to the Web server and submits a GET request, it waits to receive the resulting Web page (HTML text stream), and as it arrives PageScrape it searches using the provided Regular Expression, if a match occurs the matched data is output and the page download is stopped as the Web Screen Scrape is complete.
PageScrape utilities are great if you’re an engineer, but what if your not? For everyone else in the world there is Mozenda, maker of a simple Page Scraping software that works like a web browser. Mozenda software tools enable novice and expert users alike to quickly set up and build Page Scraping agents. These agents are capable of scraping only the data you want and then put it into a structured database format such as CSV, TSV, XML, etc. Unlike other page scraping solutions, Mozenda is both easy to use and affordable.
Mozenda’s patented technology provides eCommerce professionals tools for mining the invisible web, bringing affordable access to remarkable quantities of information.
Mozenda Training Videos
|Input Text Into a Form||0:44|
|Click the “Next” Button to Load the Next Page of Results||1:58|
|Schedule an Agent to Run Regularly||1:08|
|Combine the Contents of Two Fields||1:16|