Mozenda Glossary of Terms

Page Scrape

See also Create RSS Feeds, HTML Scrape, Web Content Extractor, Data Harvesting


A pagescrape is a project designed to collect data from a specific webpage. This is different from a “crawl”, which targets a mass of domains. Pagescrapes can be more specific and allow users to collect more complete data, including context data.

Traditional Pagescrape

In order to let PageScrape to know how to scrape the required data the user provides the following parameters – target URL (along with optional HTTP GET request parameters) and a Regular Expression.  A Regular Expression is just a powerful and fairly standard way to express a set of textual search criteria, the provided Regular Expression is used by PageScrape to search the resulting HTML stream for the required data. PageScrape connects to the Web server and submits a GET request, it waits to receive the resulting Web page (HTML text stream), and as it arrives PageScrape it searches using the provided Regular Expression, if a match occurs the matched data is output and the page download is stopped as the Web Screen Scrape is complete.

Mozenda Pagescrape

PageScrape utilities are great if you’re an engineer, but what if your not? For everyone else in the world there is Mozenda, maker of a simple Page Scraping software that works like a web browser. Mozenda software tools enable novice and expert users alike to quickly set up and build Page Scraping agents. These agents are capable of scraping only the data you want and then put it into a structured database format such as CSV, TSV, XML, etc. Unlike other page scraping solutions, Mozenda is both easy to use and affordable.

