Mozenda Web data mining solution Trusted by Enterprise
  • The Data You Want is Closer Than You Think

    Tell us about your data extraction needs.

  • Can‘t wait and need answers now?
    Here are other ways to reach us:

    +1 (801) 995-4550

    sales@mozenda.com

  • 100% Privacy. You are that important to us. Privacy Policy

Mozenda Glossary of Terms

Data Mining Solution


See also Data Capture, Web Content Data Mining, Web Scrape

 

A data mining solution is any system that collects information from the web and organizes it into a format you can work with; this is usually in the form of a spreadsheet. In Mozenda’s case, you can customize the way the data is organized, and even create different versions of any particular data set.

 

Website Database Extraction Solutions by Mozenda


 

 

 

Uses


Some companies use data mining to perform indexing functions on a variety of sites for a variety of reasons. These functions include indexing how many and what types of pages exist, finding out how many times certain terms are referenced, and finding how many hyperlinks and backlinks are exist on website. The most popular data mining solutions target search engines. These solutions have the specific task of attempting to index the web to make content more searchable and available to users.

Policies


Web crawlers use sets of instructions or policies to determine the crawling behavior. Many crawlers do not hit “every” page, but hit enough pages to determine what is important and what is not. The instructions that crawlers use are called policies:

  • Restriction policy– exclude MIME type pages
  • Normalization policy– avoid crawling the same resource more than once
  • Selection policy– which pages to download and in what order
  • Revisit policy– checks for changes on the pages being crawled
  • Politeness policy– when and how frequently requests can be made to the website

 

Mozenda Training Videos


 

Title Length
Play Button Capture Text 0:43
Play Button Input Text Into a Form 0:44
Play Button Click the “Next” Button to Load the Next Page of Results 1:58
Play Button Schedule an Agent to Run Regularly 1:08
Play Button Combine the Contents of Two Fields 1:16