Data Mining

Also referred to as data extraction, or data scraping. Data mining is the process by which a program extracts data from a website, strips out the irrelevant information, and stores the remaining data for later retrieval and/or analysis.

Data mining software and services can aid a company in a variety of ways, including:

  • Tracking competitor inventory and prices.
  • Determining shipment tracking status and expected delivery times.
  • Determine accurate, up to the minute shipping rates.
  • Minimize data entry time and possibility of errors by grabing product specifications from manufacturer websites.

Data Extraction

How is data extracted from a website?

Data mining tools determine relevant data by looking for specific patterns in a webpage. When a section of the webpage is found that matches the specified layout, the data mining program looks at specific areas of that portion of the page for the relevant data.

Let's take for example ebay. A data mining program would pull up all the webpages in a certain category and look for a certain pattern. For this example the program is looking for a rectangular portion of the page that has an image to the left, and fields such as end time, shipping costs, item location in close proximity.

Once the data extraction program has found one or more portions of the page that match what it is looking for, it grabs data from specific areas of each matching block.

In the above example, the data mining program knows where it can find specific pieces of data based upon the data's relation to other elements on the page or sub-page.

Normally, data mining software will, at set intervals, automatically loading specific webpages, and extract the data from them. The frequency of the data extractions depends upon the frequency that data on the target page is likely to be updated.

Data Mining Software

Due to the complexity of defining the patterns that a data mining program searches for, and the endless variety that data is displayed on webpages throughout the internet, the existence of quality data mining software or data mining tools are almost non-existent. As a result of this, most all data mining software design are custom build jobs, done with a specific target website in mind.

Because data mining operates off of pattern recognition, the slightest change to a website can break the algorithm and render a data mining program ineffectual. For this reason, we at Databases Done Right offer data mining services rather than stand alone data mining programs. This service based approach allows us to adapt, and correct data mining algorithms if and when the target website is changed.

In conclusion, knowledge is power, and knowing what your competitors are doing will give your company the edge and help keep you one step ahead of them. Data mining has countless applications and is a great tool for a business of any size.