Scraping

Scraping, or "web scraping," is the process of extracting large amounts of information from a website. This may involve downloading several web pages or the entire site. The downloaded content may include just the text from the pages, the full HTML, or both the HTML and images from each page.

There are many different methods of scraping a website. The most basic is manually downloading web pages. This can be done by either copying and pasting the content from each page into a text editor or using your browser's File → Save As… command to save local copies of individual pages. Scraping can also be done automatically using web scraping software. This is the most common way to download a large number of pages from a website. In some cases, bots can be used to scrape a website a regular intervals.

Web scraping may be done for several different purposes. For instance, you may want to archive a section of a website for offline access. By downloading several pages to your computer, you can read them at a later time without being connected to the Internet. Web developers sometimes scrape their own websites when testing for broken links and images within each page. Scraping can also done for unlawful purposes, such as copying a website and republishing it under a different name. This type of scraping is viewed as a copyright violation and can lead to legal prosecution.

NOTE: While scraping a website for the purpose of republishing information is always wrong, scraping a site for other purposes may still violate the website's terms of use. Therefore, you should always read a website's terms of use before downloading content from the site.

Updated September 22, 2011 by Per C.

quizTest Your Knowledge

What type of data is the NTP protocol used to retrieve?

A
Weather
0%
B
Time
0%
C
News
0%
D
Financial data
0%
Correct! Incorrect!     View the NTP definition.
More Quizzes →

The Tech Terms Computer Dictionary

The definition of Scraping on this page is an original definition written by the TechTerms.com team. If you would like to reference this page or cite this definition, please use the green citation links above.

The goal of TechTerms.com is to explain computer terminology in a way that is easy to understand. We strive for simplicity and accuracy with every definition we publish. If you have feedback about this definition or would like to suggest a new technical term, please contact us.

Sign up for the free TechTerms Newsletter

How often would you like to receive an email?

You can unsubscribe or change your frequency setting at any time using the links available in each email.

Questions? Please contact us.