Web extraction
The extraction of data from web pages is also known as "web scraping".
Visit the web scraping topic for more information.
These samples are (more or less) ready to use once they're installed.
The Web-* samples install with Data Splitter. They visit the websites in the Input URLs list and produce HTML summaries of the extracted data.
The Census-* samples install separately. They contain Microsoft Access databases and automatically configure the ODBC DSNs for those databases during installation. The Census-* samples require MS Access 2000 or later.
Web-watch-words
This sample scans websites periodically for keyword occurrences. The user specifies :
- the URLs to watch
- the keywords to watch for
- the HTML tags to watch
- the timer start time
- the timer interval
This sample outputs a web page (HTML file). You can accumulate results in the output by checking the "append" box for the output file. In this case results will be added to the bottom of the output file.
Web-watch-words can be modified to produce other file formats or perform database updates.
Installs with Data Splitter.
Web-extract-number
This sample is configured to scrape stock quotes, but can be modified (easily) to scrape labeled numbers from other web pages by changing the Label and the input URL list.
Installs with Data Splitter.
Web-extract-title-header
Extracts two items from each web page in the input URL list :
- the content of the web page's <title> tag
- the content of the first header tag as defined in the HeaderTag string set
The "rules" in the HeaderTag list determine which headers are extracted. These rules work well for the news headlines sites specified in the sample input URL list, but will probably require modification for other sites.
Installs with Data Splitter.
Census-01
U.S. Census Bureau table scraper: state populations + areas
Extracts selected fields from a single table on the U.S. Census Bureau website and puts the extracted data in a database.
To run this sample :
- download and install Data Splitter, if you haven't already
- download and install the Census-01 sample
- open the Census-01 sample (MS Access will start)
- press the Grab button
- wait for the scan to complete
- press the Output button to view the results
See the HTML table parser example for more information.
Installs separately. Requires MS Access 2000 or later.
Census-02
U.S. Census Bureau table scraper: zip code data
Extracts selected fields from multiple zip code tables on the U.S. Census Bureau website. Just specify the zip codes you're interested in and press the Grab button.
To run this sample :
- download and install Data Splitter, if you haven't already
- download and install the Census-02 sample
- open the Census-02 sample (MS Access will start)
- press the Zip Codes button and enter the zip codes
- press the Grab button
- wait for the scan to complete
- press the Output button to view the results
This sample demonstrates the use of SQL to convert the user-entered zip codes into a source URL table which is scanned to produce the desired output.
Installs separately. Requires MS Access 2000 or later.