Example : web page word watcher
This sample, Web-Watch-Words.dss, monitors a list of websites for keyword occurrences. The user specifies :
- the URLs to watch
- the words to watch for
- the HTML tags to watch
- the timer start time
- the timer interval
Two HTML output files are generated :
- latest results
- all results
"Latest results" contains the results of the last scan performed. It displays the time of the scan, the URLs of the pages containing the search words, and the text on those pages containing the search words. "All results" contains an accumulation of the latest results, oldest at the top, newest at the bottom. "All results" can become quite large after a while; you may need to archive / delete it periodically.
Sample output :
Specify the URLs to watch :
The words to watch for :
The user can also specify the start tags of the HTML elements whose content is to be watched :
In this example HTML header (1..3) and paragraph elements will be watched.
The user can also configure a timer to control when the watch scans occur :
In this example the interval is 1 hour (3600 seconds). The first scan will run at 12:30 pm on the current day. Subsequent scans will run every hour thereafter, until the run is canceled or Data Splitter is exited. If a start time that is already past is configured an error dialog will appear when the user attempts to run.
How it works
The top-level parser extracts the desired HTML elements :
After the top-level parser extracts an HTML element it hands it off to another parser, "CheckContent" :
CheckContent simply checks its input (the HTML element content) against the search word list. If any one of the search words is present the content is sent to the output by action group "ShowContent".