File searchers + transformers
Search: | Search + replace: | Convert: | Etcetera: |
---|---|---|---|
Data Splitter can convert Microsoft Word .DOC files to text prior to scanning. Microsoft Word 97 or later is required to use this feature. "Convert MS Word (.doc) files to text" is turned on by default - change this setting by selecting menu option Input/Output | Input options.
C-functions
Extracts function definition headers from C and C++ source files. Even works on MFC source.
Contact Data Splitter support for more information.
CSV-database
Imports CSV (comma-separated value) data into a database table. Transmits fields 1-13 in the input to fields A-M in the database. Ignores the first line of input, which is assumed to contain layout information. Can easily be adapted to import a different number of fields.
Contact Data Splitter support for more information.
File-count-keywords
Counts words and keywords in a group of HTML files. User specifies the input file(s) and "Keywords" lists.
Installs with Data Splitter.
File-count-lines
Computes the line count of a group of text files.
Installs with Data Splitter.
File-CRLF-LF
Replace carriage return - line feed sequences (0x0D0A) with line feeds (0x0A).
Installs with Data Splitter.
File-filter-unprintables
Extracts printable characters (ASCII 30-126) from the input, discards everything else, i.e. unprintable characters.
Installs with Data Splitter.
File-generate-site-index
Generates a site index (web page) from web pages (HTML files) in a directory. Extracts the TITLE and description META HTML tags for each page. Used to generate this site's index.
Installs with Data Splitter.
File-HTML-generate-line-breaks
Replaces newline characters with HTML <BR> tags. Can be used as a post-processor to preserve newlines when converting to HTML.
Installs with Data Splitter.
File-LF-CRLF
Replace line feeds (0x0A) with carriage return + line feed sequences (0x0D0A).
Installs with Data Splitter.
File-search
Powerful file searcher allowing the user to specify
multiple input file specifications and multiple search items.
Produces 4 outputs :
- An HTML file listing the "hits", with links back to the original documents
- A text file listing the hits
- A text file listing the files with hits
- A text summary with file counts and hit counts
The user can customize these output formats.
Installs with Data Splitter.
Output format similar to grep.
File-search-replace
Searches for and replaces one or more patterns in the input files. User specifies the input file(s) and "new text" lists.
Generates 2 results summaries :
- An HTML results file with a table showing which files changed and how many replacements were made in each file
- A simple text file listing the files that changed
The user can customize these output formats.
Installs with Data Splitter.
File-search-whole-words-only
Performs a "whole-word-only" search for a string.
Contact Data Splitter support for more information.
File-search-wide
Performs wide-string (Unicode) text search.
Contact Data Splitter support for more information.
File-swap-bytes
Swaps consecutive bytes in a group of files. Big-endian to little-endian and vice-versa.
This application of Data Splitter is trivial and inefficient (and a little silly), but it works. It handles the leftover byte if the input file has an odd byte count. Processing the odd last byte allows for round-trip conversions that return the input files to their starting states.
This application of Data Splitter is inefficient because Data Splitter examines the value of every byte, which, for this task, is not necessary.
Contact Data Splitter support for more information.
HTML-Unicode-maps-to-C
Converts ISO8859-to-Unicode maps to C include files. Reads the HTML files, filters out the hexadecimal ISO8859-to-Unicode mapping values and formats them so that they can be included and compiled in a C program. The input maps can be found at : ftp://ftp.unicode.org/Public/MAPPINGS/ISO8859.
Contact Data Splitter support for more information.
PAD-database
Parses a PAD (Portable Application Description) XML file and loads selected values into a database table.
Contact Data Splitter support for more information.
PDF-database
Data Splitter has been used to import PDF (Portable Document Format) file data into databases. Data Splitter is currently unable to read PDF directly, so the PDF files must be converted to text prior to the import.
Contact Data Splitter support for more information.
RTF-HTML
Generates web content (HTML) from RTF (rich text format) files. Does a decent job of converting the old Set Machine RTF help file to HTML. May require modification for other RTF files! Builds links and a separate index file too.
Contact Data Splitter support for more information.
RTF-lines
Replaces newline characters with RTF "\line" tokens.
Contact Data Splitter support for more information.
Syntax-check
Perform syntax checking on files. See the syntax checker example or contact Data Splitter support for more information.
XML-database
XML parsing sample. See the XML parsing topic or contact Data Splitter support for more information.
Legacy data conversion
Data Splitter can be configured to transform legacy data formats into databases, XML, or other formats.
Examples
Data Splitter has been used to :
- Generate database records from scanned PDF files
- Transform archaic data (for example: 80-column punch card-style) into database tables
- Repair incorrectly formatted data
Contact Data Splitter support for more information.