File searchers + transformers

Search:	Search + replace:	Convert:	Etcetera:
File-search File-search-whole-words-only File-search-wide File-count-keywords File-count-lines Syntax-check C-functions	File-search-replace File-LF-CRLF File-CRLF-LF File-HTML-generate-line-breaks File-filter-unprintables	CSV-database PDF-database XML-database PAD-database RTF-HTML RTF-lines HTML-Unicode-maps-to-C	File-generate-site-index File-swap-bytes Legacy-data

Data Splitter can convert Microsoft Word .DOC files to text prior to scanning. Microsoft Word 97 or later is required to use this feature. "Convert MS Word (.doc) files to text" is turned on by default - change this setting by selecting menu option Input/Output | Input options.

C-functions

Extracts function definition headers from C and C++ source files. Even works on MFC source.

Contact Data Splitter support for more information.

CSV-database

Imports CSV (comma-separated value) data into a database table. Transmits fields 1-13 in the input to fields A-M in the database. Ignores the first line of input, which is assumed to contain layout information. Can easily be adapted to import a different number of fields.

Contact Data Splitter support for more information.

File-count-keywords

Counts words and keywords in a group of HTML files. User specifies the input file(s) and "Keywords" lists.

more information

Installs with Data Splitter.

File-count-lines

Computes the line count of a group of text files.

Installs with Data Splitter.

File-CRLF-LF

Replace carriage return - line feed sequences (0x0D0A) with line feeds (0x0A).

Installs with Data Splitter.

File-filter-unprintables

Extracts printable characters (ASCII 30-126) from the input, discards everything else, i.e. unprintable characters.

more information

Installs with Data Splitter.

File-generate-site-index

Generates a site index (web page) from web pages (HTML files) in a directory. Extracts the TITLE and description META HTML tags for each page. Used to generate this site's index.

more information

Installs with Data Splitter.

File-HTML-generate-line-breaks

Replaces newline characters with HTML <BR> tags. Can be used as a post-processor to preserve newlines when converting to HTML.

more information

Installs with Data Splitter.

File-LF-CRLF

Replace line feeds (0x0A) with carriage return + line feed sequences (0x0D0A).

Installs with Data Splitter.

File-search

Powerful file searcher allowing the user to specify multiple input file specifications and multiple search items.
Produces 4 outputs :

An HTML file listing the "hits", with links back to the original documents
A text file listing the hits
A text file listing the files with hits
A text summary with file counts and hit counts

The user can customize these output formats.

Installs with Data Splitter.

Output format similar to grep.

File-search-replace

Searches for and replaces one or more patterns in the input files. User specifies the input file(s) and "new text" lists.

Generates 2 results summaries :

An HTML results file with a table showing which files changed and how many replacements were made in each file
A simple text file listing the files that changed

The user can customize these output formats.

Installs with Data Splitter.

File-search-whole-words-only

Performs a "whole-word-only" search for a string.

Contact Data Splitter support for more information.

File-search-wide

Performs wide-string (Unicode) text search.

Contact Data Splitter support for more information.

File-swap-bytes

Swaps consecutive bytes in a group of files. Big-endian to little-endian and vice-versa.

This application of Data Splitter is trivial and inefficient (and a little silly), but it works. It handles the leftover byte if the input file has an odd byte count. Processing the odd last byte allows for round-trip conversions that return the input files to their starting states.

This application of Data Splitter is inefficient because Data Splitter examines the value of every byte, which, for this task, is not necessary.

Contact Data Splitter support for more information.

HTML-Unicode-maps-to-C

Converts ISO8859-to-Unicode maps to C include files. Reads the HTML files, filters out the hexadecimal ISO8859-to-Unicode mapping values and formats them so that they can be included and compiled in a C program. The input maps can be found at : ftp://ftp.unicode.org/Public/MAPPINGS/ISO8859.

Contact Data Splitter support for more information.

PAD-database

Parses a PAD (Portable Application Description) XML file and loads selected values into a database table.

Contact Data Splitter support for more information.

PDF-database

Data Splitter has been used to import PDF (Portable Document Format) file data into databases. Data Splitter is currently unable to read PDF directly, so the PDF files must be converted to text prior to the import.

Contact Data Splitter support for more information.

RTF-HTML

Generates web content (HTML) from RTF (rich text format) files. Does a decent job of converting the old Set Machine RTF help file to HTML. May require modification for other RTF files! Builds links and a separate index file too.

Contact Data Splitter support for more information.

RTF-lines

Replaces newline characters with RTF "\line" tokens.

Contact Data Splitter support for more information.

Syntax-check

Perform syntax checking on files. See the syntax checker example or contact Data Splitter support for more information.

XML-database

XML parsing sample. See the XML parsing topic or contact Data Splitter support for more information.

Legacy data conversion

Data Splitter can be configured to transform legacy data formats into databases, XML, or other formats.

Examples

Data Splitter has been used to :

Generate database records from scanned PDF files
Transform archaic data (for example: 80-column punch card-style) into database tables
Repair incorrectly formatted data

Contact Data Splitter support for more information.