File searchers + transformers

Search: Search + replace: Convert: Etcetera:

Data Splitter can convert Microsoft Word .DOC files to text prior to scanning.   Microsoft Word 97 or later is required to use this feature.   "Convert MS Word (.doc) files to text" is turned on by default - change this setting by selecting menu option Input/Output | Input options.


C-functions

Extracts function definition headers from C and C++ source files.   Even works on MFC source.

Contact Data Splitter support for more information.


CSV-database

Imports CSV (comma-separated value) data into a database table.   Transmits fields 1-13 in the input to fields A-M in the database.   Ignores the first line of input, which is assumed to contain layout information.   Can easily be adapted to import a different number of fields.

Contact Data Splitter support for more information.


File-count-keywords

Counts words and keywords in a group of HTML files.   User specifies the input file(s) and "Keywords" lists.

Installs with Data Splitter.


File-count-lines

Computes the line count of a group of text files.

Installs with Data Splitter.


File-CRLF-LF

Replace carriage return - line feed sequences (0x0D0A) with line feeds (0x0A).

Installs with Data Splitter.


File-filter-unprintables

Extracts printable characters (ASCII 30-126) from the input, discards everything else, i.e. unprintable characters.

Installs with Data Splitter.


File-generate-site-index

Generates a site index (web page) from web pages (HTML files) in a directory.   Extracts the TITLE and description META HTML tags for each page.   Used to generate this site's index.

Installs with Data Splitter.


File-HTML-generate-line-breaks

Replaces newline characters with HTML <BR> tags.   Can be used as a post-processor to preserve newlines when converting to HTML.

Installs with Data Splitter.


File-LF-CRLF

Replace line feeds (0x0A) with carriage return + line feed sequences (0x0D0A).

Installs with Data Splitter.


File-search

Powerful file searcher allowing the user to specify multiple input file specifications and multiple search items.
Produces 4 outputs :

The user can customize these output formats.

Installs with Data Splitter.

Output format similar to grep.


File-search-replace

Searches for and replaces one or more patterns in the input files.   User specifies the input file(s) and "new text" lists.

Generates 2 results summaries :

The user can customize these output formats.

Installs with Data Splitter.


File-search-whole-words-only

Performs a "whole-word-only" search for a string.

Contact Data Splitter support for more information.


File-search-wide

Performs wide-string (Unicode) text search.

Contact Data Splitter support for more information.


File-swap-bytes

Swaps consecutive bytes in a group of files.   Big-endian to little-endian and vice-versa.

This application of Data Splitter is trivial and inefficient (and a little silly), but it works.   It handles the leftover byte if the input file has an odd byte count.   Processing the odd last byte allows for round-trip conversions that return the input files to their starting states.

This application of Data Splitter is inefficient because Data Splitter examines the value of every byte, which, for this task, is not necessary.

Contact Data Splitter support for more information.


HTML-Unicode-maps-to-C

Converts ISO8859-to-Unicode maps to C include files.   Reads the HTML files, filters out the hexadecimal ISO8859-to-Unicode mapping values and formats them so that they can be included and compiled in a C program.   The input maps can be found at :  ftp://ftp.unicode.org/Public/MAPPINGS/ISO8859.

Contact Data Splitter support for more information.


PAD-database

Parses a PAD (Portable Application Description) XML file and loads selected values into a database table.

Contact Data Splitter support for more information.


PDF-database

Data Splitter has been used to import PDF (Portable Document Format) file data into databases.   Data Splitter is currently unable to read PDF directly, so the PDF files must be converted to text prior to the import.

Contact Data Splitter support for more information.


RTF-HTML

Generates web content (HTML) from RTF (rich text format) files.   Does a decent job of converting the old Set Machine RTF help file to HTML.   May require modification for other RTF files!   Builds links and a separate index file too.

Contact Data Splitter support for more information.


RTF-lines

Replaces newline characters with RTF "\line" tokens.

Contact Data Splitter support for more information.


Syntax-check

Perform syntax checking on files.   See the syntax checker example or contact Data Splitter support for more information.


XML-database

XML parsing sample.   See the XML parsing topic or contact Data Splitter support for more information.


Legacy data conversion

Data Splitter can be configured to transform legacy data formats into databases, XML, or other formats.

Examples

Data Splitter has been used to :

Contact Data Splitter support for more information.