Email parsing + database generation
Many generated emails use a
label: data
or similar format that makes it easy to extract data items and transmit them to database fields corresponding to the labels.
Data Splitter string sets allow mapping of the labels to database destinations. For example :
Text | Target |
---|---|
Customer: | table.custname |
CustomerID: | table.custid |
TransactionID: | table.transid |
Transaction date: | table.transdate |
Amount: | table.amt |
.... | ... |
The extracting data from emails topic demonstrates how this works. Sample solution EMail-To-Database.dss installs with Data Splitter and can be modified to work with other email formats. See the email parsing topic for more information.
The general approach to setting up an email parser :
- Analyze the emails to be parsed.
- Configure the database to be generated - see the database topic for more information.
- Configure Data Splitter to "map" the email content to the database.
- Run Data Splitter.
In detail :
Analyze the emails
Consider the following questions when analyzing the emails to be parsed :
- Which fields to extract
- Which emails to scan, i.e. is a filter required to restrict parsing to certain emails in the folder?
- Should only new emails, i.e. ones that Data Splitter hasn't read yet, be parsed?
Sample EMail-To-Database.dss demonstrates filtering emails based on the subject (SubjectFilter). Data Splitter has a "New messages only" checkbox under Message Options that restricts parsing to as-yet-unscanned messages.
Configure the database
Use a DBMS (Database Management System) such as Microsoft Access to create the "target" database. Questions to consider when designing the tables :
- How many email categories, how many tables?
- The data types of the target fields. Consider that a part "number" may actually contain alphabetic characters and must therefore be defined as a "text" field.
- Will Data Splitter output to "raw" tables will relatively loose restrictions on field types, lengths, null attributes, etc.? Using memo fields or long text fields, and allowing them to be null, is often a good approach, especially early in parser development.
Visit the Database + ODBC page for information regarding database creation and configuring the ODBC (Open Database Connectivity) connection.
Download, install and start Data Splitter
Download the Data Splitter self-installing executable (.EXE) file.
Install Data Splitter by running the downloaded .EXE file on your computer. It is recommended that you accept the defaults by simply pressing "Enter" until the installation has finished. Take time to read the license agreement, though!
Start Data Splitter - the installation creates a Start menu entry and icons to run the program.
Select the appropriate email parsing template
The Data Splitter installation is accompanied by several sample email parsing configurations. The best way to create an email parser is to adapt an existing parser that closely matches the desired task.
EMail-To-Database.dss is a good starting point for many email formats. Modify the string sets and action group "NewEMail" as needed.
Contact Data Splitter support for assistance with email parser configuration.
Set the input
The input is specified as one or more message folders. Select Input/Output | Input email folders to define the folders to be scanned. When the Run | Email input (Ctrl+E) command is selected all emails in the specified folders will be scanned.
Set the output
Data Splitter can output to databases or files. Specify output file(s) using the "Input/Output" menu, option "Output files". Specify the target database using the "Input/Output" menu, option "Database". Consult Data Splitter help for information on output destinations. Visit the Database and ODBC topic for more information on databases.
Map the input to the output
Modify the Data Splitter string set(s) using the "Definitions" menu, option "String sets". String sets are also accessible via the "Quick Start" menu.
Modify other configuration data as necessary
In order to perform the desired task the sample configurations may require modification beyond changing input, output and string sets. Consult Data Splitter help and the tutorial for more information.
Run
Use the Run | Email input (Ctrl+E) command to parse the emails. See the run topic for more information.
Obtaining the desired output typically requires a few iterations of three basic steps :
- Modify the configuration,
- run,
- check the results.
Contact Data Splitter support for assistance with email parser configuration.
MAPI email / message interface
Data Splitter uses the Messaging Application Programming Interface (MAPI) to interact with the Windows messaging system. This provides access to messages stored by MAPI-enabled email clients such as Microsoft Outlook.
In order to determine whether or not it has already seen a message, Data Splitter marks each message with a time stamp. This enables Data Splitter to run in "New messages only" mode. Apart from this time stamp, Data Splitter's access to the messaging system is read-only.