Data Splitter email parsing
For each email in an input folder Data Splitter loads the header and body as a single input stream.
The header contains one line for each of the following fields :
Field: | Field starts with: |
---|---|
subject | "Subject:" |
sender | "From:" |
recipient | "To:" |
date / time | "Date:" |
The header and body are separated by a blank line, per SMTP specifications.
The techniques described in the Data Splitter tutorial can be used to parse emails.
Email parsing example
Notice that the header and the body are parsed separately. While parsing the header the start node is set to "Header". While parsing the body the start node is set to "Body". After "EndHeader" (the first blank line) is recognized the start node is set to "Body". The start node must be reset to "Header" in the pre-stream actions. This practice is advisable for parsing emails whose bodies may contain header contents ("Subject:", "From:", "To:", etc.), for example: when replying or forwarding.
Note the use of a null node, "EndMessage". The null node is recognized at the end of the input stream (email body). The values collected from the email are sent to the database at that point by action group "NewEMail".
See sample EMail-To-Database.dss.
In the bodies of these emails the general format of the data we're interested in is :
label ... spaces ... data ... end-of-line.
A label is descriptive text, typically followed by a colon, for example: "Item name:".
This solution makes use of the following patterns :
* | zero or more of any character |
---|---|
Num | one or more numeric digits |
WS0+ | zero or more whitespace characters (blank, tab, etc.) |
EndHeader | a blank line (two line feeds) |
EndMessage | null pattern indicating end of email body |
It also makes use of string sets :
SubjectFilter | text that begins the subject field in the email header |
---|---|
TextFields | a list of text fields expected in the email body, and their destinations in the database |
NumericFields | a list of numeric field labels and their respective database destinations |
CurrencyFields | a list of currency field labels and their respective database destinations |
and node groups :
Name-Address | extracts the name and email address (RName + RMail) from the from/to fields in the email header |
---|---|
Decimal-Number | gets a decimal number : digits + decimal point + digits |
Text | gets text from the input, strips leading / training blanks, stops at end of line |
Here's the definition of the TextFields string set :
This solution also makes use of a database / ODBC connection. File "SQL.txt" accompanies the installation: it contains an SQL statement for creating the "eauction" table used by this sample.