EmailReader

Not available in Community Designer

Short Description
Ports
Metadata
EmailReader Attributes
Details
Examples
See also

Short Description

EmailReader reads a store of email messages, either locally from a delimited flat file, or on an external server.

Component Same input metadata Sorted inputs Inputs Outputs Java CTL Auto-propagated metadata
EmailReader
no
no
12--
no

Icon

Ports

When looking at ports, it is necessary that use-case scenarios be understood. This component has the ability to read data from a local source, or an external server. The component decides which case to use based on whether there is an edge connected to the single input port.

Case One: If an edge is attached to the input port, the component assumes that it will be reading data locally. In many cases, this edge will come from a FlatFileReader. In this case, a file can contain multiple email message bodies, separated by a chosen delimiter, and each message will be passed one by one into the EmailReader for parsing and processing.

Case Two: If an edge is not connected to the input port, the component assumes that messages will be read from an external server. In this case, the user must enter related attributes, such as the server host and protocol parameters, as well as any relevant username and/or password.

Port typeNumberRequiredDescriptionMetadata
Input0
no
For inputting email messages from a flat file String field
Output0
no
The content portAny
1
no
The attachment portAny

Metadata

EmailReader does not propagate metadata.

EmailReader has metadata templates on its output ports.

Fields of the templates have to be mapped using Field Mapping attribute. Otherwise, null values are sent out to output ports.

Table 48.3. EmailReader_Message - Output port 0

Field numberField nameData typeDescription
1MessageIDstringMessage ID
2FromstringSender of the message
3TostringAddressee of the message
4CcstringCopy sent to
5SubjectstringEmail subject
6DatestringEmail delivery date
7BodystringEmail content

Table 48.4. EmailReader_Attachment - Output port 1

Field numberField nameData typeDescription
1MessageIDstringMessage ID
2ContentTypestringContent type of attachment
3CharsetstringCharacter set of attachment
4DispositionstringAttachment or inline
5FilenamestringAttachment file name
6AttachmentRawbyteEmail attachment as bytes
7AttachmentFilestringPath to downloaded attachment

EmailReader Attributes

Whether many of the attributes are required or not depends solely on the configuration of the component. See Ports: in Case Two, where an edge is not connected to the input port, many attributes are required in order to connect to the external server. The user at minimum must choose a protocol and enter a hostname for the server. Usually a username and password will also be required.

AttributeReqDescriptionPossible values
Basic
Server Type  Protocol utilized to connect to the mail server. IMAP (default) | POP3
Server Name The hostname of the server.e.g. imap.example.com
Server Port  Specifies the port used to connect to an external server. If left blank, a default port will be used. Integers
Security  Specifies the security protocol used to connect to the server. None (default) | STARTTLS | SSL
User Name  Username to connect to the server (if authorization is required)  
Password  Password to connect to server (if authorization is required)  
Fetch Messages  Filters messages based on their status. The option ALL will read every message located on the server, regardless of its status. NEW fetches only messages that have not been read. NEW | ALL
Field MappingYesDefines how parts of the email (standard and user-defined) will be mapped to Clover fields. See Mapping Fields.
Source Folder Defines source folder on remote server. Use with IMAP only.e.g. INBOX
Mark/Delete Messages Defines what to do with read messages. By default, messages are marked as read.mark as read (default) | no action | delete
Max. Number of Messages Defines maximum number of messages to be downloaded. Any positive value defines the limit, negative value or 0 means unlimited.e.g. 50
Advanced
POP3 Cache File  Specifies the URL of a file used to keep track of which messages have been read. POP3 servers by default have no way of keeping track of read/unread messages. If one wishes to fetch only unread messages, they must download all of the messages IDs from the server, and then compare them with a list of message IDs that have already been read. Using this method, only the messages that do not appear in this list are actually downloaded, thus saving bandwidth. This file is simply a delimited text file, storing the unique message IDs of messages that have already been read. Even if ALL messages is chosen, the user should still provide a cache file, as it will be populated by the messages read. Note: the pop cache file is universal; it can be shared amongst many inboxes, or the user can choose to maintain a separate cache for different mailboxes.  

Details

EmailReader is a component suitable for reading of online or local email messages.

This component parses email messages and writes their attributes out to two attached output ports. The first port, the content port, outputs relevant information about the email and body. The second port, the attachment port, writes information relevant to any attachments that the email contains.

The content port will write one record per email message. The attachment port can write multiple records per email message; one record for each attachment it encounters.

Mapping Fields

If you edit the Field Mapping attribute, you will get Email to Clover Mapping dialog:

Mapping to Clover fields in EmailReader

Figure 48.9. Mapping to Clover fields in EmailReader


In its two tabs - Message and Attachments - you map incoming email fields to Clover fields by drag and drop. You will see metadata fields in particular tab only if corresponding edge is connected and has metadata assigned. First output port influences Message tab, second output port influences Attachments tab.

Buttons on the right hand side allow you to perform Auto mapping, Clear selected mapping or Cancel all mappings. Buttons on the left hand side add or remove user-defined fields.

User-defined Fields

User-defined Fields let you handle non-standardized email headers. Manually define a list of email header fields that should be populated from email message. For example, you can read additional email headers like Accept-Language, DKIM-Signature, Importance, In-Reply-To, Received, References etc.

See details on message headers at http://www.iana.org/assignments/message-headers/message-headers.xhtml.

Tips&Tricks

  • Be sure you have dedicated enough memory to your Java Virtual Machine (JVM). Depending on the size of your message attachments (if you choose to read them), you may need to allocate up to 512M to CloverETL so that it may effectively process the data.

Performance Bottlenecks

  • Quantity of messages to process from an external server EmailReader must connect to an external server, therefore one may reach bandwidth limitations. Processing a large number of messages which contain large attachments may bottleneck the application, waiting for the content to be downloaded. Use the NEW option whenever possible, and maintain a POP3 cache if using the POP3 protocol.

Examples

Reading E-Mails
Reading Attachments

Reading E-Mails

This example describes basic usage of EmailReader component.

Read email of Adam Smith (email: [email protected], password: InquiryInto). Read all messages. The example.com can be accessed via POP3 protocol.

Solution

Create a graph with EmailReader component, connect first output port of EmailReader with another component, and configure the component:

AttributeValue
Server TypePOP3
Server Nameexample.com
User Nameadam.smith
PasswordInquiryInto
Fetch MessagesALL
Field MappingMessageID:=MessageID; From:=From; To:=To; Cc:=Cc; Subject:=Subject; Date:=Date; Body:=BodyAsText;|
Mark/Delete Messagesno action
POP3 Cache File${DATATMP_DIR}/pop3cache

The POP3 Cache File must be in an existing directory.

The Field Mapping can be defined on the Message tab of the Email to Clover Mapping dialog.

Reading Attachments

This example describes reading attachments and saving the files under their original names.

Read attachments from email of John Doe ([email protected], password: MyKittenName123) and store the files into the data-out directory. The mailbox is accessible via IMAP4 protocol.

Solution

Create a graph containing EmailReader and FlatFileWriter. Connect the second output port of EmailReader with FlatFileWriter.

In EmailReader, set the following attributes:

AttributeValue
Server TypeIMAP
Server Nameexample.com
User Namejohn.doe
PasswordMyKittenName123
Fetch MessagesALL
Field Mapping|MessageID:=MessageID; ContentType:=ContentType; Charset:=Charset; Disposition:=Disposition; Filename:=Filename; AttachmentRaw:=AttachmentRaw; AttachmentFile:=AttachmentFile;
Mark/Delete Messagesno action
Max. Number of Messages0

The Field Mapping in EmailReader can be configured on the Attachment tab of the Email to Clover Mapping dialog.

In FlatFileWriter, set the following attributes.

AttributeValue
File URL${DATAOUT_DIR}/#
Create directoriestrue
Exclude fieldsMessageID;ContentType;Charset;Disposition;AttachmentFile;Filename
Partition keyFilename
Partition file tagKey file tag
[Note]Note

You should filter out null file names before writing. Use Filter.

You should handle duplicated file names as well.

Compatibility

In CloverETL 3.4.x and 3.5.x, Auto mapping (accessible via Field mapping attribute) is automatically performed when you first open this window.

See also

EmailSender
Common Properties of Components
Specific Attribute Types
Common Properties of Readers
Readers Comparison