Products -- The AddToIt Document Engine

The AddToIt Document Engine was designed with flexibility and performance in mind. Some of the features of the AddToIt Document Engine include:
  • Retrieval of documents and data from multiple sources including websites
  • Transformation of documents from unstructured to structured formats, e.g. PDF to XML
  • Normalization of data within documents into a standard data form
  • Document and data storage for fast retrieval from a database
  • Delivery of data in multiple formats via multiple means (FTP/HTTP)

The diagram below provides an example of how the AddToIt Document Engine is often used. The AddToIt Document Engine is collecting data from HTML and PDF documents found on the web from within databases on disk or from email. It then transforms them into an XML document representation and normalizes the data. It stores the data in document and data form in the AddToIt database. Users of the data receive the data from the database on request (e.g. through a web interface) or it is delivered to them at regular intervals.

As important as what the AddToIt Document Engine is capable of is how the engine accomplishes these tasks. The AddToIt Document Engine is designed to be extremly configurable and as simple to configure as possible. This design goal leads to two results; the AddToIt Document Engine is extremely flexible and it can be configured to accomplish new tasks simply and easily by both subject matter experts and by non-experts. The benefits include:

  • Faster and cheaper configuration via XML control files
  • Deep configurability so that the software can handle tasks usually only done by hand
  • Document-centric model that allows for configuration via a web interface
  • Separation of work-flow so that non-technical subject matter experts can change the configuration of the engine
  • Creation of new control languages to suit the needs of the subject matter experts

The diagram below illustrates how the data extraction and normalization task is modified through the use of AddToIt software. Instead of a programmer repeatedly modifying code in order to extract data from new sources or types of documents, with the AddToIt Document Engine, control is in the hands of the analyst. The analyst can configure the AddToIt Document Engine on the fly for each new source of documents or each new type of document. This results in faster cheaper and more reliable data extraction implementations.