EAI & ETL : When you should choose one over the other

Posted by Avinash Rao on Monday, 6 February, 2012

As part of our quarterly review process of all the FAQs, we have decided to touch upon the most frequently talked about subject, ETL vs EAI. In this article we look to address some of the pain points involved in the decision making process between the two, what they mean to your Business and when you should draw boundaries between these two  integration techniques.

As part of any data integration strategy, it is a golden rule of thumb to follow ‘prevent and prepare rather than repent and repair’. In most cases concerning Business Intelligence initiatives, if you know what you don’t need, you are already half way through in the quest to what you want to achieve. Business Enterprises invest millions of dollars on BI initiatives and Data Warehousing and hence rely on a consistent, accurate and reliable data. Thus following proper integration techniques becomes all the more vital. With concerned to enterprise-integration strategy, EAI is essential for maintaining a perfect harmony between business needs and IT solutions where as Data integration, a function of ETL is a constant need as inconsistent data at the heart of any BI initiative fails to provide an accurate picture of the Business. Hence the vital question EAI or ETL?

What is EAI (Enterprise Application Integration)?

EAI may be defined as a process of aligning a business’s strategic vision with its information technology. In other words EAI is mainly process integration that exists to allow client applications to operate on data from across the business through API calls without regard to its location, encapsulating technology or format. It involves integration of incompatible business applications within and beyond enterprise to allow them to talk to each other seamlessly and to share data in real time.

EAI solutions enable the automation of end-to-end business processes by coordinating sequences of tasks and resources (both systems and people) that perform them. EAI solutions support sophisticated exception management and the dynamic modification of processes even when processes are underway. EAI involves developing a ‘unified view’ of an enterprise’s business and its applications, seeing how existing applications fit into the scheme of things, and then devising ways to efficiently reuse what already exists while adding new applications and data.

EAI integration is mostly message-based, transaction-oriented, point-to-hub, brokering and transformation for application-to-application integration. Some of the core benefits offered by Enterprise Application Integration are.

  1. A focus on integrating both business-level processes and data
  2. A focus on reuse and distribution of business processes and data
  3. A focus on simplifying application integration by reducing the amount of detailed, application specific knowledge required by the users

What is ETL (Extraction, Transformation & Loading)?

As the acronym stands, ETL is a process of extraction / reading a database, transforming or validating the data, and lastly loading / writing to a database. Extract, Tranform & Load are three database functions that are combined into one tool (like DBSync) to pull data out of source databases and place it into target databases. ETL is used to migrate data from databases to others, to form data marts and data warehouses and also to convert databases from one format or type to another.

Extract - The process of reading data from source systems. Data can be extracted in schedule-driven pull mode or event–driven push mode. Pull mode operation supports data consolidation and is typically done in batch. Push mode operation is one online by propagating data changes to target data stores.

Transform - the process of converting the extracted data from its existing form into the format it needs to be in so that it can be placed into other systems or databases. Transformation occurs by using rules or lookup tables or by combining the data with other data.

Load - The process of creation and execution of workflows to write data into the target systems. Data loading may cause a complete refresh of a target data store or may be done by updating the target destination. Interfaces here include de facto standards like ODBC, JBDC, JMS, or application interfaces.

ETL is mainly designed to process very large amounts of data, ETL provides a suitable platform for

  1. Improved productivity by reuse of objects and transformations
  2. Strict Methodology
  3. Better Metadata Support, including impact analysis.

Drawing Boundaries for ETL & EAI

As mentioned in the above sections, both techniques rely on the concept of a unified view and the definition of a mapping that allows data from many disparate sources to be “projected” onto that view. But what differ are the purpose, speed and direction and amount of data that are transformed and placed within the unified view from the external sources.

Process-level integration mainly deals with building enterprise-wide business workflows and processes and incorporating existing applications into those processes. EAI middleware acts as the workflow engine integrating applications in near real time, passing small amounts of data through message queues and a series of stages. EAI tools provide much more complete workflow capabilities than ETL tools, which provide simple workflow.  EAI tools, and especially their workflow components, provide very sophisticated GUI development environments that enable design and management of very complex business processes. Here transformations are focused on ensuring a common understanding of the context and meaning (semantics) of the data involved within the message, the more likely a proven EAI is more appropriate.

With regards to data integration, ETL tools and also next generation ETL clearly holds an advantage whether in batch or real time. Synchronizing data between two applications involves a lot more data manipulation than simply moving data from point Source to Target. It involves data intense tasks that depend upon either RDBMS efficiencies/scalability or in-memory data caching to achieve the necessary throughput. Typically, enterprise data warehousing projects require you to move large amounts of data within relatively small windows of time. Also since ETL tools were born out of the relational database world, and thus are adept at performing SQL oriented transformations. They deal mainly with pulling data out of multiple relational tables, understanding the meaning and relationships between the tables, combining, merging, or joining that data, and augmenting it with data from their sources. Since this is more than just moving data from Source to Target, ETL is more appropriate.

When EAI

When ETL

High number of transactions

Large amounts of data

Message transforms

Complex transformations

Transformation act on a single row of data

Transformation is data-set oriented

Little data augmentation

Large data augmentation

1 to n ; m to n

Point-to-point

Suitable for real-time data needs

Suitable for large amounts of data

High volume, low footprint data exchange

Generally used to move data between 2 or more databases / data repositories


Jumpstart your Microsoft Dynamics GP (Great Plains) Integration Project

Posted by Avinash Rao on Thursday, 10 March, 2011

If you are starting to feel that your current accounting and business solution is slowly but surely becoming inadequate, not catering to your business needs entirely and also isn’t keeping up pace with your organization growth, then you have just outgrown the entry-level accounting software. In such a scenario if you’re seeking a solution that can overcome these limitations then, Microsoft’s ERP solution,  Dynamics GP (Great Plains) is an ideal upgrade that can cater to your business needs and perhaps more.

With the advent of cloud, there has been a fundamental shift in the way businesses are being operated these days. This change is more evident than ever before as more and more organizations require both on-premise and online solutions. In most cases these processes are interconnected, leading to the integration of financial systems, supply chain management, CRM, e-selling, and other key business processes. In view of this, GP provides great flexibility for small and mid-sized businesses alike to deal with the IT challenges they face in the new interconnected economy. Being a Microsoft Certified partner, we have managed to make better use of features incorporated in GP and has enabled us to provide successful integration with host of applications.

With DBSync, Microsoft Dynamics GP users can connect to databases, on-demand CRM tools and a variety of other applications to improve productivity and data visibility. Some of the features that customers can make use of are as follows,

  • Synchronize Customer Master Databases with Microsoft Dynamics GP
  • Convert Opportunities Won to Order generated in MS Dynamics GP
  • Convert Opportunities Won to Invoice in MS Dynamics GP
  • Integrate E-commerce website orders to MS Dynamics GP
  • Integrated payments received from online payment systems for reconciliation.

DBSync for Salesforce.com & Microsoft Dynamics GP integrates Salesforce objects like Accounts, Contacts, Products and Opportunities to GP as Customers, Sales Order Processing (SOP), Purchasing Order Processing (POP ), Receivables (RM), Items, Invoicing Transactions, and many more, while preserving links between tables.


Data Quality in Data Integration

Posted by Avinash Rao on Tuesday, 1 March, 2011

Data, as much as it is a lifeblood of our business, can also be Achilles Heel for various reasons, sometimes reasons best known to none! If you are in the Business Data Integration stream, the complexity only multiplies exponentially. When you are selling an integration product, you are by default understood as flawless integrator of data regardless of any factors. But we are not living in an ideal world are we?

data quality

More often than not the data received and described, is never as accurate, never as complete or never as consistent as it is required to be. The result of this would be data flow not being as smooth as intended. When the result isn’t the best the actions and conclusions based on it will not be the best as well, eventually leading to a dissatisfied customer. Some of the common issues encountered as a result of poor quality of data,

  • Data duplication – The key reason for this to happen is that systems do not have proper keys (primary or external ids)  i.e. the keys that tie applications.
  • Application Crashes – This could happen for many reasons ranging from machine or servers to amount of data transfered.
  • Half data being moved – reasons include limitation of data transfered or system governance limits to errors in mapping or data quality or validations
  • Performance lag – reasons could be in-adequate resources applied to the integration software to network bandwidth allocated to the integration.
  • No synchronization – reasons include in-correct credentials, data mapping errors or data validation issues.

As with any product, with evolving times, comes in more automation targeting performance enhancements and more sophistication, meaning lesser room to ensure data quality. Integration process without accurate, consistent data is as good as integration without the data and eventually business without customers. So it is absolutely necessary to strike the right balance between the two.

Over a period of time we have followed best practices to ensure a smooth integration as any amount of data profiling, data management, data cleansing can ensure at best what can be described as a half solution and not a complete one. Hence we believe in incorporating Customer intelligence into the product and this is being done by gathering key facts about the customer, their nature of business and their data usage and management, a step that not only gives an edge over competition but more importantly also ensures a step ahead in having a satisfied customer.