Data driven integrations

Azure | Power platform | Integration

Intro

This post takes a look at the data driven integration. Typical use cases and tools used for moving data in and out of the Common Data Service.

Data Driven integration

Data Driven integrations have a different actions that trigger synchronization. This action is a change in the data that you have in place in either of the connected applications. This can be a new record created or a change in an existing record, any data change will make the record a candidate for data synchronization. The data driven integration scenario is focused on the exchange of information amongst applications to ensure consistency across multiple applications.

Typical use cases

As mentioned data driven integration maintains the data consistency across multiple applications. Most commonly the data driven integration is triggered independent of any application specific events, but rather on a scheduled basis - batch operations. Ideally the synchronization process transfers only the modified records from one application to another.

Initial data load (migration)

These days we are rarely in the situation where no initial data load is required when deploying a new application into an ecosystem. The initial data migration involves extracting data from existing systems, transforming the data and loading the data into the new application.

Usually some data cleansing of existing data is required, as everyone likes the idea of no garbage data in the system. The data cleansing task is almost always as important as it is tedious to perform. The process is resource demanding and requires cooperation between existing users of applications and the IT technical roles in order to be successful. The effort required is also usually underestimated.

An important consideration is always which data to migrate. The decision of which data to migrate always depends on the business functions the application should perform. For instance if we are created a sales automation application with lead management, opportunity management, quoting we need the data which will enable the user to perform those actions. Data which the users will only use for reference, non actionable data, can be displayed in the UI using UI based integration either by redirecting the user to a different application or showing the data within our Model Driven application using one of the approaches described in the UI based integrations.

The way we approach data migration consists of multiple steps, as always some of them are specific to the situation, the common ones are as follows.

Migration Concept

Evaluate and identify the data you need to migrate
Extract the data into a staging database or for small datasets Excel file will suffice
Cleanse the data
Transform the data
Load the data into the target application

Migration Strategy

Determine how you will perform the migration. In one big bang swoop, or gradually by migrating one part of the data and the migrating the remaining data in one or more steps.

Data Migration Activities

Analysis
- Evaluate and identify the data you need to migrate
- Prepare the execution plan
- Define tools and approach to use
Design
- Build a data dictionary
- Prepare data mapping
Development
- Cleanse the data or develop cleansing procedure as part of the data migration package
- Develop the migration packages/application
- Develop migration tests if possible (it almost always is)
Test
- Execute test runs - use automated testing if possible to shorten the turnaround time
- Measure the time required to migrate the data
- Involve users in the latter test runs so they can validate the migration result
Execution
- Migrate the data to the target application

Data Migration Deliverables

As part of the data migration you will deliver:

Data migration plan and strategy
Data dictionary
Data maps
Migration data packages/applications and tests
Deployment documentation
Data migration performance
Data migration final report and statistics

Data Migration Sequence

Extract - data is extracted from source applications and cleansed
Stage - data is loaded into a staging database
Transform - data is transformed into the target format
Migrate - data is migrated into the target application

Depending on the data volume, you might want to consider splitting the above steps per entity in order to not overwhelm the end users validating the migration test runs - always keep in mind that the end users are usually business users, who already have a job and their job description does not include validating the data migration. Delivering all the steps in one big bang usually leads to trouble and shifting timelines.

Although it might be tempting, avoid using user centric business logic in the target application to transform the data, perform all data transformations before you are updating the data in the target application. For CDS this means do not use plugins/workflows/power automate, these components should not be triggered by the data migration process.

Data synchronization

So you have done the initial migration, and it was a challenge to do, now that you are warmed up you need to keep the data in sync.

It is a commonly used shortcut to ignore the data migration as a separate piece of work and just build the data synchronization and use the data synchronization for the initial data load as well. This might seem like an awesome idea to shorten the delivery times and make project managers happy, it can however prove to be just a wrong approach. The data volume for initial data migration load is typically a few factors higher than your typical data sync volume.

The approach to solving the data synchronization contains similar steps to the steps performed by building the initial data migration. Additional steps for building data synchronization are:

Definition od data ownership

Is important do define which application is the master for each entity or even attribute - which application can change the data directly. Try and avoid multiple applications owning a specific entity or attribute as conflict resolution in these cases become very complicated. Usually you can get away by implementing change requests as part of an event driven or process driven integrations to get around the conflict resolution scenarios.

Identify how to transfer only modified records

Identification of modified records is very important, as transferring whole dataset usually is not an option - you do not want to synchronize all of the records every time the synchronization runs.

Identifying data changes depends on the source system capabilities, for instance if the source is SQL Server or the Common Data Service you can use change tracking to identify which rows have changed since the last time the synchronization ran. Typically records in all systems have an attribute with the records last modified date/time. For system with no useful atribute you can determine the changed records in a staging database.

As an additional filter, when we only some of the record attributes are transferred you can use hash algorithms to determine is the attributes you care about have changed.

How often you need to run the synchronization

Since most data driven integrations are scheduled batches the interval of synchronization is and important parameter of the synchronization process. The interval can be derived from the business requirements and sometimes the interval is 0 - all data must be fresh everywhere all the time, when this is a requirement and it is really required that the data has to be fresh in all applications all the time - the users cannot be persuaded otherwise, the Data Drive Integration - Data Synchronization is not the correct approach to use - event driven integration (near real time) or UI based integration approach would be a better one.

When defining the the interval take into account any events which might change a large portion of data (billing run) and scale or adapt the synchronization accordingly so it does not fail during peak times.

Staging environment

Data synchronization staging environment is also an important consideration. You could connect most applications directly by using an integration tool, but in most cases you need an environment (database) when you can perform data mapping and data transformation in, before the data is loaded into the target environment. The staging database is also a palce where you can store mapping key and other important values. Using a data staging environment will decrease the load on the source and target applications and allow them to function normally.

For the Common Data Service you can use the Data Export Service to export the CDS data into an Azure SQL Database and perform your data mapping/transformation tasks in that database. If you chose the Data Export Service be aware the of the delay in data transfer from CDS to to Azure SQL database (up to 20 minutes).

Business logic execution

Remember data sync is used to keep the data synchronized not to drive the business process, for driving the business process better approaches can be used - Event based integration or process integration. Although it might be tempting, avoid using user centric business logic in the target application to transform the data, perform all data transformations before you are updating the data in the target application. For CDS this means do not use plugins/workflows/power automate, these components should not be triggered by the data synchronization process. This can be achieved by either scoping workflows, having plugins ignore any changes from the data synchronization process.

Do not let the data synchronization drive the business business process, as it leads to trouble, for these scenarios use Event driven or process driven integration.

Ad hoc data loads

Ad hoc data loads are not purely integration, but they are a common scenario in business application use. The Power Platform and the Common Data Service have very good support for importing data from Excel/CSV files, you can also use Power Automate to transform data from Excel/CSV/XML/JSON files to CDS records.

Tools

In this section I will focus more on the tools I have commonly used when migrating or synchronizing data with the Common Data Service.

Common Data Service Data Import Wizard

The Data Import Wizard is useful when migration small batches of records or executing ad hoc data loads. You can define the data mapping in the wizard, utilize duplicate detection rules to avoid duplicates - I know what I have written about not using business logic, but it is a small data set.

The wizard will also allow you to update existing records, but you will first have to export them from the application into an excel file and import the updated Excel back.

More information about the Data Import Wizard is available here: https://docs.microsoft.com/en-us/powerapps/developer/common-data-service/import-data

Scribe Insight or Scribe Online

An application or a service from TIBCO, used in more complex data migrations and data synchronizations. Both Scribe Insight (OnPremise) and Scribe Online have connectors to connect to the Common Data Service and other databases and applications. It support queuing/scheduling and can be used in hybrid scenarios.

More information about Scribe Insight and Scribe online is available here:

Scribe Insight: https://community.tibco.com/products/tibco-scribe-insight
Scribe Online: https://docs.tibco.com/products/tibco-scribe

SSIS

SQL Server Integration Services are similar to Scribe Insights used for migrating and integrating large amounts of data. SSIS is a time-tested data integration. In order to integrate with the Common Data Service you will need to use an SSIS adapter from one of the numerous vendors.

More information

Azure Data Factory

Microsoft Azure clod based ETL service which also allows you to run SSIS packages and using Kingsway SSIS adapter for CDS. So if you have some SSIS packages already developed you can reuse them.

More information: https://docs.microsoft.com/en-us/azure/data-factory/

Power Apps Data flows

This feature can me used to load/synchronize data from multiple sources to/from the Common Data Service. Data flows are authored through the Power Apps maker portal. Data transformations can be done using Power Query. By using the on-premises data gateway your on-premises data can be synchronized online.

More information: https://docs.microsoft.com/en-us/powerapps/maker/common-data-service/self-service-data-prep-with-dataflows

Data Export Service

The Data Export Service is a service used to continuously Export data from the Common Data Service to an Azure SQL Database. The service will export all data from selected entities.

More information: https://docs.microsoft.com/en-us/power-platform/admin/replicate-data-microsoft-azure-sql-database

Export to data lake

Export to data lake feature exports continuously data from the Common Data Service to an Azure Data Lake.

More information: https://docs.microsoft.com/en-us/powerapps/maker/common-data-service/export-to-data-lake

To be continued

Event based integration to follow