Sync types
Define the extract and load strategy
A Sync type determine how we perform data capture and loading processes. Each strategy has its pros and cons, as well as specific use cases where it brings the most value. On the Erathos platform, we support the following update types:
Full refresh
The Full Refresh strategy involves a complete read and load of the source table, overwriting all values in the destination table.
Partials
In contrast, Partial updates capture only a subset of the table, defined by records with values greater than a specific cursor (such as creation date, update date, or even an ID).
Understanding cursors
Before moving on, it’s essential to understand what a cursor is and how it works.
During the first execution, all the records are loaded into the destination. In subsequent pipeline executions, the cursor helps identify only the records that have been added or updated.
To perform a partial update, a column must be selected to help identify which records should be transferred to the destination. From the columns available, we have the following options:
Id: Suitable if it is an integer and auto-incremental.
Created_at: Suitable, but not recommended as updates in existing records would be missed.
Updated_at: Suitable as it reflects the latest changes.
Name / Gender: Not meaningful for filtering updates.
For this example, we use Updated_at. From the destination table, we select the highest Updated_at value and use it as a filter, selecting only records where Updated_at is greater than the last loaded value.
Prerequisites
In general, all jobs have the Full Refresh option available. However, for Partial Refresh to be enabled, the following prerequisites must be met:
A primary key must exist.
A column of type
date
ordatetime
must be available to serve as cursor.
For APIs, Erathos maps and defines which endpoints meet the prerequisites automatically. For databases, we perform an introspection step. Learn more here.
Types of partials
The three variations of partial updates are as follows:
Partial Overwrite
In this strategy, new records are inserted as usual, while updated records overwrite outdated records in the destination database.
This is the suggested sync type to be used across Erathos plataform.
Partial Append
All records are inserted in this approach, resulting in updated records being duplicated (with both original and new values present in the database).
Partial Versioned
This strategy is similar to Partial Append, but each record is marked with an indicator identifying whether it’s the most recent version to be used.
Advanced configuration
When performing partial updates, it may occasionally be necessary to execute Full Refresh updates to resolve discrepancies between the source and destination datasets.
To address this, you can use the Enable Full On parameter, which schedules routine and automatic full executions.
Using sync types
To choose the sync type for a job, navigate to the "Pipeline Jobs" page in your desired connection.
APIs
For APIs, we automatically map the best configurations and apply the optimal cursor for each job, ensuring efficient and accurate data capture.
Databases
In contrast, when the data source is a database, it is necessary to manually define the cursor for each job that requires partial execution capabilities.
If partial options are unavailable for a specific job, ensure that all pre requisites are met.
Last updated