BODS – ETL Framework [Part II]
BODS – ETL Framework [Part II]
Note: this is the second part of the blog. Missed the first part? Read it here.
For the extraction of data, a where clause needs to be defined. With the current logic this will be a combination of global variables ($G_Load_Mode, $G_Extraction_From_DT, $G_Extraction_To_DT, $G_Delta_Days).
- $G_Load_Mode: INIT/DELTA (All or subset of data)
- $G_Extraction_From_DT: calculated or defined
- $G_Extraction_To_DT: calculated or defined
- $G_Delta_Days: number of days to subtract from the extraction from date
START_PYTHON_SCRIPT
- Provide the workflow with the right parameters
- A substitution parameter on repository level was set for the location of the Python scripts
ETL_JOB_SUB_STEP_READY_WF
- Update logging table ETL_BODS_JOB_EXECUTION_LOG for this job sub step with status READY
Error Handling
Try/Catch
Try/Catch is used to catch the errors thrown during the execution of extracting, transforming and loading the data.
When the error is caught, the flows present in the “Catch” block will be executed.
ETL_JOB_SUB_STEP_ERROR_WF
- Send mail to the e-mail addresses defined in global variable $G_Mailing_List to notify these people job execution failed
- Call python script PowerBILogging.py to log a job failure in a streaming dataset in PowerBI
- Update the ETL_BODS_JOB_EXECUTION_LOG table and set the status for the job to ERROR
- Stop job execution
ETL_JOB_ERROR_RESTART_WF
- Restart the same job after failure
- Fill in parameter $P_Retry_Nr to set the number of retries
- BODS native web services will be called to restart the job with the right parameters
Post Job Execution
Send a mail to the addresses defined in the global variable $G_Mailing_List to let them know that job execution was successful.
Also call the python script PowerBILogging.py to log a successful job execution in a streaming dataset in PowerBI.
Logging table
Logging table is maintained in the BI_GENERAL scheme on SAP HANA. Logging table is named ETL_BODS_JOB_EXECUTION_LOG.
This table contains all execution steps of all jobs and will be updated during job execution.
Determine Extraction
ETL stands for Extraction, Transformation and Load. In this section, we will discuss various extraction methods using this framework.
Extraction is for instance loading source data from various systems into for example the staging layer of a DWH.
Basic overview:
For the extraction of data, a where clause needs to be defined. With the current logic this will be a combination of global variables ($G_Load_Mode, $G_Extraction_From_DT, $G_Extraction_To_DT, $G_Delta_Days).
- $G_Load_Mode: INIT/DELTA (All or subset of data)
- $G_Extraction_From_DT: calculated or defined
- $G_Extraction_To_DT: calculated or defined
- $G_Delta_Days: number of days to subtract from the extraction from date
INIT
Global Variable Definition:
- $G_Load_Mode = INIT
When the global variable has this value, no extraction from/to date will be calculated or determined because all data or data from a certain date should be extracted.
DELTA
Moving Window
Global Variables Definition:
- $G_Load_Mode = DELTA
- $G_Extraction_From_DT = null/blank
- $G_Extraction_To_DT = null/blank
Extraction from and to date are calculated automatically in ETL_CALC_JOB_EXTRACTION_DATE_WF based on the last successful job execution.
Extraction from and to date do not overlap each other. The extraction from date is derived from the last successful extraction to date. Extraction date will be set to sysdate().
Moving Window example flow in BODS:
Requirement:
- All data should only be extracted initially
- Delta cannot be extracted in overlap
- Should run daily
> Used Method: Moving Window
Global Variables Definition:
- $G_Load_Mode: DELTA
- $G_Extraction_From_DT: null
- $G_Extraction_To_DT: null
- $G_Delta_Days: null
Extraction dates calculation in ETL_CALC_JOB_EXTRACTION_DATE_WF:
- $G_Extraction_From_DT = extraction to date from last successful job execution
- $G_Extraction_To_DT will be set to sysdate()
Where clause:
Overlap
Global Variables Definition:
- $G_Load_Mode: DELTA
- $G_Extraction_From_DT: defined (e.g.: sysdate())
- $G_Extraction_To_DT: null or defined (e.g.: sysdate())
- $G_Delta_Days: defined (e.g.: 30)
Extraction from date will be calculated.
Extraction from date = $G_Extraction_From_DT – $G_Delta_Days
Extraction to date will be set to sysdate() when left empty.
Overlap example flow in BODS:
Requirement:
- All data should only be extracted from 2017 initially
- Delta can be extracted in overlap
- Should be run once a month
> Used Method: Overlap
Global Variables Definition:
- $G_Load_Mode: DELTA
- $G_Extraction_From_DT: sysdate()
- $G_Extraction_To_DT: null
- $G_Delta_Days: 60
Extraction dates calculation in ETL_CALC_JOB_EXTRACTION_DATE_WF:
- $G_Extraction_From_DT = $G_Extraction_From_DT – $G_Delta_Days
- ‣$G_Extraction_From_DT = sysdate() – 60
- $G_Extraction_To_DT will be set to sysdate()
Where clause:
Blog by Robby De Backer