Automatic Lineage Reporting

Do you get these types of questions a lot?

  • We’re changing something in ECC/APO/…, what’s the impact in BI?
  • How many reports do we have right now?
  • Are we already reporting on source/dataset X?
  • Where does the data in this set of reports come from?

We also come across such questions very regularly at our clients. Especially when you’re in a BI team lead role, or when big projects/migrations are on the horizon, answering such questions can start to take up a decent amount of your time.

So we decided to create an automatic lineage reporting, to be able to answer those questions easily. In this blog I’ll share the basics of how we set this up with you.

How?

At my current client we have a BW-on-HANA as DWH, and both Analysis for Excel (via BIP) and Power BI (via HANA views) as reporting tools.

In this first phase, we wanted to get lineage information from the entire DWH part. So, both BW and HANA flows, starting from the BW data sources, right up to the BW query & HANA reporting views. (We plan to add information from the reporting tools in a later phase.) We decided to build the lineage reporting with HANA views as data model and Power BI as reporting tool. We explored several BW/HANA tables and found that already a lot of information was available and it’s just about joining them together in an efficient and correct way.

Used HANA/BW tables

The following tables were used in our solution.

Table name

Description

Filter

RSDS

Table containing all the datasources

OBJVERS = ‘A’

RSDST

Table containing the descriptions of the datasources

LANGU = ‘E’ AND OBJVERS = ‘A’

RSDODSO

Table containing all the (O)DSO’s

OBJVERS = ‘A’

RSDODSOT

Table containing the descriptions of the (O)DSO’s

LANGU = ‘E’ AND OBJVERS = ‘A’

RSDCUBE

Table containing all the Cubes

OBJVERS = ‘A’

RSDCUBET

Table containing the descriptions of the Cubes

LANGU = ‘E’ AND OBJVERS = ‘A’

RSDADSO

Table containing all the ADSO’s

OBJVERS = ‘A’

RSDADSOT

Table containing the descriptions of the ADSO’s

LANGU = ‘E’ AND OBJVERS = ‘A’ AND TTYPE = ‘EUSR’

RSOHCPR

Table containing all the Composite Providers

OBJVERS = ‘A’

RSOHCPRT

Table containing the descriptions of the Composite Providers

LANGU = ‘E’ AND OBJVERS = ‘A’ AND COLNAME = ‘’

RSDIOBJ

Table containing all the InfoObjects

OBJVERS = ‘A’

RSDIOBJT

Table containing the descriptions of the InfoObjects

LANGU = ‘E’ AND OBJVERS = ‘A’

RSRREPDIR

Table containing all the Queries

OBJVERS = ‘A’ AND COMPTYPE = ‘REP’

BIMC_REPORTABLE VIEWS

Table containing all HANA views

/

BIMC_ALL_DIMENSIONS

Table containing all fields per HANA view

/

OBJECT_DEPENDENCIES

Table containing the HANA view’s dependencies

BASE_SCHEMA_NAME = ‘_SYS_BIC’

RVITRFN

Table containing all the transformations

/

RSDICMULTIIOB

Table containing all the MultiProviders

OBJVERS = ‘A’

RSOOBJXREF

Table containing all component references (e.g. DSO’s in composite)

TLOGO = ‘HCPR’ AND OBJVERS = ‘A’ AND ASC_TYPE = ‘005’

Hana views

We chose a modular/layered solution, so we can still use the different components for other use cases in the future. The following schema gives an indication of how everything was build up with different layers of calculation views.

TECH_BW_INFOPROV

  • The 1st step is to join all relevant info per info provider type
  • The 2nd step is to union everything together

TECH_BW_FLOWS

The flows view is the most complex one, as we trace back from the top level BW objects down to the bottom objects, with the help of the transformations.

We join the info provider view with the transformation table’s target field (TGT). This enables us to get the objects from the below level (SRC). We do this as many times as we have levels in our DWH flows. (+some extra levels to take into account possible longer flows in the future)

Note: An alternative to foreseeing extra dummy levels, would be writing recursive code in an sql script HANA view.

We also join in the multi provider and composite provider data.

We also filter here to make sure we don’t create a loop. (imagine the object being loaded to itself!)

In the end we have for each info provider a line with the entire flow downwards.

Then, as a last part, we add two new calculated fields, to derive the lowest level object without needing to add all the fields.

TECH_HANA_VIEWS

The HANA view is similar to the flows view, in that it has to trace back from top-level objects towards bottom level objects. However, our HANA setup has only 2 levels.

First, we define in which “subcatalog” the view is located. This subcatalog actually defines the layer of our layered HANA model. It can be either “Generated”, “Modeling” or “Reporting”.

We also join in information about the fields in the view.

Then, we join with the dependencies table to get the lower-level objects. We do this join on a calculated column which contains a concatenation of catalog & view name.

In the next step we’ll create two calculated columns to deduct the actual info provider name from the base object name

TECH_BW_DATASRC

The data source view is rather easy. We join two tables together, format the timestamps, and clean the semantics.

TECH_BW_QUERIES

The query view is also quite easy. We take data from the source table, format the timestamps, and clean the semantics.

TECH_FRONTEND

The frontend view is a full outer join of the query and the hana view.

TECH_BACKEND

The backend view is a left join of the flows and the data source view.

98_AUTO_LINEAGE

This view brings al our efforts together, and it is the one that we’ll report upon. It is a left join of the frontend and the backend view.

Power BI report

On top of the auto lineage view, we can now easily create a Power BI report that allows us to answer those typical lineage questions:

  • Starting from a BW query or HANA view, we can check which info providers and data sources are involved
  • Starting from a data source, we can check which info providers, HANA views and queries are impacted

Gunther Van Eyck

BI consultant