Getting started with SAP HANA SDI in less than a day

09 november 2018

This is part 2 of the series. Herein, we’ll cover dataflow creation within SDI.

(Check out part 1 to find out how to setup SDI)


FYI: I’m going to assume that you have the necessary rights on your HANA box to perform all described steps. For more info on the needed privileges, check out section 2.1 of the official SAP HANA EIM Administration Guide.

Twitter Setup:

I won’t go over all the details here, but you basically go through the following process:

You’ll end up with something like this:

Bring in the Twitter data:

Let’s start with an overview of the dataflow…

There’s 2 objects that we have to create for our initial test-case:

  1. A “remote source”, which defines the connection to our Twitter App
  2. A “replication task”, which will take care of the data retrieval and persistence in HANA.


FYI: It’s time to actually logon to your HANA box now 😊. I’ll use the “Web based Development Workbench” interface. This interface is fully ‘online’, so you don’t need to install HANA Studio. In fact, some parts (replication task, flowgraph) are only available in the online interface.


Typical URL for the Web based Development Workbench: https://<hanaserver>:43<hanainstance>/sap/hana/ide/


We’ll use 2 parts of the workbench:

  • Catalog (to create the remote source and execute SQL commands)
  • Editor (to create the replication task)

1.    Creating the remote source


  • GOTO “Catalog”
  • GOTO Provisioning à Remote Sources

  • Name your remote source
  • Choose adapter “TwitterAdapter” and your agent
  • Choose Credentials Mode “Technical User” and then fill in the credentials from your Twitter App (API Key, API Secret, Access Token, Access Token Secret)
  • After saving you can test the connection


2.    Replicating the data


GRANT rights

You have created your own ‘remote source’ object now, which means that you are its owner, and thus you have all privileges on it. Next thing we’ll do, is create a replication task, and although it’s you that creates the ‘task’, in the background it’s the system user (_SYS_REPO) that will create the underlying objects. As some of those objects are dependent on the remote source, we’ll need to grant some rights to this system user.


  • Open an SQL console & Grant user “_SYS_REPO” the necessary rights to perform background tasks when activating a replication task




Create  Replication Task

  • GOTO “Editor”
  • Create a package if needed
  • Create a Replication Task

  • Name your task
  • Choose your own remote source
  • Use prefix “TB_VR_” for the generated virtual table (best practice)

  • Add Objects > Public_Stream
    • Use prefix “TB_” for the generated target table (best practice)
    • Use Realtime only for the behavior


FYI: We’ve chosen the ‘realtime’ option above which has some implications.

o   Our source will push new data when it becomes available. This means, depending on how much your subject is tweeted about, it might take a while before you see any data. It’s off course perfectly possible to create your own tweet about the subject, to test the functionality. (I’ve chosen ‘bitcoin’ as subject here, which typically gives results straight-away 😊)

o   We’ll have to use the CDC (Change Data Capture) tab to provide filters. It’s just the ‘realtime version of filtering’.


  • Fill in the CDC (Change Data Capture) parameter “Phrases to track”


  • Save
  • Right click on the Replication Task to Execute it.

Check the results:

  • GOTO “Catalog”
  • Refresh the Tables directory under your schema
  • Open Content of the newly created target table

  • As said above, you can tweet about your chosen subject (here ‘bitcoin’) and will see your own tweet popup here when you refresh.

That’s it! You now know how easy it is to start with SAP HANA SDI, and you can start exploring Twitter as a new source of data for your company/clients. Have fun and stay tuned for more…

  • Blogs

    so.. SAP Lumira is dead.. now What ?

    As a consultant I’m responsible for the Analytics environment for an Innovative Company in Natural Chemistry and innovative in their IT. Recently I was forced to ponder on the post-Lumira question “ Now what ? ”.


    Read more >
  • Blogs

    Fetching XML API data

    Fetching XML API data

    I recently finished a project for which I had to fetch public data of parking lots and load it to a multiprovider to use it in reporting.  The data that I needed to fetch was from an open data API. After successfully finishing this project I thought I would share how I did this as an easy guide for others.

    Read more >
  • Blogs

    Take on the fight with Python

    Take on the fight with Python


    If there is one thing common in everyone’s career than it is that at some point everyone said: “Can’t this be automated or made simple?”. On a given moment at my customer I was asked to do a data-compare of two data providers with over 50 characteristics each and containing millions of records in SAP HANA. At first I started to run queries against the SAP HANA-tables to check the data but soon enough I thought…

    Read more >