Getting started with SAP HANA SDI in less than a day
Getting started with SAP HANA SDI in less than a day
This is part 2 of the series. Herein, we’ll cover dataflow creation within SDI.
(Check out part 1 to find out how to setup SDI)
FYI: I’m going to assume that you have the necessary rights on your HANA box to perform all described steps. For more info on the needed privileges, check out section 2.1 of the official SAP HANA EIM Administration Guide. |
Twitter Setup:
I won’t go over all the details here, but you basically go through the following process:
- Create a Twitter account (twitter.com)
- Create a Developer account associated with your user (https://developer.twitter.com/)
- Create an ‘App’ under your developer account
- Generate App credentials
You’ll end up with something like this:
Bring in the Twitter data:
Let’s start with an overview of the dataflow…
There’s 2 objects that we have to create for our initial test-case:
- A “remote source”, which defines the connection to our Twitter App
- A “replication task”, which will take care of the data retrieval and persistence in HANA.
FYI: It’s time to actually logon to your HANA box now . I’ll use the “Web based Development Workbench” interface. This interface is fully ‘online’, so you don’t need to install HANA Studio. In fact, some parts (replication task, flowgraph) are only available in the online interface. |
Typical URL for the Web based Development Workbench: https://<hanaserver>:43<hanainstance>/sap/hana/ide/
We’ll use 2 parts of the workbench:
- Catalog (to create the remote source and execute SQL commands)
- Editor (to create the replication task)
1. Creating the remote source
- GOTO “Catalog”
- GOTO Provisioning à Remote Sources
- Name your remote source
- Choose adapter “TwitterAdapter” and your agent
- Choose Credentials Mode “Technical User” and then fill in the credentials from your Twitter App (API Key, API Secret, Access Token, Access Token Secret)
- After saving you can test the connection
2. Replicating the data
GRANT rights
You have created your own ‘remote source’ object now, which means that you are its owner, and thus you have all privileges on it. Next thing we’ll do, is create a replication task, and although it’s you that creates the ‘task’, in the background it’s the system user (_SYS_REPO) that will create the underlying objects. As some of those objects are dependent on the remote source, we’ll need to grant some rights to this system user.
- Open an SQL console & Grant user “_SYS_REPO” the necessary rights to perform background tasks when activating a replication task
CODE
GRANT CREATE VIRTUAL TABLE ON REMOTE SOURCE “<your-source>” TO _SYS_REPO WITH GRANT OPTION;
GRANT CREATE REMOTE SUBSCRIPTION ON REMOTE SOURCE “<your-source>” TO _SYS_REPO WITH GRANT OPTION;
Create Replication Task
- GOTO “Editor”
- Create a package if needed
- Create a Replication Task
- Name your task
- Choose your own remote source
- Use prefix “TB_VR_” for the generated virtual table (best practice)
- Add Objects > Public_Stream
- Use prefix “TB_” for the generated target table (best practice)
- Use Realtime only for the behavior
FYI: We’ve chosen the ‘realtime’ option above which has some implications.
o Our source will push new data when it becomes available. This means, depending on how much your subject is tweeted about, it might take a while before you see any data. It’s off course perfectly possible to create your own tweet about the subject, to test the functionality. (I’ve chosen ‘bitcoin’ as subject here, which typically gives results straight-away ) o We’ll have to use the CDC (Change Data Capture) tab to provide filters. It’s just the ‘realtime version of filtering’. |
- Fill in the CDC (Change Data Capture) parameter “Phrases to track”
- Save
- Right click on the Replication Task to Execute it.
Check the results:
- GOTO “Catalog”
- Refresh the Tables directory under your schema
- Open Content of the newly created target table
- As said above, you can tweet about your chosen subject (here ‘bitcoin’) and will see your own tweet popup here when you refresh.
That’s it! You now know how easy it is to start with SAP HANA SDI, and you can start exploring Twitter as a new source of data for your company/clients. Have fun and stay tuned for more…