Data Engineer (ETL & IBM Datastage Focus)

Overview:

Our client is looking for a mid-level Data Engineer who will focus primarily on ETL development within the Data Analytics Center (DAC). Primary day-to-day job functions (70-80%) will be to support the build and maintenance of enterprise data warehouses, data marts, and internal/external data feed. These efforts will be built utilizing IBM DataStage v11.3. This portion of the role will require interaction with Product Owners and to assist in the definition, development, and deployment of these products. The candidate should draw on past experiences to facilitate in the best development decision making while bringing new thoughts and ideas to the table.
Minor aspects to the job (20-30%) will be the support of implementing the Hortonworks Data Platform (Hadoop) as well as promoting the concept of Data Virtualization. This will require learning many aspects of the Hadoop platform, such as Kafka, Sqoop, HBase, Phoenix, Hive, Storm, Spark, and others to a lesser degree.

Responsibilities: Data Access &Self-Service

Liberating the data from IT is a major initiative for the Data Analytics Center. We will be focusing much of our efforts to continue to build enterprise-grade dashboards as well as focus on current and next generation self-service tools. This will include maximizing SAP Universe utilization and expanding our development/support for user-centric Fact-Dimension models.

Delivering high-speed availability of information utilizing the in-memory capabilities of SAP HANA, providing quicker analysis, easier decision making, and smarter end-users.

Data Operating System & Data Virtualization
With the underpinnings of the data operating system being Hortonworks Data Platform (Hadoop), the Data Analytics Center will be responsible for setting up a next generation data warehouse
Capable of streaming in terabytes upon terabytes of high-velocity medical device information, patient vital signs, wave form data (such as Neurological ICU brain monitors), and much more
Capable of storing vast amounts of high-volume textual data from clinician notes, which will be used for current and future text processing, such as Natural Language Processing (NLP) and Guided Navigation for more defined cohort searches.
Create a layer of data abstraction from end-users, allowing the data to persist in various locations (Hadoop, RDBMS, HANA) to provide maximum effectiveness from storage to use.

Required: