Responsible for ingesting large volumes of data to Kafka. Write Kafka producers to stream the data from external rest APIs to Kafka topics.
Develop many Spark applications for performing data cleansing, event enrichment, data aggregation, de normalization and data preparation needed for data analysis.
Work on troubleshooting spark application to make them more error tolerant. Work on fine tuning spark applications to improve the over all processing time for the pipelines.
Build real time data pipelines by developing Kafka producers and Spark streaming application. Write Kafka producers to stream the data from external rest APIs to Kafka topics.
Write Spark Streaming applications to consume the data from Kafka topics and write the processed streams to HBase. Involve in creating Hive tables, loading and analyzing data using hive scripts.
Implement Partitioning, Dynamic Partitions, Buckets in HIVE. Develop services with Java.