From the course: Stream Processing Design Patterns with Kafka Streams

Streaming opportunities and challenges

- [Instructor] Stream processing provides a number of new opportunities for real time insights, but also processes challenges. Let's review them in this video. What are some of the key opportunities provided by stream processing? First, it provides the ability to process big data in real time. Using parallel processing capabilities, significant volumes of data can be processed and delivered. Streaming allows the ability to do data marshaling in real time. This involves analyzing incoming data and making decisions on where to direct the data, based on specific use cases and scenarios. It provides the ability to do real-time analytics. Data can be analyzed to generate insights in real time, which in turn can drive real-time actions. Data can be checked against set thresholds in real time and alerts can be generated. This can provide critical functionality for real-time resiliency. Leaderboards can be maintained in real time to show top trending elements. It has significant users in gaming and operational dashboards. Finally, predictions can be made on incoming data using machine learning models and these predictions can be delivered in real time to destinations, to drive actions. But what are some of the challenges of real time stream processing? The first challenge is the unbounded memory requirements, needed to handle unbounded data. It is not easy to predict and control memory requirements of upcoming data. Horizontal scaling of the streaming pipelines as incoming data grows and fluctuates, is also a challenge. The pipeline should be scaled up and down as the incoming data volumes change. A lot of analytics need to look back beyond the current record. Periodic summaries like five seconds summaries, would need to look back records, window them based on timestamps, and then aggregate them. State management is another key challenge, especially in distributed processing. How do we maintain state? Buy entity across a distributed processing network and store them and access them in real time. Finally, while stream processing allows for Ad hoc analytics, optimizing Ad hoc queries in real time, is also a challenge. Fortunately, today's stream processing frameworks provided by Apache spark, Apache Flink and Kafka Streams, solve these problems for us. They provide out of the box capabilities that help manage these challenges and deliver stream processing.

Contents