Spark Structured Streaming is the stream processing module in Apache Spark that offers a high-level declarative streaming Dataset API built on top of Spark SQL and allowing for continuous incremental execution of structured queries. As of Spark 2.2.0, Structured Streaming has been marked stable and announced as ready for production use.
In this condensed 1-day Spark Structured Streaming hands-on workshop, you will deep dive into and develop end-to-end continuous streaming applications using Spark Structured Streaming, and in particular:
Develop and execute your own streaming applications
Explore available streaming sources and sinks
Use Apache Kafka as a data source and sink
Understand output modes
Learn how to monitor streaming queries
Use web UI
Use dropDuplicates operator for streaming deduplication (with state)
Explain streaming query plans
Apply groupBy and groupByKey operators for streaming aggregations
Use window function for aggregation
Use event time streaming watermark to handle late events
Use flatMapGroupsWithState operator for arbitrary stateful streaming aggregation (with explicit state logic)
The programming language of the workshop is Scala (but Python or Java are acceptable yet pose mental challenge for the trainer).
The version of Apache Spark is 2.2.0 (or later when released).
Prerequisities / Recommended Background
After completing the workshop participants should be able to:
Experience with the basic concepts of Scala language (or Java or Python)
Familiarity with Spark SQL concepts like DataFrame and Dataset
Familiarity using the command line and spark-shell in particular
Developers attending this workshop will learn everything that is needed to go from not doing monitoring at all to having a solid understanding of how to setup metrics collection and tracing in any JVM application using Kamon. Throughout the workshop we will take a Play! + Akka application and study how they work under the hood, understand the key elements that should be monitored and ensure that all our precious metrics data is properly delivered to a few of the supported metric backends.
What will you learn?
INTRODUCTION TO MONITORING WITH KAMON
Understanding what Kamon is and what it brings to the table.
Comparing Kamon with other monitoring libraries.
Simple and plain metrics collection.
Using the tracer API.
Instrumentation and reporting modules’ basics.
PEEKING INSIDE THE MONSTER
Anathomy of Play applications and Actor Systems.
Getting HTTP, actors, routers and dispatcher metrics.
Analyzing JVM metrics and hiccups.
Customizing the instrumentation.
Setting up distributed tracing.
REPORTERS AND PRETTY CHARTS
Setting up open source and commercial reporting solutions (StatsD + Grafana, Datadog and Kamino)
Advice and best practices for alerting and dashboard design.
Guided walk through the most common crash scenarios and how to detect them.
TAKE IT HOME
Open space for questions specific to your application monitoring needs and how to effectively monitor them with Kamon.
Who should come?
If you have any piece of software running on production (or soon to be running) this workshop is for you! Attendants should have experience with JVM languages (Java/Scala preferred), experience using Play Framework and Akka is not required but will definitely make it easier for you.