Wednesday, March 19, 2025

0 comments


ScalaTest Hands-On: 

Spark Transformations, Errors, Matchers,

Sharing Fixtures - Maven & IntelliJ

Watch on YouTube

ScalaTest Hands-On

Introduction

In this blog, we will dive into how to efficiently test Spark transformations using ScalaTest in an IntelliJ and Maven setup. ScalaTest is a widely used testing framework in the Scala ecosystem, providing powerful features to handle various testing scenarios, including assertions, matchers, and error handling.

 

We'll cover:

How to set up your environment using Maven and IntelliJ.
Implementing Spark transformations and writing tests for them.
Handling errors with proper test cases.
Using matchers to validate expected outcomes.
Sharing fixtures across tests for better efficiency.

Let's get started!

1. Setting Up Your Environment with Maven and IntelliJ

Before diving into testing Spark transformations, we need to ensure that your project is set up correctly in Maven and IntelliJ.

Maven Setup

In your pom.xml file, add the following dependencies:

xml
<dependencies> <!-- ScalaTest dependency --> <dependency> <groupId>org.scalatest</groupId> <artifactId>scalatest_2.12</artifactId> <version>3.2.10</version> <scope>test</scope> </dependency> <!-- Spark dependency --> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.12</artifactId> <version>3.1.2</version> </dependency> <!-- Spark SQL dependency --> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-sql_2.12</artifactId> <version>3.1.2</version> </dependency> </dependencies>

IntelliJ Setup

Open IntelliJ IDEA.
Create a new Maven project or import an existing project with the correct pom.xml.
Make sure to set the Scala SDK version in IntelliJ according to your project setup.
After importing, you can sync the Maven dependencies by clicking "Reimport All Maven Projects."

2. Writing Spark Transformations

Spark transformations are operations that are applied to RDDs or DataFrames. Let's start by creating a simple Spark transformation.

scala
import org.apache.spark.sql.{SparkSession,
functions => F}
object SparkTransformations { def createDataFrame(spark: SparkSession) = { val data = Seq( ("Alice", 29), ("Bob", 31), ("Charlie", 35) ) spark.createDataFrame(data).toDF("name", "age") } def filterAdults(df: DataFrame) = { df.filter(F.col("age") >= 30) } }

Here, we create a simple DataFrame and apply a transformation that filters out rows where the age column is less than 30.

3. Writing ScalaTest for Spark Transformations

Now, let’s write some tests for the filterAdults transformation using ScalaTest.

scala
import org.scalatest.funsuite.AnyFunSuite import org.apache.spark.sql.SparkSession import org.apache.spark.sql.functions._ class SparkTransformationTest extends AnyFunSuite { // Initialize SparkSession val spark: SparkSession = SparkSession.builder() .appName("ScalaTest Spark Example") .master("local[*]") .getOrCreate() test("filterAdults should return
only people aged 30 or older") { val df = SparkTransformations.createDataFrame(spark) val filteredDf = SparkTransformations.filterAdults
(df) // Collect the results to assert values val result = filteredDf.collect() assert(result.length == 2) assert(result(0).getString(0) == "Bob") assert(result(1).getString(0) == "Charlie") } }

In this test, we validate that the filterAdults transformation filters out the correct rows. We use assert to check the length of the resulting DataFrame and ensure the values of the rows are as expected.

4. Handling Errors in ScalaTest

Error handling is a crucial part of testing. Let’s write tests for scenarios where an error might occur, like null or invalid data.

scala
test("filterAdults should throw an error
for null DataFrame") { intercept[NullPointerException] { SparkTransformations.filterAdults(null) } }

In this test, we check that the filterAdults method throws a NullPointerException when given a null input.

5. Using Matchers in ScalaTest

Matchers provide a more expressive way of writing assertions. Instead of using assert(), we can use matchers to make our assertions more readable.

scala
import org.scalatest.matchers.should.Matchers class SparkTransformationTest extends AnyFunSuite
with Matchers { // Test with matchers test("filterAdults should return only people aged 30
or older using matchers") { val df = SparkTransformations.createDataFrame(spark) val filteredDf = SparkTransformations.filterAdults(df) filteredDf.collect() should have length 2 filteredDf.collect()(0).getString(0) shouldBe "Bob" filteredDf.collect()(1).getString(0) shouldBe "Charlie" } }

Here, we use should have length and shouldBe to check that the length of the DataFrame is 2 and the first and second names are "Bob" and "Charlie" respectively.

6. Sharing Fixtures Across Tests

In larger test suites, setting up and tearing down SparkSession in each test can become inefficient. ScalaTest provides a way to share fixtures across tests by using beforeAll and afterAll.

scala
class SparkTransformationTest extends AnyFunSuite
with Matchers { // Define a shared SparkSession val spark: SparkSession = SparkSession.builder() .appName("ScalaTest Spark Example") .master("local[*]") .getOrCreate() // Share SparkSession across tests override def beforeAll(): Unit = { super.beforeAll() println("Setting up Spark session") } override def afterAll(): Unit = { println("Stopping Spark session") spark.stop() super.afterAll() } test("filterAdults should return only people
aged 30 or older") { val df = SparkTransformations.createDataFrame(spark) val filteredDf = SparkTransformations.filterAdults(df) filteredDf.collect() should have length 2 } }

In this case, we initialize the Spark session once before running all tests using beforeAll and stop it after all tests are done using afterAll.

Conclusion

With this setup, we’ve shown how to:

Set up a Maven project with ScalaTest and Spark dependencies.
Write unit tests for Spark transformations.
Handle errors and use matchers for assertions.
Share fixtures efficiently across tests to improve performance.

By integrating Spark and ScalaTest with Maven and IntelliJ, you can write clean, efficient, and maintainable tests for your Spark transformations. This approach will ensure that your Spark code is both robust and well-tested as you scale your data processing pipeline.

 Watch on YouTube

ScalaTest Hands-On

No comments:

Post a Comment