Spark Scala + IntelliJ on Windows: Step-by-Step Guide to Writing to Hive
Spark Scala + IntelliJ on Windows: Step-by-Step Guide to Writing to Hive
Apache Spark is a powerful distributed computing framework, and when combined with Scala, it provides a robust environment for big data processing. This guide will walk you through setting up Spark with Scala in IntelliJ on Windows and writing data to Apache Hive.
Watch on YouTube
Spark Scala + IntelliJ on Windows
Prerequisites
Before proceeding, ensure you have the following installed:
- Java (JDK 8 or later) – Required for Spark execution
- Scala (Latest version) – Used for Spark programming
- Apache Spark – A powerful data processing engine
- IntelliJ IDEA (Community or Ultimate Edition) – IDE for Scala development
- SBT (Scala Build Tool) – To manage dependencies
- Hadoop & Hive – Hive for SQL-like querying
Step 1: Install and Configure Java & Scala
- Download and install Java JDK 8 or later from Oracle’s website.
- Set
JAVA_HOME
in your environment variables. - Download and install Scala from Scala’s official website.
- Verify installation by running:
java -version scala -version
Step 2: Install and Set Up Apache Spark
- Download Apache Spark from Spark’s official website.
- Extract the Spark folder and set environment variables:
SPARK_HOME
→ Path to extracted Spark folder- Add
%SPARK_HOME%\bin
to the systemPATH
- Verify Spark installation:
spark-shell
Step 3: Install IntelliJ IDEA and Set Up Scala Plugin
- Download and install IntelliJ IDEA from JetBrains.
- Open IntelliJ, navigate to File → Settings → Plugins, search for Scala, and install it.
Step 4: Create a Spark Scala Project with SBT
- Open IntelliJ and select New Project.
- Choose Scala and select SBT as the build tool.
- Set Project SDK to JDK 8 or later.
- Click Finish to create the project.
- Modify
build.sbt
to include Spark dependencies:name := "SparkHiveExample" version := "1.0" scalaVersion := "2.12.15" libraryDependencies ++= Seq( "org.apache.spark" %% "spark-core" % "3.2.1", "org.apache.spark" %% "spark-sql" % "3.2.1", "org.apache.hive" % "hive-jdbc" % "3.1.2" )
- Click Refresh on the SBT panel to download dependencies.
Step 5: Configure Hive and Spark Integration
- Install Hadoop and Hive:
- Download Hadoop and Hive binaries.
- Set
HADOOP_HOME
andHIVE_HOME
in environment variables.
- Configure
hive-site.xml
insideconf/
directory of Spark:<configuration> <property> <name>hive.metastore.uris</name> <value>thrift://localhost:9083</value> </property> </configuration>
- Start the Hive metastore:
hive --service metastore &
Step 6: Write Data to Hive Using Spark
Create a Scala object SparkHiveExample.scala
and add the following code:
import org.apache.spark.sql.SparkSession
object SparkHiveExample {
def main(args: Array[String]): Unit = {
val spark = SparkSession.builder()
.appName("Spark Hive Example")
.config("spark.sql.catalogImplementation", "hive")
.enableHiveSupport()
.getOrCreate()
import spark.implicits._
// Create a DataFrame
val df = Seq((1, "Alice"), (2, "Bob"), (3, "Charlie"))
.toDF("id", "name")
// Write DataFrame to Hive table
df.write.mode("overwrite").saveAsTable("users")
// Read from Hive
val dfRead = spark.sql("SELECT * FROM users")
dfRead.show()
spark.stop()
}
}
Step 7: Run the Spark Hive Application
- Open a terminal in IntelliJ and navigate to the project directory.
- Compile and run the project:
sbt package sbt run
- Open Hive and verify the table:
SELECT * FROM users;
Conclusion
You’ve successfully set up Spark with Scala in IntelliJ on Windows and written data to Hive. This setup enables you to perform big data processing and SQL-based querying efficiently. 🚀
Watch on YouTube
Spark Scala + IntelliJ on Windows