Wednesday, March 19, 2025

Spark Scala + IntelliJ on Windows: Step-by-Step Guide to Writing to Hive

0 comments

 



Spark Scala + IntelliJ on Windows: Step-by-Step Guide to Writing to Hive

Spark Scala + IntelliJ on Windows: Step-by-Step Guide to Writing to Hive

Apache Spark is a powerful distributed computing framework, and when combined with Scala, it provides a robust environment for big data processing. This guide will walk you through setting up Spark with Scala in IntelliJ on Windows and writing data to Apache Hive.


Watch on YouTube

Spark Scala + IntelliJ on Windows


Prerequisites

Before proceeding, ensure you have the following installed:

  • Java (JDK 8 or later) – Required for Spark execution
  • Scala (Latest version) – Used for Spark programming
  • Apache Spark – A powerful data processing engine
  • IntelliJ IDEA (Community or Ultimate Edition) – IDE for Scala development
  • SBT (Scala Build Tool) – To manage dependencies
  • Hadoop & Hive – Hive for SQL-like querying

Step 1: Install and Configure Java & Scala

  1. Download and install Java JDK 8 or later from Oracle’s website.
  2. Set JAVA_HOME in your environment variables.
  3. Download and install Scala from Scala’s official website.
  4. Verify installation by running:
    java -version
    scala -version
    

Step 2: Install and Set Up Apache Spark

  1. Download Apache Spark from Spark’s official website.
  2. Extract the Spark folder and set environment variables:
    • SPARK_HOME → Path to extracted Spark folder
    • Add %SPARK_HOME%\bin to the system PATH
  3. Verify Spark installation:
    spark-shell
    

Step 3: Install IntelliJ IDEA and Set Up Scala Plugin

  1. Download and install IntelliJ IDEA from JetBrains.
  2. Open IntelliJ, navigate to File → Settings → Plugins, search for Scala, and install it.

Step 4: Create a Spark Scala Project with SBT

  1. Open IntelliJ and select New Project.
  2. Choose Scala and select SBT as the build tool.
  3. Set Project SDK to JDK 8 or later.
  4. Click Finish to create the project.
  5. Modify build.sbt to include Spark dependencies:
    name := "SparkHiveExample"
    version := "1.0"
    scalaVersion := "2.12.15"
    
    libraryDependencies ++= Seq(
        "org.apache.spark" %% "spark-core" % "3.2.1",
        "org.apache.spark" %% "spark-sql" % "3.2.1",
        "org.apache.hive" % "hive-jdbc" % "3.1.2"
    )
    
  6. Click Refresh on the SBT panel to download dependencies.

Step 5: Configure Hive and Spark Integration

  1. Install Hadoop and Hive:
    • Download Hadoop and Hive binaries.
    • Set HADOOP_HOME and HIVE_HOME in environment variables.
  2. Configure hive-site.xml inside conf/ directory of Spark:
    <configuration>
        <property>
            <name>hive.metastore.uris</name>
            <value>thrift://localhost:9083</value>
        </property>
    </configuration>
    
  3. Start the Hive metastore:
    hive --service metastore &
    

Step 6: Write Data to Hive Using Spark

Create a Scala object SparkHiveExample.scala and add the following code:

import org.apache.spark.sql.SparkSession

object SparkHiveExample {
  def main(args: Array[String]): Unit = {
    val spark = SparkSession.builder()
      .appName("Spark Hive Example")
      .config("spark.sql.catalogImplementation", "hive")
      .enableHiveSupport()
      .getOrCreate()

    import spark.implicits._

    // Create a DataFrame
    val df = Seq((1, "Alice"), (2, "Bob"), (3, "Charlie"))
      .toDF("id", "name")
    
    // Write DataFrame to Hive table
    df.write.mode("overwrite").saveAsTable("users")
    
    // Read from Hive
    val dfRead = spark.sql("SELECT * FROM users")
    dfRead.show()

    spark.stop()
  }
}

Step 7: Run the Spark Hive Application

  1. Open a terminal in IntelliJ and navigate to the project directory.
  2. Compile and run the project:
    sbt package
    sbt run
    
  3. Open Hive and verify the table:
    SELECT * FROM users;
    

Conclusion

You’ve successfully set up Spark with Scala in IntelliJ on Windows and written data to Hive. This setup enables you to perform big data processing and SQL-based querying efficiently. 🚀


Watch on YouTube

Spark Scala + IntelliJ on Windows


No comments:

Post a Comment