Wednesday, March 19, 2025

Installing Apache Airflow on a Google Cloud VM

0 comments


Installing Apache Airflow on a Google Cloud VM (No Docker)

Apache Airflow is a powerful workflow automation tool used for scheduling and monitoring workflows. Installing it on a Google Cloud VM without Docker ensures flexibility and better integration with cloud services. This guide walks you through the installation process step by step.

 Watch on YouTube

Installing Apache Airflow


Prerequisites

Before starting, ensure you have the following:

  • A Google Cloud Platform (GCP) account.
  • A Compute Engine VM instance (Ubuntu recommended).
  • SSH access to the VM.
  • Python 3.8 or later installed.

Step 1: Update and Install Dependencies

Start by updating the system and installing essential packages:

sudo apt update && sudo apt upgrade -y
sudo apt install -y python3-pip python3-venv

Step 2: Create a Virtual Environment

Using a virtual environment helps isolate dependencies:

python3 -m venv airflow-venv
source airflow-venv/bin/activate

Step 3: Install Apache Airflow

Set the Airflow home directory and install Airflow:

export AIRFLOW_HOME=~/airflow
pip install apache-airflow==2.7.2 --constraint 
"https://raw.githubusercontent.com/apache/
airflow/constraints-2.7.2/constraints-3.8.txt"

Step 4: Initialize the Airflow Database

Airflow requires a database to store metadata:

airflow db init

Step 5: Create an Admin User

Create an admin user to access the Airflow UI:

airflow users create \
    --username admin \
    --password admin \
    --firstname Admin \
    --lastname User \
    --role Admin \
    --email admin@example.com

Step 6: Start Airflow Services

Run the scheduler and webserver in separate terminals:

airflow scheduler

Open another terminal and run:

airflow webserver --port 8080

Step 7: Access Airflow UI

Once the webserver starts, access the UI via:

http://<VM_EXTERNAL_IP>:8080

To find the external IP, run:

gcloud compute instances list

Step 8: Set Up Airflow to Start on Boot (Optional)

To ensure Airflow starts automatically after reboot, create systemd service files for the scheduler and webserver.

Scheduler Service

sudo nano /etc/systemd/system/airflow-scheduler.service

Add the following:

[Unit]
Description=Apache Airflow Scheduler
After=network.target

[Service]
User=<your-user>
Group=<your-group>
Environment="AIRFLOW_HOME=/home/<your-user>/airflow"
ExecStart=/home/<your-user>/airflow-venv/bin/airflow scheduler
Restart=always

[Install]
WantedBy=multi-user.target

Webserver Service

sudo nano /etc/systemd/system/airflow-webserver.service

Add the following:

[Unit]
Description=Apache Airflow Webserver
After=network.target

[Service]
User=<your-user>
Group=<your-group>
Environment="AIRFLOW_HOME=/home/<your-user>/airflow"
ExecStart=/home/<your-user>/airflow-venv/bin/airflow 
webserver --port 8080
Restart=always

[Install]
WantedBy=multi-user.target

Enable the services:

sudo systemctl enable airflow-scheduler
sudo systemctl enable airflow-webserver

Start the services:

sudo systemctl start airflow-scheduler
sudo systemctl start airflow-webserver

Conclusion

You have now installed Apache Airflow on a Google Cloud VM without using Docker. This setup provides flexibility and better resource management, making it ideal for production use. You can now begin creating and scheduling workflows efficiently.

 Watch on YouTube

Installing Apache Airflow

No comments:

Post a Comment