Saturday, March 15, 2025

Exploring Delta Table Time Travel in Databricks

0 comments


Databricks provides powerful capabilities for working with Delta tables, enabling users to manage data efficiently. One of the most exciting features is Time Travel, which allows you to access previous versions of a table, track changes, and recover lost data. In this blog, we’ll walk through a hands-on demonstration of Delta Table Time Travel in Databricks.


Watch more on YouTube

Delta Table Time Travel


Understanding Delta Tables

Delta tables enable structured data management in Apache Spark with ACID transactions, schema enforcement, and indexing. Using Spark SQL, you can perform transformations such as filtering, grouping, and ordering before saving results as a Delta table. Once saved, these tables are accessible across applications and can be queried using Databricks SQL.

Setting Up a Delta Table in Databricks

To begin, we use a Databricks notebook to create and interact with a Delta table:

  1. Open your Python notebook under the Workspace section.

  2. Attach the notebook to a running cluster.

  3. Create a DataFrame and store it in a Delta table using the write command.

ordered_df.write.format("delta").mode("overwrite").saveAsTable("Clarity_Effect_Skills")

This command saves the data in a Delta format, making it accessible via Spark SQL.

Querying the Delta Table

Once the table is created, you can query it using Spark SQL:

SELECT * FROM Clarity_Effect_Skills;

Alternatively, you can use Python:

df = spark.sql("SELECT * FROM Clarity_Effect_Skills")
df.show()

Databricks also allows querying through a SQL notebook for a seamless experience.

Tracking Changes with Describe History

Delta tables maintain an audit trail of changes. You can retrieve this history using the following SQL command:

DESCRIBE HISTORY Clarity_Effect_Skills;

This command displays all operations, such as insert, delete, update, along with timestamps and user details. It’s a useful feature for auditing and debugging.

Performing Updates and Inserts

Let’s modify the table by updating a record:

UPDATE Clarity_Effect_Skills SET clarity = 'pH Scales' WHERE clarity = 'C2';

After performing this update, running DESCRIBE HISTORY again will show a new version of the table.


Watch more on YouTube

Delta Table Time Travel


Delta Table Storage and Versioning

Each table version is stored as Parquet files under the Hive Warehouse directory in Databricks. You can list the files using the following command:

%fs ls dbfs:/user/hive/warehouse/clarity_effect_skills

Each update generates a new version, enabling Time Travel functionality.

Time Travel: Querying Previous Versions

With Time Travel, you can view previous versions of a table by specifying a timestamp or version number.

Query by Timestamp

To query a past version based on a timestamp:

SELECT * FROM Clarity_Effect_Skills TIMESTAMP AS OF '2024-03-01T12:00:00Z';

Query by Version Number

To retrieve data from a specific version:

SELECT * FROM Clarity_Effect_Skills VERSION AS OF 1;

By accessing older versions, users can recover lost records, analyze historical trends, and validate data changes.

Use Cases of Time Travel

  1. Accidental Data Deletion Recovery – Restore data lost due to unintended modifications.

  2. Historical Data Analysis – Compare different table versions to analyze trends.

  3. Auditing and Compliance – Track who made changes and when they occurred.

Conclusion

Delta Table Time Travel is a powerful feature that enhances data reliability, making it easy to track changes and recover previous data states. With Databricks, users can leverage SQL and Spark APIs to interact with Delta tables efficiently. Whether you are debugging, auditing, or restoring lost records, Time Travel provides a seamless solution for managing historical data.


Watch more on YouTube

Delta Table Time Travel



No comments:

Post a Comment