Skip to content

hours

Tags: partition functions

Description

The hours() function is a partition transformation function that extracts the hour component from a timestamp column. It is commonly used for partitioning data by hour.

Parameters

  • col: Column - a timestamp column

Return Value

Column - the hour component as an integer (0-23)

Example

from pyspark.sql import SparkSession
from pyspark.sql.functions import hours

# Create a Spark session
spark = SparkSession.builder.appName("hours_example").getOrCreate()

# Create a DataFrame with sample timestamps
data = [("2023-01-15 10:30:00",), ("2022-12-31 23:59:59",), ("2024-03-20 15:45:30",)]
df = spark.createDataFrame(data, ["timestamp"])
df = df.withColumn("timestamp", df.timestamp.cast("timestamp"))

# Extract hours
df = df.withColumn("hour", hours("timestamp"))
df.show()

# Output:
# +-------------------+----+
# |          timestamp|hour|
# +-------------------+----+
# |2023-01-15 10:30:00|  10|
# |2022-12-31 23:59:59|  23|
# |2024-03-20 15:45:30|  15|
# +-------------------+----+

Notes

  • The function extracts the hour component from a timestamp value
  • Returns an integer representing the hour of the day (0-23)
  • Useful for partitioning data by hour
  • Can be used in combination with other partition functions like years(), months(), and days()
  • Returns NULL if the input is NULL