days
Tags: partition functions
Description
The days()
function is a partition transformation function that extracts the day component from a timestamp or date column. It is commonly used for partitioning data by day.
Parameters
col
: Column - a timestamp or date column
Return Value
Column - the day component as an integer (1-31)
Example
from pyspark.sql import SparkSession
from pyspark.sql.functions import days
# Create a Spark session
spark = SparkSession.builder.appName("days_example").getOrCreate()
# Create a DataFrame with sample timestamps
data = [("2023-01-15 10:30:00",), ("2022-12-31 23:59:59",), ("2024-03-20 15:45:30",)]
df = spark.createDataFrame(data, ["timestamp"])
df = df.withColumn("timestamp", df.timestamp.cast("timestamp"))
# Extract days
df = df.withColumn("day", days("timestamp"))
df.show()
# Output:
# +-------------------+---+
# | timestamp|day|
# +-------------------+---+
# |2023-01-15 10:30:00| 15|
# |2022-12-31 23:59:59| 31|
# |2024-03-20 15:45:30| 20|
# +-------------------+---+
Notes
- The function extracts the day component from a timestamp or date value
- Returns an integer representing the day of the month (1-31)
- Useful for partitioning data by day
- Can be used in combination with other partition functions like
years()
andmonths()
- Returns NULL if the input is NULL