Title here
Summary here
broadcast(df)
broadcast - Marks a DataFrame as small enough for use in broadcast joins.
cols
: Column or str. list of columns to work on.from pyspark.sql import types
df = spark.createDataFrame([1, 2, 3, 3, 4], types.IntegerType())
df_small = spark.range(3)
df_b = broadcast(df_small)
df.join(df_b, df.value == df_small.id).show()
+-----+---+
|value| id|
+-----+---+
| 1| 1|
| 2| 2|
+-----+---+
pyspark broadcast
Добавлено в версии | 1.3.1 |
Обновлено в версии | 3.2.2 |