broadcast2

broadcast(df)	

broadcast - Marks a DataFrame as small enough for use in broadcast joins.

Параметры
cols: Column or str. list of columns to work on.
Возвращает
dataframe для broadcat join.

Пример

from pyspark.sql import types
df = spark.createDataFrame([1, 2, 3, 3, 4], types.IntegerType())
df_small = spark.range(3)
df_b = broadcast(df_small)
df.join(df_b, df.value == df_small.id).show()

+-----+---+
|value| id|
+-----+---+
|    1|  1|
|    2|  2|
+-----+---+

pyspark broadcast

Добавлено в версии1.3.1
Обновлено в версии3.2.2