pyspark.sql.functions.last

pyspark.sql.functions.last(col: ColumnOrName, ignorenulls: bool = False) → pyspark.sql.column.Column[source]

Aggregate function: returns the last value in a group.

The function by default returns the last values it sees. It will return the last non-null value it sees when ignoreNulls is set to true. If all values are null, then null is returned.

New in version 1.3.0.

Changed in version 3.4.0: Supports Spark Connect.

Parameters
colColumn or str

column to fetch last value for.

ignorenullsColumn or str

if last value is null then look for non-null value.

Returns
Column

last value of the group.

Notes

The function is non-deterministic because its results depends on the order of the rows which may be non-deterministic after a shuffle.

Examples

>>>
>>> df = spark.createDataFrame([("Alice", 2), ("Bob", 5), ("Alice", None)], ("name", "age"))
>>> df = df.orderBy(df.age.desc())
>>> df.groupby("name").agg(last("age")).orderBy("name").show()
+-----+---------+
| name|last(age)|
+-----+---------+
|Alice|     NULL|
|  Bob|        5|
+-----+---------+

Now, to ignore any nulls we needs to set ignorenulls to True

>>>
>>> df.groupby("name").agg(last("age", ignorenulls=True)).orderBy("name").show()
+-----+---------+
| name|last(age)|
+-----+---------+
|Alice|        2|
|  Bob|        5|
+-----+---------+