Home Explore Blog CI



nushell

6th chunk of `book/dataframes.md`
501ce1706fbd5324cad7d6cbba6dc693ee7cab104c2fc9e700000001000014d5
the [`polars group-by`](/commands/docs/polars_group-by.md). This command will allow you to perform aggregation operations
based on a grouping criteria. In Nushell, a `GroupBy` is a type of object that
can be stored and reused for multiple aggregations. This is quite handy, since
the creation of the grouped pairs is the most expensive operation while doing
group-by and there is no need to repeat it if you are planning to do multiple
operations with the same group condition.

To create a `GroupBy` object you only need to use the [`polars_group-by`](/commands/docs/polars_group-by.md) command

```nu
let group = $df_1 | polars group-by first
$group
# => ╭─────────────┬──────────────────────────────────────────────╮
# => │ LazyGroupBy │ apply aggregation to complete execution plan │
# => ╰─────────────┴──────────────────────────────────────────────╯
```

When printing the `GroupBy` object we can see that it is in the background a
lazy operation waiting to be completed by adding an aggregation. Using the
`GroupBy` we can create aggregations on a column

```nu
$group | polars agg (polars col int_1 | polars sum)
# => ╭────────────────┬───────────────────────────────────────────────────────────────────────────────────────╮
# => │ plan           │ AGGREGATE                                                                             │
# => │                │     [col("int_1").sum()] BY [col("first")] FROM                                       │
# => │                │   DF ["int_1", "int_2", "float_1", "float_2"]; PROJECT */8 COLUMNS; SELECTION: "None" │
# => │ optimized_plan │ AGGREGATE                                                                             │
# => │                │     [col("int_1").sum()] BY [col("first")] FROM                                       │
# => │                │   DF ["int_1", "int_2", "float_1", "float_2"]; PROJECT 2/8 COLUMNS; SELECTION: "None" │
# => ╰────────────────┴───────────────────────────────────────────────────────────────────────────────────────╯
```

or we can define multiple aggregations on the same or different columns

```nu
$group
| polars agg [
    (polars col int_1 | polars n-unique)
    (polars col int_2 | polars min)
    (polars col float_1 | polars sum)
    (polars col float_2 | polars count)
] | polars sort-by first
# => ╭────────────────┬─────────────────────────────────────────────────────────────────────────────────────────────────────╮
# => │ plan           │ SORT BY [col("first")]                                                                              │
# => │                │   AGGREGATE                                                                                         │
# => │                │       [col("int_1").n_unique(), col("int_2").min(), col("float_1")                                  │
# => │                │ .sum(), col("float_2").count()] BY [col("first")] FROM                                              │
# => │                │     DF ["int_1", "int_2", "float_1", "float_2                                                       │
# => │                │ "]; PROJECT */8 COLUMNS; SELECTION: "None"                                                          │
# => │ optimized_plan │ SORT BY [col("first")]                                                                              │
# => │                │   AGGREGATE                                                                                         │
# => │                │       [col("int_1").n_unique(), col("int_2").min(), col("float_1")                                  │
# => │                │ .sum(), col("float_2").count()] BY [col("first")] FROM                                              │
# => │                │     DF ["int_1", "int_2", "float_1", "float_2                                                       │
# => │                │ "]; PROJECT 5/8 COLUMNS; SELECTION: "None"                                                          │
# => ╰────────────────┴─────────────────────────────────────────────────────────────────────────────────────────────────────╯

Title: Using GroupBy Objects for Aggregation in Polars
Summary
This section demonstrates how to use a `GroupBy` object created with `polars group-by` to perform aggregation operations on DataFrames. It shows examples of applying single and multiple aggregations to columns, including calculating sums, minimums, unique counts, and sorting the results. The `GroupBy` object represents a lazy operation that needs an aggregation to be fully executed.