DataFrame Joins and Group-by Operations

going to join our mini dataframe with another mini dataframe. Copy these lines in another file and create the corresponding dataframe (for these examples we are going to call it `test_small_a.csv`) ```nu "int_1,int_2,float_1,float_2,first 9,14,0.4,3.0,a 8,13,0.3,2.0,a 7,12,0.2,1.0,a 6,11,0.1,0.0,b" | save --raw --force test_small_a.csv ``` We use the `polars open` command to create the new variable ```nu let df_2 = polars open --eager test_small_a.csv ``` Now, with the second dataframe loaded in memory we can join them using the column called `int_1` from the left dataframe and the column `int_1` from the right dataframe ```nu $df_1 | polars join $df_2 int_1 int_1 # => ╭───┬───────┬───────┬─────────┬─────────┬───────┬────────┬───────┬────────┬─────────┬───────────┬───────────┬─────────╮ # => │ # │ int_1 │ int_2 │ float_1 │ float_2 │ first │ second │ third │ word │ int_2_x │ float_1_x │ float_2_x │ first_x │ # => ├───┼───────┼───────┼─────────┼─────────┼───────┼────────┼───────┼────────┼─────────┼───────────┼───────────┼─────────┤ # => │ 0 │ 6 │ 16 │ 0.60 │ 5.00 │ b │ a │ a │ second │ 11 │ 0.10 │ 0.00 │ b │ # => │ 1 │ 7 │ 17 │ 0.70 │ 6.00 │ b │ c │ a │ third │ 12 │ 0.20 │ 1.00 │ a │ # => │ 2 │ 8 │ 18 │ 0.80 │ 7.00 │ c │ c │ b │ eight │ 13 │ 0.30 │ 2.00 │ a │ # => │ 3 │ 9 │ 19 │ 0.90 │ 8.00 │ c │ c │ b │ ninth │ 14 │ 0.40 │ 3.00 │ a │ # => ╰───┴───────┴───────┴─────────┴─────────┴───────┴────────┴───────┴────────┴─────────┴───────────┴───────────┴─────────╯ ``` ::: tip In `Nu` when a command has multiple arguments that are expecting multiple values we use brackets `[]` to enclose those values. In the case of [`polars join`](/commands/docs/polars_join.md) we can join on multiple columns as long as they have the same type. ::: For example: ```nu $df_1 | polars join $df_2 [int_1 first] [int_1 first] # => ╭───┬───────┬───────┬─────────┬─────────┬───────┬────────┬───────┬────────┬─────────┬───────────┬───────────╮ # => │ # │ int_1 │ int_2 │ float_1 │ float_2 │ first │ second │ third │ word │ int_2_x │ float_1_x │ float_2_x │ # => ├───┼───────┼───────┼─────────┼─────────┼───────┼────────┼───────┼────────┼─────────┼───────────┼───────────┤ # => │ 0 │ 6 │ 16 │ 0.60 │ 5.00 │ b │ a │ a │ second │ 11 │ 0.10 │ 0.00 │ # => ╰───┴───────┴───────┴─────────┴─────────┴───────┴────────┴───────┴────────┴─────────┴───────────┴───────────╯ ``` By default, the join command does an inner join, meaning that it will keep the rows where both dataframes share the same value. You can select a left join to keep the missing rows from the left dataframe. You can also save this result in order to use it for further operations. ## DataFrame group-by One of the most powerful operations that can be performed with a DataFrame is the [`polars group-by`](/commands/docs/polars_group-by.md). This command will allow you to perform aggregation operations based on a grouping criteria. In Nushell, a `GroupBy` is a type of object that can be stored and reused for multiple aggregations. This is quite handy, since the creation of the grouped pairs is the most expensive operation while doing group-by and there is no need to repeat it if you are planning to do multiple operations with the same group condition. To create a `GroupBy` object you only need to use the [`polars_group-by`](/commands/docs/polars_group-by.md) command ```nu let group = $df_1 | polars group-by first $group # => ╭─────────────┬──────────────────────────────────────────────╮ # => │ LazyGroupBy │ apply aggregation to complete execution plan │ # => ╰─────────────┴──────────────────────────────────────────────╯ ``` When printing the `GroupBy` object we can see that it is in the background a lazy operation waiting to be completed by adding an aggregation. Using the

This section details how to join DataFrames using specified columns and different join types (inner, left), with a demonstration of joining on multiple columns. It then introduces the `polars group-by` command for creating a `GroupBy` object, which is a lazy operation waiting for an aggregation to be applied.