DataFrames: Aggregations, Selection, Storage, and Joins

# => │ 7 │ 8 │ 18 │ 0.80 │ 7.00 │ c │ c │ b │ eight │ # => │ 8 │ 9 │ 19 │ 0.90 │ 8.00 │ c │ c │ b │ ninth │ # => │ 9 │ 0 │ 10 │ 0.00 │ 9.00 │ c │ c │ b │ ninth │ # => ╰───┴───────┴───────┴─────────┴─────────┴───────┴────────┴───────┴────────╯ ``` With the dataframe in memory we can start doing column operations with the `DataFrame` ::: tip If you want to see all the dataframe commands that are available you can use `scope commands | where category =~ dataframe` ::: ## Basic Aggregations Let's start with basic aggregations on the dataframe. Let's sum all the columns that exist in `df` by using the `aggregate` command ```nu $df_1 | polars sum | polars collect # => ╭───┬───────┬───────┬─────────┬─────────┬───────┬────────┬───────┬──────╮ # => │ # │ int_1 │ int_2 │ float_1 │ float_2 │ first │ second │ third │ word │ # => ├───┼───────┼───────┼─────────┼─────────┼───────┼────────┼───────┼──────┤ # => │ 0 │ 40 │ 145 │ 4.50 │ 46.00 │ │ │ │ │ # => ╰───┴───────┴───────┴─────────┴─────────┴───────┴────────┴───────┴──────╯ ``` As you can see, the aggregate function computes the sum for those columns where a sum makes sense. If you want to filter out the text column, you can select the columns you want by using the [`polars select`](/commands/docs/polars_select.md) command ```nu $df_1 | polars sum | polars select int_1 int_2 float_1 float_2 | polars collect # => ╭───┬───────┬───────┬─────────┬─────────╮ # => │ # │ int_1 │ int_2 │ float_1 │ float_2 │ # => ├───┼───────┼───────┼─────────┼─────────┤ # => │ 0 │ 40 │ 145 │ 4.50 │ 46.00 │ # => ╰───┴───────┴───────┴─────────┴─────────╯ ``` You can even store the result from this aggregation as you would store any other Nushell variable ```nu let res = $df_1 | polars sum | polars select int_1 int_2 float_1 float_2 ``` ::: tip Type `let res = !!` and press enter. This will auto complete the previously executed command. Note the space between `=` and `!!`. ::: And now we have two dataframes stored in memory ```nu polars store-ls | select key type columns rows estimated_size ╭──────────────────────────────────────┬───────────┬─────────┬──────┬────────────────╮ │ key │ type │ columns │ rows │ estimated_size │ ├──────────────────────────────────────┼───────────┼─────────┼──────┼────────────────┤ │ e780af47-c106-49eb-b38d-d42d3946d66e │ DataFrame │ 8 │ 10 │ 403 B │ │ 3146f4c1-f2a0-475b-a623-7375c1fdb4a7 │ DataFrame │ 4 │ 1 │ 32 B │ ╰──────────────────────────────────────┴───────────┴─────────┴──────┴────────────────╯ ``` Pretty neat, isn't it? You can perform several aggregations on the dataframe in order to extract basic information from the dataframe and do basic data analysis on your brand new dataframe. ## Joining a DataFrame It is also possible to join two dataframes using a column as reference. We are going to join our mini dataframe with another mini dataframe. Copy these lines in another file and create the corresponding dataframe (for these examples we are going to call it `test_small_a.csv`) ```nu "int_1,int_2,float_1,float_2,first 9,14,0.4,3.0,a 8,13,0.3,2.0,a 7,12,0.2,1.0,a 6,11,0.1,0.0,b" | save --raw --force test_small_a.csv ``` We use the `polars open` command to create the new variable ```nu let df_2 = polars open --eager test_small_a.csv ``` Now, with the second dataframe loaded in memory we can join them using the column called `int_1` from the left dataframe and the column `int_1` from the right dataframe ```nu $df_1 | polars join $df_2 int_1 int_1 # => ╭───┬───────┬───────┬─────────┬─────────┬───────┬────────┬───────┬────────┬─────────┬───────────┬───────────┬─────────╮ # => │ # │ int_1 │ int_2 │ float_1 │ float_2 │ first │ second │ third │ word │ int_2_x │ float_1_x │ float_2_x │ first_x │ # => ├───┼───────┼───────┼─────────┼─────────┼───────┼────────┼───────┼────────┼─────────┼───────────┼───────────┼─────────┤

This section elaborates on performing aggregations on DataFrames using `polars sum` and selecting specific columns with `polars select`. It demonstrates storing the results of these operations as Nushell variables and listing stored DataFrames with `polars store-ls`. Additionally, it explains how to join two DataFrames using `polars join` with a common column as a reference, creating a new DataFrame with merged data.