Working with DataFrames: Creation, Inspection, and Basic Aggregations

When we execute our next commands, we will start a new instance of plugin. ```nu plugin stop polars ``` ## Working with Dataframes After seeing a glimpse of the things that can be done with [`Dataframe` commands](/commands/categories/dataframe.md), now it is time to start testing them. To begin let's create a sample CSV file that will become our sample dataframe that we will be using along with the examples. In your favorite file editor paste the next lines to create out sample csv file. ```nu ("int_1,int_2,float_1,float_2,first,second,third,word 1,11,0.1,1.0,a,b,c,first 2,12,0.2,1.0,a,b,c,second 3,13,0.3,2.0,a,b,c,third 4,14,0.4,3.0,b,a,c,second 0,15,0.5,4.0,b,a,a,third 6,16,0.6,5.0,b,a,a,second 7,17,0.7,6.0,b,c,a,third 8,18,0.8,7.0,c,c,b,eight 9,19,0.9,8.0,c,c,b,ninth 0,10,0.0,9.0,c,c,b,ninth" | save --raw --force test_small.csv) ``` Save the file and name it however you want to, for the sake of these examples the file will be called `test_small.csv`. Now, to read that file as a dataframe use the `polars open` command like this: ```nu let df_1 = polars open --eager test_small.csv ``` This should create the value `$df_1` in memory which holds the data we just created. ::: tip The `polars open` command can read files in formats: **csv**, **tsv**, **parquet**, **json(l)**, **arrow**, and **avro**. ::: To see all the dataframes that are stored in memory you can use ```nu polars store-ls | select key type columns rows estimated_size # => ╭──────────────────────────────────────┬───────────┬─────────┬──────┬────────────────╮ # => │ key │ type │ columns │ rows │ estimated_size │ # => ├──────────────────────────────────────┼───────────┼─────────┼──────┼────────────────┤ # => │ e780af47-c106-49eb-b38d-d42d3946d66e │ DataFrame │ 8 │ 10 │ 403 B │ # => ╰──────────────────────────────────────┴───────────┴─────────┴──────┴────────────────╯ ``` As you can see, the command shows the created dataframes together with basic information about them. And if you want to see a preview of the loaded dataframe you can send the dataframe variable to the stream ```nu $df_1 # => ╭───┬───────┬───────┬─────────┬─────────┬───────┬────────┬───────┬────────╮ # => │ # │ int_1 │ int_2 │ float_1 │ float_2 │ first │ second │ third │ word │ # => ├───┼───────┼───────┼─────────┼─────────┼───────┼────────┼───────┼────────┤ # => │ 0 │ 1 │ 11 │ 0.10 │ 1.00 │ a │ b │ c │ first │ # => │ 1 │ 2 │ 12 │ 0.20 │ 1.00 │ a │ b │ c │ second │ # => │ 2 │ 3 │ 13 │ 0.30 │ 2.00 │ a │ b │ c │ third │ # => │ 3 │ 4 │ 14 │ 0.40 │ 3.00 │ b │ a │ c │ second │ # => │ 4 │ 0 │ 15 │ 0.50 │ 4.00 │ b │ a │ a │ third │ # => │ 5 │ 6 │ 16 │ 0.60 │ 5.00 │ b │ a │ a │ second │ # => │ 6 │ 7 │ 17 │ 0.70 │ 6.00 │ b │ c │ a │ third │ # => │ 7 │ 8 │ 18 │ 0.80 │ 7.00 │ c │ c │ b │ eight │ # => │ 8 │ 9 │ 19 │ 0.90 │ 8.00 │ c │ c │ b │ ninth │ # => │ 9 │ 0 │ 10 │ 0.00 │ 9.00 │ c │ c │ b │ ninth │ # => ╰───┴───────┴───────┴─────────┴─────────┴───────┴────────┴───────┴────────╯ ``` With the dataframe in memory we can start doing column operations with the `DataFrame` ::: tip If you want to see all the dataframe commands that are available you can use `scope commands | where category =~ dataframe` ::: ## Basic Aggregations Let's start with basic aggregations on the dataframe. Let's sum all the columns that exist in `df` by using the `aggregate` command ```nu $df_1 | polars sum | polars collect # => ╭───┬───────┬───────┬─────────┬─────────┬───────┬────────┬───────┬──────╮ # => │ # │ int_1 │ int_2 │ float_1 │ float_2 │ first │ second │ third │ word │ # => ├───┼───────┼───────┼─────────┼─────────┼───────┼────────┼───────┼──────┤ # => │ 0 │ 40 │ 145 │ 4.50 │ 46.00 │ │ │ │ │

After stopping the Polars plugin, this section details how to create a sample CSV file and load it as a DataFrame using `polars open`. It explains how to list stored DataFrames with `polars store-ls` and preview their contents by sending the DataFrame variable to the stream. The section then demonstrates basic column aggregations using the `polars sum` and `polars collect` commands to calculate the sum of all columns in the DataFrame.