# => │ │ "]; PROJECT */8 COLUMNS; SELECTION: "None" │
# => │ optimized_plan │ SORT BY [col("first")] │
# => │ │ AGGREGATE │
# => │ │ [col("int_1").n_unique(), col("int_2").min(), col("float_1") │
# => │ │ .sum(), col("float_2").count()] BY [col("first")] FROM │
# => │ │ DF ["int_1", "int_2", "float_1", "float_2 │
# => │ │ "]; PROJECT 5/8 COLUMNS; SELECTION: "None" │
# => ╰────────────────┴─────────────────────────────────────────────────────────────────────────────────────────────────────╯
```
As you can see, the `GroupBy` object is a very powerful variable and it is
worth keeping in memory while you explore your dataset.
## Creating Dataframes
It is also possible to construct dataframes from basic Nushell primitives, such
as integers, decimals, or strings. Let's create a small dataframe using the
command `polars into-df`.
```nu
let df_3 = [[a b]; [1 2] [3 4] [5 6]] | polars into-df
$df_3
# => ╭───┬───┬───╮
# => │ # │ a │ b │
# => ├───┼───┼───┤
# => │ 0 │ 1 │ 2 │
# => │ 1 │ 3 │ 4 │
# => │ 2 │ 5 │ 6 │
# => ╰───┴───┴───╯
```
::: tip
For the time being, not all of Nushell primitives can be converted into
a dataframe. This will change in the future, as the dataframe feature matures
:::
We can append columns to a dataframe in order to create a new variable. As an
example, let's append two columns to our mini dataframe `$df_3`
```nu
let df_4 = $df_3 | polars with-column $df_3.a --name a2 | polars with-column $df_3.a --name a3
$df_4
# => ╭───┬───┬───┬────┬────╮
# => │ # │ a │ b │ a2 │ a3 │
# => ├───┼───┼───┼────┼────┤
# => │ 0 │ 1 │ 2 │ 1 │ 1 │
# => │ 1 │ 3 │ 4 │ 3 │ 3 │
# => │ 2 │ 5 │ 6 │ 5 │ 5 │
# => ╰───┴───┴───┴────┴────╯
```
Nushell's powerful piping syntax allows us to create new dataframes by
taking data from other dataframes and appending it to them. Now, if you list your
dataframes you will see in total five dataframes
```nu
polars store-ls | select key type columns rows estimated_size
# => ╭──────────────────────────────────────┬─────────────┬─────────┬──────┬────────────────╮
# => │ key │ type │ columns │ rows │ estimated_size │
# => ├──────────────────────────────────────┼─────────────┼─────────┼──────┼────────────────┤
# => │ e780af47-c106-49eb-b38d-d42d3946d66e │ DataFrame │ 8 │ 10 │ 403 B │
# => │ 3146f4c1-f2a0-475b-a623-7375c1fdb4a7 │ DataFrame │ 4 │ 1 │ 32 B │
# => │ 455a1483-e328-43e2-a354-35afa32803b9 │ DataFrame │ 5 │ 4 │ 132 B │
# => │ 0d8532a5-083b-4f78-8f66-b5e6b59dc449 │ LazyGroupBy │ │ │ │
# => │ 9504dfaf-4782-42d4-9110-9dae7c8fb95b │ DataFrame │ 2 │ 3 │ 48 B │
# => │ 37ab1bdc-e1fb-426d-8006-c3f974764a3d │ DataFrame │ 4 │ 3 │ 96 B │
# => ╰──────────────────────────────────────┴─────────────┴─────────┴──────┴────────────────╯
```
One thing that is important to mention is how the memory is being optimized
while working with dataframes, and this is thanks to **Apache Arrow** and
**Polars**. In a very simple representation, each column in a DataFrame is an
Arrow Array, which is using several memory specifications in order to maintain
the data as packed as possible (check [Arrow columnar
format](https://arrow.apache.org/docs/format/Columnar.html)). The other
optimization trick is the fact that whenever possible, the columns from the
dataframes are shared between dataframes, avoiding memory duplication for the
same data. This means that dataframes `$df_3` and `$df_4` are sharing the same two