Memory Optimization, Working with Series in Polars

# => │ 0d8532a5-083b-4f78-8f66-b5e6b59dc449 │ LazyGroupBy │ │ │ │ # => │ 9504dfaf-4782-42d4-9110-9dae7c8fb95b │ DataFrame │ 2 │ 3 │ 48 B │ # => │ 37ab1bdc-e1fb-426d-8006-c3f974764a3d │ DataFrame │ 4 │ 3 │ 96 B │ # => ╰──────────────────────────────────────┴─────────────┴─────────┴──────┴────────────────╯ ``` One thing that is important to mention is how the memory is being optimized while working with dataframes, and this is thanks to **Apache Arrow** and **Polars**. In a very simple representation, each column in a DataFrame is an Arrow Array, which is using several memory specifications in order to maintain the data as packed as possible (check [Arrow columnar format](https://arrow.apache.org/docs/format/Columnar.html)). The other optimization trick is the fact that whenever possible, the columns from the dataframes are shared between dataframes, avoiding memory duplication for the same data. This means that dataframes `$df_3` and `$df_4` are sharing the same two columns we created using the `polars into-df` command. For this reason, it isn't possible to change the value of a column in a dataframe. However, you can create new columns based on data from other columns or dataframes. ## Working with Series A `Series` is the building block of a `DataFrame`. Each Series represents a column with the same data type, and we can create multiple Series of different types, such as float, int or string. Let's start our exploration with Series by creating one using the `polars into-df` command: ```nu let df_5 = [9 8 4] | polars into-df $df_5 # => ╭───┬───╮ # => │ # │ 0 │ # => ├───┼───┤ # => │ 0 │ 9 │ # => │ 1 │ 8 │ # => │ 2 │ 4 │ # => ╰───┴───╯ ``` We have created a new series from a list of integers (we could have done the same using floats or strings) Series have their own basic operations defined, and they can be used to create other Series. Let's create a new Series by doing some arithmetic on the previously created column. ```nu let df_6 = $df_5 * 3 + 10 $df_6 # => ╭───┬────╮ # => │ # │ 0 │ # => ├───┼────┤ # => │ 0 │ 37 │ # => │ 1 │ 34 │ # => │ 2 │ 22 │ # => ╰───┴────╯ ``` Now we have a new Series that was constructed by doing basic operations on the previous variable. ::: tip If you want to see how many variables you have stored in memory you can use `scope variables` ::: Let's rename our previous Series so it has a memorable name ```nu let df_7 = $df_6 | polars rename "0" memorable $df_7 # => ╭───┬───────────╮ # => │ # │ memorable │ # => ├───┼───────────┤ # => │ 0 │ 37 │ # => │ 1 │ 34 │ # => │ 2 │ 22 │ # => ╰───┴───────────╯ ``` We can also do basic operations with two Series as long as they have the same data type ```nu $df_5 - $df_7 # => ╭───┬─────────────────╮ # => │ # │ sub_0_memorable │ # => ├───┼─────────────────┤ # => │ 0 │ -28 │ # => │ 1 │ -26 │ # => │ 2 │ -18 │ # => ╰───┴─────────────────╯ ``` And we can add them to previously defined dataframes ```nu let df_8 = $df_3 | polars with-column $df_5 --name new_col $df_8 # => ╭───┬───┬───┬─────────╮ # => │ # │ a │ b │ new_col │ # => ├───┼───┼───┼─────────┤ # => │ 0 │ 1 │ 2 │ 9 │ # => │ 1 │ 3 │ 4 │ 8 │ # => │ 2 │ 5 │ 6 │ 4 │ # => ╰───┴───┴───┴─────────╯ ``` The Series stored in a Dataframe can also be used directly, for example, we can multiply columns `a` and `b` to create a new Series ```nu $df_8.a * $df_8.b # => ╭───┬─────────╮ # => │ # │ mul_a_b │ # => ├───┼─────────┤ # => │ 0 │ 2 │ # => │ 1 │ 12 │ # => │ 2 │ 30 │ # => ╰───┴─────────╯ ``` and we can start piping things in order to create new columns and dataframes ```nu let df_9 = $df_8 | polars with-column ($df_8.a * $df_8.b / $df_8.new_col) --name my_sum $df_9 # => ╭───┬───┬───┬─────────┬────────╮ # => │ # │ a │ b │ new_col │ my_sum │ # => ├───┼───┼───┼─────────┼────────┤ # => │ 0 │ 1 │ 2 │ 9 │ 0 │ # => │ 1 │ 3 │ 4 │ 8 │ 1 │

This section discusses memory optimization in Polars using Apache Arrow, where DataFrame columns are Arrow Arrays. It emphasizes column sharing between DataFrames. It introduces Series as DataFrame building blocks, demonstrating creation with `polars into-df`, arithmetic operations, and renaming with `polars rename`. The content showcases combining Series, adding them to DataFrames using `polars with-column`, and creating new columns through operations on existing Series.