Sorting DataFrames and Finding Unique Values in Polars

# => ╭───┬───────┬───────┬─────────┬─────────┬───────┬────────┬───────┬────────╮ # => │ # │ int_1 │ int_2 │ float_1 │ float_2 │ first │ second │ third │ word │ # => ├───┼───────┼───────┼─────────┼─────────┼───────┼────────┼───────┼────────┤ # => │ 0 │ 1 │ 11 │ 0.10 │ 1.00 │ a │ b │ c │ first │ # => │ 1 │ 4 │ 14 │ 0.40 │ 3.00 │ b │ a │ c │ second │ # => │ 2 │ 8 │ 18 │ 0.80 │ 7.00 │ c │ c │ b │ eight │ # => ╰───┴───────┴───────┴─────────┴─────────┴───────┴────────┴───────┴────────╯ ``` Or what if we want to create a new sorted dataframe using a column in specific. We can use the `arg-sort` to accomplish that. In the next example we can sort the dataframe by the column `word` ::: tip The same result could be accomplished using the command [`sort`](/commands/docs/sort.md) ::: ```nu let indices_2 = $df_1 | polars get word | polars arg-sort $df_1 | polars take $indices_2 # => ╭───┬───────┬───────┬─────────┬─────────┬───────┬────────┬───────┬────────╮ # => │ # │ int_1 │ int_2 │ float_1 │ float_2 │ first │ second │ third │ word │ # => ├───┼───────┼───────┼─────────┼─────────┼───────┼────────┼───────┼────────┤ # => │ 0 │ 8 │ 18 │ 0.80 │ 7.00 │ c │ c │ b │ eight │ # => │ 1 │ 1 │ 11 │ 0.10 │ 1.00 │ a │ b │ c │ first │ # => │ 2 │ 9 │ 19 │ 0.90 │ 8.00 │ c │ c │ b │ ninth │ # => │ 3 │ 0 │ 10 │ 0.00 │ 9.00 │ c │ c │ b │ ninth │ # => │ 4 │ 2 │ 12 │ 0.20 │ 1.00 │ a │ b │ c │ second │ # => │ 5 │ 4 │ 14 │ 0.40 │ 3.00 │ b │ a │ c │ second │ # => │ 6 │ 6 │ 16 │ 0.60 │ 5.00 │ b │ a │ a │ second │ # => │ 7 │ 3 │ 13 │ 0.30 │ 2.00 │ a │ b │ c │ third │ # => │ 8 │ 0 │ 15 │ 0.50 │ 4.00 │ b │ a │ a │ third │ # => │ 9 │ 7 │ 17 │ 0.70 │ 6.00 │ b │ c │ a │ third │ # => ╰───┴───────┴───────┴─────────┴─────────┴───────┴────────┴───────┴────────╯ ``` And finally, we can create new Series by setting a new value in the marked indices. Have a look at the next command ```nu let indices_3 = [0 2] | polars into-df $df_1 | polars get int_1 | polars set-with-idx 123 --indices $indices_3 # => ╭───┬───────╮ # => │ # │ int_1 │ # => ├───┼───────┤ # => │ 0 │ 123 │ # => │ 1 │ 2 │ # => │ 2 │ 123 │ # => │ 3 │ 4 │ # => │ 4 │ 0 │ # => │ 5 │ 6 │ # => │ 6 │ 7 │ # => │ 7 │ 8 │ # => │ 8 │ 9 │ # => │ 9 │ 0 │ # => ╰───┴───────╯ ``` ## Unique Values Another operation that can be done with `Series` is to search for unique values in a list or column. Lets use again the first dataframe we created to test these operations. The first and most common operation that we have is `value_counts`. This command calculates a count of the unique values that exist in a Series. For example, we can use it to count how many occurrences we have in the column `first` ```nu $df_1 | polars get first | polars value-counts # => ╭───┬───────┬───────╮ # => │ # │ first │ count │ # => ├───┼───────┼───────┤ # => │ 0 │ a │ 3 │ # => │ 1 │ b │ 4 │ # => │ 2 │ c │ 3 │ # => ╰───┴───────┴───────╯ ``` As expected, the command returns a new dataframe that can be used to do more queries. Continuing with our exploration of `Series`, the next thing that we can do is to only get the unique unique values from a series, like this ```nu $df_1 | polars get first | polars unique # => ╭───┬───────╮ # => │ # │ first │ # => ├───┼───────┤ # => │ 0 │ a │ # => │ 1 │ b │ # => │ 2 │ c │ # => ╰───┴───────╯ ``` Or we can get a mask that we can use to filter out the rows where data is unique or duplicated. For example, we can select the rows for unique values in column `word` ```nu $df_1 | polars filter-with ($in.word | polars is-unique) # => ╭───┬───────┬───────┬─────────┬─────────┬───────┬────────┬───────┬───────╮

This section covers sorting DataFrames using `arg-sort` and creating new Series by setting values at specified indices. It then delves into finding unique values within Series, demonstrating the use of `value-counts` to count occurrences of unique values. It shows how to extract just the unique values from a Series and how to filter rows based on unique or duplicated values in a specified column using `is-unique`.