Filtering by Unique/Duplicate Values and Introduction to Lazy DataFrames in Polars

example, we can use it to count how many occurrences we have in the column `first` ```nu $df_1 | polars get first | polars value-counts # => ╭───┬───────┬───────╮ # => │ # │ first │ count │ # => ├───┼───────┼───────┤ # => │ 0 │ a │ 3 │ # => │ 1 │ b │ 4 │ # => │ 2 │ c │ 3 │ # => ╰───┴───────┴───────╯ ``` As expected, the command returns a new dataframe that can be used to do more queries. Continuing with our exploration of `Series`, the next thing that we can do is to only get the unique unique values from a series, like this ```nu $df_1 | polars get first | polars unique # => ╭───┬───────╮ # => │ # │ first │ # => ├───┼───────┤ # => │ 0 │ a │ # => │ 1 │ b │ # => │ 2 │ c │ # => ╰───┴───────╯ ``` Or we can get a mask that we can use to filter out the rows where data is unique or duplicated. For example, we can select the rows for unique values in column `word` ```nu $df_1 | polars filter-with ($in.word | polars is-unique) # => ╭───┬───────┬───────┬─────────┬─────────┬───────┬────────┬───────┬───────╮ # => │ # │ int_1 │ int_2 │ float_1 │ float_2 │ first │ second │ third │ word │ # => ├───┼───────┼───────┼─────────┼─────────┼───────┼────────┼───────┼───────┤ # => │ 0 │ 1 │ 11 │ 0.10 │ 1.00 │ a │ b │ c │ first │ # => │ 1 │ 8 │ 18 │ 0.80 │ 7.00 │ c │ c │ b │ eight │ # => ╰───┴───────┴───────┴─────────┴─────────┴───────┴────────┴───────┴───────╯ ``` Or all the duplicated ones ```nu $df_1 | polars filter-with ($in.word | polars is-duplicated) # => ╭───┬───────┬───────┬─────────┬─────────┬───────┬────────┬───────┬────────╮ # => │ # │ int_1 │ int_2 │ float_1 │ float_2 │ first │ second │ third │ word │ # => ├───┼───────┼───────┼─────────┼─────────┼───────┼────────┼───────┼────────┤ # => │ 0 │ 2 │ 12 │ 0.20 │ 1.00 │ a │ b │ c │ second │ # => │ 1 │ 3 │ 13 │ 0.30 │ 2.00 │ a │ b │ c │ third │ # => │ 2 │ 4 │ 14 │ 0.40 │ 3.00 │ b │ a │ c │ second │ # => │ 3 │ 0 │ 15 │ 0.50 │ 4.00 │ b │ a │ a │ third │ # => │ 4 │ 6 │ 16 │ 0.60 │ 5.00 │ b │ a │ a │ second │ # => │ 5 │ 7 │ 17 │ 0.70 │ 6.00 │ b │ c │ a │ third │ # => │ 6 │ 9 │ 19 │ 0.90 │ 8.00 │ c │ c │ b │ ninth │ # => │ 7 │ 0 │ 10 │ 0.00 │ 9.00 │ c │ c │ b │ ninth │ # => ╰───┴───────┴───────┴─────────┴─────────┴───────┴────────┴───────┴────────╯ ``` ## Lazy Dataframes Lazy dataframes are a way to query data by creating a logical plan. The advantage of this approach is that the plan never gets evaluated until you need to extract data. This way you could chain together aggregations, joins and selections and collect the data once you are happy with the selected operations. Let's create a small example of a lazy dataframe ```nu let lf_0 = [[a b]; [1 a] [2 b] [3 c] [4 d]] | polars into-lazy $lf_0 # => ╭────────────────┬───────────────────────────────────────────────────────╮ # => │ plan │ DF ["a", "b"]; PROJECT */2 COLUMNS; SELECTION: "None" │ # => │ optimized_plan │ DF ["a", "b"]; PROJECT */2 COLUMNS; SELECTION: "None" │ # => ╰────────────────┴───────────────────────────────────────────────────────╯ ``` As you can see, the resulting dataframe is not yet evaluated, it stays as a set of instructions that can be done on the data. If you were to collect that dataframe you would get the next result ```nu $lf_0 | polars collect # => ╭───┬───┬───╮ # => │ # │ a │ b │ # => ├───┼───┼───┤ # => │ 0 │ 1 │ a │ # => │ 1 │ 2 │ b │ # => │ 2 │ 3 │ c │ # => │ 3 │ 4 │ d │ # => ╰───┴───┴───╯ ``` as you can see, the collect command executes the plan and creates a nushell table for you. All dataframes operations should work with eager or lazy dataframes. They are converted in the background for compatibility. However, to take advantage of lazy operations if is recommended to only use lazy operations with lazy

This section continues exploring `Series` operations in Polars, focusing on identifying and filtering data based on uniqueness and duplication. It demonstrates how to use `is-unique` and `is-duplicated` to create masks for filtering rows containing unique or duplicate values in a specific column. The section then introduces lazy DataFrames, which create a logical query plan that is not evaluated until data extraction is needed. This allows chaining operations for efficiency. The section shows how to create a lazy DataFrame using `into-lazy` and how to execute the plan and create a table using `collect`.