Nushell 0.34 Release: Dataframes and Usability Improvements

--- title: Nushell 0.34 author: The Nu Authors author_site: https://twitter.com/nu_shell author_image: https://www.nushell.sh/blog/images/nu_logo.png excerpt: Today, we're releasing 0.34 of Nu. This release is the first to support dataframes and also includes a set of usability improvements. --- # Nushell 0.34 Nushell, or Nu for short, is a new shell that takes a modern, structured approach to your commandline. It works seamlessly with the data from your filesystem, operating system, and a growing number of file formats to make it easy to build powerful commandline pipelines. Today, we're releasing 0.34 of Nu. This release is the first to support dataframes and also includes a set of usability improvements.  # Where to get it Nu 0.34 is available as [pre-built binaries](https://github.com/nushell/nushell/releases/tag/0.34.0) or from [crates.io](https://crates.io/crates/nu). If you have Rust installed you can install it using `cargo install nu`. If you want all the goodies, you can install `cargo install nu --features=extra`. If you'd like to try the experimental paging feature in this release, you can install with `cargo install nu --features=table-pager`. As part of this release, we also publish a set of plugins you can install and use with Nu. To install, use `cargo install nu_plugin_<plugin name>`. # What's New ## Dataframes (elferherrera) With 0.34, we've introduced a new family of commands to work with dataframes. Dataframes are an efficient way of working with large datasets by storing data as columns and offering a set of operations over them. To create a dataframe, you can use the `dataframe open` command and pass it a source file to load. This command currently supports CSV and parquet files. ``` > let df = (dataframe open .\Data7602DescendingYearOrder.csv) ``` Once loaded, there are a variety of commands you can use to interact with the dataframe (you can get the full list with `dataframe --help`). For example, to see the first few rows of the dataframe we just loaded, we can use `dataframe first`: ``` > $df | dataframe first ───┬──────────┬─────────┬──────┬───────────┬────────── # │ anzsic06 │ Area │ year │ geo_count │ ec_count ───┼──────────┼─────────┼──────┼───────────┼────────── 0 │ A │ A100100 │ 2000 │ 96 │ 130 1 │ A │ A100200 │ 2000 │ 198 │ 110 2 │ A │ A100300 │ 2000 │ 42 │ 25 3 │ A │ A100400 │ 2000 │ 66 │ 40 4 │ A │ A100500 │ 2000 │ 63 │ 40 ───┴──────────┴─────────┴──────┴───────────┴────────── ``` Where dataframes really shine is their performance. For example, the above dataset is 5 columns and ~5.5 million rows of data. We're able to process group it by the year column, sum the results, and display it to the user in 557ms: ``` # process.nu let df = (dataframe open Data7602DescendingYearOrder.csv) let res = ($df | dataframe group-by year | dataframe aggregate sum | dataframe select geo_count) $res ``` ``` > benchmark {source process.nu} ───┬─────────────────── # │ real time ───┼─────────────────── 0 │ 557ms 658us 500ns ───┴─────────────────── ``` By comparison, here's the same example in pandas: ``` import pandas as pd df = pd.read_csv("Data7602DescendingYearOrder.csv") res = df.groupby("year")["geo_count"].sum() print(res) ``` ``` > benchmark {python .\load.py} ───┬──────────────────────── # │ real time ───┼──────────────────────── 0 │ 1sec 966ms 954us 800ns ───┴──────────────────────── ``` > System Details: The benchmarks presented in this section were run using a machine with a processor Intel(R) Core(TM) i7-10710U (CPU @1.10GHz 1.61 GHz) and 16 gb of RAM. While these results are still early, we're excited to see what can be possible using Nushell for processing large datasets. You can learn more about dataframes, including many examples and a much more in-depth explanation, by reading the new [dataframes chapter of the Nushell book](https://www.nushell.sh/book/dataframes.html).

Nushell 0.34 is released, introducing support for dataframes and various usability improvements. Dataframes enable efficient handling of large datasets with operations like `dataframe open`, `dataframe first`, `dataframe group-by`, and `dataframe aggregate`. Benchmarks show Nushell's dataframe operations can be faster than Pandas for similar tasks. The release also includes installable plugins and a new chapter in the Nushell book dedicated to dataframes.