Read and write Parquet files in Ruby with the ‘red-parquet’ gem; store data in memory with Arrow using the ‘red-arrow’ gem.

To install, you use the standard approach with “gem install red-arrow” and “gem install red-parquet”. However, you’ll notice the installation fails because part of the native extension build depends on running ‘apt-get install …” on packages that your Ubuntu “apt” package manager doesn’t know about.

Directions for Ubuntu

There are also packages for Debian and Redhat. I have Ubuntu so this is what I could test out and get to work for myself.

You need to add the “red-data-tools” packages at “packages.red-data-tools.org/” so that “apt” can download them. Instructions for this are on the Apache Arrow home page Apache Arrow but don’t include the Parquet packages.

The directions on the Red Arrow home page are slightly incorrect; you can simply follow the directions on the Arrow homepage and you can then “apt install” the libparquet-dev and libparquet-glib packages. In the future (soon hopefully) these will be official Ubuntu and Redhat packages.

sudo apt install -y -V apt-transport-https
sudo apt install -y -V lsb-release
cat <<APT_LINE | sudo tee /etc/apt/sources.list.d/red-data-tools.list
deb https://packages.red-data-tools.org/ubuntu/ $(lsb_release --codename --short) universe
deb-src https://packages.red-data-tools.org/ubuntu/ $(lsb_release --codename --short) universe
APT_LINE
sudo apt update --allow-insecure-repositories || sudo apt update
sudo apt install -y -V --allow-unauthenticated red-data-tools-keyring
sudo apt update
sudo apt install -y -V libarrow-dev # For C++
sudo apt install -y -V libarrow-glib-dev # For GLib (C)
sudo apt install -y -V libparquet-dev # For C++
sudo apt install -y -V libparquet-glib-dev # For GLib (C)

Now “gem install red-parquet red-arrow” should complete without error.

ccd@ascella:/mnt/c/Users/ccd$ sudo gem install red-parquet red-arrow
[sudo] password for ccd:
Building native extensions.  This could take a while...
Successfully installed red-parquet-0.0.2
Parsing documentation for red-parquet-0.0.2
Done installing documentation for red-parquet after 0 seconds
Building native extensions.  This could take a while...
Successfully installed red-arrow-0.10.0
Parsing documentation for red-arrow-0.10.0
Done installing documentation for red-arrow after 0 seconds
2 gems installed

You can now access Parquet files in Ruby:


require 'arrow'

data_table = Arrow::Table.load("/data/data.parquet")