Read and write Parquet files in Ruby with the ‘red-parquet’ gem; store data in memory with Arrow using the ‘red-arrow’ gem.
To install, you use the standard approach with “gem install red-arrow” and “gem install red-parquet”. However, you’ll notice the installation fails because part of the native extension build depends on running ‘apt-get install …” on packages that your Ubuntu “apt” package manager doesn’t know about.
Directions for Ubuntu
There are also packages for Debian and Redhat. I have Ubuntu so this is what I could test out and get to work for myself.
You need to add the “red-data-tools” packages at “packages.red-data-tools.org/” so that “apt” can download them. Instructions for this are on the Apache Arrow home page Apache Arrow but don’t include the Parquet packages.
The directions on the Red Arrow home page are slightly incorrect; you can simply follow the directions on the Arrow homepage and you can then “apt install” the libparquet-dev and libparquet-glib packages. In the future (soon hopefully) these will be official Ubuntu and Redhat packages.
sudo apt install -y -V apt-transport-https sudo apt install -y -V lsb-release cat <<APT_LINE | sudo tee /etc/apt/sources.list.d/red-data-tools.list deb https://packages.red-data-tools.org/ubuntu/ $(lsb_release --codename --short) universe deb-src https://packages.red-data-tools.org/ubuntu/ $(lsb_release --codename --short) universe APT_LINE sudo apt update --allow-insecure-repositories || sudo apt update sudo apt install -y -V --allow-unauthenticated red-data-tools-keyring sudo apt update sudo apt install -y -V libarrow-dev # For C++ sudo apt install -y -V libarrow-glib-dev # For GLib (C) sudo apt install -y -V libparquet-dev # For C++ sudo apt install -y -V libparquet-glib-dev # For GLib (C)
Now “gem install red-parquet red-arrow” should complete without error.
ccd@ascella:/mnt/c/Users/ccd$ sudo gem install red-parquet red-arrow [sudo] password for ccd: Building native extensions. This could take a while... Successfully installed red-parquet-0.0.2 Parsing documentation for red-parquet-0.0.2 Done installing documentation for red-parquet after 0 seconds Building native extensions. This could take a while... Successfully installed red-arrow-0.10.0 Parsing documentation for red-arrow-0.10.0 Done installing documentation for red-arrow after 0 seconds 2 gems installed
You can now access Parquet files in Ruby:
require 'arrow' data_table = Arrow::Table.load("/data/data.parquet")