Add Key-Value Metadata to Parquet Files in C++

File-level arbitrary metadata on a parquet file could be extremely useful but adding it in C++ isn't well documented. Here's how to do it.

Although the Parquet format allows extra metadata and the C++ libraries provide a means to read and write extra metadata the capability isn’t well documented. I’ll show some example code to clarify how to read and write key-value Parquet metadata. This advice is specific to directly using the C++ libraries in the Arrow project. [Read More]

My not so deep thoughts on AI

Speculation on our AI future races straight to where we fear it ends up, but we should think about what comes first (but I can't help myself in the concluding thoughts.)

Will A.I. Become the New McKinsey? Yes. And it will begin when McKinsey and the other big consultants start to apply A.I. routinely. If firms like McKinsey are “capital’s willing executioners”, A.I. will at first merely sharpen their axes. [Read More]

So Many New Systems Programming Languages II

Twelve new systems languages, and one that dates to the Carter administration

Here’s a non-exhaustive rundown of newish systems languages. I’ll list some notable things about them related to safety and syntax as I discussed in the previous post. Well, here they are, in rough order of production readiness and popularity. Sorry if I put something lower than it deserved. [Read More]

Taking a Look at the Recent Batch of Systems Programming Languages

The new languages emphasize safety, convenient syntax, and high-performance. There are so many of them!

The last few years have brought us an explosion in the number of new systems languages under development. They’re mostly trying to find good balances between safety, performance and expressivity. In this post I’ll first outline briefly what’s meant by “safety” and a little on how today’s system languages try to achieve it. [Read More]

Resources for Building Programming Languages

These are the most useful links I've found that focus on using Rust and C++ to develop your compiler or interpreter

Here’s a collection of resources for learning to create programming languages. I’m using Rust and some C so that’s the focus of the resources listed at the end. First, I’ll talk about Crafting Interpreters, a book that’s applicable no matter what you plan to use to build your language. [Read More]

Notes on Designing and Implementing a Small Language

For years I’ve wanted to create my own programming language. Recently I took the time to do so, and a few weeks ago the project reached a milestone: The compiler builds a non-trivial program – and it’s fast! Before that I’d built a simple interpreter for the same language. This is a collection of my thoughts on planning a personal programming language project for others who are just starting out. [Read More]

Tech Support 1: Findings from 8529 Unwanted Conversations

In the mid-1990s I worked phone tech support at a call center in the Twin Cities. Though I kind of hated it at the time, it proved to be a valuable experience. At first the stress was pretty high, then burnout set in. Eventually, like most, I moved past the burnout to calm acceptance, while always planning on eventual escape. [Read More]

Read Multi-File Parquet Data with Rust

How to iterate and yield Records over multiple files as a single dataset with a schema projection

Figuring out how to elegantly consume multi-file Parquet data may seem challenging unless you dig into the test cases and the source code for the Rust Parquet crate. The one example given in documentation is misleadingly simple. I’ll show a couple examples of how it’s done. Jump to “The Solution” at the end to skip the journey it took to find it. [Read More]
Tags: Parquet Rust