AI for Editing Novels

I’ve tried out most of the AI LLM products at this point. For the most part I haven’t found a use for them beyond short term entertainment. Recent releases of Claude and Gemini have gotten good enough to perform some busy-work coding for me, but anything complicated needs too much review still. But today I stumbled on an actually valuable use-case for writers. [Read More]
Tags: AI Writing

Minimum Useful PC Uptime

With the still only partial return to offices, I expect lots of people are running into an unpleseant side of modern computing. Frequent software updates, logins and operating system patches will interrupt your work or at best, restart your computer between work sessions, potentially leaving you some wreckage with your morning coffee. [Read More]

Save Arrow Record Batches Fast to Parquet With Custom Metadata During Incremental Writes

Adding custom metadata is easy and documented when saving an entire table, but adding to batched output is different.

Saving custom metadata – “schema metadata” or “file metadata” – to Parquet could be really useful. You can put versions of an application’s data format, release notes or many other things right into the data files. The documentation is somewhat lacking on how to accomplish it with PyArrow – but you totally can. Last time I reviewed the docs for Polars and DuckDB they didn’t allow for adding your own metadata to Parquet output at all. [Read More]

Notes on simplifying complex Parquet data

Not all tools can read nested logical Map or List type data (often made by Spark.) Here are some tips to make the data more accessible by more tools.

The Parquet columnar data format typically has columns of simple types: int32, int64, string and a few others. However, columns can have logical types of “List”, “Map” as well, and their members may be more “List” or “Map” structures or primitive types. [Read More]