Notes on Designing and Implementing a Small Language

For years I’ve wanted to create my own programming language. Recently I took the time to do so, and a few weeks ago the project reached a milestone: The compiler builds a non-trivial program – and it’s fast! Before that I’d built a simple interpreter for the same language. This is a collection of my thoughts on planning a personal programming language project for others who are just starting out. [Read More]

Tech Support 1: Findings from 8529 Unwanted Conversations

In the mid-1990s I worked phone tech support at a call center in the Twin Cities. Though I kind of hated it at the time, it proved to be a valuable experience. At first the stress was pretty high, then burnout set in. Eventually, like most, I moved past the burnout to calm acceptance, while always planning on eventual escape. [Read More]

NewTypes: Introduction and Using in Rust

Adding “newtypes” to your code can improve readability and type-safety. A “newtype” helps to communicate your intent both at a superficial and deep level in your code. Newtypes are user-defined types that are derived from common types in a language but considered a distinct type by the language interpreter / compiler and type-checker. What this precisely means will depend on the language. However, it’s important to understand that newtypes aren’t simply type name aliases like “type = “ in Rust or “typedef” in C and C++. Instead, the newtype convinces the type checker to disallow assigning a newtype derived from an integer to a plain integer variable, for instance. The term “newtype” comes from Haskell. In most other languages “newtype” is a programming idiom or pattern, not a specific language feature. [Read More]

Read Multi-File Parquet Data with Rust

How to iterate and yield Records over multiple files as a single dataset with a schema projection

Figuring out how to elegantly consume multi-file Parquet data may seem challenging unless you dig into the test cases and the source code for the Rust Parquet crate. The one example given in documentation is misleadingly simple. I’ll show a couple examples of how it’s done. Jump to “The Solution” at the end to skip the journey it took to find it. [Read More]
Tags: Parquet Rust

Residential Real Estate Listing Price and Sales Price Disconnect

Asking Price Clusters Around Home Features; Sales Price doesn't

Save your data, even if your project is half-baked and you think your done with it. Years ago in 2014 I got access to an interesting dataset, and used it to come up with some potentially useful conclusions about valuing homes for sale. For whatever reason, today I remembered the work and went to dig up the data and project files to refine the queries and create a model… only to discover the data is permanently gone. [Read More]

Rust Ownership

Comparisons to Other Languages

While learning Rust you encounter the “borrow checker” and the concept of ownership. The borrow-checker automatically does checks you’d probably be doing in your head when programming in other non-garbage collected languages anyway. If you don’t think about ownership when coding in C++, C or Pascal, for instance, you may end up debugging strange behavior or segmentation faults when you run your programs. [Read More]

Optimization Part V: Applying Data Oriented Design Principles

Optimizing Data Flow on a Data Intensive C++ Application

Today I’m going to focus on data-oriented design inspired optimizations. Previously, I also did a handful of API improvements made obvious from a flame graph profile, and then replaced all uses of the STL “unordered_map” type with a more efficient implementation of a flat hash map. [Read More]

Notes for a Sequel/JRuby/SQLite Bug

A Bug in Sequel, JRuby + SQLite DATETIME type columns

Accessing SQLite3 DATETIME column data with the Sequel gem “jdbc” SQLite adapter produces different date types for the DATETIME columns than does the MRI Sequel adapter. So, you get ‘Time’ objects in the result sets when using standard Ruby, but ‘Date’ types when using JRuby. The ‘Date’ objects don’t have a time component; the ‘Time’ objects have both date and time. [Read More]