In a previous article on optimization we looked at how to read a flame graph and discover areas of a program that could benefit from optimization by re-writing the source code.
[Read More]
If You Liked Pascal
Glimmers of Pascal in Three Modern Programming Languages
Each of these relatively new languages takes after Wirth family languages like Pascal or Oberon, and particularly Turbo Pascal. In spirit or small details or tooling you’ll find something familiar. Here are my sort of random observations on Nim, Kotlin, Go
[Read More]
Optimization Part III: Better Hash Tables
After you have done the obvious algorithm commplexity analysis on your C++ application in code you’ve written, what’s next on the list for ways to optimize your application? How about looking for more efficient implementations of standard data structures in libraries?
[Read More]
Optimization Part II: Targeted Optimizations Assisted by Flame Graphing
Early last year IPUMS moved production of IPUMS-International micro-data to the latest version of the core DCP and a new data editing API. In doing so we discovered a number of places where the new API – while performing better than the old one on our USA and CPS test datasets – performed worse than expected on some of the IPUMSI datasets. Not a big deal except for a few datasets that took twenty or thirty times longer to process than we would expect.
[Read More]
Optimizing a Data-Intensive C++ Application, Part I
At IPUMS we continuously enhance our data products with newly available datasets, adding new variables and improvements to existing variables. We do this with the “Data Conversion Program”, a C++ application built to transform census and survey data into “harmonized” micro-data. When you visit ipums.org and make data extracts, you’re downloading data developed with the DCP.
[Read More]
Python 3 Language Notes
Notes on the Pythone 3 Language
[Read More]
Reparations
Introduction
[Read More]
Save the USPS
The U.S. Postal Service is required to fund itself by charging for services like a private business. Since the beginning of the COVID-19 outbreak mail volume has dropped by more than half, severely undercutting its budget.
[Read More]
The Parquet Data Format Landscape
As you begin to handle Parquet data with tools in more than one framework and language you’ll probably wonder how all these related pieces fit together. Here is a summary of data formats, libraries and frameworks you will encounter when working with Parquet data and Spark.
[Read More]
SF Worth Reading
The book recommendations list has moved to sfworthreading.com. It’s a static site built with Jekyll and a Ruby “updater” script I wrote to do some busy work for me that Jekyll won’t, like building a custom authors index.
[Read More]