Today I’m going to focus on data-oriented design inspired optimizations. Previously, I also did a handful of API improvements made obvious from a flame graph profile, and then replaced all uses of the STL “unordered_map” type with a more efficient implementation of a flat hash map.
[Read More]
Notes for a Sequel/JRuby/SQLite Bug
A Bug in Sequel, JRuby + SQLite DATETIME type columns
Accessing SQLite3 DATETIME column data with the Sequel gem “jdbc” SQLite adapter produces different date types for the DATETIME columns than does the MRI Sequel adapter. So, you get ‘Time’ objects in the result sets when using standard Ruby, but ‘Date’ types when using JRuby. The ‘Date’ objects don’t have a time component; the ‘Time’ objects have both date and time.
[Read More]
Exploring the Crystal Language
Type Inference
As a long time Rubyist I’ve been intrigued by the Crystal language for a while. Crystal is a compiled statically typed language that uses Ruby syntax pretty much wherever it can. Now the Crystal language approaches a 1.0 release later this year and I wanted to try it out.
[Read More]
Developing Software is Developing Knowledge
Why Agile Methodologies Work
Software development is the art of transforming vague requirements into precise statements executible by a machine, resulting in a working software system. Almost by definition you can’t begin a project with perfect requirements; if you did so they would be executible or translatable on their own with no need for further development. The reason “Agile” development methodologies have persisted is that they are designed around this basic truth.
[Read More]
Optimization Part IV: Profile Guided Optimization
In a previous article on optimization we looked at how to read a flame graph and discover areas of a program that could benefit from optimization by re-writing the source code.
[Read More]
If You Liked Pascal
Glimmers of Pascal in Three Modern Programming Languages
Each of these relatively new languages takes after Wirth family languages like Pascal or Oberon, and particularly Turbo Pascal. In spirit or small details or tooling you’ll find something familiar. Here are my sort of random observations on Nim, Kotlin, Go
[Read More]
Optimization Part III: Better Hash Tables
After you have done the obvious algorithm commplexity analysis on your C++ application in code you’ve written, what’s next on the list for ways to optimize your application? How about looking for more efficient implementations of standard data structures in libraries?
[Read More]
Optimization Part II: Targeted Optimizations Assisted by Flame Graphing
Early last year IPUMS moved production of IPUMS-International micro-data to the latest version of the core DCP and a new data editing API. In doing so we discovered a number of places where the new API – while performing better than the old one on our USA and CPS test datasets – performed worse than expected on some of the IPUMSI datasets. Not a big deal except for a few datasets that took twenty or thirty times longer to process than we would expect.
[Read More]
Optimizing a Data-Intensive C++ Application, Part I
At IPUMS we continuously enhance our data products with newly available datasets, adding new variables and improvements to existing variables. We do this with the “Data Conversion Program”, a C++ application built to transform census and survey data into “harmonized” micro-data. When you visit ipums.org and make data extracts, you’re downloading data developed with the DCP.
[Read More]
Python 3 Language Notes
Notes on the Pythone 3 Language
[Read More]