Our Spark cluster setup
Running our own Spark cluster has proved to be a valuable experience for us.
BRIN indexes in Postgres 9.5
We recently had a chance to explore one of Postgres 9.5’s new features: block range indexes.
How we hire: A roundtable discussion
Hiring is hard. We assembled our engineering hiring committee — senior developers Mark, Colin, Graeme, and Steven — for an open discussion on what goes on behind the scenes when you apply for a position at Sortable.
Spark Performance: Is the DirectParquetOutputCommitter really better?
Why can the DirectParquetOutputCommitter be more efficient? How much more efficient? Are there any downsides?
Compromising with legacy code: an anecdote
Last year I was making a change to some of our legacy code and learned a bit about sacrificing technical correctness for other, less technical, goals in the process.
Building a sampling profiler with 30-year-old technology
What if you need to diagnose this performance issue in a non-invasive manner? I recently found myself in this situation. Here’s my solution: a sampling profiler.
Improving Spark performance with repartitioning
One of the tricks that we’ve found to improve performance of Spark jobs is to change the partitioning of our data. We’ll illustrate how this can help with a simple example.