We have been running a similar setup (Postgres -> Foreign Data Wrappers -> LLVM) at AdRoll for over a year. We keep 100TBs+ of raw data in memory, compressed.
We managed to build our solution mostly in Python(!) using Numba for JIT and a number of compression tricks. More about it here:
I saw an instance of your talk at another venue. At the time I was learning about bitmap indexes (because we needed OLAP capabilities on a another database engine that is not postgres). Your talk reminded me furiously of them, but in the end I was not able to further investigate the difference.
How does your technology differ from bitmap indexes? Have you solved the performance problem of updating random rows, for example?
Combining OLAP without OLTP (large aggregate queries + lots of real time updates) is the holy grail that Cassandra, for example, has addressed rather nicely.
Compressed bitmap indexes are awesome. Like most indexes, the updating random rows problem is best addressed using a log structured merge tree and amortizing your index updates. Just have an in-memory buffer of recently updated rows.
If you are doing mostly sums & counts type work and can deal with some level of inaccuracy, you can consider HyperLogLog...
You mentioned that there are some regularities in different data sets that can be used to increase the efficiency of their encoding. Does this mean that you need to write a different foreign data wrapper for each data set?
Did you do anything about join pushdown (which isn't supported in core PostgreSQL yet)? Apologies if this is in your talk - I looked at the slides and couldn't see anything.
Yeah, lack of aggregation function and join pushdown is annoying.
Our data model makes sure that we don't have to do huge joins on the fly, which would be a bad idea anyways. We have a workaround to distribute medium-scale joins that occur frequently. Small joins are handled fine by Postgres as usual.
I would be curious to learn more about Vitesse. I couldn't find many technical details besides the mailing list post.
It's not as easy as it could be but you can accomplish aggregate pushdown by using executor hooks in your FDW. As an example I'd point you to our (Citus Data) recent work on doing this with our columnar store foreign data wrapper:
And what is old, is new again. For reference, the System R database used this technique of compiling SQL queries to machine language in the 70's:
"The approach of compiling SQL statements into machine code was one of the most successful parts of the System R project. We were able to generate a machine-language subroutine to execute any SQL statement of arbitrary complexity..."
If you're interested in query compilation, check out the HyPer DBMS prototype at [1]. It efficiently compiles SQL (SQL92++) to super-fast LLVM code (faster than Vitesse). It's not open source, but you can download a binary [4] or try it out online using the "WebInterface" link and read about the query compiler here [2]. It also has lots of other goodies, like SIMD-powered CSV parsing (about 4x faster than Postgres) [3].
Rumor has it that the prototype will be commercialized soon ;)
EDIT: It also has NUMA-aware multi-threaded query execution.
Looks like the Vitesse product was inspired by this research. Even though the HyPer looks a lot more mature (though it's still a research project). Looking forward to see the product.
To be fair, they've made it very clear how it works:
1) They have optimized the (existing?) CSV import code to use SSE instructions, for faster CSV import
2) For a sufficiently complicated query, they will compile it to native code using LLVM. They presumably precompile most of Postgres (or at least the execution parts) to LLVM IR, and then convert enough of the execution plan into code so that the LLVM optimizer can optimize it (inlining, branch prediction, dead code elimination etc). If they are able to persuade LLVM to optimize the per-row decoding, I think that could be a huge win.
It's a great accomplishment; I do wish they had contributed it to Postgres, but I can't blame them for not doing so (they need to eat!).
I also thought this was somehow related to Google's Vitess. I was rather excited at the prospect of Postgres compatibility, which I know they've been working on, and which we use in our business.
I think this is a fascinating approach to improving query speed. It certainly won't be applicable to every use-case, but it seems like there's a lot of value in it.
It really depends on your workload. If your working-set fits into memory and is read-mostly, your I/O is basically irrelevant. Even if you're not in-memory, if you are doing mostly sequential reads (like the OLAP queries they are benchmarking), each drive can do ~200MB/second, so it is not too difficult to become CPU bound if you are doing any significant processing on each row (particularly if your row size is small!)
For OLTP queries, particularly with working-set > RAM or lots of writes, your disk I/O is probably the bottleneck (probably your IOPS, actually, which is why SSD can be so valuable). Pretty sure they're not targeting that use-case though!
in my experience (which isn't the sum-total of all human experience), our OLAP queries have a bottleneck at the fiber interconnect between the storage controller and the OS. cpu still isn't the problem for our OLAP systems.
your point that great strides have been made in storage such that we could be back at cpu as a bottleneck is well taken though.
It is a reasonable assumption given he said "raid 10". By "cheap" I mean all spinning disks. Enterprise SSDs are an order of magnitude more expensive than the best spinning platters. ~$25k for 3TB.
Raid 10 basically just gives performance over Raid 1. You might Raid 1 SSDs but Raid 10 is likely to be a very expensive way to hit your IO bus limits.
Given that, assuming it is spinning disk is a reasonable assumption, and the accusation of a strawman was unnecessarily rude.
I don't find it to be unnecessarily rude. It was a legitimate critique of the poster's own assumptions. I guess that because I'm not an HN "insider" I'm not allowed to challenge someone's assumptions. Only once you hit the magical karma level are you able to do such things.
The most common bottleneck in my work experience has been disk I/O when the data set would not fit into memory. And every professional data set I've worked with exceeds 1TB. Perhaps there are other bottlenecks, but disk seeks and (for one place) their iSCSI over gbit ethernet nonsense ruled the performance challenges.
Usually yes, but that is why you take advantage of specialized CPU instructions for bulk loading and operating on data. From what it says that is part of the optimization that these folks are taking advantage of (see comment mentioning SSE instructions).
As someone that has used PostGIS on every city in the US to map to every county in the US over the last two centuries for genealogical mapping purposes, it would probably make it nicer. (Some of the results are already in the library of congress)
We managed to build our solution mostly in Python(!) using Numba for JIT and a number of compression tricks. More about it here:
http://tuulos.github.io/pydata-2014/#/
https://www.youtube.com/watch?v=xnfnv6WT1Ng