Vitesse Data: Postgres + LLVM

vtuulos · on Oct 17, 2014

We have been running a similar setup (Postgres -> Foreign Data Wrappers -> LLVM) at AdRoll for over a year. We keep 100TBs+ of raw data in memory, compressed.

We managed to build our solution mostly in Python(!) using Numba for JIT and a number of compression tricks. More about it here:

http://tuulos.github.io/pydata-2014/#/

https://www.youtube.com/watch?v=xnfnv6WT1Ng

BenoitP · on Oct 17, 2014

I saw an instance of your talk at another venue. At the time I was learning about bitmap indexes (because we needed OLAP capabilities on a another database engine that is not postgres). Your talk reminded me furiously of them, but in the end I was not able to further investigate the difference.

How does your technology differ from bitmap indexes? Have you solved the performance problem of updating random rows, for example?

cbsmith · on Oct 18, 2014

Combining OLAP without OLTP (large aggregate queries + lots of real time updates) is the holy grail that Cassandra, for example, has addressed rather nicely.

Compressed bitmap indexes are awesome. Like most indexes, the updating random rows problem is best addressed using a log structured merge tree and amortizing your index updates. Just have an in-memory buffer of recently updated rows.

If you are doing mostly sums & counts type work and can deal with some level of inaccuracy, you can consider HyperLogLog...

vtuulos · on Oct 17, 2014

Our DeliRoll is used for analytics. It doesn't support random updates, it is append-only.

We use bitmap indexes in a number of places internally.

bglazer · on Oct 17, 2014

Very interesting slides and talk.

You mentioned that there are some regularities in different data sets that can be used to increase the efficiency of their encoding. Does this mean that you need to write a different foreign data wrapper for each data set?

vtuulos · on Oct 17, 2014

Thanks! Encoding takes care of regularities - it is all transparent to the user and DBA.

yangyang · on Oct 17, 2014

This is really interesting.

Did you do anything about join pushdown (which isn't supported in core PostgreSQL yet)? Apologies if this is in your talk - I looked at the slides and couldn't see anything.

vtuulos · on Oct 17, 2014

Yeah, lack of aggregation function and join pushdown is annoying.

Our data model makes sure that we don't have to do huge joins on the fly, which would be a bad idea anyways. We have a workaround to distribute medium-scale joins that occur frequently. Small joins are handled fine by Postgres as usual.

I would be curious to learn more about Vitesse. I couldn't find many technical details besides the mailing list post.

bredman · on Oct 17, 2014

It's not as easy as it could be but you can accomplish aggregate pushdown by using executor hooks in your FDW. As an example I'd point you to our (Citus Data) recent work on doing this with our columnar store foreign data wrapper:

https://github.com/citusdata/postgres_vectorization_test

As discussed there it definitely showed promise in improving performance for us.

yangyang · on Oct 17, 2014

Thanks.

I'm not involved with it, just a Postgres user and thought it was of interest.

hendzen · on Oct 17, 2014

And what is old, is new again. For reference, the System R database used this technique of compiling SQL queries to machine language in the 70's:

"The approach of compiling SQL statements into machine code was one of the most successful parts of the System R project. We were able to generate a machine-language subroutine to execute any SQL statement of arbitrary complexity..."

See: http://www.cs.berkeley.edu/~brewer/cs262/SystemR-comments.pd...

florianfunke · on Oct 17, 2014

If you're interested in query compilation, check out the HyPer DBMS prototype at [1]. It efficiently compiles SQL (SQL92++) to super-fast LLVM code (faster than Vitesse). It's not open source, but you can download a binary [4] or try it out online using the "WebInterface" link and read about the query compiler here [2]. It also has lots of other goodies, like SIMD-powered CSV parsing (about 4x faster than Postgres) [3]. Rumor has it that the prototype will be commercialized soon ;)

EDIT: It also has NUMA-aware multi-threaded query execution.

[1] http://www.hyper-db.com

[2] http://www.vldb.org/pvldb/vol4/p539-neumann.pdf

[3] http://www.vldb.org/pvldb/vol6/p1702-muehlbauer.pdf

[4] http://databasearchitects.blogspot.de/2014/05/trying-out-hyp...

sbrakerm · on Oct 17, 2014

Looks like the Vitesse product was inspired by this research. Even though the HyPer looks a lot more mature (though it's still a research project). Looking forward to see the product.

willvarfar · on Oct 17, 2014

Interesting and upvoted even if just a sales pitch.

But a keeping-the-sauce-sectret aproach won't win over techies! We want to know how it works, why it works, what it can do and what it can't do!

Looking forward to the white papers and talks :)

PS name is very close the Google's Vitess?

justinsb · on Oct 17, 2014

To be fair, they've made it very clear how it works:

1) They have optimized the (existing?) CSV import code to use SSE instructions, for faster CSV import

2) For a sufficiently complicated query, they will compile it to native code using LLVM. They presumably precompile most of Postgres (or at least the execution parts) to LLVM IR, and then convert enough of the execution plan into code so that the LLVM optimizer can optimize it (inlining, branch prediction, dead code elimination etc). If they are able to persuade LLVM to optimize the per-row decoding, I think that could be a huge win.

It's a great accomplishment; I do wish they had contributed it to Postgres, but I can't blame them for not doing so (they need to eat!).

ris · on Oct 19, 2014

"I do wish they had contributed it to Postgres, but I can't blame them for not doing so (they need to eat!)."

Stop this. The two are not mutually exclusive. Postgres developers are not starving.

Keats · on Oct 17, 2014

Vitesse means speed in french, so it's not a made up word

Zikes · on Oct 17, 2014

I also thought this was somehow related to Google's Vitess. I was rather excited at the prospect of Postgres compatibility, which I know they've been working on, and which we use in our business.

mfenniak · on Oct 17, 2014

There are a few posts on the pgsql-hackers mailing list that have some more detailed information, for anyone interested in reading more about the approach taken: http://www.postgresql.org/message-id/5CFE0CA1-E5CC-4CD1-9D0B...

I think this is a fascinating approach to improving query speed. It certainly won't be applicable to every use-case, but it seems like there's a lot of value in it.

Thaxll · on Oct 17, 2014

From my experience I/O is more important than CPU for DB, I've never seen the 24 cores of our servers having problems, however the raid 10...

justinsb · on Oct 17, 2014

It really depends on your workload. If your working-set fits into memory and is read-mostly, your I/O is basically irrelevant. Even if you're not in-memory, if you are doing mostly sequential reads (like the OLAP queries they are benchmarking), each drive can do ~200MB/second, so it is not too difficult to become CPU bound if you are doing any significant processing on each row (particularly if your row size is small!)

For OLTP queries, particularly with working-set > RAM or lots of writes, your disk I/O is probably the bottleneck (probably your IOPS, actually, which is why SSD can be so valuable). Pretty sure they're not targeting that use-case though!

swasheck · on Oct 17, 2014

in my experience (which isn't the sum-total of all human experience), our OLAP queries have a bottleneck at the fiber interconnect between the storage controller and the OS. cpu still isn't the problem for our OLAP systems.

your point that great strides have been made in storage such that we could be back at cpu as a bottleneck is well taken though.

jcampbell1 · on Oct 17, 2014

I assume you are using cheap spinning disks, and not something like a fusion-io drive. How big is your dataset?

swasheck · on Oct 17, 2014

that's a narrow assumption. why not get more information from the person instead of simply attacking a straw man.

jcampbell1 · on Oct 17, 2014

It is a reasonable assumption given he said "raid 10". By "cheap" I mean all spinning disks. Enterprise SSDs are an order of magnitude more expensive than the best spinning platters. ~$25k for 3TB.

swasheck · on Oct 17, 2014

not sure why i was downvoted, but oh well.

really? you don't RAID your SSDs?

nl · on Oct 18, 2014

Raid 10 basically just gives performance over Raid 1. You might Raid 1 SSDs but Raid 10 is likely to be a very expensive way to hit your IO bus limits.

Given that, assuming it is spinning disk is a reasonable assumption, and the accusation of a strawman was unnecessarily rude.

swasheck · on Oct 28, 2014

I don't find it to be unnecessarily rude. It was a legitimate critique of the poster's own assumptions. I guess that because I'm not an HN "insider" I'm not allowed to challenge someone's assumptions. Only once you hit the magical karma level are you able to do such things.

twic · on Oct 18, 2014

There was a vaguely similar thing a while ago called PGStrom, which compiled qualifiers into CUDA code which ran on the GPU:

https://wiki.postgresql.org/wiki/PGStrom

Someone got similar performance just using multithreaded native code on the CPU:

http://blog.notapaper.de/article5.html

uberneo · on Oct 18, 2014

Shard-Query does the same work of utilising all the available CPUs but its for MySQL - https://github.com/greenlion/swanhart-tools/tree/master/shar...

yawniek · on Oct 17, 2014

isn't usually the transfer of data between memory and cpu the bottle neck and not the cpu itself?

e.g. http://www.pytables.org/docs/CISE-12-2-ScientificPro.pdf

spacemanmatt · on Oct 17, 2014

The most common bottleneck in my work experience has been disk I/O when the data set would not fit into memory. And every professional data set I've worked with exceeds 1TB. Perhaps there are other bottlenecks, but disk seeks and (for one place) their iSCSI over gbit ethernet nonsense ruled the performance challenges.

analogAndroid · on Oct 17, 2014

Usually yes, but that is why you take advantage of specialized CPU instructions for bulk loading and operating on data. From what it says that is part of the optimization that these folks are taking advantage of (see comment mentioning SSE instructions).

peatmoss · on Oct 17, 2014

I wonder how well this approach would extend to Postgres add ons like PostGIS and PGRouting?

cordite · on Oct 17, 2014

As someone that has used PostGIS on every city in the US to map to every county in the US over the last two centuries for genealogical mapping purposes, it would probably make it nicer. (Some of the results are already in the library of congress)

bfrog · on Oct 17, 2014

I would love to see how its done one day, though I get the need to make back the investment in time!

jeffdavis · on Oct 17, 2014

Menu button doesn't seem to work in Firefox on android.