At 3.7GHz that's roughly 3700 clock cycles per row. With something that low, I'd be really interested to see a breakdown of how many cycles were spent doing what.
This is really awesome, I'm going to give it a try in my latest project. Has there been any apps/sites with large amounts of i/o running this yet? Curious how reliable it is at this stage.
We've open sourced asyncpg just a couple of weeks ago, and right now it has a beta status. Not that we're aware of any bugs in it or plan to add a lot of new features. So I guess you should give it a try, carefully ;)
Python and nodejs applications are usually deployed in a multiprocess configuration (and Go apps without GOMAXPROCS). So in production, applications in all of those languages use all CPU cores.
For benchmarks, we didn't want to complicate things with multi-process setup, the idea was to compare the raw performance of all drivers.
They don't use the multiprocess module but most of them are deployed with something like uWSGI or Gunicorn, which spawns the worker processes and forwards requests to them and the responses back.
> asyncpg extensively uses PostgreSQL prepared statements. This is an important optimization feature, as it allows to avoid repeated parsing, analysis, and planning of queries. Additionally, asyncpg caches the data I/O pipeline for each prepared statement.
it would be great if psycopg2 did this, at least within the context of executemany(). You don't need async to use prepared statements. But even if they did this, it wouldn't be this amazing speed-everywhere kind of thing, just for when you do a bulk INSERT/UPDATE/DELETE. The prepared statement is linked to a cursor and a statement handle so it's not like you can have hundreds of these just queued up in a cache without a great deal of complexity.
The TL;DR; for my blog post you mention is that async itself usually has little to do with speed when talking to a database, because the database is typically on the same network as your application, you're using typically no more than a few dozen connections per process (postgresql uses a process per connection, so using thousands of connections isn't feasible anyway), so the overhead of async and especially Python 3's async IO slows things down considerably without offering any opportunity to regain that speed. The GIL doesn't block on IO so threads work just fine, and most applications are CRUD / transactional applications which means you can't spawn off several dozen queries within a transaction at the same time anyway, you need to wait for each one to complete serially regardless.
I'd be very wary of a new PG driver especially one that implements the wire protocol itself. pg8000 does this, and also does prepared statements, and while it doesn't have the speed approach here, it also has serious problems supporting all the features and datatypes that psycopg2 does. It's been around for years and reliably breaks on many Postgresql features (and Postgresql introduces new features like a firehose). Writing and maintaining a driver for Postgresql is not an easy task.
Prepared statements were used in all benchmarks where supported. Please don't dismiss the results so easily. The source of performance increases here is not the use of prepared statements, but efficient implementation of the binary type I/O and the protocol itself.
> I'd be very wary of a new PG driver especially one that implements the wire protocol itself.
> pg8000 does this, and also does prepared statements, and while it doesn't have the speed approach here, it also has serious problems supporting all the features and datatypes that psycopg2 does.
asyncpg supports 100% of pg_catalog types (we have a test for that.) Additionally, psycopg2 doesn't _really_ support all the features and datatypes and simply returns unparsed text in many cases. You have to write lots of typecasters to convince it otherwise.
> It's been around for years and reliably breaks on many Postgresql features (and Postgresql introduces new features like a firehose). Writing and maintaining a driver for Postgresql is not an easy task.
Please. Simply because something has been around is not the reason to stiffle new development and innovation.
I have no intention to stifle anything, I'm only raising the point that this seems like a really new driver and it's likely that it would take a while before it approaches the stability of psycopg2. That is, the stability of psycogp2 is not to be so lightly thrown in the trash.
Does your driver seamlessly handle reading and writing of three-dimensional arrays of JSONB structures, and arrays of custom enumerated types ?
edit:
> but efficient implementation of the binary type I/O and the protocol itself.
that's great too! you don't need async for that either.
I guess I am frustrated that you folks put all this work into a great driver and then locked it in the largely useless dungeon of asyncio, where all the libraries are totally useless to those of us who don't think node.js is the programming model for all cases.
> I have no intention to stifle anything, I'm only raising the point that this seems like a really new driver and it's likely that it would take a while before it approaches the stability of psycopg2. That is, the stability of psycogp2 is not to be so lightly thrown in the trash.
We in no way question the merits and the stability if psycopg2. We created asyncpg to solve the challenges we face in the development of EdgeDB, it's not just a toy project for fun.
> Does your driver seamlessly handle reading and writing of three-dimensional arrays of JSONB structures, and arrays of custom enumerated types ?
Yes it does.
> edit:
> > but efficient implementation of the binary type I/O and the protocol itself.
> that's great too! you don't need async for that either.
> I guess I am frustrated that you folks put all this work into a great driver and then locked it in the largely useless dungeon of asyncio, where all the libraries are totally useless to those of us who don't think node.js is the programming model for all cases.
I'm sorry to hear that. Nobody forces you to use asyncio or asyncpg. But please don't try to claim that something is a "useless dungeon" simply because you disagree.
The tone used is indeed not the proper one, but while I'm a big asyncio advocate, I think it would be beneficial if you would split your project in 2, separating the IO from the rest (like hyper does https://github.com/Lukasa/hyper).
The rational being:
- Python is not just async. It currently majoritarly sync. A lot of the community would benefit from your awesome project;
- You'd have contrib from the async and sync community;
- async may sometime not be worth it for db (http://techspot.zzzeek.org/2015/02/15/asynchronous-python-an...
- some people will prefer to use threads or gevent;
- you will see compat plugin created for current ORM;
That aside, I will use your project as is, since I'm deep in asyncio these days. Thank you.
> I think it would be beneficial if you would split your project in 2, separating the IO from the rest (like hyper does https://github.com/Lukasa/hyper).
I'd like to do that, but we don't have resources. If we had a couple capable devs volunteering on the GH to do this, I'd be glad to review patches and guide the development.
> I guess I am frustrated that you folks put all this work into a great driver and then locked it in the largely useless dungeon of asyncio, where all the libraries are totally useless to those of us who don't think node.js is the programming model for all cases
I see your point. Nothing is really "locked away". The protocol parser and data codecs in asyncpg are independent from the I/O model and loop semantics. With effort, it's possible to build a sync interface on top. It's just not a priority, as our cases are async.
and it's just as frustrating to those of us that do use asyncio (because it really is excellent) when someone needlessly ties it to not use async. it's harder to maintain something that does both
> The TL;DR; for my blog post you mention is that async itself usually has little to do with speed when talking to a database, because the database is typically on the same network as your application, you're using typically no more than a few dozen connections per process (postgresql uses a process per connection, so using thousands of connections isn't feasible anyway), so the overhead of async and especially Python 3's async IO slows things down considerably without offering any opportunity to regain that speed.
This simply isn't true for all use cases. If you have to manage 100-1000s of connections (some of each can be long-living sessions such as websockets) you have no other option but to use the async approach.
To minimize the number of connections to the DB you should use a decent caching layer (redis/memcached will do). Once you have that, and a use a connection pool, you'll be able to do a lot more with asyncio than with threads.
> The GIL doesn't block on IO so threads work just fine, and most applications are CRUD / transactional applications which means you can't spawn off several dozen queries within a transaction at the same time anyway, you need to wait for each one to complete serially regardless.
If you have 10 threads and relatively low server load - the threads will work just fine. However GIL will make things really slow under load.
Everytime I talk about my blog post, I am super careful to qualify: in this case, "usually has little to do w/ speed" and other qualifications. The blog post as well. But everytime, I get a response, "but that's not always TRUE!" We agree. It is not always true. But it is usually true :).
> If you have to manage 100-1000s of connections
Which I said, usually you are not. Postgresql connections are expensive. You do not want 100s-1000s.
> some of each can be long-living sessions such as websockets
you use websockets to talk to your Postgresql database? Where did I say I was talking about HTTP web services ?
> However GIL will make things really slow under load.
not IO load with a hundred threads or so. Thousands, sure. Otherwise no.
> Everytime I talk about my blog post, I am super careful to qualify: in this case, "usually has little to do w/ speed" and other qualifications. The blog post as well.
I created an asyncpg suite for your bigdata benchmark (https://bitbucket.org/zzzeek/bigdata). Here are the results obtained from running it on my laptop against local PostgreSQL 9.5.
that's great, you've written a very fast driver. My benchmarks are comparing psycopg2 to itself, under threads / gevent vs. asyncio, to show that asyncio is inherently slower. It follows that if you wrote your driver and all of its protocol improvements into a traditional blocking model, it would also be much faster. You've written something so fast that it overcomes the latency of asyncio (this whole thread is a huge plug for your driver, btw).
It's not that the latency of asyncio is really that big of a deal. The point I try to make is, asyncio != speed. For databases, 95% of the time, it means, a little less speed. Not a big deal, but if you aren't into writing explicit context switching directives throughout 100% of your code when the OS does it for you just fine, it isn't worth it, unless you are in the zone of thousands of arbitrarily performing connections, which is really more of an incoming client connection thing, not usually a database thing.
Really not that slower. There is also an important detail that you forget to mention: the threaded benchmark uses notably more CPU time (about 30% more).
If you write a program that has thousands of persistent websockets open, you do not want all those thousands of connections to each have their own, persistent, long lived database transaction. You will not like your process listing nor will you be happy about the amount of table and row locking going on. Transactions and connection scope should be short lived, and if you're using a connection pool, you definitely are going for that. As my blog post stated, you'd use a thread pool for the database access part, given that the http-facing side is async.
> Mike, I've never suggested to have a DB connection per HTTP request.
right, so my suggestion is, keep the DB code in a thread pool. So you can use the async for your 1000-client frontend and not force it into the DB layer where it makes things much more complicated (or what I do, just use gevent :) ).
A lot of code in asyncpg is datatypes parsers, buffer abstractions, API facade and other IO independent things. So good news is that it would be possible to reuse a lot of code.
Protocol is implemented in two layers -- the base layer is IO and framework independent. The higher level is designed for asyncio.
Long story short, it's possible to add a synchronous version, but it still will require a lot of work. The API will also have to be split in two.
It's possible to make a really nice async sqlalchemy core too, and even some of the ORM. huge amount of work, i started on just the connection pool. Would be nice but require all its own test coverage and everything. If i was the CEO of sqlalchemy inc, it would have been done. But that's not where my salary comes from :)
SQLAlchemy is clearly one of the best ORMs (maybe there is something in Java-land that I don't know). It is beyond me how you are not guilted by cash getting thrown in your direction into making this your job.
Well, I like what bountysource.com and others are doing, maybe we will get there.
I think you should open a bounty for it. As questions like "Can I use this with Django/SQLAlchemy/…?" showed, people love drop-in replacements that speed up their code, especially when it is at a low level like this.
Do not want to sound ungrateful, though. Thank you for your work also on you other contributions. Would definitely use this when applicable. Are more performance related projects like this and uvloop planned?
> I think you should open a bounty for it. As questions like "Can I use this with Django/SQLAlchemy/…?" showed, people love drop-in replacements that speed up their code, especially when it is at a low level like this.
We'll consider doing that. We have to keep in mind that supporting two IO modes will add a lot of maintenance overhead. Anyways, we can certainly continue the discussion on GH, and maybe we can find a champion to do the heavy-lifting.
> Do not want to sound ungrateful, though. Thank you for your work also on you other contributions. Would definitely use this when applicable. Are more performance related projects like this and uvloop planned?
Thank you.
Right now most of our resources are consumed by getting EdgeDB alpha out as soon as possible. We also have plans to create a micro web framework based on httptools and few other exciting ideas. Stay tuned :)
Also check out PEP 525 -- another cool thing we're trying to make happen in Python 3.6.
> right, so my suggestion is, keep the DB code in a thread pool. So you can use the async for your 1000-client frontend and not force it into the DB layer where it makes things much more complicated (or what I do, just use gevent :) ).
Looks like a totally valid suggestion for folks who write async code and want to continue to use SQL Alchemy.
However, if you don't use SQL Alchemy (or another ORM) and just want to work with the DB directly in an async/await code base, having a threadpool will only needlessly complicate things.
> If you write a program that has thousands of persistent websockets open, you do not want all those thousands of connections to each have their own, persistent, long lived database transaction. [..]
Mike, I've never suggested to have a DB connection per HTTP request. Please re-read my other comment in this thread.
We haven't benchmarked this specific use case. However, cursors in asyncpg do data prefetching (the exact number of prefetched rows is configurable), so I expect that asynchronous iteration should be pretty fast (and still faster than psycopg2).
May be I am reading the benchmarks wrong, but does it mean I can get a 2x speedup in my end to end query execution from moving from aiopg to asynpg alone without any server side changes or query optimizations?
We intentionally benchmarked simple queries to measure the overhead of the driver implementation. So, the 2x speedup really means 50% reduction in overhead.
Yeah, documenting internals sounds like a good idea. At least to make it easier for people to contribute and further improve asyncpg. We'll try to find some time to do that.
SQLAlchemy ORM is not async, so there is no way currently to use any async dialect for it. However, it is possible to write an SQLAlchemy Core adapter for asyncpg.
It might even be easier than writing an adapter if you just use SQLAlchemy Core's DSL as the generative mechanism for a SQL statement that is sent to asyncpg.
Does this automatically escape/convert things like datetimes in a query or does that need to be done separately? The docs are a little sparse on details.
Using Cython means you don't/can't use Pypy? Was that something you considered, e.g. with psycopg2cffi or a pure-python driver? It seems to be the defacto performant runtime for Python, and since performance is something you guys obviously care a lot about, I'm surprised it hasn't been mentioned in this thread.
Very excited to try out your code - thanks for sharing it.
A pure-Python implementation would probably never reach this level of performance, even under PyPy. Cython really makes it possible to basically drop down to C and write very efficient code.
That said, once PyPy supports 3.5, we will consider making a CFFI binding.
By using python asyncio, you can't use pypy because pypy doesn't support 3.5 yet (or probably for a while), so by going async, you are eliminating pypy right there.
Primary author of node-postgres (both the JavaScript & libpq versions) here. Very good question. I'd say part of the relative speed is because it's been a mostly 1 person (me!) project for a loooong time & I haven't had a lot of time to do perf analysis to find where exactly to make it faster. I would absolutely love some help! My time for open source is pretty limited, and I try to spend most of it focused on triaging issues. If you'd like to help out please don't hesitate to open an issue, a pull request, or send me a gchat (contact info is in node-postgres readme) and we can discuss!
would be cool to also make jmh tests against jdbc and async pg on java.
actually what i also found is that python wasn't slow in these cases but it was slow when you do map operations on the lists and reoder the data or convert it to other structures.
We've been using Python (2.7) and Postgres, where I work, and I must say I really miss the async feature. It's really convenient to be able to parallelize and saturate I/O very quickly. Right now, we're using a combination of threads and processes without proper messaging, which leads to deadlocks and other nasty things.
That's one thing I've been quite pleased with, when using Scala.
I use SublimeText with a Cython highlighter; @redcrowbar uses Atom. I started coding uvloop & asyncpg directly in Cython, basically from scratch.
Debugging & refactoring are pretty much as easy as for pure Python code (which is a huge win over C). Sometimes when you're writing low-level code you can get a segfault, in which case lldb (or gdb) help you understand what's going on relatively quickly.