Postgres is going in one direction - for the better. And as a long time Postgres user I've been really happy with it for years. Now with JSONB support I am even happier, as I can build powerful RDBMS + "NoSQL" apps that outperform traditional NoSQL databases.
To my disappointment several members of my team have been championing Oracle recently though - presumably because it's a safer option and can be managed by others under an SLA.
How does one argue against this? With Postgres we're not going to get SLA's because we know it very well, but with Oracle we would. It's mostly perception management IMHO, and I'm mostly against it. What killer core features does Oracle have that Postgres doesn't?
> What killer core features does Oracle have that Postgres doesn't?
While I don't really like Oracle as a company, there are some features that Oracle has and postgres don't, that can be important for production deployment:
- flashback
- partitioning that's much more mature
- merge /*+ parallel
- materialized views that refresh on commit
- and oracle is quite a bit quicker - especially with badly written queries
But those (except for optimizer) are all Oracle Enterprise features which means you're paying ridiculous money per-core for enterprise edition itself, and then you're paying ridiculous money for the features one by one.
There's mostly no reason to use Oracle Standard edition instead of postgres. And I have seen Oracle XE in producion and that's just sad.
EDIT: oh and I forgot - postgres has that stupid "we don't want to include optimization hints in the language so we use CTEs as optimization boundary" policy that the developers stuck on and refuse to change, which means you have to decide between readable code and performant code. Meh.
Flashback is indeed a killer feature. I'd like to see something like Flashback in Postgres. PITR is possible with Postgres but it takes very long to replay log files to a certain point, if you need to restore data. I know this because I've had to do this quite a few times.
Client or server side? (I guess maybe the oracle side "hints" have another name, but my Oracle-Foo is not strong enough to remember the name query plan?). I have seen some really big shops that the only way they could keep their app running in production was a fair bit of server side hints. Which then really sucked because someone fixed the casing on a table name (say "Orders" to "orders") and made it fall out of the query plan. Whereas if they were done as hints, they would have kept working.
The new query plan stabilization features in 12cEE really help with this. Hints are a pain when upgrading Oracle major versions as they can affect query correctness, as my boss found out painfully last week ;(
I too think plan stabilisation is the important thing. What people particularly want is the ability to know that their production performance is not just going to suddenly crash and burn because of a change in plan.
Spoken like people who have never dealt with Oracle licensing and compliance. "Safer" for values of "well, I guess we didn't need those millions in this year's budget anyway.
... but tongue out of cheek, if you can afford it, Oracle has solutions for just about every database problem in just about every vertical domain. They have really good engineers, really good testing, and they're #1 for a reason
Well, professional support with SLAs of available for postgres through, most notably, EnterpriseDB, who also is a significant source of resources for core postgres development. So whether or not that's really something you need, that it is available for Oracle isn't a distinguishing point when compared with Postgres.
They will come back after speaking with people at very large banks (if they go to PgConf NY), very large insurance companies, and very large enterprise tech companies. All of them are moving more of their business into postgresql. Many of them are moving away from oracle.
Well, parallel execution, for one. Am I reading this right? Postgres has not had parallel query, DML, or DDL until now? This has been a feature of Oracle for as long as I've been using it.
Say what you want about enterprise software but Oracle is a beast... if you work for a company willing to shell out for it. That cost will seem sillier though Postgres closes the feature gap.
The answer would likely be to just price up Oracle. Not just buying it, but over three years or whatever. Particularly if you virtualise, and get stuck by the varying definitions of what that means for core licensing. That might help you out.
As for "killer features", the zero downtime version upgrades has always meant a lot to me.
What killer core features does Oracle have that Postgres doesn't?
Lots of third party applications that support Oracle and not Postgres is sadly the most common reason I see people deploying Oracle these days. Also Oracle used to be a lot better with larger databases (400+ TB or so), but I have no idea if that is still true.
Unfortunately yes, the compression options and existing parallel code in Oracle means that at the moment IMHO Oracle performs better than PG at that scale.
There are multi petabyte Oracle RAC instances in existence, while on the other hand a 10TB+ PG is very large.
This is a first step in closing that gap. Some form of compression on block level would really help as well. For analytics columnar support will be needed to, and an in-memory option would be nice too.
Main problem with Oracle is cost. They are changing the licensing structure for the standard edition which is going to have some major impact on our business.
Enterprise edition's price is completely prohibitive even for a large company.
Most of the nice Oracle things require Enterprise. For example, RAC (clustering) actually works and is not bad, but to be useful you need Enterprise. The other thing that is pretty awesome is Enterprise Manager -- the web tool which lets you monitor your RAC, including looking at the executing queries, their plan and progress they are making. It is completely invaluable in diagnosing and improving slow queries. And it is, again, Enterprise only.
Amazon RDB - not exactly the same as a managed oracle, but you get much in the package: backups, transparent multizone failover are the one that I prize the most.
Although notably the Oracle version along with MySQL has some features on RDS which PostgreSQL does not (like a HIPAA business partner agreement available for example).
Naive question here, but why wouldn't parallel scans actually be worse when reading from a spinning disk? And even for cached data I'd expect the bottleneck to be I/O still. Or is this just important because it's laying the groundwork for parallel queries, and we shouldn't expect any improvements from this feature alone?
The post shows a 3.5x performance improvement for requested 4-way parallelism, so that's a significant improvement right now!
You might be surprised how many queries are bound by memory bus bandwidth and CPU, rather than I/O ... but even I/O bound queries may benefit:
Remember that postgres is a "demand-pull" system. A sequential scan node returns a single tuple at a time, whenever its parent node asks for a tuple. So there can be considerable time between requests for an actual disk block.
Parallelizing the scan should result in more frequent requests to the disk controller, which could then have better opportunity to optimize reads - even for a single disk. Since most of the latency for spinning disks is seek time, having more outstanding block-read requests should help performance.
There are no stupid questions, only stupid answers... like this one :-)
No but seriously, if your "spinning disk" is even a semi-modern storage controller, it will queue and reorder your parallel requests for the most efficient access to the several, or dozens, or hundreds of physical disks it manages. So why not slam it with all the requests in a single sequence? Because a) if you are set up to handle parallel returns in your code, you don't care about the ordering in which blocks are returned so much and b) the way this is physically implemented is n queues of d depth and you want to fill them all with work.
SSD drives contain multiple NANDs. So a well designed SSD controller will take advantage of that (including data striping, essentially turning your drive into a tiny RAID). Of course it is not clear how good your consumer SSD drives are at that, but any server-grade drive you are using in your db server should support parallel reads. It gets tricky though because it reads data so quickly it might saturate the io bandwidth even with a single request, I've never done that math. All I know is that SSDs make my queries magical :)
You get streaming replication out of the box now and you get WAL-based replication assuming you configure a shared drive or use something like S3. The latter isn't super fun to setup automation for. Also not much of a story for logical replication (tried slony, meh, feels like a relic of olden days when every deployment was carefully hand-massaged into existence) and for automated failover.
Nowadays you actually can skip a lot of setup automation with RDS, with backups, read mirrors and automated failover taken care of for you, although you frequently hear folks complain about some of those scenarios not working as nicely as you'd expect.
Improved, yes. Fixed, no. Basically if you want proper full replication, you won't get it out of the box, but will have to use various third party tools. That being said 9.4 added a bunch of the fundamental changes that that laid the foundation for full replication, now we're just waiting PG to finish exposing those features with userland tools.
Built-in multi-master replication is the only real missing piece there, AFAIK every other replication base is covered well natively.
Besides that, sync & async replication has been there nicely since 9.2. 9.3 added timeline switches, needed for cascaded slave setups in practice. 9.4 added replication slots so the master can keep track of slave sync status, and the foundations for multi-master & logical replication. 9.5 added pg_rewind which speeds up resyncs.
Yes, it has improved a lot and is continuing to improve with new replication features every release. The built-in replication covers the simple master-salve replication case, and is simple to set up and stable.
I don't think any of the current open-source forks are ready for production.
The oldest one is Postgres-X2 [1] (formerly Postgres-XC), a project sponsored by NTT. It implements fully consistent multimaster replication and partitioning. After many, many years apparently still isn't production-ready, and it seems to have a significant scalability bottleneck that's tied to its design. Another issue is that being a fork, it has to be continually updated from the mainline. It's currently based on 9.3.
Postgres-XL [2] is apparently a merger of Postgres-XC and a different implementation called StormDB that was bought by a small company called TransLattice. (TransLattice also sponsored or acquired an earlier project that went nowhere, Postgres-R.) Unlike XC/X2, they say they aim to contribute changes back to the mainline, and they also claim their distributed query model is superior.
With the commerical support behind it (it's used as the basis of a commercial product), it's possible that this is something that will be usable. Unfortunately, it seems very quiet and not very open-sourcy; most of the development seems to be by just one guy [3], and nobody seems to be using it in production at this point.
That example benchmark he has is weird. You'd probably want to test this one something with at least a million rows, taking more than a 100ms and probably actually returning some data. This sub-second difference can be attributed to solar flairs. Also it is not clear that it is not just result of the table getting cached.
Did they implement a feature like this and not run a real performance test?
Robert Haas is the real deal, and will certainly have tested this feature.
Perhaps I missed it, but I don't think the post specified how many rows were in the source. Could have been more than a million. Having a relatively expensive filter (regex) eliminating all results keeps the performance picture clearer. Anything done with a tuple that passes the filter would apply to both the regular and the parallel scan cases, so eliminating the results highlights just the difference between regular and parallel scans
I really really doubt that it can process a million uncached rows under a second.
But yeah, I just tested an million was not a good example because it does take only a bit over a second on decent hardware.
So perhaps should have said 10 million.
I agree about not returning any results being the right test.
To my disappointment several members of my team have been championing Oracle recently though - presumably because it's a safer option and can be managed by others under an SLA.
How does one argue against this? With Postgres we're not going to get SLA's because we know it very well, but with Oracle we would. It's mostly perception management IMHO, and I'm mostly against it. What killer core features does Oracle have that Postgres doesn't?