Hacker Newsnew | past | comments | ask | show | jobs | submit | hyc_symas's commentslogin

Ah yes, the ever popular "mongoDB's developers were incompetent therefore mmap is bad" paper.

Pure tripe. https://www.symas.com/post/are-you-sure-you-want-to-use-mmap...


How often does anyone care about using data on a different system than it was created on?

These days, any C struct you built on amd64 will work identically on arm64. There really aren't any other architectures that matter.

And yes, managing concurrent access to shared resources requires care and cooperation. That has always been true, and has nothing specific to do with mmap.


On BSD, read() was already implemented in the kernel by page-faulting in the desired pages of the file, to then be copied into the user-supplied buffer. So from the first time mmap was ever implemented, it was always the fastest input mechanism. (First deployed implementation was in SunOS btw, 4.2BSD specified and documented it but didn't implement it.) Anyway there's no magic to get data off a device into memory faster, io_uring just lets you hide the delay in some other thread's time.


mmap is slow because stalling on page faults is slow. Your process stalls and sits around doing nothing instead of processing data you've read already. You can google the benchmarks if you like. io_uring wasn't built just for kicks.

https://www.bitflux.ai/blog/memory-is-slow-part2/


Network data and most serialization formats are big endian because it's easiest to shift bits in and out of a shift register onto a serial comm channel in that order. If you used little endian, the shifter on output would have to operate in reverse direction relative to the shifter on input, which just causes stupid inconsistency headaches.


Isn't the issue with shift registers related to endianness at the bit level, while the discourse above is about endianness at the byte level? Both are pretty much entirely separate problems


In-memory DBs were always a dead-end waste of time. There will always be bigger slower cheaper secondary storage, no matter how cheap main memory gets. And workloads always grow to exceed the available main memory (unless a system is dying). And secondary storage will always be paged, because it's too inefficient to address it at smaller granularity. That's just reality, and anyone who ignores reality isn't going to get very far.


Fun footnote: SQLite only got on board with mmap after I demonstrated how slow their code was without it. I.e., getting a 22x speedup by replacing SQLite's btree code with LMDB https://github.com/LMDB/sqlightning


Thank you for beating the mmap drum and LMDB! It's truly an incredible piece of tech.


Andy's critiques are only valid on dedicated database servers.

https://www.symas.com/post/are-you-sure-you-want-to-use-mmap...

LMDB uses mmap and Andy recommends LMDB, in the very article this thread is about.


A lot of potential treatments are too easily available and can't be patented. If a big pharma company can't make massive profit from it, they won't bother bringing it to market. Consider that a not-good reason.

Other treatments may eventually prove to have too many serious negative side effects. That's a good reason to abandon them.


> A lot of potential treatments are too easily available and can't be patented.

This isn’t really an obstacle, at least not as much as it’s made out to be.

There are numerous examples of drugs being brought to market at high prices despite having been generic compounds. Even old drugs can be brought back at $1000/month or more at different doses or delivery mechanisms.

One example: Doxepin is an old antidepressant that is extremely cheap. It was recently re-certified for sleep at lower doses and reintroduced at low doses at a much higher price, despite being “off patent”.

This happens all the time. The drug companies aren’t actually abandoning usable treatments due to patent issues as much as journalists have claimed. If they couldn’t, for some reason, find a way to charge for it they could still use it as a basis for finding an improved relayed compound with more targeted effects, better pharmacokinetics, etc.

They’re not just dropping promising treatments anywhere if there’s a market for them.


About Doxepin. As many seniors do, I also suffer from extreme inability to stay asleep at night. I have trialed through all the known prescription and non prescription possibilities, only eszopiclone and baclofen seem to show some promise, however, eszopiclone is DEA listed, requires higher and higher doses, and if I take it more than say 2 weeks, it has rather serious side effects attempting to withdraw, addictive, serious anxiety, trying to wean oneself off it. Doxepin is prescribed as an antidepressant in large doses, one of the most potent H1 histamine antagonists known. The H1 system in our bodies promotes wakefulness. In very low doses, doxepin acts against the H1 to promote sleep. To avoid the upcharges of low dose doxepin, I am prescribed the high dose version, which I have to break the capsules to administer about 5 to 10 mg. placed in an empty gelatin capsule (it's bitter). It really works well, however you are fairly tired and useless the next day.


Have you tried Cognitive Behavioral Therapy?

I am a lifelong sufferer of insomnia (though mostly sleep onset) and tried all sorts of increasingly risky things. CBT cured me in ~2 months.


Why would a China or India care if it were a viable treatment? Unless a country wants to use their population as lab rats, it takes money and scientists to actually confirm a treatment is safe and effective.


Obviously you use the neighboring country’s population, or an ethnic minority, or prisoners, or orphans, or…


Wonder if some form of FOSS approach would work as an alternative development model for pharma?


These folks gave an interesting talk on producing pharmaceuticals at defcon a couple years ago.

IIRC it was more about production methods than developing new treatments.

https://fourthievesvinegar.org/


This made me think of the "Institute for one world health" . It came out as a non-profit pharmaceutical company in the mid 2000's. Victoria Hale was the founder-it got her a MacArthur fellowship. It is focuses(focused?) on global health and populations underserved by for-profit models. I think they successfully developed a treatment for leishmaniasis. it's an adorable model and should be pushed but as usual it seems like the philantropy money is limiting.


Kind of like open source software.


Wrong, LMDB fully supports multiprocess concurrency as well as DBs multiple orders of magnitude larger than RAM. Wherever you got your info from is dead wrong.

Among embedded key/value stores, only LMDB and BerkeleyDB support multiprocess access. RocksDB, LevelDB, etc. are all single process.


My mistake. Doesn’t it have a global lock though?

Also, even if LMDB supports databases larger than RAM, that’s it doesn’t mean it’s a good idea to have a working set that exceeds that size. Unless you’re claiming it’s scan resistant?


It has a single writer transaction mutex, yes. But it's a process-shared mutex, so it will serialize write transactions across an arbitrary number of processes. And of course, read transactions are completely lockfree/waitfree across arbitrarily many processes.

As for working set size, that is always merely the height of the B+tree. Scans won't change that. It will always be far more efficient than any other DB under the same conditions.


> As for working set size, that is always merely the height of the B+tree.

This statement makes no sense to me. Are you using a different definition of "working set" than the rest of us? A working set size is application and access pattern dependent.

> It will always be far more efficient than any other DB under the same conditions

That depends on how broadly or narrowly one defines "same conditions" :-)


Identical hardware, same RAM size, same data volume.


That’s a bold claim. Are you saying that LMDB outperforms every other database on the same hardware, regardless of access pattern? And if so, is there proof of this?



You don't have to take my word for it. Plenty of other developers know. https://www.youtube.com/watch?v=CfiQ0h4bGWM


Since the first question of my two-part inquiry not explicitly answered in the affirmative: To be absolutely clear, you are claiming, in writing, that LMDB outperforms every other database there is, regardless of access pattern, using the same hardware?


Not every.

LMDB is optimized for read-heavy workloads. I make no particular claims about write-heavy workloads.

Because it's so efficient, it can retain more useful data in-memory than other DBs for a given RAM size. For DBs much larger than RAM it will get more useful work done with the available RAM than other DBs. You can examine the benchmark reports linked above, they provide not just the data but also the analysis of why the results are as they are.


> So much C library code is documented in ad-hoc ways - often through doxygen, which is a disaster. Eg here's the documentation for LMDB. LMDB is one of the most thoroughly documented C APIs I've seen, but I find this almost totally unusable. I often find myself reading the source instead. There's not even any links to the source from here:

> http://www.lmdb.tech/doc/group__mdb.html

How is doxygen a disaster?

Why do we need links to the source code? Doxygen is already embedded in the source, you should already be reading the source code on your local machine. It makes no sense to go searching across the web for information that's already stored on your local machine. Especially since you have no idea if the version you find on the web matches the version you're using locally.


Hah! Do you have a running search giving you alerts when lmdb is mentioned or something?

> How is doxygen a disaster?

I've just never read any good, user friendly documentation ever produced by doxygen. Even when I used it myself. It always comes out looking like a pig's breakfast.

Like, take the lmdb docs. And I'm sorry for picking on it. But its a good example, because you've clearly put effort into using doxygen to document lmdb. I think the lmdb docs are about the best that doxygen generated documentation gets.

Looking at this page: http://www.lmdb.tech/doc/group__mdb.html

There's a bunch of concepts here:

- Environment (mdb_env)

- Database and database (dbi)

- Transaction

- Cursor

- Record

- Key

But none of those concepts are defined or explained. They're simply referenced without explanation in a big jumble of function names and descriptions, leaving me to figure out how they're supposed to work together. Maybe if those data types were defined up the top of the page? No. Doxygen tries to put some data structures at the top of the page - but for some reason the only documented types are MDB_val, MDB_stat and MDB_envinfo. All terrible places to start reading if you want to understand how to use lmdb.

Good documentation would lead with some front matter like:

- What does the library do

- How does the library sees the world. Here, explain the above concepts and how they relate. (Eg an environment represents a set of files on disk. Each environment contains a numbered set of databases, database contains a set of records. You can read and write within a txn. You can use a cursor to iterate. ... Etc)

- Code examples of all of the above. Ideally a hello world, and more complex / specific examples showing each feature.

Doxygen does not help with any of that. From this documentation I don't know how to use lmdb to make a correct program. I can kinda guess how to use various features, but not what the features actually are or how to use them correctly.

For example, I can obliquely tell from reading the function descriptions that an environment can be opened multiple times at the same time. But I have no idea why I'd want to do that, or how, or what performance implications there are, or if there are any gotchas I need to be aware of if I do that. I see there's a bunch of functions for mdb_env_copy. Does that copy in memory or on disk? Does it do it atomically, like at a snapshot? Is it synchronous? Does it fsync? fdatasync? What errors can happen? The documentation isn't helpful.

So I'm not just banging on about rust, here's equivalent - but much better - reference documentation for prisma:

https://www.prisma.io/docs/orm/reference/prisma-client-refer...

Prisma also has a separate guide, explaining the concepts involved:

https://www.prisma.io/docs/orm/prisma-client/queries/select-...

Or for another positive example, here's bun's documentation on using sqlite from javascript:

https://bun.com/docs/runtime/sqlite

They explain the concepts and present examples for how to use all the features. When I read those docs, I come away knowing how to use the library to solve my problems. I don't have that experience reading the lmdb documentation.

Maybe its possible to produce good documentation using doxygen. But I've never seen it done. Not even once.

-----

Side points:

> you should already be reading the source code on your local machine.

I'd rather not read the source code of all my dependencies to understand how to use them. Reading the source code of your dependencies should be a last resort. Eg, I don't go reading my compiler's source code if I want to understand my programming language. I don't want to read the source code of my web browser, or postgres, or linux, or any of this stuff.

> It makes no sense to go searching across the web for information that's already stored on your local machine. Especially since you have no idea if the version you find on the web matches the version you're using locally.

I hear you, but honestly I don't really care where documentation lives. Just so long as I can find it and read it. But with rust in particular, if you want local documentation you can run cargo doc to generate & open the documentation of your project and all your dependencies, which is nice.

And re: versions, rust docs hosted online also have a little 'version' field up the top showing which version of the library you're looking at the documentation of. Eg if you open https://docs.rs/rand/ I see "rand-0.9.2". If you change versions, the URL changes. It'd be nice if doxygen had that too.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: