> And let's be honest who has those gigantic tables with more than 5M rows? Not too many I'd assume.
/Looks around all innocent ... does 57 billion count? Hate to tell ya but plenty of use cases when you deal with large datasets and normal table design will get out of hand. And row overhead will bite!
We ended up writing a specialized encoder / decoder to store the information in bytea fields to reduce it to a measly 3 billion (row packing is the better term) but we also lost some of the advantages of having the database btree index on some of the date fields.
Thing is, the moment you deal with big data, things starts to collaps fast if you want to deal with brute force, vs storage vs index handeling.
I can think of a lot more projects that can expand to crazy numbers, if you follow basic database normalization.
> /Looks around all innocent ... does 57 billion count?
That surely counts as a case where brute-force search will not do :) I'm intrigued though, do you really need to make searches over all those vectors or could you filter the candidates down to something <5M ? As I wrote, this is one of the nice advantages of no-index brute-force search, you can use good 'ol SQL WHERE clauses to limit the amount of candidates in many cases and then the brute-force search is not as expensive. Complex indices like HNSW or DiskANN don't play as nice with filters.
/Looks around all innocent ... does 57 billion count? Hate to tell ya but plenty of use cases when you deal with large datasets and normal table design will get out of hand. And row overhead will bite!
We ended up writing a specialized encoder / decoder to store the information in bytea fields to reduce it to a measly 3 billion (row packing is the better term) but we also lost some of the advantages of having the database btree index on some of the date fields.
Thing is, the moment you deal with big data, things starts to collaps fast if you want to deal with brute force, vs storage vs index handeling.
I can think of a lot more projects that can expand to crazy numbers, if you follow basic database normalization.