Sure thing, I added them for the two cases in the readme. There's a section in the thesis about the FPR for more fixed sizes if you're curious (spoiler: it's pretty much exactly in the middle, notably higher than the CPU Cuckoo Filter though because really small buckets are bad for performance)
I haven't tested this but I would be very surprised if the PCIe bus wasn't a severe bottleneck in that case, unless you can somehow amortize the cost of the memcpy.
Though that being said, with such massive datasets you'll already be bottlenecked by the necessary communication between GPUs (sadly even with NVLink) since the queried data always lives on the GPU.
reply