Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'd imagine they mean indexing (and being able to search on) data in real time. Given LinkedIn's previous open source projects around real time search (http://javasoze.github.com/zoie/).

Lucene (which Solr uses as its index) cannot expose newly indexed data immediately after it's added.

Lucene exposes IndexReaders for searches, which offer a snapshot view of the index. In order to search across new documents IndexReaders need to be re-opened, a somewhat expensive operation. Expensive enough to prevent it from happening after each document is added, especially if they're added frequently.

The latest version of Lucene supports "near real time" search, but afaik it's not widely used (with Solr).



Yeah, NRT is 4.0; our content is such that right now that kind of flexibility isn't required. (Once-a-day batch db writes that update the index in NRT via signaling)


IndexTank is built on Lucene too. I'm not sure if it is the real time branch or not, though.


It is not exactly built ON Lucene. It reuses very specific constructs. The main one is the structure that holds the comprised index. And that is only used for the long term index. The realtime part of the index has been written for IndexTank exclusively.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: