+1 to this question. Approaching this problem requires a scalable graph database.
AllegroGraph is a well known solution to this problem but out of reach of most enthusiasts like me, so if DiffBot came up with their own solution, I, among others would love to read about it more
We've tried Neo4J, ArangoDB, as well as many others to store the triples. Neo4J locked up at around 100M entities, and also the loading/injection times weren't sufficient build the KG at a regular interval. However, we are closely following any developments in these projects as they improve. There is more detail in this interview here: https://www.zdnet.com/article/the-web-as-a-database-the-bigg...