BTW, this product (Scalyr) is a lifesaver. We (Periscope) are able to operate ~ a dozen heterogeneous servers with no FT DevOps largely because of Scalyr.
Lots of attention goes to OLTP-type loads for good reasons, but when you do design to just stream fast, some fun things happen:
You can use lots of relatively cheap spindles in parallel, and think of each one as (at least) 100MB/s of sequential read speed and a couple TB of space. You have fast compression available that can increase your effective bandwidth and make the effective cost of space cheaper.
You can draw on well-understood ways to search, sort, do hash- or sort-based joining and grouping, and so on.
Streaming doesn't need a big in-memory cache to avoid disk seeks, so you can use those gobs of RAM for other things--aggregating results or holding data to join against, say. (Of course, if you don't need the RAM, disk cache might still be useful for some access patterns.)
Besides log search, you see a stream-fast approach in analytics-focused DBs: BigQuery, Redshift, Vertica, and open-source ones--Facebook put up a good post about the work that led to the their Hive ORCFile design.
Some bioinformatics tools load a big hashtable into memory and, roughly, hash-join against a ton of raw data streamed from disk, then sometimes then repeat the process with another hashtable.
These are not at all original observations, but I managed to hear about these sorts of analytics and bioinformatics tools for a while before really getting how or why why they did things all that differently from a typical random-access-oriented database.
In that time I had 4-5 days I couldn't even login since the network was "under attack", they didn't tell me I told them.
Their management interface is an exercise in frustration.
Customer support is horrific, "We are not aware of any on going network issues" yup except I can't ping it from any of my other servers anywhere in the world.
You get a fast machine for a cheap price but the network is a joke, the support is a joke and they are under a near perpetual DDoS.
We started with Java because the rest of our backend is written in Java. We stuck with it because it has not turned out to be much of a bottleneck in practice. This came as a surprise; I had expected that we'd need to rewrite the substring search loop in native code. We may still do so eventually, but we're able to get very high performance in pure Java. (1.25GB / second / core, as noted in the post.)
Edit to add: "good enough is not good enough" -- very well put! Yes, it would be good to move this loop to native code. However, with our current design, the overhead of calling out from Java to native code would probably outweigh any benefits. We're planning to move our core database storage to off-Java-heap memory buffers, at which point it will become much more feasible to call out to native code.
The inner loop speedup on this particular code will be more like 2x than 3x, and the speedup of the overall system will be quite a bit less than that. But your point is still correct.
Well, if you can triple your performance, you will require 1/3 of the servers, and your clients will enjoy faster responses. I thought that was the idea, good enough is not good enough.
Anyways, congrats on your success. I would love to read more about the business side - finding clients, profits, expenses, etc.
String searching is an inherently I/O or memory bound problem. Your CPU ends up waiting for bytes to arrive from memory at around 50 GB/s theoretical max, half of that in practice usually. The programming language or algorithm doesn't matter that much when memory bandwidth is saturated.
A faster implementation of a string searching algorithm could only save a few milliwatts of CPU power, it wouldn't make it faster nor require less hardware.
That's not correct, until the point that requests are queuing, increasing performance gets you lower latency and higher throughout simultaneously in proportion.
Once there's a backlog of requests, faster requests on proportionally fewer machines will not help the queuing latency.
But in general it's not a good plan to be in that situation for long, because the back pressure that stops the queued requests expanding infinitely is people leaving your service in frustration.
Hey, can you help me understand this? I don't quite follow. Obviously for a fixed number of servers the throughput goes up and latency goes down. But it sounds like you're saying you can somehow reduce the number of servers and still have lower latency?
Lets just make up toy numbers. Suppose you have a code that can process (say) 1000 lines per second, per machine. You have 3 machines and you need to process 90k lines. Each one gets assigned 30k lines and it takes 30 seconds overall right?
Suppose you find a 3x speedup in the code. Now your 90k lines takes 10 seconds. Alternatively, you can ditch who machines, and process the whole 90k on one machine in 30 seconds, the same as the original time.
To me it seems like this is what tacotime was saying, you can either have faster responses or fewer machines.
The metric that is improving is the latency experienced by a user.
Imagine that you have a web app that takes one second of server CPU time to render a page, and you have three servers which process three hits a second in total. All three servers are thus on 100% CPU load, dealing with one hit a second each.
Each time somebody visits your site, they are going to experience a 1 second latency (in addition to communication latency), as they wait for one of your servers to build the page.
If you then optimize your code so that it completes in a third the time, 333ms, then your servers are going to suddenly be at 1/3 load; they execute their one query for 333ms, and then sleep for 667ms.
Not just that; but the user now only has to wait for 333ms for the page to render on the server, so the site gets a lot more snappy for them.
Then you can shut down two servers leaving one; it will sit on 100% load, but you still keep the shorter 333ms latency experienced by your users.
You are doing the same amount of work as before - 3 hits a second. But previously, when the tasks took 1s each, three would be running in parallel, each being completed more slowly. With the faster run time, they are running in series, being completed quickly before switching to the next one.
Now, this does not apply if your requests are queuing. Because you're not able to do any more work than before (after shutting down the other two servers), if your hits/s exceeds the capacity of your servers to deal with them, the backlog will grow just as fast as it normally would have, and the latency caused by this will sky rocket.
Your example doesn't fit because multiple machines aren't ever used to process front end web requests in parallel; you don't render half of a template on one machine, and the other half on another, for example. If you can set up a system like this and see gains from it, then what I've written above will not apply.
Gotcha, thanks for clarifying. I had something like batch processing for data analysis in mind, which is what I understood to be the target use model of the original article. What you are saying in the context of synchronous request processing makes sense and is an interesting point.
Thank you! A lot of people somehow don't get this. They think only in terms of horizontal or vertical scaling. (Increase code performance is definitely a form of vertical scaling, it isn't what people are using thinking/talking about)
Sorry for the late reply—I wanted to give you a serious answer and didn't have time earlier.
The criteria for "substantive" are what the HN guidelines (along with https://news.ycombinator.com/newswelcome.html) say about which stories are on topic: intellectually interesting (as opposed to sensational) and so on. HN's culture is well-enough established by now that I think most community members can probably agree, not of course on what interests each of us personally, but on a working set of candidate stories that roughly fit the criteria. That's our hypothesis, anyhow.
HN sees perhaps a thousand new stories a day. Most aren't "intellectually interesting" in HN's sense (i.e. gratifying intellectual curiosity), and the weeds have grown too thick for the upvote system alone to reliably surface the potential gems. To comb through /newest looking for them has become too much work.
Some interesting stories (again, "interesting" in the HN sense) do fine under the upvote system. Breaking news, stories about fashionable technologies, and anything controversial reliably attract upvotes. But the quieter and deeper submissions are often not immediately recognizable. Those deserve closer attention than they were getting, so we've been experimenting with different approaches to finding them.
In the spirit of "do things that don't scale", it's mostly us trying different things manually for now, but what we're looking for is a mechanism that can be opened to anyone willing to put in the effort. The upvote system will work exactly as it always has, but we're hoping to add a new one that complements it and compensates for its weaknesses.
Since it requires significant effort to look through the story stream hunting for out-of-the-way pieces, one idea (my favorite) is to make this a new way of earning karma on the site. But first we have to find something repeatable that will work.
Downthread, someone mentioned that they couldn't find the HN discussion from when this was originally posted; it's here:
https://news.ycombinator.com/item?id=7715025