More

moatra · on Feb 20, 2016

Amazon Cetrificate Manager seems to do this, but it's only available in one region right now.

moatra · on Jan 14, 2016

When did the rename happen?

olalonde · on Jan 14, 2016

I'm not sure, just found out myself few days ago. I believe this is the original announcement was in November: http://www.linuxfoundation.org/news-media/announcements/2015...

moatra · on Dec 10, 2015

Just in time! This will be nice reference. I recently started working on a new Scala driver that uses the v0_4 asynchronous protocol, built on top of Akka's IO module and Play's JSON module. I think I have the performance where I want it, but now I need to flesh out the DSL for proper ReQL support.

habitue · on Dec 10, 2015

Have a look at https://github.com/rethinkdb/rethinkdb/blob/next/drivers/jav...

It has a full listing of all ReQL terms and their signatures and optargs

moatra · on Oct 8, 2015

They just announced a cron-like scheduler for Lambda, so that may make for a decent alternative.

moatra · on Oct 2, 2015

Exponential back off is a good idea, but to make it even better you'll want to add some random jitter to the delay period. Especially in the case of deadlocks this helps avoid another deadlock when two things would have retried at exactly the same time.

mjb · on Oct 2, 2015

Absolutely. I made some graphs of different kinds of jitter approaches at http://www.awsarchitectureblog.com/2015/03/backoff.html. Simply backing off with delay does very little to spread out spikes of work, and can lead to longer MTTRs if the cause of the issue was overload.

sgk284 · on Oct 3, 2015

Would it be possible for the server to simply return a queue position to the contending clients? For instance, "There are 37 clients ahead of you. Try again in 37ms." (assuming each write is averaging 1ms)

I suspect if the client and server worked together in this fashion you could get much closer to the linear completion time seen with no backoff and also closer to linear work (since each client would try exactly twice in an ideal world where the server accurately predicted when it should try again).

cpeterso · on Oct 3, 2015

Sorry. I accidentally down voted your comment when trying to up vote it on my phone! The article was very interesting.

shabble · on Oct 2, 2015

I think most exponential backoff schemes I've looked at treat the 'backoff delay' as a range and do randrange(0, current_backoff) or similar which does what you say.

Truncated exponential (where you cap the upper limit of the retry delay to some maximum) is also often a good idea, to prevent a short service outage from spiking the retry timers to crazy numbers.

mafribe · on Oct 2, 2015

Doesn't exponential back-off mean that you select the time until retry uniformly at random from the interval [0, ..., 2^n-1] in the n-th round of failure, or something along those lines?

moatra · on July 7, 2015

Do Continuous Views work with table-table joins, or must there always be at least one stream present? The documentation[1] doesn't specify.

If so, this could be an interesting alternative to RethinkDB's changefeeds, as RethinkDB doesn't support joins on the change stream.

[1] http://docs.pipelinedb.com/joins.html

grammr · on July 7, 2015

Founder @ PipelineDB here (hi Slava!)

Currently continuous views must read from a stream. However, in the very near future it will possible to write to streams from triggers, which would probably give you enough flexibility to model the behavior you want if you could conceptualize a table as a stream of changes.

coffeemug · on July 7, 2015

Founder @ RethinkDB here.

Our next release (2.1) is due in about three weeks and includes automatic failover/high availability. Feeds on table joins (and other greatly expanded feed functionality) will be in 2.2, which should happen ~6-8 weeks after 2.1.

(Sorry to jump in with a shameless plug; what PipelineDB is doing is super-cool; I also met the founders a few times, and they're awesome, smart, and very driven people -- I'm really excited about what PipelineDB has to offer!)

clebio · on July 7, 2015

Would be mind-expanding if shell redirection could be used for that, e.g.

> diff <(cat a.txt) <(cat b.txt)

moatra · on June 17, 2015

KONG definitely looks interesting, and I'd love to know more about it. However, there's definitely not a lot written about it yet.

For example: I've gone searching through the blog posts, github readme, and KONG documentation, but I still have no idea _why_ it needs Cassandra. What does it store in there?

jkarneges · on June 17, 2015

Kong uses Cassandra for storing config. This makes it easy to run a Kong cluster. Just add more instances that share the same Cassandra cluster.

moatra · on June 17, 2015

Is rate limiting state stored in Cassandra?

One of the main graphics on the KONG docs shows a Caching plugin (http://getkong.org/assets/images/homepage/diagram-right.png), but the list of available plugins doesn't include such an entry. Is that because caching is built in? Is the cache state stored in Cassandra? Or is the plugin yet to be built?

fosk · on June 17, 2015

All the data that Kong stores (including rate-limiting data, consumers, etc) is being saved into Cassandra.

nginx has a simple in-memory cache, but it can only be shared across workers on the same instance, so in order to scale Kong horizontally by adding more servers there must be a third-party datastore (in this case Cassandra) that stores and serves the data to the cluster.

Kong supports a simple caching mechanism that's basically the one that nginx supports. We are planning to add a more complex Caching plugin that will store data into Cassandra as well, and will make the cached items available across the cluster.

sinzone · on June 17, 2015

Worth mentioning KongDB, to easily provision a cloud Cassandra instance for free: http://kongdb.org

moatra · on May 27, 2015

Paper - https://www.fiftythree.com/paper

moatra · on May 27, 2015

  if you want to build a new derived datastore, you can just start a new consumer
  at the beginning of the log, and churn through the history of the log, applying
  all the writes to your datastore.

For high-throughput environments with lots of appends to the log, how do you get around the ever-increasing size of your log file? I know the traditional answer is to take a periodic snapshot and compact the previous data, but is that built in to tools like Kafka?

Sphax · on May 27, 2015

There's a log compaction cleanup policy yes. Never used it myself but if I'm not mistaken it works like this: for each message you send to Kafka, you set a key with it. When Kafka does log compaction, it keeps only the last value for each key.

The other cleanup policy is to just have a retention time. After X minutes/days/weeks segments of the log are simply deleted.

moatra · on May 27, 2015

That sounds great if your messages in the logs are the complete state for that key, but I'm not seeing how to use that compaction system if the messages are change events.

Is there a system designed for snapshotting the aggregate and logging the delta?

kasey_junk · on May 28, 2015

A common pattern is to publish a "checkpoint" message. Not sure if the concept is built into Kafka or not.

erichmond · on May 28, 2015

It's easy to store messages in HDFS or S3 for long-term storage. It's also easy to replay messages from those mediums, if you need to re-ingest data later on.

skybrian · on May 28, 2015

One idea is to shard the logs. By analogy with git: any given repo has a log of its commits, but you can have as many repos as you like.

It does limit throughput for any given shard, though, and then you're left with a distributed transaction problem to solve when you need to commit changes to objects in different repos.

moatra · on Jan 31, 2015

Regarding changefeeds - is there a way to tell when you've consumed all the initial data and are now receiving update diffs?

coffeemug · on Jan 31, 2015

Yes. When you're getting initial data you'll get a document of the form `{ new_val: data }`. When you're getting changes, the document is of the form `{ new_val: data, old_val: data }`. Note that in the former case, the `old_data` field is missing.

moatra · on Jan 31, 2015

Ah, thanks. It looks like only a few query types actually return an initial result set: between, min, max, and order_by/limit ( http://rethinkdb.com/docs/changefeeds/python/ )

Is there a way to get an entire table as the initial result set before getting update diffs? Something like:

  r.table("users").between(-Infinity, Infinity).changes().run()  // Not actually valid

coffeemug · on Jan 31, 2015

Not at the moment, but we're on it -- see https://github.com/rethinkdb/rethinkdb/issues/3579.

tracker1 · on Jan 31, 2015

Wouldn't setting a limit to more than the table/result size work?