Hacker Newsnew | past | comments | ask | show | jobs | submit | rajaravivarma_r's commentslogin

Not related to the content itself, but people using Psychology terminologies for wrong behavior is not acceptable.

The one use case where a DB backed queue will fail for sure is when the payload is large. For example, you queue a large JSON payload to be picked up by a worker and process it, then the DB writing overhead itself makes a background worker useless.

I've benchmarked Redis (Sidekiq), Postgres (using GoodJob) and SQLite (SolidQueue), Redis beats everything else for the above usecase.

SolidQueue backed by SQLite may be good when you are just passing around primary keys. I still wonder if you can have a lot of workers polling from the same database and update the queue with the job status. I've done something similar in the past using SQLite for some personal work and it is easy to hit the wall even with 10 or so workers.


In my experience you want job parameters to be one, maybe two ids. Do you have a real world example where that is not the case?


I'm guessing you're with that adding indirection for what you're actually processing, in that case? So I guess the counter-case would be when you don't want/need that indirection.

If I understand what you're saying, is that you'll instead of doing:

- Create job with payload (maybe big) > Put in queue > Let worker take from queue > Done

You're suggesting:

- Create job with ID of payload (stored elsewhere) > Put in queue > Let worker take from queue, then resolve ID to the data needed for processing > Done

Is that more or less what you mean? I can definitively see use cases for both, heavily depends on the situation, but more indirection isn't always better, nor isn't big payloads always OK.


If we take webhook for example.

- Persist payload in db > Queue with id > Process via worker.

Push the payload directly to queue can be tricky. Any queue system usually will have limits on the payload size, for good reasons. Plus if you already commit to db, you can guarantee the data is not lost and can be process again however you want later. But if your queue is having issue, or it failed to queue, you might lost it forever.


> Push the payload directly to queue can be tricky. Any queue system usually will have limits on the payload size, for good reasons.

Is that how microservice messages work? They push the whole data so the other systems can consume it and take it from there?


A microservice architecture would probably use a message bus because they would also need to broadcast the result.


yes and no, as the sibling comment mentions sometimes a message bus is used (Kafka, for example), but Netflix is (was?) all-in with HTTP (low-latency gRPC, HTTP/3, wrapped in nice type-safe SDK packages)

but ideally you don't break the glass and reach for a microservices architecture if you don't need the scalability afforded by very deep decoupling

which means ideally you have separate databases (and DB schema and even likely different kind of data store), and through the magic of having minimally overlapping "bounded contexts" you don't need a lot of data to be sent over (the client SDK will pick what it needs for example)

... of course serving a content recommendation request (which results in a cascade of requests that go to various microservices, eg. profile, rights management data, CDN availability, and metadata for the results, image URLs, etc) for a Netflix user doesn't need durability, so no Kafka (or other message bus), but when the user changes their profile it might be something that gets "broadcasted"

(and durable "replayable" queues help, because then services can be put to read-only mode to serve traffic, while new instances are starting up, and they will catch up. and of course it's useful for debugging too, at least compared to HTTP logs, which usually don't have the body/payload logged.)


> I can definitively see use cases for both

Me too, I was just wondering if you have any real world examples of a project with a large payload.


I have been doing this for at least a decade now and it is a great pattern, but think of an ETL pipeline where you fetch a huge JSON payload, store it in the database and then transform it and load it in another model. I had an use case where I wanted to process the JSON payload and pass it down the pipeline before storing it in the useful model. I didn't want to store the intermediate JSON anywhere. I benchmarked it for this specific use case.

...well, that's good for scaling the queue, but this means the worker needs to load all relevant state/context from some DB (which might be sped up with a cache, but then things are getting really complex)

ideally you pass the context that's required for the job (let's say it's less than 100Kbytes), but I don't think that counts as large JSON, but request rate (load) can make even 512byte too much, therefore "it depends"

but in general passing around large JSONs on the network/memory is not really slow compared to writing them to a DB (WAL + fsync + MVCC management)


> Redis beats everything else for the above usecase.

Reminds me of Antirez blog post that when Redis is configured for durability it becomes like/slower than postgresql http://oldblog.antirez.com/post/redis-persistence-demystifie...


May be, but over 6 years of using Redis with bare minimum setup, I have never lost any data and my use case happens to be queuing intermediate results, so durability won't be an issue.

There's been 6 major releases and countless improvements on Redis since then, I don't think we can say whether it's still relevant.

Also, Antirez has always been very opinionated on not comparing or benchmarking Redis against other dbs for a decade.


> The one use case where a DB backed queue will fail for sure is when the payload is large. For example, you queue a large JSON payload to be picked up by a worker and process it, then the DB writing overhead itself makes a background worker useless.

redis would suffer from the same issue. Possibly even more severely due to being memory constrained?

I'd probably just stuff the "large data" in s3 or something like that, and just include the reference/location of the data in the actual job itself, if it was big enough to cause problems.


Interesting, as a self-contained minimalistic setup.

Shouldn't one be using a storage system such as S3/garage with ephemeral settings and/or clean-up triggers after job-end ? I get the appeal of using one-system-for-everything but won't you need a storage system anyway for other parts of your system ?

Have you written up somewhere about your benchmarks and where the cutoffs are (payload size / throughput / latency) ?


FWIW, Sidekiq docs strongly suggest only passing around primary keys or identifiers for jobs.


Using Redis to store large queue payloads is usually a bad practice. Redis memory is finite.


this!! 100%.

pass around ID's


If Python's standard library and its backward incompatible changes are the only problem, then Ruby will be a great replacement. The language is terse and the standard library is beautiful and consistent. The language doesn't have any large backward incompatible change from version 1.9 which was released 15 years ago.


I'm wondering the same, but honestly I have a soft corner for the old way of doing things as well, and I think it stems from it.

The performance numbers seem to show how bad it is in real world.

For testing I converted the CGI script into a FastAPI script and benchmarked it on my MacBookPro M3. I'm getting super impressive performance numbers,

Read ``` Statistics Avg Stdev Max Reqs/sec 2019.54 1021.75 10578.27 Latency 123.45ms 173.88ms 1.95s HTTP codes: 1xx - 0, 2xx - 30488, 3xx - 0, 4xx - 0, 5xx - 0 others - 0 Throughput: 30.29MB/s ``` Write (shown in the graph of the OP) ``` Statistics Avg Stdev Max Reqs/sec 931.72 340.79 3654.80 Latency 267.53ms 443.02ms 2.02s HTTP codes: 1xx - 0, 2xx - 0, 3xx - 13441, 4xx - 0, 5xx - 215 others - 572 Errors: timeout - 572 Throughput: 270.54KB/s ```

At this point, the contention might be the single SQL database. Throwing a beefy server like in the original post would increase the read performance numbers pretty significantly, but wouldn't do much on the write path.

I'm also thinking that at this age, one needs to go out of their way to do something with CGI. All macro, micro web frameworks comes with a HTTP server and there are plenty of options. I wouldn't do this for anything apart from fun.

FastAPI-guestbook.py https://gist.github.com/rajaravivarma-r/afc81344873791cb52f3...


Not the author of parent comment, but from the description it looks like it is Jolla Tablet, which was crowdfunded and the delivery got delayed. After an year or so, some of the funders go the tablet. While, some of us got half a refund.


It is funny, how being a corporate-Rails programmer taught me this early in my career. Because building Ruby from source was the only way you could install latest Ruby versions (using ruby-build) and installing Ruby from source means its dependencies like openssl, zlib had to be installed from source too, to match the required versions.

And for a long time using Jemalloc was the only was to keep the memory usage constant with multi-threaded ruby programs like Puma and Sidekiq. This was either achieved through compiling Ruby with jemalloc or modifying the LD_LIBRARY_PATH.

Some developers also reported 5% or so reduction in response times with jemalloc, iirc.

The problem with this approach is though, when a package has a lot of dependencies like ImageMagick which relies on jpeg, png, ghost and a lot of other libraries, you have to take a trial and error approach until it succeeds. Fortunately fixing the dependency errors are the easiest, sometimes building Python from source would throw errors from headers which are impossible to understand. If you find a Stackoverflow solution then you are good, or you have to go down the rabbit hole coming either successful or empty handed based on your level of expertise.


I don't have a solution for the performance problem. But for the camelCase to snake_case conversion, I can see potential solutions.

1. If you are using axios or other fetch based library, then you can use an interceptor that converts the camelCase JavaScript objects to 'snake_case' for request and vice versa for response.

2. If you want to control that on the app side, then you can use a helper method in ApplicationController, say `json_params`, that returns the JSON object with snake_case keys. Similarly wrap the `render json: json_object` into a helper method like `render_camel_case_json_response` and use that in all the controllers. You can write a custom Rubocop to make this behaviour consistent.

3. Handle the case transformation in a Rack middleware. This way you don't have to enforce developers to use those helper methods.


I believe his point is that this transformation could be done maybe in C and therefore have better performance, it could be a flag to the JSON conversion.

I find the idea good, maybe it even already exists?


It could be done relatively efficiently in C indeed, but it would be yet another option, imposing et another conditional, and as I mention in the post (and will keep hammering in the followups) conditions is something you want to avoid for performance.

IMO that's the sort of conversion that would be better handled by the "presentation" layer (as in ActiveModel::Serializers and al).

In these gems you usually define something like:

    class UserSerializer < AMS::Serializer
      attributes :first_name, :email
    end
It wouldn't be hard for these libraries to apply a transformation on the attribute name at almost zero cost.


I have rarely seen a car becoming unusable after 15 years. I have seen taxis with more than 400,000 KM on the odo. Not that they never had issues, but they are pretty much usable if you replace parts.

The 15 year car rule is only applicable for New Delhi NCR Region. You still need to obtain fitness certificate for vehicles every 5 years after the 15th year, but they need not be dumped as long as it passes the evaluation.


I have this idea of creating a nano/micro bot of sorts that will replace hair root, and grow hairs using the nutrients (chemical) available in the blood stream or applied topically from time to time.

I know there are more important problems to solve than male pattern baldness, but somehow I think, in my limited understanding, replicating hair follicles should be easier than growing organs in labs.


I loved Symbian, because it was hackable. It had mShell and PyS60 that gave direct access to everything the system had. I remember backing up text messages and contacts using mShell.

I accessed a heart rate monitor via BT using PyS60. Both were pretty straight forward.

With Android, termux comes close.

I had a colleague who worked in Nokia, on a TCS contract. He explained how the team was devasted upon hearing the news that Meego would be abandoned and Windows OS would be used.

I was hoping for Meego/Maemo before 13 years, then Sailfish and then Ubuntu touch. I don't know if there is any hope of mobile Linux (not Android) anymore.

The problem is majorily a limiting hardware, than the software itself. We had a Sailfish mobile in India, but the hardware was very disappointing that evev if you could live with the limited software ecosystem, the hardware was not really useful.

I see the same trend with PinePhone and Librem.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: