Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Just to make this discussion less abstract: is a groupby operation monotonic?


I think it's logically monotonic in the sense that as you add input data to your query, the output simply accumulates (you never have to retract any prior conclusions).

Consistency between two accumulators is easy, they just sum their states.


you are right. but even in case where you have to retract prior conclusions, it is possible to to fold accumulators to maintain global consistency.

for example as you said: global_sum() = accum1.sum() + accum2.sum()

global_min():

  accum3 = [accum1.min(), accum2.min()]

  global_min = accum3.min()
any non-monotonic computation can be split into monotonic, where they process the bulk of the data, and folding the results to product the final result


Min is also logically monotonic in the same sense though, right? As a min accumulator I can throw away all past information once I've computed my result and never have consistency issues in light of new data.

A better example would be reachability analysis in a DAG - here you really do have monotonicity failure when merging results from multiple workers.


Yes. Based on the definition.


So I contradicted myself above. Aggregation is probably not monotonic -- the output of the aggregation is not a subset if the input is a subset.


it is possible to split aggregation into parts that use commutative operators.

for example global AVG can be expressed as SUM()/COUNT()

although AVG is not commutative, SUM() and COUNT() are commutative and can be run in scatter/gather mode and offloaded to individual nodes




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: