Understanding Waiting Times Between Events in a Poisson Process

chejazi · on June 8, 2015

A similar model applies to analyzing Bitcoin blocks. The "event" in this case is a block that is at its maximum capacity: https://www.reddit.com/r/Bitcoin/comments/38zww2/tom_harding...

kylebrown · on June 9, 2015

The Poisson process equation is actually in Satoshi's whitepaper. The 6-block confirmation rule-of-thumb comes from it.

mrcactu5 · on June 8, 2015

I am nervous he should first very from his server logs these visits are indeed described by a Poisson process (which is very likely the case).

nicolewhite · on June 8, 2015

It's not from the server logs that I would verify this is a Poisson process; it's from the intuitive understanding that these events occur independently from each other and at a constant, known rate. This process also possesses the necessary properties[1] of a Poisson process.

[1] http://en.wikipedia.org/wiki/Poisson_process#Definition

mturmon · on June 8, 2015

Of course, for many web sites, the events are not independent.

If I visit one page, I'm likely to visit several more. If I see an interesting page, I'm likely to forward it to friends. If I happen to have 1M twitter followers, my visit could spark a cascade of visits.

Unless your models of site visits are very good (good enough to verify independence), you would need to examine logs to determine if effects like those above kick you completely out of a domain where independence is a good approximation.

Note, time-dependence (a non-constant but deterministic rate λ(t)), is usually easy to cope with. Lack of independence is much harder, because then you are left with Cox processes (http://en.wikipedia.org/wiki/Cox_process) and worse.

nicolewhite · on June 8, 2015

Those are good points. The event in question is not a simple user visit, however. Let's say it's a particular user action that's a bit more involved, e.g. filling out a form. The assumption here is that users fill out this form independently from one another.

mturmon · on June 8, 2015

Ah, I just noticed you are the OP.

Sure, you have to start somewhere with any analysis, and Poisson is the place to start. Anyone (like me) can then question independence assumptions from the sidelines.

I was moved to speak up originally because a commenter referred to inspecting the logs to verify the Poisson assumption, and in my experience, looking at the data is always an excellent principle, and generally preferable to just stopping at an intuitive understanding of the arrivals process.

hassy · on June 8, 2015

Correct me if I'm wrong, but surely in the scenario that you are Ashton Kutcher and tweet a link to your followers, the average inter-arrival time would decrease but overall it would still be a Poisson process?

mturmon · on June 8, 2015

It would not be Poisson, because the arrivals are not statistically independent.

Let's take a simpler case, where I have one follower who has a high probability of clicking links from my tweets, and there is a webpage that only I know about. I visit that page, tweet it, and an hour later, with high probability, my follower visits it. The event

  { I visit page }

and the event

  { follower visits page }

are not independent, and neither is the arrival process associated with it.

In the extreme case, for large times, there are either zero visits to the link, or two, but never just one (obviously non-Poisson).

The Kutcher example is just this, magnified.

Independence is a very strong assumption, but it is often casually stated. Independence is what allows you to decompose a very complex process (all those arrivals) into their individual atomic events (a single arrival).

Without invoking independence, you have a very complex task in relating how each of the N possible sources of events might relate to each other. It could be arbitrarily complex (e.g., some followers might only click a link if two people they're following like it, unless Ashton likes it, in which case they will never visit it, but if Colbert likes it, that will override Ashton).

hassy · on June 10, 2015

Thanks for taking the time to explain!

I understand that each { follower visits page } is dependent on the { Ashton Kutcher tweets } event, but I still don't understand why events are not independent of each other, and why the N visits to the webpage in time period T from the tweet would not likely be described by a Poisson process. Not expecting an answer btw, just thinking out loud here. :)

vanattab · on June 8, 2015

Why do you assume the rate is constant and known rate. I would assume that there are peak and min times of the day when the website and datablees would be used, no?

nicolewhite · on June 8, 2015

I was addressing the "verifying it's a Poisson process" part of their comment.

Verifying that the process was homogenous involved analyzing the historical data. I didn't include it in the writeup, but that was done with a likelihood ratio test.