Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Not directly related but while I have the eyes of some technical people reading about the lottery, I want to throw this out there.

I have 4+ years worth of daily scratch off lottery ticket results from a dozen or so different states.

Every day, the state updates their website with the numbers of tickets remaining at each ticket level. I've been scraping that and saving it.

If anyone would find this data interesting I'd be happy to share the SQLite database. I just ask that you share your code/queries and what you find.

- Are the grand prizes truly random? Or are they stratified?

- Do games end with an unusually high number of grand prizes unclaimed?

- Is there a buffer when a game is first released when no grand prize is possible?

You could scan some Working Papers to get ideas of things to check the integrity of: https://docs.zohopublic.com/file/pze38fbeed85562834d5696105b...

Those working papers have things like "guaranteed low-end prize structures" per pack of tickets.

Tips based on those working papers:

- Buy from a fresh pack until you get a winner then stop. Since there's a guaranteed number of winners per pack, each loser you scratch improves the odds for the rest of the pack.

- Don't buy from a pack that's already had a big winner. Most working papers stipulate no more than 1 large prize per pack.



Here's the data through ~October of last year. ~500MB.

https://docs.zohopublic.com/file/pze38c35faf87eb654907b51890...

I'm using it to power https://scratchoff-odds.com right now.


I see one game in Missouri has a score of 146. Is there anything stopping someone from buying all the remaining tickets (other than time and money obviously) and pocketing the $4 million difference?

Also, is it possible someone has grand prize winner but incorrectly throws it in the trash (because they overlooked the fact that it was a winner/didn’t scratch it all the way off)? Would the website pick up on that?


Nothing stopping someone from buying all the remaining tickets. But if you do the math, I think you'll find it's still not worth it. How fast can you scratch/verify tickets? If you take the lump sum, you get something like 60% of the grand prize. If you take the annuity, it pays out over 20-40 years. Taxes will take out another chunk.

But everyone's situation is different. If you already have losses that could be tax-deducted from the win, that would help you. If you could monetize the process by selling your story or gaining youtube fame, that would help. Some youtuber did buy $1,000,000 worth of tickets without any particular strategy and presumably made profit from the youtube side of his business. As expected, he got back ~70% so it only cost him ~$300k + production costs.


Thanks. Really cool site! I sent it to a friend of mine who regularly plays scratch offs. He’s in New York though.


What is with the fractional numbers in `game.num_tx_initial` for some of the rows? I am assuming this is number of tickets sold. Parsing error?

Edit: The site is pretty cool. I get strong vibes of the Winfall lottery story[0]

[0] https://highline.huffingtonpost.com/articles/en/lotto-winner...


Some states only publish claim numbers for prizes over a certain amount. For prizes below that amount, I estimate using the % claimed of all published prizes.

If 25% of the prizes greater than $30 have been claimed, then I assume 25% of the prizes lesser than $30 have been claimed. Everything in the low numbers has large enough data pools for it to average out accurately. It's not until you get to the $600+ prize level where things would be really inaccurate.

You'll also note there's usually a lag for prizes $600+.

When you look at aggregates across states, you might see something like 25% of prizes below $600 have been claimed but only 19% of prizes above $600 have been claimed. I figure that's because $600+ has to be claimed at lottery headquarters and go on taxes. So people might delay, try to hide the money from their spouse, wait for tax reasons, the headquarters has to manually process it rather than the automated machine at a retail outlet, whatever...


Actually that other explanation is for fractional tickets in other locations of the database, like prizes remaining.

Specifically in `num_tx_initial` it might be because they don't report the number of tickets printed. But if they print the odds of a win and numbers of winners available, then you can estimate how many non-winners there are and thus how many printed tickets there are.


Gotcha. Reasonable inferences from whatever data you can access.


If you do something cool with it, let me know at support@scratchoff-odds.com.


I am a university Prof and a statistician. I'd love to get these data into my courses if you are willing to share it. It's a great example that students can easily relate to.


To #2, as typical consumer walking in to buy a scratch-off, it's unlikely you will know the results of a in-use pack.


Employees could really game the system. On average there's 1 "big" prize (outside the GLEP prizes) every 4 packs. Any time you see a pack go from start to ~5 remaining without a big prize, buy every remaining ticket.

There's also guaranteed restrictions on the maximum number of losers in a row. So if you see ~6+ (depends on ticket) losers in a row, then buy the next few until you win. I've run simulations on those distributions and it's profitable. But it's a situation that only an employee could take advantage of. And it probably comes up rarely.


I'm a bit confused. How does the employee know that the previous 6 tickets were losers? It's not like all customers are scratching them off then and there in front of them?


> It's not like all customers are scratching them off then and there in front of them?

A good number of tickets (most tickets?) are purchased by habitual players who'll buy many tickets per sale, and many of those people will even scratch them off in the store:

> “Some customers come in up to three times a day to play, spending up to one hour to scratch-off tickets right in the store after spending $300 or more.”

https://www.cspdailynews.com/technologyservices/inside-marke...

If you don't see this happening often, you're not likely living in poor or low-income neighborhoods, not patronizing the local convenience stores, or at least not paying attention to the 1 or 2 individuals that you'll often see lingering near the counter.


Worked at a gas station for years when I was younger, this was a pretty common thing.


Didn't know that. I guess any patron could pull it off. Like watching a table and card counting.


I've heard of this off and on for years, but seeing as I don't know anyone who has actually done it I think practically it's not too profitable. If he has everything in a sqlite db though, hmm..



I knew a bartender who did exactly this with pull tabs.


Nope. Like throwing a coin, with scratch-off ticket packs actions don't have any effect on the following packs. They're mixed to prevent precisely attacks such as the one you describe.


I would be interested, if you are willing to share the data. Actually, I am thinking now: what if you introduce some anomaly in the data. Something like the man did in the article (draw from a different distribution), and a challenge would be to detect it.


A relative did this math the.. not so legal way... and the most he "won" on about 50k cards was £100.


Wow I'd be very interested in that


Are you addicted to scratchers?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: