1) i actually think that’s too high, i bet it’s more like 30%. My logic is that they have to have _some_ margin, but LLMs are too expensive to have typical software margins. Total speculation though.
2) It generally tracks pretty well unless the model is gaming the metric (training on the test set, overfit to the specific source of data, etc). The relative rankings will typically match in both.
3) alas, not with the mild winter North America’s having. They only stop below -5C or so. I am lucky though. The woodpecker stopped attacking my house and started attacking my neighbor’s. Even worse, it used to be a downy woodpecker,and it’s now been replaced by a pileated one (think: Woody).
I guess my question is can you please fix your braindead blacklisting?
Several times per year—I can practically guarantee it’ll happen sometime in December, and indeed had to deal with this just five days ago—I end up with a bunch of users whose email notifications stop working because Microsoft have started blocking the entire netrange where my server lives. I don’t have control over other Linode customers, guys! I even wrote extra code to stop sending mail to addresses that start bouncing specifically to avoid blacklisting, so after MS finally processes a blacklist mitigation request, someone also has to go in and re-enable those accounts.
SPF, DKIM, DMARC are all configured; I’ve sent from the same IP address for about a decade; I’ve not once received an email abuse report; mail volume is low (most days, volume does not reach the minimum threshold for SNDS to report data[0]). I’ve never had any other mail provider blacklist my server. SNDS always says everything is OK as I am S3150s. What is even the purpose of SNDS at this point when it lies about what is going on?
[0] P.S. The janky SNDS calendar widget resets the month to the current month every time you click on a date, even if the date being viewed is in a previous month. I don’t have any hope that anyone will ever touch SNDS code again since it was clearly designed in the early 2000s and the copyright on the site is now ten years old, but this is a pretty silly bug.
My guess is that the effectiveness issue isn’t actually due to SNDS and is probably related to sender reputation having famously high false positive rates. I read a paper a while back which introduced a different algorithm with tighter bounds on regret, I didn’t really understand it tbh, but I can implement it behind a flight and run a data study to see if it works better. The problem is that most graph based stuff doesn’t scale super well because of something-something complexity classes. I think the lady who architected it 5 years ago didn’t do a great job and there’s a bunch of arbitrary config stuff which was put as a placeholder and then became enshrined in stone… but the guy maintaining it rn is really smart so I’ll have him review my half-assed PR when he’s back next week (and idk how long it’ll take to finish the other half of it, shit never ships around here).
About the calendar widget thing… man am I glad I our team doesn’t own that. No one ever touches legacy stuff cause they’re afraid it’ll break or no one will update but the trick is to file it as an accessibility bug since that gets someone to actually prioritize it since it shows up in reports that the execs read. But dude good luck getting that off the backlog, the one engineer we have who is good at UX stuff (i.e, can code with both quality and velocity instead of just one) has her hands full as is.
Whatever the problem is, all I know is that last year Linode said they tried and failed to get Microsoft to actually fix the problem[0], apparently despite assurances and multiple requests for a root cause analysis. Everyone else seems to have figured out how to not be overrun by spam and also not block entire netranges, so I’d say it is well past time for Microsoft to figure out how to do that too.
Asking infrastructure providers to police email content is a very invasive thing to want. I don't think I agree with that.
Realistically, what can they do here? Make servers unaffordable to discourage abuse? Give most servers "Internet*" access where some ports are missing?
But that's how the world works right now: every provider has acceptable use policy, and not just for emails. Not necessarily because they care, but because they are beholden to an AUP, from their upstream or peers. Which makes it viral: if they won't hold AUP, they'd get cut off, and there's very little use in an internet service provider without connectivity.
> Give most servers "Internet*" access where some ports are missing?
Disallow SMTP traffic unless an account has a certain reputation or verified identity related to it?
I mean, they don't have to do that, and I would agree the government shouldn't force it to happen. But if someone is constantly causing you problems you shouldn't be required to deal with their shit. If you don't want to behave, expect consequences from everyone else in society.
If every time my friends invited me over I brought over another random person that smears feces all over the walls and pees in the corner I probably won't get invited over very often. Linode (and other cheap VPS hosts) are that person constantly enabling abusive people and subjecting them to others.
Personally inviting someone to your party? Surely the analogy to linode is something like an apartment building owner. You wouldn't ban your established friend just because you keep having problems with nearby tenants.
Oh man I think around 2 years ago there was a 3x spike in Europe outbound spam and the fraud team had to disable like 200k+ tenants from some shady cloud VPS. We didn’t have a long term plan for the abuse back then besides playing whack-a-mole, and if we have one now, I haven’t heard of it.
Dumb question, but wtf is the solution even? I’m confused about what you expect us to do. I haven’t thought about the problem much so I might be missing some obvious Pareto improvement.
You didn’t make any mistakes encoding, I just screwed up my decoding, it happens :D
Thanks for clarifying, ok, hmm… that seems hard to do if you can check the IP block by using a subnet mask but the specific IP isn’t resolved until later in mailflow. It might not actually work like that in… ProtocolFilterHub? I always get this mixed up, wait… I think this might be something that we are already working on. And have been working on for a while, wow. Looks hairy. It’s stuck since the guy working on it transferred to another team, and no one picked it up, but some PM noticed before I did and put it up for vote in semester planning. Always creepy to see engineers get referred to as “resources”.
Linode respond swiftly to abuse reports[0], block outgoing SMTP by default, and prevent so many people from even registering for services that it is the #1 question people ask on their IRC channel. What more should they do? What is “enough”?
90% of the spam that I receive from a DMARC-validated sender comes from Google; should every Gmail user be punished because Google aren’t “doing enough”?
[0] Linode twice threatened to shut off services within 24 hours due to some vigilante scanning the internet with a broken virus scanner and automatically sending reports: https://virtuallyfun.com/2014/04/23/dumbass-of-the-year-awar... (n.b. this is not my site)
> Linode respond swiftly to abuse reports[0], block outgoing SMTP by default
One instance of them supposedly responding quickly to an email abuse report isn't showing they're consistently responsive to abuse reports. I don't know if they are or are not. I don't even know that this blog post even refers to Linode, they're not mentioned once.
And its not true they always block outgoing SMTP by default. Loads of old accounts do not have SMTP blocked. New accounts since 2019 sometimes have it blocked, but given the last few times I've made an account and didn't have any blocks it doesn't seem that often. Maybe I just got lucky though.
And don't get me wrong here, I'm not intentionally singling out Linode here. There's loads of cheap VPS providers that enable this kind of abuse. They're not necessarily better or worse in this regard to many others.
> 90% of the spam that I receive from a DMARC-validated sender comes from Google; should every Gmail user be punished because Google aren’t “doing enough”?
Yes. Just like those telephone companies originating most of the spam phone calls should get disconnected. If they're going to enable abusers, they should get cut off.
Here is the issue that most ESPs are facing.. Every 5-6 months something is being enabled or not from Outlook's side which affects either IPs or the domain name of the sender and messages land in Junk folder or in quarantine zone.
Now, I do know that the IPs might be affected by complaints or spamtraps, or maybe the client sent something suspicious, but trust me most ESPs don't allow those messages to be sent. Also, when the IPs appear GREEN in SNDS, and SPF/ DKIM and DMARC are a part of DNS authentication and headers appear like this: CAT:HSPM;SFS:(13230031)(4636009)(451199024)(7596003)(356005)(7636003)(86362001)(450100002)(8676002)(1096003)(14286002)(34206002)(5660300002)(336012)(26005)(42186006)(9686003)(33656002)(83380400001)(7846003)(33964004)(564344004);DIR:INB;
X-Microsoft-Antispam: BCL:0;
X-Microsoft-Antispam-Message-Info:
You are expecting that quarantine zone is the last place to find a legit message.
For obvious reasons I won't share more details, but I bet that from time to time someone is messing with spam filters that can easily result in false positive and angry senders.
In any case, especially when we raised tickets to Outlook, at least please inform your team not to reply like robots. If they will share with us the exact reason why a message landed in junk folder that would really help us. If it is the content, we will change it. If it is related with the sender, we will block the sender. If those are complaints, we will block senders and check their subscription sources, but at least we need something especially when SNDS shows Green IP, 0 spamtraps, 0 complaints.
Thank you for reading this.
Yo I’m not even gonna apologize about this, it would be so wack if we didn’t do that:
a) if a mail sever looks like it’s gonna send spam, then you gotta block it. I personally have philosophical hang ups about this, like it’d be wrong to sentence someone to prison for crimes they didn’t commit just because a system added up some points and made a prediction with high confidence, but in real life, you absolutely need to be proactive.
b) there is literally no way to do this that wont immediately get abused. Trust me we’ve tried. We make it nearly impossible to get unlocked on purpose because if it was easy, then it’d be like 1 innocent person using it and 99 attackers due to the adversarial incentive structures.
Now ofc there’s more nuance here, we really do want to get it wrong less often, and you do pay us so it’s not fair to blame it all on the bad guys, so I’m grateful for the feedback but I think you should give me even more detailed feedback since there’s not much I can do except give a vague high level explanation unless you help me by being specific.
Do you somehow track the amount of false positives these predictions generate? How do you tune the prediction to not generate too many false positives?
> but in real life, you absolutely need to be proactive
Why is Microsoft the only provider who needs to do such proactive blocking? Why don't you need to do that for email addresses associated with Office 365?
> I think you should give me even more detailed feedback since there’s not much I can do except give a vague high level explanation unless you help me by being specific.
My story is very much the same as for everybody else having the same trouble, including the person whose blog post sparked this discussion: A root server for personal use located in the data center of a mid-sized hoster, running a mail server as part of its duties. In my case the whole mail setup runs on IP-addresses separate from everything else. Mail volume to Microsoft would probably be on average 1-2 emails per month. No issues whatsoever getting emails delivered to other mail providers, only to Microsoft. This whole setup is in place since several years.
> Why is Microsoft the only provider who needs to do such proactive blocking? Why don't you need to do that for email addresses associated with Office 365?
The way you implement this, low-volume senders (nerdy individuals or small projects that can't use SES/Mailgun/… for GDPR reasons), even if they manage to get off the list once (olcsupport.office.com, escalate), never get the chance to build up reputation in the long term (I'd have to contact olcsupport again in a few months and that's just not sustainable for a small-time postmaster).
I get it, you're afraid that some VPS from a cheap cloud provider suddenly floods the inboxes of thousands of Outlook.com customers. I realize that a fresh IP that sends dozens of emails out of the blue has to be blacklisted.
But why don't you allow my VPS to send, say, 16 emails a day to Outlook.com inboxes? And if ⅛ of the recipients report junk, I get blacklisted. But if all 16 recipients are happy, my IP can now send 16+16=32 emails/day for the next few months (as long as the non-ISP hostname matches; otherwise, it might be a new VPS customer), and so on.
This way, your customers are happy (I don't think spammers rent/hack a fresh VPS in order to send 16 emails, and I don't think they are very good at building up IP reputation), and I'm happy (my personal VPS can send a few emails to my Outlook.com contacts every few weeks/months, and my project VPS can gradually build up and maintain the reputation it needs).
I'm obviously being naive about that approach, but I don't remember having trouble reaching Gmail inboxes or those of local providers, and at least for Gmail, I know that they have pretty effective spam filtering too, so I reckon that they use some approach like the one I described.
For a side project, I have just given up contacting olcsupport and instruct Postfix to send through our @outlook.com address instead, but that is a wobbly workaround at best. For personal email, I now relay through SMTP2GO because GDPR doesn't matter that much, but it makes me sad to have that gaping hole (called Outlook.com) in my decentralized email fantasy, after having spent so much time researching, configuring, diagnosing.
> I'm obviously being naive about that approach, but I don't remember having trouble reaching Gmail inboxes or those of local providers, and at least for Gmail
There are plenty of those who do have such issues with Gmail.
The simple reason behind all this is that spammers also have near endless patience. If it takes sending 15 emails per day per IP, they'll do it. If it's a criteria you can figure out as a legitimate user, the spammer can as well. They'll "subtract one" and bypass it.
So the end result is that there's intentional fog over the methods. Just things you can try and get right and maybe that's sufficient. Eventually the good side tends to prevail, with some effort. Other than that it's one of the hardest problems out there with insane weight on both sides.
Similar question as my sibling comments. I have rented a server with a static IP address for over ten years now. Nobody else has used this IP during this time. Yet, every few months I have to beg Microsoft to unblock the IP. In the beginning I could do this on my own, but something changed a few years ago and now I have to beg my ISP (netcup) instead to contact Microsoft on behalf of me to temporarily whitelist the domain. Then wait another 2-3 months and do the same dance again.
Why? Why can Microsoft not learn that an IP has been healthy and spam-free for 10+ years and only bother me when there is actual spam is being sent?
Aww man, not joking this actually breaks my heart, something about the way you wrote it makes it sink in how much we’ve failed you. I’m angry at how much of your time we’ve wasted and this experience is completely unacceptable.
…I think this is just a systemic issue beyond my ability to comprehend, let alone solve, and— I hope I’m wrong about this but honestly when I look ahead it seems the future is only going to get worse for people like you. Which I wish I could phrase in a way that was more kind and respectful, it’s not what anyone wants, these unthinking scars inflicted on email as a medium.
But what I can do is make sure that it’s not worse for you, specifically. If I was perfect I’d attack this rot at its core, but I’m not, so I’ll just solve the problem in front of me even though I know it doesn’t scale and hope God forgives me. Get in touch with me directly and I’ll figure out how to make sure you don’t have to jump through those hurdles again.
Exact same situation as the person you're replying to, except ~5y instead of 10 and I gave up trying getting unblocked after at one point even the mandated reply to an automatic follow-up e-mail a few steps down the line of the appeal-chain got blocked. That behavior was consistent over multiple weeks of retrying. It was truly kafkaesque but I resorted to just not being able to email Outlook/MS recipients. Getting an outreach from someone who wants to get in touch and not being able to reply is the most frustrating. So many people probably believe I ghost them.
Outgoing email volume is a handful a week, zero automation ever, and I must have spent dozens if not in the low hundreds of hours over the years on e-mail deliverability to Microsoft alone until finally giving up. Not comparable to anywhere/anyone else.
Just to say, behind every single false-positive is a story like mine and TonyTrapp. Missing out on a group tour with the local club. An old lost friend or family member not being able to get back in touch. Missed recruitment opportunities. A lawyer not receiving a time-sensitive follow-up.
But why do you consider this good practice? It's (unnecessarily?) frustrating for senders and poses a legal risk for recipients (the sender has the logs to prove that they sent the invoice, while the recipient doesn't have any record).
Again, not the person you replied to. But some feedback mechanisms take time (so action has to be taken after a 2xx reply) and some indicators are just very very very accurate that leaving them in even just Spam is a way bigger risk. Users have a terrible tendency to dig out malware from Spam folders.
This was not the cause in my case (no attachments, no URLs, just plain text, as far as I can remember). I know how to send email (ask mail-tester.com).
Regardless, there are always better options than silently discarding the whole email: delete attachments, erase everything that looks like a URL, even erase the whole message body, but please tell the recipient that you accepted an email and from whom.
Why doesn't whitelisting an address ensure one receives messages from it, the address has never sent spam, sends at most a couple of emails a day. But I couldn't receive emails from it, there was no notification or information despite the address being on my whitelist?
Huh? This shouldn’t be possible in principle? Don’t quote me on that though, I wish I’d paid more attention to my notes but they’re a mess and haven’t kept up with newer changes, if they were accurate at all in the first place. I’d submit an escalation so support can look into it.
Eg. Known bad domains, known bad IP addresses, incorrectly setup DKIM / SPF, no reverse DNS, non-matching reverse DNS, and that's before even looking at content to determine whether spam.
For privacy and compliance reasons (read: “oh boy wouldn’t wanna get sued, eh?” reasons) we actually don’t snoop into the message body much. Hooray, good job on not doing the maximally big brother thing for once, MS!
My hot take is that this prolly won’t last because every org descends to doing a creepy level of data collection eventually so I have a textbook on privacy preserving ML downloaded for when we join the “surveillance but we found a way to make it technically legal” squad. We haven’t done that yet though.
I was trying to ask generically because Microsoft deals with a universe-sized quantity of email traffic in comparison to my self-hosted barely used domains.
By tiers (which may be the wrong word, maybe just 'layers'), only relating to my setup, I mean things like:
- Tier 1: Spamhaus DROP and eDROP lists are outright blocked
- Tier 2: IP addresses that have illegitimately connected to my mail server ports are outright blocked (port scans, invalid login attempts, etc. - I manually check some of these against abuseipdb.com to determine their validity)
- Tier 3: IP addresses that have scanned non-open ports on my systems are outright blocked from connecting to my mail server ports
Just running these rules for a couple of months has dropped unwanted connections to my mail server ports a heavy percentage. One theory being that if you can block known-bad and highly-likely-bad connections, then actual spam detection (through email content review) is minimised to a certain degree.
I actually want to implement additional anti-spam IP address block lists and just haven't gotten around to it yet, but the above does a good enough job for my essentially unknown domains (as I said, a universe of difference to what Microsoft has to deal with)
- Tier 4: Black-box spam detection built-in to the all-in-one mail server solution I use (I don't know how it works, I don't know how to edit the 'rules' or even if I can).
'Tiers' I would expect Microsoft to have would be:
- Their own lists of known-bad IP addresses / ranges / ASNs
- Reverse DNS lookup validation
- DKIM checks
- SPF checks
- More protocol level 'things' beyond the understanding of a simple network admin such as myself.
- Weighting the results of all of the above to determine some kind of 'spam likelihood' score.
All of this is before reviewing the content of the actual message.
What's the best way to quickly get MS to trust a server/domain?
Does MS ignore IP reputation in cases where the domain has a good reputation?
How would you go about getting a new domain and an IP address from a public cloud provider working consistently?
I've had issues with outlook when it comes to new domains and IPs, but after some time it works. I do however usually have more email than a personal server so what's the best way - if such a thing exists - for a personal server that has much lower volume of mail to be trusted?
Hmm, oh wow, occasionally I’m reminded that if I flipped sides to run phishing campaigns I’d be totally unstoppable.
There isn’t a quick way, by design. You need to wait a minimum period and meet some predicates, and the organized scammers already know what the period is via empirical testing but I’m not comfortable disclosing details of those predicates for disorganized scammers to use. More so because I’d definitely get into trouble for it than due to any belief in security via obscurity. Cushy job makes you risk averse.
Since I can’t share any of the tricks, some general advice— the main thing that matters is a long track record of good behavior. You can end up in a vicious cycle where you fight the system when it punishes you and then it doubles down on the beatings— this is bizarre and kafkaesque and happens all the time. What you want is for there to be two-way communication, if it’s unbalanced with traffic being broadcast but no one engaging with it, that’s going to be cracked down on sooner than if recipients reply.
I don’t. I have slept in the daytime ever since covid and actually got a move to the east coast approved as a health accommodation after I started routinely missing important afternoon meetings due to my incurable insomnia (mornings are easy when you stay up all night). I still struggle with it, especially since it’s not a consistent offset to my circadian rhythm. There’s data I’ve collected but it’s hard to fit a simple function to it— it’s not like I’m on a 26 hour schedule either. This isn’t due to trauma or addiction, my brain is just an outlier in many dimensions and this is one of them.
My penis enlargement pill newsletter isn't showing up in my customers' inboxes. I could have been a penis-enlargement millionare if it wasn't for your stupid spam filter. What to do?
the sheer polish of the design of the top half makes me think it is more complete than it is though. Maggie puts a "budding" status right at the top, but i'm sure 99% of readers completely missed that and have no idea what it means in her taxonomy.
i have my drafts public too but i separate them for this reason.
I actually found the contrast between the lower part of the article labelled 'draft' and the upper finished part instructive. I could sort of imagine how the unfinished paragraphs might be completed finally.
PS: 1977 Tools for Thought was by C.H. Waddington, a UK biologist with an interest in complex systems. Club of Rome era about systems thinking and all.