Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> If you could share, can you share what exactly led to the discovery that things are unreachable.

We have some heavy monitoring with InfluxDB and Grafana built in our application and one of the alerts is if the number of logins/minute drop under a threshold. That's how we noticed it first time. The main reason was some network issues affecting our mail server provider. We were holding on a support ticket with our provider while trying to find a solution to our customers.

After that we added as well we extended this monitoring to the mail queue. To be fair we used to monitor it before, but with the infra team, now we have it in our SRE dashboard, with Slack notifications and etc.

> Also, do you still continue that (Email login / OTP login) OR is it moved something else?

The OTP as second factor of authentication is something that we couldn't disable, but it's a requirement for one very specific application. We just looked for different partners with better SLA and built some monitoring around it.

The Email login is still there, but we didn't roll-out it to all our applications as we initially intended to. We are still studying what would be the best solution here. The company is heavy user of microsoft's 365 mail service, and although the overall experience is pretty good, we have 0 influence in their SLA if we get impacted by any issue on their side. I don't think that the solution is bad per-se, just you have to plan mail infrastructure as core part of your application.



Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: