More

throwaway20371 · on Nov 2, 2021

My teammates create e-mail filters to send useless daily e-mail reports to the trash. I try to find out who controls the e-mail process, or the e-mail address, or something, so I can fix it... but I spend lots of time and it ends in vain. I couldn't find who "owns" the process, I couldn't find who controls the mailing list, and I couldn't get anyone to give me permission to change it even if I knew how.

Clearly the problem isn't just having the skill or permission to change something, it's also the friction involved in figuring out how the hell to do it. How do you lower friction? Documenting things, making it easy to find things, making it easy to get access to things. If you can come up with an internal system that combines all of that, you have a one-stop shop for fixing high-friction problems.

I think Wikis are highly underrated. They seem to encapsulate all those things. Anyone can edit (or revert edits), anyone can access it, anyone can find it (eventually). Somehow we need to tie all the rest of an organization into a Wiki.

throwaway20371 · on Nov 2, 2021

That's basically what the do-nothing script is. The difference is that before you write any automation, you document all the steps in the script. Right there - when you've got it all written down, and no automation work has been done yet - that in itself is a very valuable piece of work. Now you can point anyone in the company to that script, and they can all accomplish the task without having to figure it out for themselves. You can now scale that process N times (N = the number of people in your company). Just writing down the steps has become a force-multiplier of repeatable work. Then as you begin automating each step, people automatically receive the benefit of that automation. Because the documented steps and the automation are in the exact same place, both will always be up-to-date.

throwaway20371 · on Nov 2, 2021

In construction, if the foreman / lead whoever is always angry, people get fearful of speaking up about something, and then more mistakes get made because nobody wanted to point out the glaring flaw.

Soldiers also prefer trustworthiness over skill competency. Of course you want your brother-in-arms to do their job well, but it's more important that you can trust them with your life.

Technical skills are needed to work with a machine. People skills are needed to work with people.

throwaway20371 · on Nov 2, 2021

You could call it an "E-I Script" for Efficiency Interest Script. Over time your costs are gradually lowered as each step is automated - like accruing interest in a savings account.

genewitch · on Nov 3, 2021

ls -l

Let's see, e i... e i... Oh! There it is!

throwaway20371 · on Nov 2, 2021

This is the way. I wish this were taught in computer science class, development bootcamps, operations team onboarding, anywhere there is a procedure that is even slightly complicated to automate. It is the absolute best solution there is.

* Documentation of the entire procedure is contained in one place. No need to go sifting through 20 different sources of documentation. This lowers the human emotional barrier to "just get it done", as people will always avoid things they aren't comfortable/familiar with, or don't have all the steps to. This central point of documentation also enables rapidly improving the process by letting people see all the steps in one place, which makes it easier to fix/collapse/remove steps.

* Automation in small pieces over time avoids the trap of "a project" where one or more engineers have to be dedicated to this one task for a long period of time. Most things shouldn't be automated unless there is demonstrably greater value in the cost of automating them than the cost of not doing so. Automating only the most valuable/costly pieces first gives immediate gains without sinking too much into the entire thing.

* One unified "method" to encapsulate any kind of process means your organization can ramp up on processes easier, reducing overall organizational cost.

* In the absence of any other similar process, you are guaranteed to save time and money.

I would say that the only potential downside is if someone decides to "engineer" this method, making it more and more and more complicated, until it loses its value. KISS is a requirement for it to be sustainable.

csdvrx · on Nov 2, 2021

> only potential downside is if someone decides to "engineer" this method

It can be engineered, if you follow a gradual process.

On servers, I keep a log of what was deployed in a root directory following the sequential number _ goal format (ex: 00_partitions ... 90_web_server)

It is not fancy, most of the logs are not even scripts: many are just ASCII text files, that will only be used as a checklist if the same "goal" has to be achieved again. For example, 00_partition may be "gdisk /dev/nvme0n1" followed by a copy-pasted list of the partitions and some quick description about why it was done that way.

But that's on the first iteration only: the next iteration turns that into "do-nothing" script, the next iteration into a better script with basic checks (supporting both /dev/nvme0n1 and /dev/sda), then exception handling (if partitions already exist, etc), and so on: this gradual complexification process avoids the "premature optimization" of creating infrastructure-as-code for what you rarely need, while optimizing and fine tuning the parts you most often need.

Someone will certainly mention Terraform, or Ansible, or something else - yes, they exist and they are nice, but if you are doing everything there, you are over-engineering and wasting time: not everything needs your equal attention!

If you only install a webserver once in a blue moon, make a .txt checklist of the steps you followed.

But if you leave and breathe nginx options and certificate deployment, fully automatize all that, including the obscure details of what may fail if you use let's encrypt with some specific DNS configuration!

And if you don't know yet which is which, start small (a .txt checklist will cost you a few minutes) and the next time you find yourself doing the same thing, do it better using the previous artifact (the .txt file) to create a better one (a script, then a better script etc)

chousuke · on Nov 2, 2021

I would argue that all deployments (no matter how small) should have configuration management.

In the simplest case, a deployment consists of: 1. Install packages 2. Install configuration files, possibly from templates. 3. Configure services to start on boot.

This kind of automation is trivial to do with almost any tool, but there's no reason not to use something like Ansible that's designed for infrastructure automation, because you get encrypted secrets, templating and idempotency with zero effort and the result can be stored in a git repository somewhere.

The Ansible playbook is as fast to write as installing and configuring things manually, and after some practice, it's faster and the resulting system is of higher quality.

Even if you end up using the automation you write only once, it still has value because it doubles as a formalized description of what you did, and can be stored together with additional documentation. Over time you will also accumulate a library of bits and pieces that you can copy over to new setups, further improving your speed and quality.

csdvrx · on Nov 2, 2021

> I would argue that all deployments (no matter how small) should have configuration management.

I would argue that in most cases, they don't need anything but some documentation explaining what was installed, and why.

I will take a word file with screenshots over a broken script in an obscure language every single time.

> but there's no reason not to use something like Ansible that's designed for infrastructure automation,

There's a big one: my time isn't free.

If someone is willing to waste money on that, sure, I'll be happy to bill them for their extravagant tastes (but only after having done my best to explain them it's a waste of money)

And still, I will think about the next person that may have to maintain or tweak whay I wrote, so I will also leave a document full of screenshots in case they don't know ansible or whatever new fashionable tool that the client may have specifically requested.

> it's faster and the resulting system is of higher quality.

Not everything needs to be of high quality.

Forgive me if I'm assuming your gender, but I see a lot of black-and-white thinking among male sysadmins/devops: it's good or it's bad, it's high quality or it's not.

I prefer to have a "sufficient" degree of quality: if a checklist is enough, I will not waste time writing a script. If a shell script is enough, I will not waste time writing proper code - and so on.

> Over time you will also accumulate a library of bits and pieces that you can copy over to new setups, further improving your speed and quality

Except you assume a continuous progress, without any change of scope or tools, and with the tools themselves never evolving. It doesn't work like that: over time, you will accumulate a bunch of useless code for old versions.

Even small inconsequential changes (like unbound in debian 11 requesting spaces before some options, which wasn't the case before) will take some time and effort. Why waste your energy one one shots?

The do-nothing approach argues that you should avoid premature optimization, which strikes me a good approach in software in general.

brulard · on Nov 3, 2021

Such a needless dismissive use of "male" word here. Devops guys I work with are very reasonable and I have not seen a female one in my whole career.

loxias · on Nov 3, 2021

> Forgive me if I'm assuming your gender, but I see a lot of black-and-white thinking among male sysadmins/devops: it's good or it's bad, it's high quality or it's not.

Male lifetime linux nerd here, who started as a sysadmin, checking in just to say that I agree with everything "policy related" in your comments on this article. Knowing where to tune the knob between "high quality"/"good architecture" vs "can i just get this done now and move on?" is difficult, at least I don't know how it could be taught other than experientially.

IME, the predilection to see things as black-or-white is more correlated with age, than gender.

Anyway, "not all men". :P

csdvrx · on Nov 3, 2021

> checking in just to say that I agree with everything "policy related" in your comments on this article

Your nick seemed familiar - now I remember, I read your great comment in "I just want to serve 5TB" earlier today!

I also agree with everything you wrote about simplicity in software development: I'll take almost every time some dirty php scripts running baremetal over Docker + Golang + Kubernetes + Terraform + Gitlab + Saltstack + Prometheus + the new fashionable tool because with so many parts now begging for attention, nothing will get done quickly - if we're lucky and something gets done.

Knowing where to tune the knob is indeed very difficult, and I'm afraid most people now are just doing a cargo cult of whatever google does, except they are not google, and they don't understand the tradeoffs or the possible alternatives.

But at the scale of most companies, it's a folly to sacrifice flexibility and simplicity to some unachievable desire for software perfection!

It's also a very costly hubris: I have been asked way too often to improve the performance after having thown very expansive hardware at the problem, that still performs miserably due to missing the big picture.

The solution was almost always removing the useless parts, or when trying to disentangle the architecture astronaut fancy mess would have been too costly, start from scratch with a saner design: most recently, I replaced a few hundreds java files (and test and stuff) by about 10 lines of bash, and 20 lines of awk.

My work is not fancy, but it works, unlike the previous solution that was going to be ready the next month, every month, for almost a year...

To all those who want to do things like google, maybe apply there instead of over engineering/polishing your CV with fancy keywords at your employer or client expense?

> IME, the predilection to see things as black-or-white is more correlated with age, than gender.

I had noticed this weird pattern, and it was my best explanation even if I didn't like it much, because it's sexist.

But your version seems more plausible (Occam's Razor!), so thanks a lot for taking the time to post!

chousuke · on Nov 2, 2021

I do not think automation is "premature optimization", nor do I that think everything needs to be high-quality; I did not say that. I do think, however, that everything you do should be of acceptable quality.

And for me, having configuration management is the minimum level of acceptable quality. It's simply not possible to have acceptable quality of a system without some form of configuration management. I can't recall a single instance where I (or anyone else involved) ever said "wow, this unmanaged mess sure works well" :P

In some cases, the management can be as simple as a comment in some script explaining some part of the process was done manually, or simply a periodic snapshot backup of the server that can be restored when the configuration is broken, but the point is that a process must exist and it must be consciously chosen.

Free-form documentation is not an alternative to configuration management either; if you can document your configuration in a wiki, you might as well put it in a git repository in the form of a script or a template.

When done properly, It's the exact same amount of effort, except when you use automation tools, the documentation is in a format that's not ad-hoc and can actually help do the things it documents instead of requiring a human to interpret them (possibly introducing mistakes). "Setting up" Ansible requires literally nothing but a text file containing your server's hostname, and SSH access, which you already will have.

Also, I don't know where you got the idea that I would somehow assume unchanging scope? I am the first person to throw away useless code and tools; I consider code my worst enemy and it's practically my favourite activity to delete as much of it as I can. If some piece of automation is no longer fit for purpose, it gets rewritten if necessary. Throwing away code is no big deal, because the tools I use allow me to get things done efficiently enough that I can spend time refactoring and making sure the computer properly handles the tedious parts of whatever I'm working on.

Your unbound example is something that is trivially solved with configuration management. After an upgrade, you notice your configuration does not work, navigate into your git repository, update the template, and then deploy your changes to any server that happened to be running unbound using that same template (because you might have redundancy, if you're running DNS). If you make a mistake, you revert and try again. There is no manual ad-hoc process that comes even close to allowing you to manage change like this, but it is trivially enabled by existing, well-understood automation tools.

csdvrx · on Nov 3, 2021

Your definition of "acceptable quality" is my definition of "overengineering".

It does not take the same amount of effort, if only cause you mention how for unbound, you have to update the template.

For one shots, this is overkill.

randomswede · on Nov 3, 2021

For "truly one-shot", you're right. But a "truly one-shot" is not a production machine, it is a test bed, informing what the eventual production machine should look like.

Because even if you will only ever have a single production machine, it will have something go horribly wrong with it and need recreating from fresh hardware (or from a fresh VM or whatever).

I guess, if you're cloud-based, you could turn your finely tuned test box into a template, then you have something that is (effectively) scripted.

chousuke · on Nov 4, 2021

Leaving aside all the other benefits and even if you never need to rebuild your system, having some sort of IaaC automation in place allows for extremely powerful change management. When your system is defined as code[0], change over time can be reviewed with a "git log -p", which definitely beats searching through ticket comments or ad-hoc documentation and attempting to reconstruct the history of change.

It's a no-brainer nowadays that software should be developed with version control. I don't see why infrastructure should be treated differently.

[0] Ansible playbooks are code, no matter what some people may think. It's a declarative'ish programming language with a silly syntax.

chousuke · on Nov 3, 2021

There's no such thing as an oneshot if you're creating a system that someone will actually use and depend on.

All systems have a lifecycle, and even on a "trivial" system you have backups, access, monitoring, logging and security maintenance to worry about even before you consider how installing any useful software affects those things.

There are exceptions to any rule, of course, and I did in fact create a system where the configuration management is a snapshot backup just two weeks ago; but that system has no data on it, its lifecycle is expected to last for less than a year, and if/when it breaks, a backup restore can be performed without any additional considerations. It was also an emergency installation into a network that's not easily accessible with SSH, which is why I did not just use Ansible from the start.

I thought it would be a oneshot, but I did end up having to create a second instance of the system a few days later, fortunately with less emergency :P

Still even ignoring that, I fail to see what could possibly be overkill about literally 3 small files in a git repository. You call "overengineering" what is to me "5 minutes of effort with extremely relevant upsides". That's literally how much time it would take me to create a playbook for unbound if I already know what the configuration needs to look like; probably less, but most of the time will be lost to context-switching overhead.

My point being, most of the time will be spent actually configuring the software and the automation overhead is nothing in comparison compared to the value you get from it, and that's why I generally automate things by default: It provides more value than I put in effort.

When you start of learning configuration management and infra automation tools, there's a learning curve; in the beginning, you will be "wasting" time learning (what a silly statement) how to use your tools effectively, but with practice, you will learn how to effectively use the tools and where to apply them and how to approach managing specific kinds of systems such that over time, using the automation tools is simply easier and faster than doing it manually. That's what I meant when I mentioned "higher quality" earlier; you get it for free, with no effort, once you've put in a bit of practice first. It just sounds to me like you're arguing against doing things well in favour of doing things with strictly inferior tools.

rgj · on Nov 3, 2021

> Not everything needs to be of high quality.

But stuff connected to the internet needs to be or it will be compromised before you even finished installing it.

minetest2048 · on Nov 3, 2021

Rant/question:

The word software configuration management can mean 2 related but different thing:

1. Configuration management in system engineering sense, which is is a process to systematically manage, organize, and control the changes in the documents, codes, and other entities during the Software Development Life Cycle (guru99).

2. Something to manage your config files, from something simple as python/bash scripts to full infrastructure-as-code solutions such as terraform and ansible

When I think about configuration management, I (and the parent) thinks about the second meaning, but if I googled that, all of the search results points to the first meaning

chousuke · on Nov 3, 2021

Both are important. A good CMDB is key in finding your documentation that points you at the configuration management used for the actual system.

Let me tell you, just a wiki with a search is not enough beyond a certain size, and you hit that faster than you'd think.

crispyambulance · on Nov 3, 2021

> [...] all deployments (no matter how small) should have configuration management [...] Ansible [...] with zero effort [...]

No.

It's a powerful tool, Ansible, but let's not get carried way. There's a ton of complexity behind the scenes. If you over-do it you end up with a ream of ugly yaml and you're fighting with the tool as much as you are any real problems.

Arch-TK · on Nov 2, 2021

This is assuming you already know Ansible.

chousuke · on Nov 2, 2021

Sure, but then again, typing shell commands into text files is assuming you already know shell commands. You have to spend time to learn your tools at some point.

For simple configuration management, Ansible is a straight upgrade to most shells because of idempotency alone, never mind the fancier features like the more advanced modules, multi-node orchestration, or encrypted vaults. The YAML syntax is dumb and it has its issues, for sure, but it still does even the simple things much better than plain old shell.

Anyone who has any familiarity at all with UNIXy systems can learn Ansible from zero well enough in a day or two for it to start becoming truly useful, and if you don't have the foundation for that... why on Earth are you setting up a web server? I mean, it's of course fine to tinker with things for learning, but I was assuming a real deployment scenario.

Arch-TK · on Nov 2, 2021

> Sure, but then again, typing shell commands into text files is assuming you already know shell commands. You have to spend time to learn your tools at some point.

Don't I still need to know shell programming for Ansible? Or at least know all the systems I want to manage with it inside out?

Yes, I need to learn tools at some point. But as I see it, I am not a system administrator of anything but my own network of 8 infrastructure hosts. The effort required to recreate this with ansible (and I don't think ansible can actually idempotently handle ALL of these devices, not without serious limitations) seems far greater than maintaining a few scripts and keeping backups. Also, I already know bash (unlike ansible).

> Ansible is a straight upgrade to most shells because of idempotency alone

So, as I said, I know nothing about Ansible. But idempotency implies that Ansible always starts from nothing and builds from there. Does this mean that every time I want to change my server I have to wait 15 minutes for it to re-install the distro and re-configure everything? Do I have to keep my state on a different server? I don't see how this can't be achieved with just as much hassle with a script?

Surely I misunderstand this. But if I did, then surely it's not THAT idempotent.

> Anyone who has any familiarity at all with UNIXy systems can learn Ansible from zero well enough in a day or two for it to start becoming truly useful

My problem with this is that every time I've looked into Ansible, it didn't look like a day of work. It looked like a week of work converting my entire infrastructure to it, for very little benefit, in addition to having to change the way I do a lot of things to fit the Ansible blessed method of doing them. It may take a day to learn Ansible but it probably takes even more time than that to learn it to a standard where I would consider the knowledge reliable. It would require making mistakes and lots of practice before I felt like I could quickly recover from any mistake I could make using it as well as avoid those mistakes. Not just that, but because of my nonstandard setup I would likely have to spend extra time learning Ansible well enough that I can actually replicate my nontrivial setup.

jjnoakes · on Nov 3, 2021

> idempotency implies that Ansible always starts from nothing and builds from there

No, it doesn't. In Ansible you say something like "make sure apache is installed" and if apache is installed, nothing happens. If it isn't, it gets installed. Then you say "make sure apache is running" and if apache is running, nothing happens. If it isn't, it is started.

Arch-TK · on Nov 3, 2021

Okay, this is a rather limited form of idempotency. I don't see the advantage. My system's package manager and service manager already perform this function.

jjnoakes · on Nov 3, 2021

You should really spend a little time learning ansible before you critique it. Ansible isn't perfect, but the things you describe aren't how ansible works in general, so they aren't even valid criticism.

For example, it has idempotent modules for all sorts of things - contents in files, files and directories in the file system, etc - things that you COULD script in an ad-hoc and verbose way, but things which come built-in as one-liners in ansible.

It's quite convenient.

Arch-TK · on Nov 5, 2021

There are no resources which are seemingly suitable for my environment. If you're going to claim that I'm missing something, rather than telling me that I have things to learn (no shit sherlock), you could tell me specifically which initial impressions are wrong.

jjnoakes · on Nov 8, 2021

I did, a few comments up. This:

> idempotency implies that Ansible always starts from nothing and builds from there

...is wrong. It might be true that Ansible is unusable in your environment for some reason, but that's quite different fromage this specific false claim.

Here are a few more quotes that imply you should learn about Ansible before critiquing it for your use case (or, if you don't have time, then refrain from critiquing it in general):

> Don't I still need to know shell programming for Ansible?

No, Ansible uses a custom non-shell syntax and python modules. You can dip into shell scripts but you don't have to. Examples are everywhere in the Ansible documentation.

> Does this mean that every time I want to change my server I have to wait 15 minutes for it to re-install the distro and re-configure everything?

No. Ansible will examine your existing system and apply the changes you configure. Idempotency does not imply or require a functional-like OS or rebuilding from scratch; Ansible is more imperative.

Too · on Nov 3, 2021

> Don't I still need to know shell programming for Ansible?

No. Ansible has its own built in functions for creating files, managing systemd, docker and so on. These are built with idempotency in mind.

You can however call out to shell for situations where there is no built in. There are a lot of people who only ever use this role, and just see ansible as the fleet orchestration layer. Which imo defeats most of the benefits of using it, you might as well ssh a full script in that case.

As a side note I wouldn’t actually recommend Ansible for server management. Like you say learning all these blessed roles feels like relearning basics you already know and the syntax and directory structure is messy. It has no place if you use containers.

Arch-TK · on Nov 3, 2021

> Ansible has its own built in functions for creating files, managing systemd, docker and so on. These are built with idempotency in mind.

Do I still get idempotency if I do not use systemd or docker?

> You can however call out to shell for situations where there is no built in. There are a lot of people who only ever use this role, and just see ansible as the fleet orchestration layer. Which imo defeats most of the benefits of using it, you might as well ssh a full script in that case.

So it sounds like I wasn't entirely wrong in my first impressions that it would be useless for my situation where I don't think any of the "built ins" would really be suitable. Of the 8 machines on my network, only one has systemd (and I'm in the process of phasing it out because systemd seriously struggles to deal with services with dependencies on specific network interfaces being "UP", these issues are documented by freedesktop[0]).

> As a side note I wouldn’t actually recommend Ansible for server management.

Given the background of my infrastructure being a mixture of FreeBSD, OpenBSD, non-systemd Linux and systemd Linux machines. What would you recommend?

[0]: https://www.freedesktop.org/wiki/Software/systemd/NetworkTar...

rgj · on Nov 3, 2021

You seem to have a very strong opinion about Ansible while you keep emphasizing that you don’t know anything about Ansible at the same time.

As a result, all your arguments against Ansible seem to be based upon assumptions, some of them completely false.

Arch-TK · on Nov 3, 2021

I have opinions of my limited experience of trying to look into ansible once.

Why don't you tell me which "arguments" (correction, they're my opinions) are based on false "assumptions" (correction, they're my impressions) rather than just giving me this blanket statement to work from?

chousuke · on Nov 3, 2021

If you already have something that works, by all means, stick with it.

If you want to learn Ansible, you don't even have to throw away your scripts; Ansible is a perfectly good way to run ad-hoc scripts if that solves your problem better than writing a full-blown playbook or even a custom module.

Ansible is weird and annoying in the beginning, but it's still a good tool to learn on top of your existing knowledge, because it provides extremely useful features beyond what's possible with plain old shell, and more importantly, it's a common language for system administration tasks that anyone can learn and understand without having to figure out how your specific scripts accomplish the things that Ansible gives you for free. The same applies to any management tool like Terraform, Puppet or even Kubernetes manifests. I put my expertise in my Ansible scripts and provide an easy interface to them such that a more junior person can, say, upgrade an Elasticsearch cluster by issuing a documented "make upgrade" (I like to use Makefiles to provide a neat interface for "standard" operations. "make help" and anyone can get going.) command that does everything correctly even though they have no idea how to actually upgrade it manually. If they wanted to learn, they have all the resources available required to read and understand my playbooks and figure it out without me being there to teach them the particulars of whatever unholy custom script setup I might have used instead.

Ansible is also mostly useful once you already have a server up and running but with 0 configuration; it's pretty bad at actually installing new servers, and I'd recommend using better tools for that part (Terraform, kickstart, or maybe just a script that clones an image). Just a manual next-next-next install is also perfectly acceptable way to get the base OS installed if the defaults are fine, though beyond a few servers it's a good idea to have a better process.

My perspective is that of someone who works with very varied systems daily, ranging in size from one to hundreds of nodes. I can manage that kind of scale alone because I use automation, and Ansible in particular is a tool that fits extremely well in the 1-20 "size range" for an environment; It is extremely lightweight and low-investment and can be used for even single nodes to great effect; once you get beyond a couple dozen, something more "heavyweight" like Puppet will start showing its usefulness.

As for idempotency, it's a very useful feature for automation: basically "Only do something if it is required". With a shell script, you have to implement manual checks for everything you run such that if you re-run a script on a system where it's already been run once, you won't accidentally break things by applying some things twice. A side benefit of this is that you can run your playbooks in "check mode", ie. "Tell me what you would do, but don't actually do it". Extremely useful and very error-prone to implement manually (Ansible doesn't always get it right either).

csdvrx · on Nov 3, 2021

> With a shell script, you have to implement manual checks for everything you run such that if you re-run a script on a system where it's already been run once, you won't accidentally break things by applying some things twice

Using tools like grep + basic logic like || and && goes a long way...

I'm not saying there is no place for ansible, but in my personal experience, it's a very small one.

> Ansible is also mostly useful once you already have a server up and running but with 0 configuration; it's pretty bad at actually installing new servers, and I'd recommend using better tools for that part (Terraform, kickstart, or maybe just a script that clones an image).

Agreed!

More recently, I've found zfs clones of base installs surprisingly flexible.

Now I only with there was a way to do some kind of merge or reconciliation of zfs snapshots from a common ancestor that haven't diverged much in practice, spawning the differences into separate datasets per subdirectory (ex: if /a/ hasn't changed but only /a/b/c/d1 and /a/b/c/d2 differs, move d1 and d2 off to create a separate d dataset mounted in /a/b/c/ so you can keep the common parts identical )

rambambram · on Nov 2, 2021

Oh, what do I love .txt files. I use them for all kinds of simple checklists, logs, and basically everything. I had a manager once (not in the software field though) and she asked me annoyed why I kept using these strange files, and if I wanted to "just use Word and .doc files because that was more safe and compatible". I wasn't able to explain that text files were there and will be there for a long time. She also didn't understand the difference between a text file and a document. Not even when I pointed her to the .txt in the filename.

throwaway20371 · on Nov 2, 2021

> If you only install a webserver once in a blue moon, make a .txt checklist of the steps you followed.

This brings up a very important point about checklists that I don't think gets enough attention.

The problem happens when somebody "updates" that web server in-place. If they try to record what changes they made in the middle of the checklist, eventually when someone tries the whole checklist from the beginning, they'll find it's now broken; the steps aren't working as expected. This happens to me when I try to record changes in my VirtualBox configuration after I add a new system package or something; later I try to re-deploy my vbox, and it breaks.

So checklists should be considered immutable. Once you create them, don't assume they will work again if modified. Instead, if you make any change to the checklist, you must follow all the steps from beginning to end. This way you catch the unexpected problems and confirm the checklist still works for the next person.

notatoad · on Nov 3, 2021

i just went through this with a colleague this afternoon, and i was super happy when i realized what we had finally accomplished:

he asked if there was any way to access a server, and the answer was "no". the only way to "access" our production server is to modify the provisioner script. there is no way to "update it in place". it's taken a while to get here, but it's really freeing to realize that yes, i have the credentials and could probably get in, but i know my changes would be automatically reversed in the near-term and there's no point in even attempting to access a server directly. the server belongs to the deploy script, not to me.

csdvrx · on Nov 3, 2021

> the server belongs to the deploy script, not to me.

I prefer it when both the server and the deploy scripts belong to me :)

"infrastructure as code" with no way or extremely limited possibilities to ssh for emergencies strikes me as foolish overengineering / painting yourself in a corner, but if you like that, why not?

theshrike79 · on Nov 3, 2021

The possibility to ssh in an emergency is also a possibility ssh in when it's not an emergency and "just quickly change this one thing".

And then the server gets deployed via the script and the quick change isn't there any more.

Whoops.

My EC2 instances are all configured so that they can't be accessed from the outside. They boot up, fetch their install script from a set location and run it.

If they need changes, I either update the base image or the install script.

csdvrx · on Nov 3, 2021

> If they need changes, I either update the base image or the install script.

You lose some time and flexibility, just because you are afraid you may forget to integrate the quick change in your scripts.

My bash histories go into a global database to avoid this

nl · on Nov 3, 2021

Cattle, not pets.

Using SSH is what you do if it's a pet.

ynx · on Nov 3, 2021

Well...if I'm only tending to, say, three or four cows, then they may as well be pets for my purposes, even if most of my management is systematic.

You can do a lot with four servers.

csdvrx · on Nov 2, 2021

> The problem happens when somebody "updates" that web server in-place.

Imagine this is 28-nginx : I would create another script 29-nginx-update only recording the update, even if it: "echo apt-get update; apt-get upgrade nginx ; echo "make sure to fix variable $foo"

Next time I have to do that, I will integrate that into 28-nginx and remove 29-nginx-update

> eventually when someone tries the whole checklist from the beginning, they'll find it's now broken; the steps aren't working as expected.

Maybe I don't understand the issue, but my scripts or text files are simple and meant to be used in sequence. If I hack the scripts, I make sure it still works as expected - and given my natural laziness, I only ever update scripts when deploying to a new server or VM, so I get an immediate feedback if they stop working

Still, sometimes something may work as expected (ex: above, maybe $foo depends on a context?), but it only means I need to generalize the previous solution - and since the script update only happen in the context of a new deployment, everything is still fresh in my head, so I can do it easily

To help me with that, I also use zfs snapshots at important steps, to be able to "observe" what the files looked like on the other server at a specific time. The snapshots conveniently share the same name (ex etc@28-nginx) so comparing the files to create one ot more scripts can be easily done with diff -Nur using .zfs/snapshot/ cf https://docs.oracle.com/cd/E19253-01/819-5461/gbiqe/index.ht...

Between that + a sqlite database containing the full history of commands types (including in which directory, and their return code), I rarely have such issues

Shameless plug for that bash history in sqlite: https://github.com/csdvrx/bash-timestamping-sqlite

> So checklists should be considered immutable. Once you create them, don't assume they will work again if modified. Instead, if you make any change to the checklist, you must follow all the steps from beginning to end.

I agree: if I don't have time to fix 28-nginx, I write 29-nginx-update instead, with the goal next time to integrate it. But I don't try to tweak 28-nginx if I know I won't have the time to test it.

throwaway20371 · on Nov 2, 2021

It can work this way (that's how software patches have historically worked) but if you don't test it from the beginning, you will still find the odd case where that added step is broken, even though it seemed like it should have worked. The more you use that method, the more chances for breakage.

If you don't want to repeat the steps from the beginning, you could make a completely separate checklist to be followed on a given system that includes things like "make sure X package is installed", "make sure Y configuration is applied", so that the new checklist accounts for any inconsistencies. This is pretty common anyway as checklists are broken up into discrete purposes and mixed and matched.

nerdponx · on Nov 2, 2021

The problem I see is that someone will inevitably update the procedure (or make a change that unknowingly requires a change in the procedure) and not update the script. Either because they are pressed for time or because they forgot. Same as any other documentation.

The solution ultimately is for PMs to get it into their heads that software and infrastructure require maintenance like anything else, and consistently refusing to schedule time for software/dev-tool maintenance (such as updating documentation) has the same effect as refusing to schedule time for physical equipment maintenance. Then and only then do engineers have the freedom to set up mandatory procedures and checklists for their work, the way all engineers should be allowed and encouraged to do.

masukomi · on Nov 2, 2021

> The problem I see is that someone will inevitably update the procedure (or make a change that unknowingly requires a change in the procedure) and not update the script

why would your procedure be to do anything _other_ than "run script foo and do what it says"? If your procedure is not that, then your procedure doesn't reflect reality, and thus is outdated documentation that needs to be updated.

if the steps of the procedure only exist within the script then there's only one place to update it. And yes, this suggests the script should be very readable.

nerdponx · on Nov 2, 2021

> If your procedure is not that, then your procedure doesn't reflect reality, and thus is outdated documentation that needs to be updated.

Configurations change all the time. There is no technological safeguard against someone forgetting to write down the change in the playbook script; it has to be organizational.

chousuke · on Nov 2, 2021

Declarative configuration management systems solve this by unchanging your configuration after someone messes with it manually. :) Hard to forget to change the automation when it persistently undoes all your hard labour.

You can help solve the problem with technology, you just have to make the solution easier than working around it.

nineteen999 · on Nov 3, 2021

> Declarative configuration management systems solve this by unchanging your configuration after someone messes with it manually

Not always, there are frequently ways to do an "end-run" around tools like Puppet and Ansible; take for example the following list of /etc/*.d directories on a Redhat distribution:

/etc/bash_completion.d

/etc/binfmt.d

/etc/chkconfig.d

/etc/cron.d

/etc/depmod.d

/etc/dracut.conf.d

/etc/gdbinit.d

/etc/grub.d

/etc/init.d

/etc/krb5.conf.d

/etc/ld.so.conf.d

/etc/logrotate.d

/etc/lsb-release.d

/etc/modprobe.d

/etc/modules-load.d

/etc/my.cnf.d

/etc/pam.d

/etc/popt.d

/etc/prelink.conf.d

/etc/profile.d

/etc/rc0.d

... <snip> ...

/etc/rc6.d

/etc/rc.d

/etc/rsyslog.d

/etc/rwtab.d

/etc/statetab.d

/etc/sudoers.d

/etc/sysctl.d

/etc/tmpfiles.d

/etc/xinetd.d

/etc/yum.repos.d

Someone can manually log onto the environment and drop additional configuration files into those directories that vastly effect what is run on the system (and when it's run in the case of cron.d for example).

"Idempotency" tools like Puppet and Ansible are very good at saying, "this file should exist in this directory with this MD5 hash", but not as good at saying "this directory shouldn't contain anything except these files".

Of course you can list all the files out that you consider to be valid and their signatures in the above directories, but that's going to break next time Redhat pushes an update that installs/removes files from those directories.

I guess you could setup an audit script that checks that all the files in those directories match the expected RPM signatures, and then account for any local customisations (additions, removals, changes etc). But you are starting to get into a lot of extra work there.

Point I am making, is that these tools are not as forcibly idempotent as a lot of people assume.

chousuke · on Nov 3, 2021

Of course; no tool is perfect. But in the general case, they're good enough, and they do help.

For example, I manage nodes with Puppet, and Puppet can and will "clean" things like sudoers.d, yum.repos.d, nginx.conf.d etc. of files that it does not manage.

I don't do this for every possible directory because so far configuration drift in those has not been a problem and generally whatever comes from the packages by default functions fine and crucially, the system can be rebuilt from scratch using the configuration that is managed, so the important bits are there.

I will simply start managing more directories as needed.

csdvrx · on Nov 2, 2021

A script comparing the md5 or the timestamp of the configuration files against the md5 or the timestamps of the log entry in charge of these files can do that

I mean, if /etc/hosts is more recent than /log-directory/03-static-hosts-in-etc or the md5 you have recorded for this file, a daemon can easily create a ticket / send an email to whoever was logged at the time of the change.

magicalhippo · on Nov 2, 2021

I'm primarily a Windows guy, but I'm dabbling more and more with Linux lately. Especially after I got a few Raspberry Pi's and various clones set up.

What I've ended up doing is to just write Word documents containing what I do. I keep my Word documents in a single directory on my OneDrive, so I can access them on all my machines.

Raspberry Pi iSCSI server? One document for that, containing the links to guides used, commands I've run and scripts made. I use headers to organize the sections, like one for compiling the custom kernel, one for configuring iSCSI on the Pi, one for using it on the NAS etc.

Need to create a new Docker SMB mount? Once the right incantation has been found, a small script is made and also added to the right Word document, ready to be pasted into the next machine I might need it on.

It's not terribly pretty and not at all fancy, but I found it's low enough barrier that it's easy to do and maintain, and it's very helpful to have it all in one place.

Had I been a Linux guy first and foremost I would probably have used something else than Word, but it works and I find the separation between the regular font for comments and monospace for script makes it easy to quickly distinguish. It's also easy to add screenshots for clarifications etc.

csdvrx · on Nov 2, 2021

> Had I been a Linux guy first and foremost I would probably have used something else than Word, but it works and I find the separation between the regular font for comments and monospace for script makes it easy to quickly distinguish. It's also easy to add screenshots for clarifications etc.

As a windows girl, I recommend you give a try to the notebooks like RStudio: you can add screenshots (or script them with ahk, nircmd etc to automatically screenshot at some points of the execution) and execute blocks of commands in about anything.

The notebook approach has the additional nice feature of stopping execution as soon as a block fails, giving you the opportunity to fix it before you continue from that point, something often more tedious with Linux scripts (where you need to commend the beginning if your script isn't idempotent)

Ideally, you would always write idempotent scripts, but who've got time for that :)

In practice, if you try to avoid wasting effort, it's often the icing on the cake, once everything else has been done. So I like notebook environments for this simple feature: piecewise execution with verbose output (similar to bash -xe)

That said, a directory per machine (and a subdirectory with all the drivers and specific software) on Onedrive with a RTF file full of screenshots (because wordpad is everywhere!) is how I work most of the time :)

xomodo · on Nov 3, 2021

Have you tried emacs org mode?

lucb1e · on Nov 2, 2021

I think using an executable format like in the article, you'll find it's not actually that much harder to read than a Word document with proportional fonts. And I'm not sure where you need screenshots if it's about doing things on a non-Windows system, even if that seems like an odd transition at first :)

magicalhippo · on Nov 2, 2021

Yeah might have to try it out.

The images could be for things like performance graphs, interactive menu choices, photos of GPIO wiring etc. Not a huge loss to not have them, but given it's dead easy to add to a Word document, why not?

cseleborg · on Nov 2, 2021

> It is the absolute best solution there is.

I prefer checklists. With a checklist, I can mark my progress as I go through the motions and, more importantly, interrupt the work even for several days, before I pick it up again. Checklists are much, much easier to adapt. They can also hold more information than progress indication, like important outputs of the procedure that need to be used somewhere else later, etc. They can be archived in case I need to understand how I did it on that particular occasion one year ago.

Google Docs added checklist a while ago and I think they are very handy. And very KISS.

csdvrx · on Nov 2, 2021

> Checklists are much, much easier to adapt. They can also hold more information than progress indication, like important outputs of the procedure that need to be used somewhere else later, etc.

echo "## Step 2/10 : preparing the SSH key for xxx"

(...)

KEY=$( cat .ssh/id_rsa.pub )

echo "## Do not forget to use this important output somewhere else later: $KEY"

> They can be archived in case I need to understand how I did it on that particular occasion one year ago.

(...)

# 20191102

# TODO: 2 years ago, we decide to use 4096 bits key, make sure to check the length

# in case I forget, how to do that, the number of bits can be forced with -b 4096

# 20201102

# WONTFIX: use ecdsa for the specific host yyyy

So I stand with the author: this is the absolute best solution there is: it can be as simple or as thorough as you need, while being very low tech and simple to use.

And when you find yourself needing to make say an Ansible configuration, you already have most of what you need.

MaulingMonkey · on Nov 3, 2021

This fails on !wsl windows (no `cat`.) Or on *nix if `ssh-keygen` is missing (minimal VM image? new box?) Or on *nix if I absentmindedly used `bash` specific syntax and I'm now in a real `sh` shell. And this is for basic SSH setup!

An aborted script has a good chance of having left things in a broken, half-constructed state, such that simply re-executing the script results in more errors. Attempting to manually resume the script midway will have likely discarded important env variables that I must now figure out how to reconstruct.

...which isn't to say the scripting approach doesn't have value - poor checklist discipline by my coworkers has led me to painstakingly automate more than one perfectly fine checklist, and some stuff is quite scripting friendly - but I've also written my share of scripts that ate up more time in debugging and troubleshooting than I ever saved by writing them in the first place.

csdvrx · on Nov 3, 2021

> This fails on !wsl windows (no `cat`.)

If you really want to argue about details, install git bash or msys2 or busybox and be done with it...

> Or on *nix if I absentmindedly used `bash` specific syntax and I'm now in a real `sh` shell.

Write your scripts in whatever you like as long as you are consistent!

Protip: starting your script with #!/bin/bash will limit the chances they are executed by sh

> An aborted script has a good chance of having left things in a broken, half-constructed state, such that simply re-executing the script results in more errors

Yes, please write good scripts that clean up and are idempotent (!!)

> but I've also written my share of scripts that ate up more time in debugging and troubleshooting than I ever saved by writing them in the first place.

With experience, you will avoid doing things like `echo something >> file` and will replace that with `grep something file || (echo something >> file)` (you get the idea)

MaulingMonkey · on Nov 3, 2021

> If you really want to argue about details, install git bash or msys2 or busybox and be done with it...

"And now you have two problems!"

(My main goal here isn't to argue the details, but to exemplify the many ways scripts end up brittle and go boom in opaque ways, in a way that a checklist and common sense might not. To argue the details anyways: git bash / msys2 scripting always causes more problems than it solves IME - at that point, fire up a real *nix, or switch to a non-shell language like Python or Rust, instead of further entertaining the sunk cost fallacy.)

> Protip: starting your script with #!/bin/bash will limit the chances they are executed by sh

I do so when I can, but bash is missing frequently on oddball targets, so I have good uses for the russian roulette that is `#!/bin/sh`.

> Yes, please write good scripts that clean up and are idempotent (!!)

Easy enough if the underlying tools the scripts invoke are idempotent. Of course, the underlying tools are never idempotent - that'd be too easy.

> With experience, you will avoid doing things like `echo something >> file` and will replace that with `grep something file || (echo something >> file)` (you get the idea)

I've done plenty of that style, and it still tends to be brittle as heck.

`file` might be in use or protected (IDE? Anti-Virus? Previous unkilled build? Uninstall gone awry? Directory corruption? Owner/permissions issues? I've seen it all...)

`file` might fail mid-write (packaging/archiving stuff is a great way to run out of disk IME)

`file`'s format can change wildly between tooling versions, and the resulting errors from using the wrong format might be absolute garbage to the point of not even mentioning `file`.

What you grep for can depend on all kinds of implicit state (I once had a Android unit-testing script break because `adb logcat`'s default format depends on settings stored on the phone, and the default changed between Android versions at some point! Grepping errors to differentiate retryable errors vs fatal vs expected non-errors tends to break as tool authors don't think of error messages as part of their stable ABI, nor should they...)

Experience helps, but sometimes the problem is fundamentally intractable/dynamic enough, and the task executed infrequently enough, that even an automation expert would be better served by manually executing a checklist and hand-editing files, instead of writing a script and trying to decode the error messages when it inevitably explodes each time it runs due to unexpected dependencies on undocumented underlying state.

Kinrany · on Nov 3, 2021

If the process takes a long time, it should be okay to just start the list from scratch and click OK a few times.

throwaway20371 · on Nov 2, 2021

What about the do-nothing script do you find is different than a checklist? To me the whole thing is already a checklist, there just aren't check-boxes.

runiq · on Nov 3, 2021

Well, then it's not a checklist, just a list, with all that entails.

neo2006 · on Nov 2, 2021

If you are going through writing the do-nothing script anyway why not do a do-something script and remove the error prone human out of the loop, it also can serve as documentation and if you are disciplined enough to never make a change other then trough the automation you can have the benefit of source controlling it and have documentation with historical context

computronus · on Nov 2, 2021

I happen to have just written a do-nothing script, so I can answer why I found it helpful vs. a do-something script.

- I am still developing the procedure in the script, so it is premature to automate. A do-nothing script still benefits you by telling you exactly what to do - in my case, spitting out exact commands to run - but you can assess its steps before performing them.

- The script still gathers a lot of information and associates it together in order to figure out the correct commands. That work is valuable all by itself.

- Even though you have to run commands yourself, it's copy-and-paste vs. hand-typing, so it's already less error-prone.

- The script documents the procedure even without automation, so the benefit is immediate.

Now, I think that it's better for a do-nothing script to _evolve_ into a do-something script. But, if that effort is delayed or never happens, at least you've got something.

renewiltord · on Nov 2, 2021

Simple.

Do nothing script:

    echo “Go to your Google Account settings and set a vacation auto responder”

Do something script:

    import oauth2 # I have already lost, there is no salvation in life, without truth or peace there is only the void that beckons

slightwinder · on Nov 2, 2021

A do-something-script is more work to make than a do-nothing-script, which can basically be just be a better tasklist with abstract instructions. Similar, the human may make more errors, but he is also more likely able to fetch errors and correct the script for changes which happened till the script was written.

dragonwriter · on Nov 2, 2021

> If you are going through writing the do-nothing script anyway why not do a do-something script and remove the error prone human out of the loop

You will, eventually, ideally.

But that's extra up-front cost (especially when do something involves a complex integration), the do-nothing script crystallizes the definition of the existing manual process allowing incremental automation of steps (also, making a to-do list for automation.) It is an example of “do the smallest useful unit of work”.

mdoms · on Nov 2, 2021

You have comprehensively missed the point. And your questions are answered in the article:

> At first glance, it might not be obvious that this script provides value. Maybe it looks like all we’ve done is make the instructions harder to read. But the value of a do-nothing script is immense:

> * It’s now much less likely that you’ll lose your place and skip a step. This makes it easier to maintain focus and power through the slog.

> * Each step of the procedure is now encapsulated in a function, which makes it possible to replace the text in any given step with code that performs the action automatically.

> * Over time, you’ll develop a library of useful steps, which will make future automation tasks more efficient.

> A do-nothing script doesn’t save your team any manual effort. It lowers the activation energy for automating tasks, which allows the team to eliminate toil over time.

throwaway20371 · on Nov 2, 2021

These kind of organizational problems happen everywhere, that doesn't bug me. What bugs me is when leadership knows about it and doesn't care. After low-level engineers stick their professional neck out to complain in internal town halls and through feedback forms, and leadership gives some bullshit answer that doesn't address or even acknowledge the problem. It would be less infuriating if they just said "I don't give a shit." It's the weasel words and pretending the problem doesn't exist that infuriates me. A lot of the time it doesn't even take much work at all to begin addressing the issue, like a working group for continuous improvement of highly-painful high-value processes. You don't even have to solve it. Just attempt to address it.

calmlynarczyk · on Nov 2, 2021

I work at a global corporation with 50,000 employees. Even though I've never been at Google I felt every pain point this video was getting at because our company is trying to implement all of this stuff right now.

"Oh you want to go to production? Here's a list from A-XX stating what you need to accomplish that." Thing is I thought they actually handled this gracefully when I started because lots of requirements were tiered with various criteria you had to meet to move up (mostly for brownie points).

But then one day the Tech Execs lose their minds and decide "everything needs to meet all criteria for every single process." You want to create an S3 bucket to store data? That will be a week of submitting paperwork and another month of meetings and approvals from various teams you've never heard of. Plus you have to register your schema, implement data quality checks, unit tests, regression tests, get a PR and CO approved for your central config change, remediate any CVEs in the tooling that you used, and build all of this using our in-house CI/CD platform we created because we're just soooo special. Now you're allowed to launch. Oh wait, NO because we've put the entire corporation on hold from launching new systems for the last calendar year because we're still trying to agree on the final process everyone needs to follow to go to production.

It's surreal how universally so many orgs makes the same mistake of trying to throw more and more process at problems.

oblio · on Nov 2, 2021

> It's surreal how universally so many orgs makes the same mistake of trying to throw more and more process at problems.

It's hard to find the right balance. You want a bit of process, but not too much.

But it's one of the hardest problems in the existence of humanity and whoever solves it should probably get all the Nobel prizes available (including peace and chemistry!).

unethical_ban · on Nov 2, 2021

In my previous role, the secdevops groups (matrixed teams) were building custom terraform modules for our devs to use in order to easily deploy compliant AWS infrastructure - and devs could only deploy via terraform/CI-CD. While TF specifically states that custom modules are not meant to be used as wrappers, I thought it was a clever way to try getting security "out of the way" while still enforcing best practices.

darkwater · on Nov 2, 2021

> While TF specifically states that custom modules are not meant to be used as wrappers

What do you mean with this?

easton · on Nov 2, 2021

"We do not recommend writing modules that are just thin wrappers around single other resource types. If you have trouble finding a name for your module that isn't the same as the main resource type inside it, that may be a sign that your module is not creating any new abstraction and so the module is adding unnecessary complexity. Just use the resource type directly in the calling module instead."

from https://www.terraform.io/docs/language/modules/develop/index...

If you write different versions of the terraform modules that do some corporate specific magic, I think that would be okay under this rule. It's when you're writing a module that doesn't do any useful magic that they want you to stop and think.

acdha · on Nov 2, 2021

> It's surreal how universally so many orgs makes the same mistake of trying to throw more and more process at problems.

Followed by the inevitable ranting about “shadow IT”, AKA the requirements gathering they really should have done.

tetha · on Nov 3, 2021

Well we're small, but our development is currently starting to build new products and new extensions to their products. I'm pretty happy that everyone is pretty much onboard with our situation #1.

There are a few hard requirements, but most of the requirements we as operations put up are tied to the guaranteed service level agreement to the customer and possibly overall user count.

If there is just an entirely lax service level agreement there might be no need to invest time in clustering non-trivial applications, or implementing more monitoring than a simple HTTP check. On the other hand, if you're selling some 99.95 24/7 with penalties to a customer, the list of must-dos suddenly grows a lot.

The nice thing of approaching it like this: It allows a gradual increase in operational rigidity and robustness. A product team doesn't hit a wall of requirements for their first productive customer. They rather have to incorporate more requirements as the service becomes more successful. Or they don't if the idea doesn't work.

brightsim · on Nov 3, 2021

Literally every point you made applies also at my workplace. The optimist in me hopes we work at the same place, but I fear that your last statement might just be the truth :-)

m0zg · on Nov 2, 2021

At Google back then "leadership" might as well not even show up. It was super bottom-up, and _you_, not "leadership" were supposed to identify and fix issues. No "leadership" would stop you, either, at least in most cases. I don't believe that in all my years there anyone ever told me what to do. It was very easy to start projects, shut down projects, get headcount, get resources (if your business case is sufficiently persuasive to others). Not a complete free for all, but certainly _a lot_ more freedom than you'd normally see in companies of that size. And (IMO) people used that freedom and autonomy pretty well.

That kinda deteriorated over time, culminating with Sundar "McKinsey" Pichai, and then went rapidly downhill from there, and now I flat out reject their recruiters, based on the feedback from friends still employed there.

TideAd · on Nov 2, 2021

My team has issues deploying builds to test machines. It's like 15 steps and takes an hour. The tooling is atrocious and recently got even worse.

We eventually found the team responsible for this (the org structure is hard to penetrate because no one answers emails). They said they had no idea anyone was dissatisfied. Then they said that it was a low priority so they didn't care and nothing would be done.

In my experience, you can usually convince an engineer that their stuff has a problem and they need to fix it. But it's often impossible to convince management if they aren't on the hook for user satisfaction.

ts4z · on Nov 2, 2021

To be fair, they did, and many things have improved. And this video was used as an uncomfortable reminder to make some of those changes.

throwaway20371 · on Nov 2, 2021

Is Apple still the only company you can pay for both hardware and software support? Because all I want is to drop down a couple grand and never have to think about "computer maintenance" again. I maintain my car myself because it's so infrequent (pretty much just oil changes) but it feels like my computer maintenance is constant.

One of the reasons for that constant maintenance seems to be The Web. Remember when you didn't need 4 gigs of ram to browse the web? When you didn't need a high-power 3D graphics card to look at Google Maps? (bad example but WebGL is mandatory for some simple sites, and if your graphics sucks/doesn't do hardware acceleration...)

I don't remember ever having to upgrade my car every few years just to visit a new local business. At some point we need to admit that this constant tech churn isn't improving our lives, but it is enriching some billionaires.

cedws · on Nov 2, 2021

I completely get what you mean. I love freedom, I love being able to tinker, but these days I just want my machine to work. Linux on laptops is a nightmare, often even on the hardware that "supports" it. MacBooks are good enough that I can get things done and I don't have to think about Wi-Fi drivers or GPU drivers or whether I'm using the wrong CPU governor causing it to pump out heat. So I bought one and so far I haven't looked back.

tosihakkeri · on Nov 2, 2021

> Linux on laptops is a nightmare, often even on the hardware that "supports" it.

That's just not true. Try a ThinkPad with Fedora and you'll see.

andyp-kw · on Nov 2, 2021

What kind of maintenance are you doing on your machine?

I restart my work and home machines once per month for updates and that's it.

throwaway20371 · on Nov 2, 2021

Why do your machines need monthly updates? Do you constantly update any other machine that you own? Lawnmower, car, oven, microwave, bicycle, watch, reciprocating saw, vaccum, garage door, TV?

Daegalus · on Nov 2, 2021

Because software is buggy, even Apples. It's not like you are updating hardware (like your other examples) you are doing a software update so that the software interfaces with the hardware better or fixes bugs.

nowadays your TV if it's a smart tv also gets monthly or quarterly updates too. they just tend to happen in off-peak hours. and car software updates are when you take them in for service.

you aren't doing a fair comparison asking why your X hardware doesn't need updates when comparing mostly hardware with simple software and full operating systems.

throwaway20371 · on Nov 2, 2021

How is it not a fair comparison? They're machines. Just because we are currently building them in a way that is incredibly fragile and needs constant fixes, does not mean they have to be built that way.

Cars used to be built by hand, had tons of bugs, and were expensive. Then a man came along and found a way to produce them faster, cheaper, and with less bugs. That was pretty amazing for a time, but they still had plenty of bugs. And then some people from a culture of very fastidious craftsmen obsessed with quality began producing cars a little cheaper, and with far fewer bugs, and they lasted much longer. Then the whole world realized, "shit, our machines don't actually need to be so fragile," and they followed suit.

The lessons learned by those people in that culture were promoted around the world, and evolved to shape what we now call Lean and Agile. But the people using these new processes forgot the first lesson: we don't have to accept the status quo.

mickotron · on Nov 2, 2021

Where there is software, there are bugs.

It's not defeatist, it's reality.

You can test, but testing does not prove an absence of bugs. It just means your tests did not reveal any. Maybe your testing is flawed, incomplete, inappropriate, biased etc.

Just saying for devs to "not write bugs" is pretty naive. Almost like saying "don't have car accidents". We don't want to have them, yet here we are. In complex environments, things happens that are sometimes outside our immediate control.

throwaway20371 · on Nov 2, 2021

So then shouldn't we stop writing software? If it's really impossible to make software that doesn't have tons of bugs, yet it's perfectly possible to make hardware without those bugs, shouldn't we be "making hardware" instead?

Actually, now that I think of it, that's not the problem. The problem is we keep changing the software. My laptop from 15 years ago still functions exactly the same way it used to. It hasn't disintegrated into a puddle of bits. You just can't use it to visit any "modern website" or run any "modern software". If we just stopped upgrading everything every 5 seconds we could keep using old technology.

mickotron · on Nov 4, 2021

Nice straw man.

smoldesu · on Nov 2, 2021

Macs are neither cheaper nor less buggy, so I think I have to agree with the other guy. You're comparing Apples to oranges.

kokada · on Nov 2, 2021

Apple machines still needs constantly updates, and worse, they keep nagging you.

My only macOS installation is a Catalina one, and Apple keeps wanting me to upgrade it to Big Sur, that I don't want for some reasons. However, the only workaround I found to stop the update badge from appearing (that is really distracting since it confuses me if this is something important or not) is to set some strange flag and kill Finder. If I open settings for any reason, the update badge reappears and this is really infuriating.

My system with the last amount of maintenance is my NixOS installation, where any workaround that I need for software/hardware issues are forever described in my dotfiles. So yeah, I need to find how to fix something once, however afterwards it will just work. Also different from Apple I can do upgrades when I want, they're atomic and I can also do rollbacks, so they're pretty much safe from a user perspective.

mickotron · on Nov 2, 2021

Mitigation against security vulnerabilities? Bug fixes? New features?

This question is intentionally missing the point.

If you think an internet-connected computer used for modern workloads can be treated like a lawn mower and doesn't need any updates over its usable life, you're dreaming.

throwaway20371 · on Nov 1, 2021

I guess not if they're based in Estonia?

rectang · on Nov 1, 2021

Whether or not legal redress is practical is a separate question. But MangoDB is obviously creating "confusion in the marketplace" and sponging off the goodwill created by somebody else. That's not cool whether it's a commercial entity abusing a FOSS trademark, or the other way around as in this case.

tssva · on Nov 1, 2021

"and sponging off the goodwill created by somebody else."

Aren't they sponging off the ill will created by somebody else?

evv · on Nov 1, 2021

How many fruit-related names are available, anyways? There would surely be a different lawsuit coming their way if they named it AppleDB. (ok, I admit this is a terrible joke)

If "Mango" can get away with it, I think it is very cool to provide a graceful path to open source tech.

I don't have much pity for the $33B company that promotes its mediocre semi-proprietary database to unsuspecting devs/students who don't know better.

rectang · on Nov 1, 2021

It is important to be consistent in the application of the rules. Some of us made a stink about the "Commons Clause" people abusing the ASF's trademark when they were promoting "Apache License 2.0 with Commons Clause" for something incompatible with the Apache License. It would be hypocritical to apply trademark rules for that but not in the case of "MangoDB".

evv · on Nov 1, 2021

Fundamentally I do agree with you. (especially with the mention of "Apache License 2.0 with Commons Clause", which was giving me a headache this morning)

But also I'm not lifting a finger to help companies like MongoDB, unless properly compensated.

Personally I hope that MongoDB does go for a trademark lawsuit, triggering the Streisand Effect. Then Mango can find a better name and attract attention.

threeseed · on Nov 1, 2021

Estonia is a member country of the European Union which has well established IP laws.

The chance of them being successfully sued for this is ~100%.

james_in_the_uk · on Nov 1, 2021

I agree with your point, but (pedant mode on) a cease and desist would be more likely in the first instance. You’d only sue first in an exceptional case.

throwaway20371 · on Nov 1, 2021

"MangoDB is a proxy which uses PostgreSQL as a backend. The proxy translates MongoDB wire protocol commands into SQL queries, and use PostgreSQL as storage."

You don't have to support MongoDB, but you can support apps that were only written with Mongo as backend? That's awesome. I can't imagine it's production-ready yet but it's a great idea.

kumarvvr · on Nov 2, 2021

I have seen multiple benchmarks that show PostgreSQL has better performance than Mongo in almost all use cases.

A simple wrapper in a language like Go or Rust is sufficient to surpass Mongo performance.

Personally, I have shifted database operations behind a GRPC service that uses Go language and PostgreSQL back-end. Allows me to customize the data store to suit the requirement.

PostgreSQL does not get enough love in this world.

rustnote · on Nov 2, 2021

It's plenty loved, I'd say the most loved, but still a fair way behind Oracle in terms of popularity just because of legacy stuff. Damn legacy code, hecking JQuery was still the most used JS framework until just this year!

kvdveer · on Nov 2, 2021

Oracle and popularity don't really go together. Oracle is really common, just like root-canal treatments. Wouldn't refer to either as popular, though.

kumarvvr · on Nov 2, 2021

> but still a fair way behind Oracle in terms of popularity

I have a vague feeling that those who use Oracle are using it due to circumstance or corporate necessity.

pydry · on Nov 2, 2021

One of the last straws for a job I had once was realizing that the oracle python client would convert money fields into float.

Oracle had its tentacles everywhere in that company. I think even they wanted to stop using it but it was just integrated to too much.

strictfp · on Nov 2, 2021

Oracle is a hot mess but it also has some really powerful features, like table sharding , that actually works if you manage to set it up correctly.

keyle · on Nov 2, 2021

To be clear, Oracle's popularity isn't popularity in the sense of popular folk love. Unless you mean popularity in the most rancid disgust available en masse.

SergeAx · on Nov 2, 2021

> in almost all use cases

Including horizontal sharding and vertical replication out of the box?

mst · on Nov 3, 2021

I've seen quite a few instances where a mongo deployment using both of those can comfortably be replaced by a single postgres primary (on about the same hardware as a single member of the mongo cluster) with a couple of read replicas and a connection balancer.

There are absolutely mongodb deployments out there that are at sufficient scale that they genuinely need those features in any storage backend, but I suspect the vast majority of them only need those features to work around mongo's mediocre straight line performance.

How close this comes to counting as "almost all" is of course highly arguable.

SergeAx · on Nov 3, 2021

> can comfortably be replaced by a single postgres primary

And lose all the data in a single failure?

mst · on Nov 4, 2021

I said a single -primary- and explicitly mentioned having replicas.

I'm very confused why you might think that could "lose all the data" from a single one of those nodes failing.

SergeAx · on Nov 4, 2021

I was reading your reply too quick, sorry. Anyway, even failover replication in PG is not there out of the box and have it's quirks.

mst · on Nov 5, 2021

I don't disagree at all with that statement although invetably what mongo does have out of the box definitely has its own quirks.

Getting operations right is tricky no matter what you're doing/using, and you always have to learn your backend's idiosyncracies no matter which one you choose.

kumarvvr · on Nov 2, 2021

It has both, but not out of the box.

exelib · on Nov 2, 2021

More info?

robertlagrant · on Nov 2, 2021

What's vertical replication?

SergeAx · on Nov 2, 2021

Having several read-only replicas in every shard to distribute reading load.

irq-1 · on Nov 2, 2021

Off topic. What would you think of a 1 to 1 relation between gRPC functions and stored procedures? Keeping all the SQL together inside postgres is great for debugging and updates; you can see all dependencies. One to one would stop you from having to make a different API, and make the gateway a simple reusable service.

inkyoto · on Nov 2, 2021

> You don't have to support MongoDB, but you can support apps that were only written with Mongo as backend? That's awesome. I can't imagine it's production-ready yet but it's a great idea.

You have just described AWS DocumentDB, which is a Mongo compatible frontend using Postgres as the backend (AWS coyly refer to it as Aurora, though); the wire protocol compatibility is at the version 4.0 level, with some extras thrown in. Change event streams also works like a charm. We have been using it for a couple of years and have found DocumentDB stable, performant with the AWS support being very good. Support for complex compound indices is still missing as well as support for complex query projections is somewhat missing, but we have decided to change ways of how we use use documents instead, so it has not become a major impediment for us.

The main disadvantage, though, is cost, especially for smaller datasets where spinning up a separate DocumentDB cluster quickly turns into a money wasting excercise. Although, for our primary use cases, DocumentDB is still more than 3x cheaper than a comparable Atlas MongoDB PaaS.

julianlam · on Nov 2, 2021

Except DocDB is not a drop-in replacement despite what Amazon says.

People have tried and failed to get our software running on DocumentDB, whereas another developer got it running on CosmosDB with minimal changes upstream.

Not affiliated with MS in any way, just sharing what I've witnessed secondhand.

inkyoto · on Nov 2, 2021

Amazon do not say that DocumentDB is a drop-in replacement, neither do I. Moreover, I have outlined specific incompatibilites between Mongo and DocumentDB we have encountered for our use cases; RBAC controlled document filtering is not available in DocumentDB, either. Therefore, it won't suit everybody.

However, for simple to medium complexity projects, especially for brand new ones, DocumentDB is a viable and a more affordable alternative to Atlas MongoDB with a decent level compatibility.

taf2 · on Nov 1, 2021

Didn't the stripe team do something like this 5 or 10 years ago? I seem to remember them having a translation layer or doing some sort of streaming conversion from mongodb to pg?

Ah yes - https://github.com/stripe-archive/mosql

6 years ago

PeterZaitsev · on Nov 1, 2021

Does not look like the same thing - this looks like system to replicate from MongoDB rather than being able to talk to PostgreSQL as if it were MongoDB

tyingq · on Nov 2, 2021

Yes, that seems correct

"MoSQL imports the contents of your MongoDB database cluster into a PostgreSQL instance, using an oplog tailer to keep the SQL mirror live up-to-date. This lets you run production services against a MongoDB database, and then run offline analytics or reporting using the full power of SQL."

sitkack · on Nov 2, 2021

This also means you don't have to use PostgreSQL, you are using its wire protocol. So ideally, this could work with Cockroach, Yugabyte, Cloud Spanner, Crate DB, immudb and tens of others that I am missing.

christkv · on Nov 1, 2021

There was another thing in that space aswell https://www.torodb.com

michaelpb · on Nov 1, 2021

Yeah, I could imagine this being a useful step to migrate away from MongoDB. I suspect there are plenty of "resume-driven development" MongoDB installations out there that could use something like this.

vosper · on Nov 1, 2021

> Yeah, I could imagine this being a useful step to migrate away from MongoDB.

What is the state of the art in this area? I did a little PoC of moving data from Mongo to a new schema in Postgres with Hexo and DBT. It worked nicely, but it was only a PoC.

michaelpb · on Nov 2, 2021

Oh, I don't know about the state of the art, I was just speculating. I'd imagine this technique would only useful if you want to support a MongoDB app at the same time as building new features with Postgres, and then gradually phase out the MongoDB interface (e.g. gradually transitioning between a v1 prototype and a v2 rewrite)

If it's not much data (eg ~100k or something), and you don't need any sort of gradual transition, then I'd do something really KISS like dump into a CSV or something and then re-import with whatever the new database management system has for importing files

threeseed · on Nov 1, 2021

MongoDB is a 12 year old database. And yet people are still using this disparaging argument that anyone that chooses it is doing so for their resume and not because it meets their needs in any way.

But by all means replace your production system with MangoDB which is unsupported, significantly slower, has no built-in HA/clustering and written in Go which is a GC language.

dang · on Nov 1, 2021

Would you please not post in the flamewar style to HN? You have a long history of doing this, and I have the impression that it got better in the last few years (yay! thanks), but I also have the impression that you've been relapsing recently (boo, please don't). You can make your substantive points respectfully and without snark, and we'd be very grateful if you'd stick to that.

https://news.ycombinator.com/newsguidelines.html

VWWHFSfQ · on Nov 2, 2021

It's crazy how this "flamewar" perspective has changed on HN about Mongo DB in the last 10 years. It used to be that any comment critical of MongoDB would get a warning. Now it's any comment supportive of it!

dang · on Nov 2, 2021

> It used to be that any comment critical of MongoDB would get a warning

That can't possibly be true, or even close to true!

michaelpb · on Nov 1, 2021

Hm, well, I never said that ANYONE who uses MongoDB is guilty of resume driven development. I specifically only indicated the ones that WERE chosen via resume driven development. Unless you were replying to the wrong comment?

threeseed · on Nov 1, 2021

No I am replying to the right comment.

I think it's disparaging to use the term resume driven development as though there is a large class of developers who are actively trying to harm projects by selecting inappropriate technologies.

I've worked with thousands of developers over the last 20+ years and never seen anyone do this.

michaelpb · on Nov 1, 2021

Huh, so if we are trotting out experience credentials, I have also worked with thousands of developers (I guess?) over the last ~20 years as well. My first programming language was Apple BASIC on an Apple II, and I haven't stopped learning since!

I think we use this term differently, perhaps? This term is not intended to be an attack, but rather just an acknowledgment of a common type of technical debt that results from people getting influenced by marketing teams and choosing tech based on how "trendy" it seems. Sometimes this might be done explicitly since they are intending to jump ship anyway... I've had conversations at the bar out of earshot of "the suits" where this exact topic was discussed! Most of the time it's not intentional or explicit, but just novice engineers directed by poor management to greenfield apps, and then falling for marketing claims and choosing based on how "trendy" the marketing claims it is vs real, observed needs. MongoDB is still getting taught at many bootcamps and coding curriculums as an "SQL, but better for beginners since you don't need that annoying schema thing!"

matttb · on Nov 1, 2021

I've worked with tens of developers over 8 years and I've seen this many times.

I don't think they're 'actively trying to harm projects', but the person you're responding to never implied that in any way.

michaelpb · on Nov 1, 2021

Yeah, I think the person we are replying to is taking this a lot more negatively than I intended. I always sort of thought it was kind of an "open secret" that this stuff went on, at least here in SV / Bay Area. Perhaps elsewhere, where engineers don't hop around jobs every year or two, this sounds more like an insult or accusation?

marginalia_nu · on Nov 1, 2021

Allow me to doubt that you have deep insight into the motivations of thousands of people that have all selected MongoDB for their projects. This seems unlikely for several reasons, if nothing else because of Dunbar's number.

threeseed · on Nov 1, 2021

I don’t need to have deep insights. Developers almost always have to justify why they pick certain technologies.

And given how old MongoDB is not sure how it benefits anyone’s resume.

marginalia_nu · on Nov 1, 2021

If you don't understand their motivations, how can you claim to know their motivations?

halostatue · on Nov 1, 2021

There are thousands of projects where MongoDB was selected precisely because it was a new-shiny No SQL thingy.

For some of these things, it may have been the right thing. For most of them, it was a chance to play with new technologies. I have seen multiple commercial projects where MongoDB was chosen by the developers with _no_ oversight by management (I have killed a couple of those projects, too, because MongoDB was always the wrong technology).

The original comment about the number of projects where MongoDB was chosen under résumé-driven-development is absolutely correct. That doesn’t make it _bad_; how _else_ is one supposed to get experience with new technologies than to try something new? (Sticking with Mongo after multiple data-loss incidents due to the “architecture” of Mongo, on the other hand…)

mapcars · on Nov 1, 2021

Since you haven't seen it means it doesn't happen? Sad that 20+ years didn't teach you basics of logic.

ryanianian · on Nov 1, 2021

From the HN guidelines:

> Be kind. Don't be snarky. Have curious conversation; don't cross-examine. Please don't fulminate. Please don't sneer, including at the rest of the community.

PeterZaitsev · on Nov 1, 2021

Note MangoDB is a stateless proxy as such you can use it with any PostgreSQL setup. For example you should be able to use it with Amazon Aurora PostgreSQL as backend which has HA built in

threeseed · on Nov 1, 2021

If you are using the cloud then you can just use DocumentDB.

kdasme · on Nov 1, 2021

Which is the same proxy on top of PostgreSQL if I remember correctly. :) But MangoDB is cloud-agnostic. I imagine it has the same limitations as DocumentDB or more.

kevinsundar · on Nov 1, 2021

But then you have to use DocumentDB.

PeterZaitsev · on Nov 1, 2021

It depends. With MangoDB you can test on your laptop with PostgreSQL and deploy to Production to Aurora... or any other PostgreSQL compatible DBaaS.

pulse7 · on Nov 1, 2021

"and written in Go which is a GC language" => Would it be better for you if it would be "written in JavaScript which is a GC language"?

zitterbewegung · on Nov 1, 2021

I agree either stay with mongodb or if you really want to migrate then just switch to Postgres by obviously exporting the data and putting it into Postgres

jerrysievert · on Nov 2, 2021

I did something similar for fun 8-9 years ago as well: https://github.com/jerrysievert/mongolike

at the time it was much faster than native mongo.

throwaway20371 · on Nov 1, 2021

I've always used whereis instead of which anyway. More useful info out of one command.

vimsee · on Nov 1, 2021

Say you want to edit a none binary executable (script file) that lives in PATH. Using the command: nano $(which scriptname) ..will open that script. This is one of the neat usecases of which.

pxc · on Nov 1, 2021

I use that all the time!

It's also useful if you want to find out where an executable lives when the thing on your PATH is a symlink (sometimes to another symlink, and so on). With GNU coreutils and which, you can run:

  realpath $(which python)
  realpath $(which php)

and so on, in order to start tracking down what package (if any) owns the interpreter on your PATH, or if it belongs to some other management tool, etc. This gives you slightly different information than something like `php --version`.

It's really useful on NixOS, too, since you might have multiple versions of the same package living in your /nix/store, and you want to know which one is the one you're using.

rezonant · on Nov 1, 2021

I used to use whereis, and... somewhat ironically... switched to using 'which' because it was more likely to be available on whatever given Linux environment I was using :-)