> If you only install a webserver once in a blue moon, make a .txt checklist of ...

notatoad · on Nov 3, 2021

i just went through this with a colleague this afternoon, and i was super happy when i realized what we had finally accomplished:

he asked if there was any way to access a server, and the answer was "no". the only way to "access" our production server is to modify the provisioner script. there is no way to "update it in place". it's taken a while to get here, but it's really freeing to realize that yes, i have the credentials and could probably get in, but i know my changes would be automatically reversed in the near-term and there's no point in even attempting to access a server directly. the server belongs to the deploy script, not to me.

csdvrx · on Nov 3, 2021

> the server belongs to the deploy script, not to me.

I prefer it when both the server and the deploy scripts belong to me :)

"infrastructure as code" with no way or extremely limited possibilities to ssh for emergencies strikes me as foolish overengineering / painting yourself in a corner, but if you like that, why not?

theshrike79 · on Nov 3, 2021

The possibility to ssh in an emergency is also a possibility ssh in when it's not an emergency and "just quickly change this one thing".

And then the server gets deployed via the script and the quick change isn't there any more.

Whoops.

My EC2 instances are all configured so that they can't be accessed from the outside. They boot up, fetch their install script from a set location and run it.

If they need changes, I either update the base image or the install script.

csdvrx · on Nov 3, 2021

> If they need changes, I either update the base image or the install script.

You lose some time and flexibility, just because you are afraid you may forget to integrate the quick change in your scripts.

My bash histories go into a global database to avoid this

nl · on Nov 3, 2021

Cattle, not pets.

Using SSH is what you do if it's a pet.

ynx · on Nov 3, 2021

Well...if I'm only tending to, say, three or four cows, then they may as well be pets for my purposes, even if most of my management is systematic.

You can do a lot with four servers.

csdvrx · on Nov 2, 2021

> The problem happens when somebody "updates" that web server in-place.

Imagine this is 28-nginx : I would create another script 29-nginx-update only recording the update, even if it: "echo apt-get update; apt-get upgrade nginx ; echo "make sure to fix variable $foo"

Next time I have to do that, I will integrate that into 28-nginx and remove 29-nginx-update

> eventually when someone tries the whole checklist from the beginning, they'll find it's now broken; the steps aren't working as expected.

Maybe I don't understand the issue, but my scripts or text files are simple and meant to be used in sequence. If I hack the scripts, I make sure it still works as expected - and given my natural laziness, I only ever update scripts when deploying to a new server or VM, so I get an immediate feedback if they stop working

Still, sometimes something may work as expected (ex: above, maybe $foo depends on a context?), but it only means I need to generalize the previous solution - and since the script update only happen in the context of a new deployment, everything is still fresh in my head, so I can do it easily

To help me with that, I also use zfs snapshots at important steps, to be able to "observe" what the files looked like on the other server at a specific time. The snapshots conveniently share the same name (ex etc@28-nginx) so comparing the files to create one ot more scripts can be easily done with diff -Nur using .zfs/snapshot/ cf https://docs.oracle.com/cd/E19253-01/819-5461/gbiqe/index.ht...

Between that + a sqlite database containing the full history of commands types (including in which directory, and their return code), I rarely have such issues

Shameless plug for that bash history in sqlite: https://github.com/csdvrx/bash-timestamping-sqlite

> So checklists should be considered immutable. Once you create them, don't assume they will work again if modified. Instead, if you make any change to the checklist, you must follow all the steps from beginning to end.

I agree: if I don't have time to fix 28-nginx, I write 29-nginx-update instead, with the goal next time to integrate it. But I don't try to tweak 28-nginx if I know I won't have the time to test it.

throwaway20371 · on Nov 2, 2021

It can work this way (that's how software patches have historically worked) but if you don't test it from the beginning, you will still find the odd case where that added step is broken, even though it seemed like it should have worked. The more you use that method, the more chances for breakage.

If you don't want to repeat the steps from the beginning, you could make a completely separate checklist to be followed on a given system that includes things like "make sure X package is installed", "make sure Y configuration is applied", so that the new checklist accounts for any inconsistencies. This is pretty common anyway as checklists are broken up into discrete purposes and mixed and matched.