For anyone who's not used these "managed" services before, I want to add that it...

rlander · on April 4, 2019

Exactly. There's no silver bullet, only trade offs.

In this case you're only shifting the complexity from "maintaining" to "orchestrating". "Maintaining" means you build (in a semi-automated way) once and most of your work is spent keeping the services running. In the latter, you spend most of your time building the "orchestration" and little time maintaining.

If your product is still small, it makes sense to keep most of your infrastructure in "maintaining" since the number of services is small. As the product grows (and your company starts hiring ops people), you can slowly migrate to "orchestrating".

adamlett · on April 5, 2019

There's no silver bullet, only trade offs.

I see this a lot and it bugs me, because it implies that it's all zero sum and there's nothing that's ever unconditionally better or worse than anything else. But that's clearly ridiculous. A hammer is unconditionally better than a rock for putting nails in things. The beauty of progress is that it enables us to have our cake and eat it too. There is no law that necessitates a dichotomy between powerful and easy.

dwheeler · on April 5, 2019

I think you are misunderstanding the phrase trade-off. A trade-off means that something is good for one thing, but not as good for something else. Your examples are exactly an example of a trade-off. A hammer is good at hammering nails, but not good for screwing in screws, because there has been a trade-off to design the tool to make it better for hammering nails than for screwing in screws.

All tools are better at one thing then another. The point of examining trade-offs is to decide which advantages are more appropriate to which circumstance.

NoodleIncident · on April 6, 2019

Why did you rephrase his analogy as a comparison between a hammer and a screwdriver, when the whole point of the comparison was that a hammer is strictly better than a rock?

grumpyprole · on April 6, 2019

A hammer isn't strictly better than a rock, it costs a lot more money/resources to obtain :)

placebo · on April 5, 2019

> it implies that it's all zero sum and there's nothing that's ever unconditionally better or worse than anything else

Not necessarily - to start with, a good trade-off is unconditionally better than a bad trade-off.

Also, progress brings with it increasing complexity. Recognizing the best path includes assessing many more parameters and is far more difficult than deciding whether to use a hammer or a rock to nail things. The puzzling nature of many people to over-complicate things instead of simplifying them makes the challenge even more difficult.

By the way, the closest thing I've ever found to a silver bullet in software development (and basically any endeavor) is "Keep it simple". While this is a cliche already, it is still too often overlooked. I think this is because it isn't related to the ability to be a master at manipulating code and logic, but the ability to focus on what's really important - to know how to discard the trendy for the practical, the adventurous methods to the focused and solid - basically passionately applying occam's razor on every abstraction layer. If this was more common, I think articles like "You are not Google" would be less common.

mcguire · on April 5, 2019

Ever wonder how the Go team seems to get stuff done more efficiently than other groups? It's not Go, it's that they simplify (perhaps oversimplify).

tracker1 · on April 5, 2019

> Not necessarily - to start with, a good trade-off is unconditionally better than a bad trade-off.

Are you better off with a simple dokku server, or a k8s cluster?

There's lots of good/bad on both sides... it really depends on your needs. You could use one locally or through dev layers and another for production.

cellularmitosis · on April 6, 2019

You can still analyze the trade-offs of your hammer / rock choice, e.g.

Hammer pro’s: very efficient

Hammer cons: costs money, only available in industrialized societies

Rock pro’s: ubiquitous and free

Rock cons: inefficient

The point is that in tech, there is no choice which has only pros, and no choice which has only cons.

FabHK · on April 5, 2019

True, but nobody is using a rock still to hammer nails, or advocates for it.

Given any important problem field, it's quite likely that among the extant top solutions there is no product that is strictly dominant (better in every dimension) - most likely there will be trade offs.

anfilt · on April 6, 2019

I used rocks few times as kid to build some forts from scrap wood. You could nail the framing together for any modern house with a rock.

However, if you bend any nails you can't pull the nail out like with a normal framing hammer. Plus you can drive a nail faster with a proper hammer. So just use a hammer.

Problem is with software the tool also become part of what your working on. So it's never quite like this hammer and stone analogy.

Really a better analogy would be fasteners. So for instance we just have screws, nails, and joints. A simplified thing comparison would look like this, and they all become part of what your building.

  Nails
    Pros
      Fast to drive
      Still pretty strong
    Easier to remove mistakes or to disassembly.
    Cons
      Not as strong as other methods of fastening
      May crack wood if driven at the end of a board.

  Screws
    Pros
      Almost as fast as nails with a screw gun.
      Stronger joint than nails.
    Cons
      Can crack wood like nails without pre-drilling.
      Slower to remove.

  Joints
    Pros
      Strong as the wood used.
      Last as long the wood.
    Cons
      Very slow requires chiseling and cutting wood into tight inter locking shapes.
      If it's any type of joint with adhesive it can't be taken apart.

webmaven · on April 10, 2019

Nails are pretty forgiving to the type/size of hammer used.

Screws can be pretty finicky, for example using a too-small Philips screwdriver can strip the screw-head, and make it very difficult to tighten further or to remove.

I'm pretty fond of those screws that can take a Flathead or Philips screwdriver, though.

How about nuts+bolts?

andrewflnr · on April 6, 2019

You can have options that are definitely worse than the rest weight having a silver bullet, aka option that's definitely better than the rest. For nail driving consider a hammer and a pneumatic nail gun. Both are better than a rock, but you couldn't call either one a silver bullet. You still have to think about which one is best for your usage.

PaulHoule · on April 4, 2019

Funny enough, I've experienced the largest benefits from "scaling down" with Amazon's managed databases.

For instance I made an email newsletter system which handles subscriptions, verifications, unsubscribes, removing bounces, etc. based on Lambda, DynamoDB, and SES. What's nice about is that I don't need to have a whole VM running all the time when I just have to process a few subscriptions a day and an occasional burst of work when the newsletter goes out.

mcny · on April 5, 2019

I have a db.example.com $10 a month VM on digital ocean.

It is strictly for dev and staging. Not actual production use because prod doesn't exist yet anyways.

My question is what kind of maintenance should I be doing? I don't see any maintenance. I run apt upgrade about once a month. I'm sure you'd probably want to go with something like Google Cloud or Amazon or ElephantSQL for production purely to CYA but other than that if you don't anticipate heavy load, why not just get a cheap VM and run postgresql yourself? I mean I think ssh login is pretty safe if you disable password login, right? What maintenance am I missing? Assuming you don't care about the data and are willing to do some work to recreate your databases when something goes wrong, I think a virtual machine with linode or digital ocean or even your own data center is not bad?

Jedi72 · on April 5, 2019

Amazon are pushing very hard to convince the next generation of devs who grew up writing front-end JS that databases, servers etc. are some kind of technical wizardy best outsourced, when a few weeks of reading and playing around would be enough to get them up to speed.

noobermin · on April 5, 2019

Hate to be super paranoid, but isn't it rather convenient the top comment on this section expresses exactly this sentiment? If anything, it proves that this perspective is working, or at least, a huge host of devs online really are front-end JS'ers who have this opinion already.

rswail · on April 5, 2019

From an engineering/development sense, this is a good thing, because it means that devs are cheaper. Most devs can't even handle being a good client of things like databases. They barely comprehend what the underlying theories are behind SQL (eg sets etc).

Just like early electricity, people ran their own generators. That got outsourced, so the standard "sparky" wouldn't have the faintest idea of the requirements of the generation side, only the demand side.

The same is happening to programming.

mcguire · on April 5, 2019

1. Programmers are supposed to be professionals. They certainly want to be paid like professionals.

2. The effects of the back end have a nasty habit of poking through in a way that the differences between windmill generators and pumped hydraulic storage don't.

Jedi72 · on April 5, 2019

The difference is that in the case of programming, for the cost of some side-study you can get yourself a competitive advantage.

felbane · on April 5, 2019

Very much this. For most use cases, the out-of-the-box configuration is fine until you hit ridiculous scale, and it's not really all that complicated to keep a service running if you take time to read the docs.

rodolphoarruda · on April 5, 2019

It just came to mind Jason Fried's Getting Real chapter titled "Scale Later". Page 44.

"For example, we ran Basecamp on a single server for the first year. Because we went with such a simple setup, it only took a week to implement. We didn’t start with a cluster of 15 boxes or spend months worrying about scaling. Did we experience any problems? A few. But we also realized that most of the problems we feared, like a brief slowdown, really weren’t that big of a deal to customers. As long as you keep people in the loop, and are honest about the situation, they’ll understand."

scarface74 · on April 5, 2019

It’s not about “getting up to speed”. It’s about not having to manage it on and ongoing basis.

I wouldn’t work for a company that expects devs to manage resources that can be managed by a cloud provider and develop.

How well can you “manage” a Mysql database with storage redundancy across three availability zones and synchronous autoscaling read replicas?

How well can you manage an autoscaling database that costs basically nothing when you’re not using but scales to handle spikey traffic when you do?

stephenr · on April 5, 2019

Right.. so now your developers don't need to understand how to configure and tune open source directory services and RDBMS's and in-memory caches... they just need to understand how to configure and tune a cloud-provider's implementation of a directory service and RDBMS and in-memory cache..... ?

If you think using a cloud "service" out-of-the-box will "just work" your scale is probably small enough that a single server with the equivalent package installed with the default settings is going to "just work" too.

scarface74 · on April 5, 2019

You did just read our use case didn’t you? Yes we could overprovision a single server with 5x the resources for the once a week indexing.

We could also have 4 other servers running all of the time even when we weren’t demoing anything in our UAT environment.

We could also not have any redundancy and separate out the reads and writes.

No one said the developers didn’t need to understand how to do it. I said we didn’t have to worry about maintaining infrastructure and overprovisioning.

We also have bulk processors that run messages at a trickle based on incoming instances during the day but at night and especially st the end of the week, we need 8 times the resources to meet our SLAs. Should we also overprovision that and run 8 servers all of the time?

nilkn · on April 5, 2019

> Yes we could overprovision a single server with 5x the resources for the once a week indexing.

You assume this is bad, but why? Your alternative seems to be locking yourself into expensive and highly proprietary vendor solutions that have their own learning curve and a wide variety of tradeoffs and complications.

The point being made is that you are still worrying about maintaining infrastructure and overprovisioning, because you're now spending time specializing your system and perhaps entire company to a specific vendor's serverless solutions.

To be clear, I don't have anything against cloud infrastructure really, but I do think some folks really don't seem to understand how powerful simple, battle-tested tools like PostgreSQL are. "Overprovisioning" may be way less of an issue than you imply (especially if you seriously only need 8 machines), and replication for PostgreSQL is a long solved problem.

scarface74 · on April 5, 2019

You assume this is bad, but why? Your alternative seems to be locking yourself into expensive and highly proprietary vendor solutions that have their own learning curve and a wide variety of tradeoffs and complications.

So having 4-8 times as many servers that unlike AWS, we would also have five times (1 master and 8 slaves) as much storage is better than the theoretical “lock-in”? You realize with the read replicas, you’re only paying once for storage since they all use the same (redundant) storage?

Where is the “lock-in”? It is Mysql. You use the same tools to transfer data from Aurora/MySQL that you would use to transfer data from any other MySQL installation.

But we should host our entire enterprise on digital ocean or linode just in case one day we want to move our entire infrastructure to another provider?

Out of all of the business risks that most companies face, lock-in to AWS is the least of them.

The point being made is that you are still worrying about maintaining infrastructure and overprovisioning, because you're now spending time specializing your system and perhaps entire company to a specific vendor's serverless solutions.

How are we “specializing our system”? We have one connection string for read/writes and one for just reads. I’ve been doing the same thing since the mid 2000s with MySQL on prem. AWS simply load balances rate readers and adds more as needed.

And you don’t see any issue on spending 5 to 9 times as much on both storage and CPU? You do realize that while we are just talking about production databases, just like any other we company we have multiple environments some of which are only used sporadically. Those environments are mostly shut down - including the actually database server until we need them and scales up with reads and writes with Aurora Serverless when we do need it.

You have know idea how easy it is to set the autoscsling read replica up do you?

polote · on April 5, 2019

Usually you pay much more and have more servers when you use AWS than a basic server provider

scarface74 · on April 5, 2019

I’m making up numbers just to make the math easier.

If for production we need 5x of our baseline capacity to handle peak load are you saying that we could get our server from a basic server provider for 4 * 0.20 ( 1/5 of the time we need to scale our read replicas up) + 1?

Are you saying that we could get non production servers at 25% of the cost if they had to run all of the time compared to Aurora Serverless where we aren’t being charged at all for CPU/Memory until a request is made and the servers are brought up. Yes there is latency for the first request - but these are our non production/non staging environments.

Can we get point in time recovery?

And this is just databases.

We also have an autoscaling group of VMs based on messages in a queue. We have one relatively small instance that handles the trickle of messages that come during the day in production that can scale up to 10 at night when we do bulk processing. This just in production. We have no instances running when the queue is empty in non production environments. Should we also have enough servers to having 30-40 VMs running with only 20% utilization?

Should we also set up our own servers for object storage across multiple data centers?

What about our data center overseas close to our offshore developers?

If you have more servers on AWS you’re doing it wrong.

We don’t even manage build servers. When we push our code to git, CodeBuild spins up either prebuilt or custom Docker containers (on servers that we don’t manage) to build and run unit tests on our code based on a yaml file with a list of Shell commands.

It deploys code as lambda to servers we don’t manage. AWS gives you such a ridiculously high amount of lambda usage in the always free tier it’s ridiculous. No, our lambdas don’t “lock us in”. I deploy standard NodeJS/Express, C#/WebAPI, and Python/Django code that can be deployed to either lsmbda or a VM just by changing a single step in our deployment pipeline.

pcnix · on April 5, 2019

Basic replication is maybe close to what you could call solved, but I'd say that there's still complications like georeplicating multi master write machines are still quite complicated, and need a dedicated person to manage. Hiring being what it is, it might just be easier to let Amazon hire that person for you and pay Amazon directly.

I see cloud services as a proxy to hire talented devops/dba people, and efficiently multiplex their time across several companies, rather than each company hiring mediocre devops/dba engineers. That said, I agree that for quite a few smaller companies, in house infrastructure will do the job almost as well as managed services, at much cheaper numbers. Either way, this is not an engineering decision, it's a managerial one, and the tradeoffs are around developer time, hiring and cost.

stephenr · on April 5, 2019

> I see cloud services as a proxy to hire talented devops/dba people, and efficiently multiplex their time across several companies

It's only a proxy in the sense that it hides them (the ops/dbas) behind a wall, and you can't actually talk directly to them about what you want to do, or what's wrong.

If you don't want to hire staff directly, consulting companies like Percona will give you direct, specific advice and support.

scarface74 · on April 5, 2019

If something goes wrong, we can submit a ticket to support and chat/call a support person immediately at AWS. We have a real business that actually charges our (multi million dollar business) customers enough to pay for a business level support.

But in your experience, what has “gone wrong” with AWS that you could have fixed yourself if you were hosting on prem?

scarface74 · on April 5, 2019

Basic replication is maybe close to what you could call solved,

Locally hosted basic synchronous read replicas are a solved a problem?

mcguire · on April 5, 2019

"No one said the developers didn’t need to understand how to do it. I said we didn’t have to worry about maintaining infrastructure and overprovisioning."

If you do not do something regularly, you tend to lose the ability to do it at all. Personally, and especially organizationally.

scarface74 · on April 5, 2019

There is a difference between maintaining MySQL servers and the underlying operating system and writing efficient queries, optimizing indexes, knowing how to design a normalized table and knowing when to denormalize, looking at the logs to see which queries are performing slowly etc. using AWS doesn’t absolve you from knowing how to use AWS.

There is no value add in the “undifferentiated heavy lifting”. It is not a companies competitive advantage to know how to do the grunt work of server administration - unless it is. Of course Dropbox or Backblaze have to optimize their low profit margin storage business.

indigo945 · on April 5, 2019

Why not run eight servers all the time? If you are running at a scale where that is a cost you notice at all, you are not only in a very early stage, you're actually not even a company.

zbentley · on April 5, 2019

There are many, MANY software companies whose infrastructure needs are on the order of single digits of normal-strength servers and who are profitable to the tune of millions of dollars a year. These aren’t companies staffed with penny-pinching optimization savants; some software, even at scale, just doesn’t need that kind of gear.

user5994461 · on April 5, 2019

A multi million dollars tech company with tens of employees cannot run a sane infrastructure with a single digit of servers.

For a trivial website in production: 2 web servers + 1 database + 1 replica.

For internal tooling: 1 CI and build server + 2 development and testing servers + 1 storage, file share, ftp server + 1 backup server.

For desktop support: At least 1 server for DHCP, DNS, Active Directory + firewall + router.

That's already 10 servers and not counting networking equipment. Less than that and you're cutting corners.

scarface74 · on April 5, 2019

Web servers - lambda

Build server - CodeBuild you either run with prebuilt Docker containers or you use a custom built Docker container that automatically gets launched when you push your code to GitHub/CodeCommit. No server involved.

Fileshare - a lot of companies just use Dropbox or OneDrive. No server involved

FTP - managed AWS SFTP Service. No server involved.

DHCP - Managed VPN Service by AWS. No server involved.

DNS - Route 53 and with Amazon Certificate Manager it will manage SSL certificates attached to your load balancer and CDN and auto renew. No servers involved.

Active Directory - Managed by AWS no server involved.

Firewall and router - no server to Manage. You create security groups and attach them to your EC2 instances, databases, etc.

You set your routing table up and attach it to your VMs.

Networking equipment and routers - again that’s a CloudFormatiom template or go old school and just configuration on a website.

user5994461 · on April 5, 2019

[flagged]

scarface74 · on April 5, 2019

Yes I realize SFTP is not FTP. But I also realize that no one in their right mind is going to deliver data over something as insecure as FTP in 2019.

We weren’t allowed to use regular old FTP in the early 2000s when I was working for a bill processor. We definitely couldn’t use one now and be compliant with anything.

I was trying to give you the benefit of a doubt.

Amateur mistake that proves you have no experience running any this.

If it doesn’t give you a clue the 74 in my name is the year I was born. I’ve been around for awhile. My first internet enabled app was over the gopher protocol.

How else do you think I got shareware from the info-Mac archives over a 7 bit line using the Kermit protocol if not via ftp? Is that proof enough for you or do I need to start droning on about how to optimize 65C02 assembly language programs by trying to store as much data in the first page of memory because reading from the first page took two clock cycles on 8 bit Apple //e machines instead of 3?

We don’t “share” large files. We share a bunch of Office docs and PDF’s as do most companies.

Yes, you do have to run DNS, Active Directory + VPN. You said you couldn’t do it without running “servers”.

No we don’t have servers called

SFTP-01

ADFS-01

Etc.

either on prem or in the cloud.

Even most companies that I’ve worked for that don’t use a cloud provider have their servers at a colo.

We would still be using shares hosted somewhere not on prem. How is that different from using one of AWS storage gateway products.

scarface74 · on April 5, 2019

9 servers (8 reads and 1 writer) running all of the time with asynchronous replication (as opposed to synchronous replication) with duplicate data - yes the storage is shared between all of the replicas.

Not to mention the four lower environments some of which the databases are automatically spun up from 0 and scaled up as needed (Aurora Serverless)

Should we also maintain those same read replicas servers in our other environments when we want to do performance testing?

Should we maintain servers overseas for our outsourced workers?

Here we are just talking about Aurora/MySQL databases. I haven’t even gotten into our VMs, load balancer, object store (S3), queueing server (or lack there of since we use SQS/SNS), our OLAP database (Redshift - no we are not “locked in” it users standard Postgres drivers), etc.

AWS is not about saving money on like for like resources as you would on bare metal, but in the case of databases where your load is spiky you do. It’s about provisioning resources as needed when needed and not having to either pay as many infrastructure folks. Heck before my manager who hired me and one other person came in, the company had no one onsite that had any formal AWS expertise. They completely relied on a managed service provider - who they pay much less than they would pay for one dedicated infrastructure guy.

I’m first and foremost a developer/lead/software architect (depending on which way the wind is blowing at any given point in my career), but yes I have managed infrastructure on prem as part of my job years ago, including replicated MySQL servers. There is absolutely no way that I could spin up and manage all of the resources I need for a project and develop at the efficacy level at a colo as I can with just a CloudFormation template with AWS.

I’ve worked at a company that rented stacks of servers that sat idle most of the time but we used to simulate thousands of mobile connections to our backend servers - we did large B2B field services deployments. Today, it would be running a Pythom script that spun up an autoscaling group of VMs to whatever number we needed.

CuriousCosmic · on April 5, 2019

The question is how often is that necessary? Once again the point goes back to the article title. You are not Google. Unless your product is actually large, you probably don't need all of that and even if you do, you can probably just do part of it in the cloud for significantly cheaper and get close to the same result.

This obsession with making something completely bulletproof and scalable is the exact problem they are discussing. You probably don't need it in most cases but just want it. I am guilty of this as well and it is very difficult to avoid doing.

scarface74 · on April 5, 2019

You think only Google needs to protect against data loss?

We have a process that reads a lot of data from the database on a periodic basis and sends it to ElasticSearch. We would either have to spend more and overprovision it to handle peak load or we can just turn on autoscaling for read replicas. Since the read replicas use the same storage as the reader/writer it’s much faster.

Yes we need “bulletproof” and scalability or our clients we have six and seven figure contracts with won’t be happy and will be up in arms.

aprdm · on April 5, 2019

> You think only Google needs to protect against data loss?

Just have a cron job taking backups in a server and sending somewhere else?

It has been working for the last 40 years...

scarface74 · on April 5, 2019

So a cron job can give me point in time recovery?

Can a cron job give me automatic failover to another server that is in sync with the master? Can it give me autoscaling read replicas?

Who is going to get up in the middle of the night when the cron job fails?

Yes I am sure my company that has six and seven figure contracts would be just as well served self hosting MySQL on Linode.

candiodari · on April 5, 2019

OTOH the last "cloud" solution I've seen was a database that allowed a team to schedule itself. It was backed on Google cloud, had autoscaling backend. It was "serverless" (app engine), and had backups etc configured.

Cost to run it for 4 years: $70000.

QPS ? 5 was the highest I ever found.

You don't need this capacity. You just don't.

But it's got to be a good business to be in ...

scarface74 · on April 5, 2019

Instead of just assuming pricing based on Google pricing (which we weren’t talking about), you could always just find AWS pricing.

https://aws.amazon.com/rds/aurora/pricing/

Storage cost of 500Gb of data for four years - $2400

Backing up is free up to the amount of online storage.

IO request are $0.12 per million.

Transfer to/from VMs in the same availability zone is free.

A r5.large reserved instance is $6600 a year.

Of course if your needs are less, you could get much cheaper cpu and memory.

candiodari · on April 5, 2019

Issue is, what would the charge be for a typical application for some basic business function, say scheduling attendance.

So QPS really low, <5. But very spread out, as it's used during the workday by both team leader and team members. Every query results in a database query or maybe even 2 or 3. So every, say 20 minutes or so there's 4-5 database queries. Backend database is a few gigs, growing slowly. Let's say a schedule is ~10kb of data, and that has to be transferred on almost all queries, because that's what people are working on. That makes ~800 gig transfer per year.

This would be equivalent to something you could easily store on say a linode or digital ocean 2 machine system, for $20 month or $240/year + having backup on your local machine. This would have the advantage that you can have 10 such apps on that same hardware with no extra cost.

And if you really want to cheap out, you could easily host this with a PHP hoster for $10/year.

So how do you calculate the AWS costs here ?

scarface74 · on April 5, 2019

If you have a really low, spread out load, you could use Aurora Serverless (Mysql/Postgres) and spend even less if latency for the first request isn’t a big deal.

And storage for AWS/RDS is .10/gb per month and that includes redundant online storage and backup.

I’m sure I can tell my CTO we should ditch our AWS infrastructure that hosts our clients who each give us six figures each year to host on shared PHP provider and Linode....

And now we have to still manage local servers at a colo for backups....

We also wouldn’t have any trouble with any compliance using a shared PHP host.

aprdm · on April 5, 2019

You keep mentioning six figures contracts as it's something big. Most of fortune 500 companies have their own data center and still read database manuals somehow...

scarface74 · on April 5, 2019

No I’m mentioning our six figure customers because too often small companies are seen as being low margin B2C customers where you control for cost and you don’t have to be reliable, compliant, redundant or scalable from day one. One “customer” can equate to thousands of users.

We are a B2B company with a sales team that ensures we charge more than enough to cover our infrastructure.

Yes infrastructure cost scale with volume, but with a much lower slope.

But it’s kind of the point, we save money by not having a dedicated infrastructure team, and save time and move faster because it’s not a month long process to provision resources and we don’t request more than we need like in large corporations because the turn around time is long.

At most I send a message to my manager if it is going to cost more than I feel comfortable with and create resources using either a Python script or CloudFormation.

How long do you think it would take for me to spin up a different combination of processing servers and database resources to see which one best meets our cost/performance tradeoff on prem?

aprdm · on April 5, 2019

Depends on how you architecture it, theoretically spinning on prems or in baremetal or vsphere should be an ansible script with its roles and some docker file regardless.

Just for reference we ""devops"" around 2 thousand VMs and 120 baremetal servers and a little of cloud stuff through same scripts and workflows.

We don't really leverage locked in cloud things because we need the flexibility of on prems.

In my business hardware is essentially a drop in the bucket of our costs.

P.s: I totally think there are legit use cases for cloud, is just another tool you can leverage depending on the situation

scarface74 · on April 5, 2019

Spinning up on prem means you have to already the servers in your colo ready and have to pay for spare capacity. Depending on your tolerance for latency (production vs non-production environments or asynchronous batch processing), you can operate at almost 100% capacity all of the time:

Lambda vs VMs (yes you can deploy standard web apps to Lambda using a lambda proxy)

Serverless Aurora vs Regular Aurora.

Tx instances vs regular instances

Autoscaling VMs.

Autoscaling DynomoDB

Etc.

candiodari · on April 5, 2019

Redundant online storage sucks, doesn't protect you against

  DELETE FROM orders WHERE $condition_from_website;

So that's just useless. You need offline backups. Is there something like, euhm, perhaps checkpoints, that you can use ?

scarface74 · on April 5, 2019

Yes, you can schedule automatic snapshots where it will take snapshots on a schedule you choose. You get as much space for your backups as you have in your database for free. Anything above that costs more.

You also get point in time recovery with BackTrack.

https://aws.amazon.com/blogs/aws/amazon-aurora-backtrack-tur...

The redundant storage is what gives you the synchronous read replicas that all use the same storage and the capability of having autoscalinv read replicas that are already in sync.

creatornator · on April 6, 2019

> automatic snapshots where it will take snapshots on a schedule you choose

So... A cronjob?

scarface74 · on April 6, 2019

Does your cron job allow recovery with a minute granularity? Is your cron job also redundant? How long would it take you to restore from backup?

candiodari · on April 6, 2019

You say that as if filesystems with point-in-time and/or checkpoint recovery are anything hard.

scarface74 · on April 6, 2019

So if it “isn’t hard” then show me an implementation that does that with Mysql that can meet the same recovery time objectives as AWS with Aurora?

candiodari · on April 8, 2019

Actually with zfs and btrfs you can service mysql stop, put checkpoint back and bring it back up at roughly the same speed as AWS can do it.

I also have done enough recoveries to know that actually placing back a checkpoint is admitting defeat and stating that you

a) don't know what happened

b) don't have any way to fix it

And often it means

c) don't have any way to prevent it from reoccuring, potentially immediately.

It's a quick and direct action that often satisfies the call to action from above. It is exactly the wrong thing to do. It very rarely solves the issue.

Actually fixing things mostly means putting the checkpoint on another server, and going through a designed fix process.

scarface74 · on April 8, 2019

You haven’t actually tried using ZfS with Mysql have you? Do you know the drawbacks and performance penalties for doing so?

http://download.nust.na/pub6/mysql/tech-resources/articles/m...

Have you benchmarked the time it takes to bring up a ZFS based Mysql restore?

a) don't know what happened b) don't have any way to fix it And often it means c) don't have any way to prevent it from reoccuring, potentially immediately.

Can you ensure the filesystem consistency when you restore with ZFS?

a) don't know what happened b) don't have any way to fix it And often it means c) don't have any way to prevent it from reoccuring, potentially immediately.

In the example that was given in the parent post, we know what happened:

Someone inadvertently (hopefully) did

  Delete * From table

You restore the table to the previous state and you take measures to prevent human error.

tigershark · on April 5, 2019

Why do you have a specific need to read the database periodically polling it instead of just pushing the data to elastic search at the same time that it reaches the database? I don’t know anything about your architecture, but unless you’re handling data on a very big scale probably rationalising the architecture would give you much more performance and maintainability than putting everything in the cloud.

scarface74 · on April 5, 2019

Without going into specifics. We have large external data feeds that are used and correlated with other customer specific data (multitenant business customers) and it also needs to be searchable. There are times we get new data relevant to customers that cause us to reindex.

We are just focusing on databases here. There is more to infrastructure than just databases. Should we also maintain our own load balancers, queueing/messaging systems, CDN, object store, OLAP database, CI/CD servers, patch management system, alerting monitoring system, web application firewall, ADFS servers, OATH servers, key/value store, key management server, ElasticSearch cluster etc? Except for the OLAP database. All of this is set up in some form in multiple isolated environments with different accounts in one Organizational Account that manages all of the other sub accounts.

What about our infrastructure overseas so our off shore developers don’t have the latency of connecting back to the US?

For some projects we even use lambda where we don’t maintain any web servers and get scalability from 0 to $a_lot - and no there is no lock-in boogeyman there either. I can deploy the same NodeJS/Express, C#/WebAPI, Python/Django code to both lambda and a regular old VM just by changing my deployment pipeline.

Jedi72 · on April 5, 2019

Did you read the article? You are not Google. If you ever do really need that kind of redundancy and scale you will have the team to support it, with all the benefits of doing it in-house. No uptime guarantee or web dashboard will ever substitute for simply having people who know what they're doing on staff.

How a company which seems entirely driven by vertical integration is able to convince other companies that outsourcing is the way to go is an absolute mystery to me.

scarface74 · on April 5, 2019

No, we are not Google. We do need to be able to handle spiky loads - see the other reply. No we don’t “need a team” to support.

Yes “the people in the know” are at AWS. They handle failover, autoscaling, etc.

We also use Serverless Aurora/MySQL for non production environments with production like size of data. When we don’t need to access the database, we only pay for storage. When we do need it, it’s there.

mcny · on April 5, 2019

I agree for production because you want to be able to blame someone when autoscaling fails but we trust developers to run applications locally, right? Then why can't we trust them with dev and staging?

By the way what is autoscaling and why are we autoscaling databases? Im guessing the only resource thst autoscales is the bandwidth? Why can't we all get shared access to a fat pipe in production? I was under the impression that products like Google Cloud Spanner have this figured out. What exactly needs to auto scale? Isn't there just one database server in production?

In dev (which is the use case I'm talking about) you should be able to just reset the vm whenever you want, no?

davidjnelson · on April 5, 2019

The big innovation of aurora is they decoupled the compute from the storage [1]. Both compute and storage need to auto scale, but compute is the really important one that is hardest. Aurora serverless scales the compute automatically by keeping warm pools of db capacity around [2]. This is great for spiky traffic without degraded performance.

1. https://www.allthingsdistributed.com/files/p1041-verbitski.p... 2. https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide...

scarface74 · on April 5, 2019

No, it’s not just the bandwidth. It’s also the CPU for read and writes in the case of Serverless Aurora.

For regular autoscaling of read replicas. It brings additional servers on line using the same storage array. It isn’t just “one database”.

That’s just it. There isn’t “just one database”.

davidjnelson · on April 5, 2019

Cloud spanner seems awesome but has a base price of $740/mo, doesn’t seem to autoscale, and doesn’t support a mysql or postgres interface.

ElFitz · on April 5, 2019

I've had servers with identical configurations, identical setup, identical programs installed the exact same way (via docker images), setup the same day

One worked, one didn't ¯\_(ツ)_/¯

That's when I gave up on servers, virtual or not

fredsir · on April 5, 2019

It stands to reason that they weren't identical because if they were they would both have worked.

The thing separating you (in this case) and Amazon/Google/* is that they make sure their processes produce predictable outcomes that if they were to fail can be shutdown and the process can be redone until it yields the correct outcome.

user5994461 · on April 5, 2019

Unless they were identical running in the same directory and listening to the same port. First one to start won.

fredsir · on April 6, 2019

That’s a whole different problem in a whole other place. It’s not either of them not working.

mcguire · on April 5, 2019

I spent this last week finally being forced to deal with Hibernate.

Devs are more than capable of pushing each other to the same conclusions.

brobdingnagians · on April 5, 2019

I have _more_ maintenance, _because_ of AWS. Most libraries/services reach some sort of a stable version where they are backwards compatible. AWS (and other big providers) changes libraries all the time and things break, then you have to figure out what they did instead of having a stable interface to work with.

dreamcompiler · on April 5, 2019

Curious about where you would put your DO data. A cheap VPS from DO doesn't come with much storage. You can either buy more storage for $$ or use DO Spaces for fewer $, but do Spaces talk to PostgresQL? I apologize for my ignorance; I'm just beginning to explore this stuff.

Scarbutt · on April 5, 2019

https://www.digitalocean.com/products/block-storage/

dreamcompiler · on April 5, 2019

Block storage is $.10/GB; Spaces is $.02/GB. I was hoping there was a glue layer that would allow PostgresQL to use Spaces, but such a thing might not exist or be performant enough to be worth building.

Scarbutt · on April 5, 2019

Spaces is for object storage, stuff like videos, images, documents, etc. You can store references(links) to this objects in postgres if you want (very common thing to do).

Block storage is for adding volumes to your droplets (like adding a second hard drive to a computer). One advantage is that they are separated from your droplet and replicated, if your droplet disappears or dies, you can reattach the volume to another droplet. So one common use is to place the data dir of PG for example, in the external volume. You should still have proper backups. DO also has a managed PG service.

mcny · on April 5, 2019

No need to apologize. I have very little data so the 50GB droplet is enough. But in my use case, the data is expendable. The data had no value in development environment. You probably shouldn't do in production what I'm doing.

vimslayer · on April 5, 2019

I have some hobby projects that are very infrequently accessed by me and my friends andy family. Relative to their usefulness, $10/mo would be way too expensive. With S3 and lambdas, my aws bill has been more like $1 to $3, depending on traffic and amount of active side projects.

Edit: those don't include a propoer DB, though, I have been basically using S3 as the DB. Not sure how much a RDS instance or something would add to the bill.

devit · on April 5, 2019

Why would you need "a whole VM"?

Just run it as a systemd service or a Docker/Kubernetes-managed container on one of the VMs that run the rest of the application.

Also no need to use all those AWS-proprietary services, just use an existing task queue system based on PostgreSQL/RabbitMQ/etc.

hinkley · on April 5, 2019

You have a database that only needs to be accessible part of the time. Why not keep going to an embedded database?

rightbyte · on April 5, 2019

If you are hunting vampires there are the choice of the Silver Bullet avaible.

Some times, the benefits from something compared to another thing are so great that there are no downsides. If the rasp in the closets doesn't suffice you can always add another one.

diminoten · on April 4, 2019

The costs associated with "maintaining" usually involve the possibility of a 3am call for whoever is in charge of maintaining. Orchestrating can be done ahead of time, during your 9-5, and that's super valuable. It's still a lot of work, but it's work that can be done on my time, at my pace.

throwaway2048 · on April 4, 2019

Managed services still have plenty of unexplained goings-on, and 3AM pages.

aprdm · on April 4, 2019

We moved some stuff from AWS back to on prems because it broke less often and in more obvious ways.

diminoten · on April 4, 2019

Said like a person who hasn't actually ever converted. It's nothing like that at all.

12 outages since 2011, and none of them are anything like what you're describing: https://en.wikipedia.org/wiki/Timeline_of_Amazon_Web_Service...

bsagdiyev · on April 4, 2019

We've moved from on-prem to AWS fully and we see random issues all the time while their status page shows all green, so I feel you probably have a small amount of resources in use with them or something, because what you're saying doesn't jive with what we see daily. I see you've also copy-pasted your response to other comments too.

diminoten · on April 4, 2019

Can you actually quantify any of this or are you asking me to trust you? What I gave is an objective standard, what you've given so far is "trust me I'm right".

bsagdiyev · on April 5, 2019

Are you just ignoring that the cloud isn't a the savior for everyone because it furthers your own agenda, either publically or personally? We spend the GDP of some nations with AWS yearly, so I guarantee you're not at our scale to see these sorts of issues that most definitely are not caused by us and are indeed issues with AWS as confirmed by our account rep.

diminoten · on April 5, 2019

There are nations that have very tiny GDPs (and AWS is very expensive) so that's not saying much, and I didn't say AWS was flawless, I said that AWS was a hell of a lot better than anything you could come up with unless you're literally at Amazon scale (and we both know you arent), and nearly none of your software's problems are AWS's fault.

Using AWS has been a legitimate excuse for an outage or service delivery issue a dozen times since 2011. The end. Write your apps to be more resilient if you've had more than that many outages at those times (and honestly even those outages weren't complete).

kbenson · on April 5, 2019

> AWS was a hell of a lot better than anything you could come up with unless you're literally at Amazon scale

AWS is a hell of a lot better than most people here can come up with for solving AWS specific set of problems and priorities. But those problems and priorities generally do not match the problem space of the people using it exactly.

For example, AWS has a lot of complexity for dealing with services and deployment because it needs to be versatile enough to meet wildly different scenarios. In designing a solution for one company and their needs over the next few years, you wouldn't attempt to match what AWS does, because the majority of companies use only a small fraction of the available services, so that wouldn't be a good use of time and resources.

AWS is good enough for very many companies so they appeal to large chunks of the market, but let's not act like that makes them a perfect or "best" solution. They're just a very scaleable and versatile solution so it's unlikely they can't handle your problem. You'll often pay more for that scaleability and versatility though, and if you knew perfectly (or within good bounds) what to expect when designing a system, it's really not that hard to beat AWS on a cost and growth level with a well designed plan.

Edit: Whoops, s/dell/well/, but it probably works either way... ;)

bsagdiyev · on April 5, 2019

I'm starting to think this person refuses to be logical, not directed at you kbenson. PlayStation Network ran fine for years in our own datacenters, we decided to move to AWS instead of dealing with acquiring and maintaining our own hardware. Trust me, Sony isn't lacking some very bright people, we just don't want to deal with on-prem anymore, so yeah we'll throw ridiculous money at them for that pleasure.

diminoten -- relax, take a breath. Your rude and condescending tone is unnecessary. We don't see eye-to-eye, but I'm not discounting your experience. I can't say I'm getting the same vibe from you.

diminoten · on April 5, 2019

[flagged]

gcommer · on April 5, 2019

You do realize we can scroll up half a page and see you were the first person to levy a personal attack, right?

diminoten · on April 5, 2019

I think the only thing you wrote here I would point out is that you seem to vastly underestimate the operational cost of running your own in-house infrastructure. You're comparing AWS costs to hardware costs, but that's not what AWS gives you, it lets you restructure how you task your entire SA team. You still need them, but they can now work on very different issues, and you don't need to hire new people to work on the things they are now freed up to work on.

And again, I cannot over emphasize how much more control over your own time AWS gives you. It's night/day, and discounting that is a mistake.

bsagdiyev · on April 5, 2019

Without breaking my employer confidentiality agreement? No. But you basing off reported outages is you trusting Amazon in the same way you don't trust me, that is, on their word.

diminoten · on April 5, 2019

I'd rather trust a corporation providing some information over an individual providing absolutely nothing, especially when that corporation's information matches with my own internal information.

The reality is, if you're having problems with AWS, it's you, and not AWS, for 99.9999999% of your problems. Continuing to pretend it's AWS is a face-saving, ego protecting activity that no rational person plays part in.

ToFundorNot · on April 5, 2019

Which is funny because I've had an instance go down. I don't have ridiculously high volume, or distributed:

-One instance -No changes to the config -I'm the only person with access -No major outages at the time

It went down, and the AWS monitor didn't actually see it go down until 45 minutes later, which it needs to be down for a certain amount of time before their techs will take a look.

It was my first time using AWS, and I didn't want to risk waiting for tech so I rebooted the instance and it started back up again. I have no idea why, but it failed with out reason, and on reboot worked like it always did.

My point is that AWS has been solid, but they are like anything else, there are tradeoff's in using their service, and they aren't perfect.

phamilton · on April 5, 2019

I don't know the last time I had an instance go down. Not because it doesn't happen, but because it's sufficiently unimportant that we don't alert on it. Our ASG just brings up another to replace it.

Many applications won't be as resilient. That's the trade-off. We don't have a single stateful application. RDS/Redis/Dynamo/SQS are all managed by someone else. We had to build things differently to accommodate that constraint, but as a result we have ~35M active users and only 2 ops engineers, who spend their time building automation rather than babysitting systems.

If you lean in completely, it's a great ecosystem. Otherwise it's a terrible co-lo experience.

bsagdiyev · on April 5, 2019

Funny enough, that exact scenario is covered in the certification exams too, and the correct answer is to do what you did. An ASG will fix too like another poster said, also.

diminoten · on April 5, 2019

Yeah, you just demonstrated why AWS is keen on you having backups to your services on their platform. You failed to do that (follow their guidance) and suffered an outage because of it. How exactly is that AWS's fault?

MY point is that AWS is very solid, and while there are plenty of trade offs, to be sure, the tradeoff is "operational" vs. "orchestration", and operational doesn't let you decide when to work on it whereas orchestration does.

CuriousCosmic · on April 5, 2019

While I want to avoid getting into this argument, what you are saying is the same as "well it works on my machine" and "there can't be anything wrong with Oracle Database because Oracle says there are no bugs."

diminoten · on April 5, 2019

No, what I'm saying is, "None of your problems are consistent across use cases, therefore they're your problems not the system being used."

I haven't actually said anything about my own experience, so it's funny you claim I have...

CuriousCosmic · on April 5, 2019

"I'd rather trust a corporation providing some information over an individual providing absolutely nothing, especially when that corporation's information matches with my own internal information."

Ah, I had assumed that your internal information was from experience, either your own or your organisation's. Since that is not the case I am curious where your "internal information" comes from considering you completely disregarded @bsagdiyev's personal experience.

diminoten · on April 5, 2019

What part of what I said makes you think my internal information isn't from experience? Just to be clear; it is. I haven't described my specific experience, so claiming I said "it works perfectly for me" is not something I've claimed.

What you tried to say was I was claiming that since it worked for me it was therefore good. That isn't the case. I'm saying because it worked for me AND EVERYBODY ELSE, it's therefore good.

toomuchtodo · on April 5, 2019

It's bad enough that a Chrome extension exists to tease out real info from the lies that is the AWS status page:

https://chrome.google.com/webstore/detail/real-aws-status/ka...

jrsdav · on April 5, 2019

How about network partitions across availability zones? Happens all the time for us, so much in fact that we had to build a tool to test connectivity across AZs just to correlate outages and find the smoking gun.

stephenr · on April 5, 2019

AWS status page is notorious for not indicating there is an issue either at all, or until well after the event began.

adrianN · on April 5, 2019

When your Redshift instance locks up, it doesn't end up on Wikipedia.

maxxxxx · on April 4, 2019

"For anyone who's not used these "managed" services before, I want to add that it's still a fuck ton of work. The work shifts from "keeping X server running" to "how do I begin to configure and tune this service"."

I have noticed that too. With some managed services you are trading a set of generally understood problems with a lot of quirky behavior of the service that's very hard to debug.

erikpukinskis · on April 4, 2019

Yes, but I think very broadly speaking the quirky behavior is stuff you bump into, learn about, fix, and then can walk away from.

The daily/monthly maintenance cycle on a self hosted SQL server is “generally understood” but you still have to wake up, check your security patches, and monitor your redeployments.

You can do some of that in an automated fashion with public security updates for your containers and such. But if monitoring detects an anomaly, it’s YOU, not Heroku who gets paged.

It’s a little like owning a house vs renting. Yes if you rent you have to work around the existing building, and getting small things fixed is a process. But if the pipes explode, you make a phone call and it’s someone else’s problem. You didn’t eliminate your workload, but you shrunk the domain you need to personally be on call for.

maxxxxx · on April 4, 2019

The problem is that if I run my own servers I can fix problems (maybe with a lot of effort but at least it can be done) but with managed services I may not be able to do so. There is a lot of value in managed services but you have to be careful not to allow them to eat up your project with their bugs/quirks.

scarface74 · on April 5, 2019

So what “problems” were you unable to fix with AWS?

holoduke · on April 4, 2019

Exactly this

diminoten · on April 4, 2019

The point is with a managed service, none of your problems will be with the service. That's what the managed service is selling.

deathanatos · on April 5, 2019

I just finished a 2+ week support ticket w/ AWS. We were unable to connect over TLS to several of our instances, because the instance's hostname was not listed on the certificate. This is a niche bug that's trivially fixable if you own the service, but with AWS, it's a lot harder: you're going to need a technical rep who understands x509 — and nobody understands x509.

I've found & reported a bug in RDS whereby spatial indexes just didn't work; merely hinting the server to not use the spatial index would return results, but hinting it to use the spatial index would get nothing. (Spatial indexes were, admittedly, brand new at the time.)

I've had bugs w/ S3: technically the service is up, but trivial GETs from the bucket take 140 seconds to complete, rendering our service effectively down.

I've found & worked w/ AWS to fix a bug in ELB's HTTP handling.

All of these were problems with the service, since in each case it's failing to correctly implement some well-understood protocol. AWS is not perfect. (Still, it is worth it, IMO. But the parent is right: you are trading one set of issues for another, and it's worth knowing that and thinking about it and what is right for you.)

diminoten · on April 5, 2019

Okay, I'm sorry you thought I said AWS was perfect and bug free. I didn't, however, say that. I said (implied, really) it's better than anything you could possibly home brew. Nothing you've said here changes that.

Further, didn't I say that it's trading one set of issues for another? Or at least, I explicitly agreed with that.

I feel like you didn't read what I wrote honestly, and kind of came in with your own agenda. All I ever said was that the issues you trade off are orchestration issues vs. operational issues, and operational issues are 10x harder than orchestration issues because you don't get to decide when to work on operational issues, you tend to have to deal with them when they happen.

frankchn · on April 5, 2019

You wrote “The point is with a managed service, none of your problems will be with the service.”

What deathanatos wrote sounds awfully like problems with the service to me.

I don’t think S3 taking 100+ seconds to respond to a GET request can be solved by orchestration alone.

diminoten · on April 5, 2019

It definitely can. Reasonable timeouts and redundant systems.

indigo945 · on April 5, 2019

It's amazing the length some people are willing to go to to defend AWS marketing slogans as a source of truth. I've seen vendor lock-in before, but AWS seems to be unique in that people actually enjoy working with a vendor whose services go down randomly to the point where they blame themselves for not being "fault-tolerant".

Guess what, if your service is not required to be up because the consuming service is super tolerant to it timing out after 140 seconds, self-hosting it becomes even more of a no-brainer. After all, you clearly need none of the redundancy AWS features.

diminoten · on April 5, 2019

If it makes you feel better, everything I'm saying about AWS can be said about GCP as well.

Sorry, but AWS/GCP is infinitely better at managing infrastructure than you or your company will ever be.

maxxxxx · on April 4, 2019

That's the promise but in reality every software has bugs, including managed services.

diminoten · on April 4, 2019

Not really, not anything like what you're describing.

12 outages since 2011, and none of them are anything like what you're describing: https://en.wikipedia.org/wiki/Timeline_of_Amazon_Web_Service...

bsagdiyev · on April 4, 2019

We've moved from on-prem to AWS fully and we see random issues all the time while their status page shows all green, so I feel you probably have a small amount of resources in use with them or something, because what you're saying doesn't jive with what we see daily. I see you've also copy-pasted your response to other comments too, so I'll do the same with my response.

diminoten · on April 4, 2019

I don't feel like copy/pasting all of our comments to each other, so I'd appreciate it if you didn't do that, thanks.

bsagdiyev · on April 5, 2019

Then don't do it yourself. You're dead set on ignoring people whose experience is different than yours, wrapping yourself in an echo chamber of sorts and telling others they are wrong.

diminoten · on April 5, 2019

I'm not dead set on anything, I'm trying to have conversations with multiple people, not create an immutable record.

And I don't think you know what an echo chamber is if you think one person can create one alone...

maxxxxx · on April 4, 2019

This is not about outages. There are many more things that can go wrong besides outages.

diminoten · on April 4, 2019

[flagged]

maxxxxx · on April 5, 2019

How long have you been working in tech? Just curious.

You sound like someone who hasn't had much real world experience and thinks AWS or whatever is the best thing because it's the only thing you know.

copperx · on April 5, 2019

You may want to ask the OP how much time she/he has been working at Amazon instead.

diminoten · on April 5, 2019

Long enough to know that some dinosaurs refuse to learn anything new (read: AWS) and will bend over backwards to try and keep themselves relevant.

Apaec · on April 5, 2019

I guess it can't be proved that this guy is a shill for AWS.

But this kind of toxic fanatism(yet trying to sound logical) is just harmful for the HN community.

dang: Can this kind of behavior be punished?

goostavos · on April 5, 2019

Bro, what're you so upset about in this thread? That people had different experiences than you with AWS..?

diminoten · on April 5, 2019

I'm not upset, I'm simply pointing out that AWS isn't the problem in any of these examples, it's the various commenter's lack of understanding about how to work in AWS that's caused these problems.

I don't think anyone is actually upset, do you? I certainly hope I haven't upset anyone... :/

Dylan16807 · on April 5, 2019

When your hammer snaps in half, you don't blame yourself for not using two hammers.

When the tool breaks under correct use, criticize the tool. Maybe the user should also have redundancy. The tool is still failing!

diminoten · on April 5, 2019

This analogy is what snapped in half, not the hammer. It's more like if your hammer says right on it, "YOU NEED A SECOND HAMMER" and this is true of all hammers, it's still not the hammer's fault you didn't bring a second hammer.

Dylan16807 · on April 5, 2019

And you're in other threads complaining that the people that had five hammers were still doing it wrong, that all the outages they report are fake somehow...

Even when you're supposed to have redundancy, there are still certain failure rates that are acceptable and some that are not. And redundancy doesn't solve every problem either.

diminoten · on April 5, 2019

What? No I'm not. Literally no where has anyone said they've built a system with redundancies as recommended by AWS and still had problems.

Of course there are unacceptable failure rates. AWS doesn't have them, and pretending like they do is simply lying to yourself to protect your own ego.

msla · on April 5, 2019

> The point is with a managed service, none of your problems will be with the service. That's what the managed service is selling.

Until the managed service simply goes away, of course, taking your data with it.

forty · on April 4, 2019

It's someone else problem unless it prevents you from living here, in which case it's still your problem too. I think the analogy works quite well :)

grigjd3 · on April 4, 2019

So I've worked with AWS and with our internal clusters as a dev. My experience has been that I have to make work-arounds for both, but at least with AWS, I don't have to spell out commands explicitly to the junior PEs.

EDIT: I should be clear, our PEs are generally pretty good, but because their product isn't seen by upper management as the thing which makes money, they're perpetually understaffed.

Macha · on April 4, 2019

Also Amazon documents their stuff in a nice public website, internal teams documented the n-2 iteration of the system and have change notes hidden in a Google drive somewhere that if you ask the right person on the other side of the world they might be able to share you a link to.

pm90 · on April 5, 2019

This. So. Much.

I can't explain just how much developing on GCP has helped me simply by having such amazing documentation. I don't think I appreciated how little I knew: every company where we worked with on premise/ internal services, we would have to use custom services built by others. With GCP, you have complete freedom, not just to design your application architecture from scratch, but to understand how others (coworkers mostly) have designed _their_ applications too! And as a company, it allows the sharing of a common set of best practices, automatically, since its "recommended by Google".

Its kinda like Google/Amazon are now the System/Operations engineers for our company. Which they're good at. And its awesome.

grigjd3 · on April 4, 2019

You were able to find documentation? Where do you work?

dfee · on April 4, 2019

But you’re talking about a reduction in the number of types of specialized people to the number of specializations per type of person. That makes this more scalable.

brobdingnagians · on April 4, 2019

The general fact of reality is that if you are building anything technical, then knowing and managing the details, whatever the details are, will get you a lot more bang for your buck. Reality isn't just a garden variety one-size fits all kind of thing, so creating something usually isn't either. If you just want a blog like everyone else's, then that comes packaged, but if you want something special, you will always have to put in the expertise.

luckylion · on April 4, 2019

> will get you a lot more bang for your buck

And it _really_ is a lot. A company I work with switched from running two servers with failover to AWS, bills went from ~€120/m to ~€2.2k/m for a similar work load. Granted, nobody has to manage those servers any more, but if that price tag continues to rise that way, it's going to be much cheaper to have somebody dedicated to manage those servers vs use AWS.

Also, maybe that's just me, but I prefer to have the knowledge in my team. If everything runs on AWS, I'm at the mercy of Amazon.

Roark66 · on April 5, 2019

If the bill went from ~€120/m to ~€2.2k/m and it was a surprise it is really a lack of proper planning. AWS pricing calculator is there for a reason...

However, often teams will use or attempt to use _a_lot_ more resources than what's needed, or they simply don't optimise for cost when using AWS.

My anecdote is that a company I worked for had an in-house app that run jobs on 8 m4.16xlarge instances costing around $20k/m and they were complaining it took hours to run said jobs. The actual tasks those servers were running were easily parallelizable and servers would run at capacity only for around 10% of the time as those jobs were submitted by users every few-up-to-24 hours. The app was basically a lift-and-shift from on-prem to AWS. The worst way one can use cloud. I created a demo for them where the exact jobs they were running on those m4.16xlarge would run on lambda using their existing code modified slightly. The time a job took to run went from many hours to few minutes with around 2k lambda functions running at the same time. The projected cost went from $20k/m to $1-5k/m depending on workload. I was quite happy with the result, unfortunately they ended up not using lambda and migrating off the app that in its entirety cost around $50k/m for the infrastructure. The point I'm trying to make here is that properly used cloud can save you a lot of money if you have a spikey workload.

Also, for internal apps that are used sparingly one can't beat a mix of client side JS code served from S3, using AWS Cognito tied to AD federation for auth with DynamoDB for data storage and lambda for typical server-side stuff. Such apps are easy to write, cost almost nothing if they are not used a lot and don't come with another server to manage. The only downside is that instead of managing servers, now you have to manage changes in AWS's api..., but nothing's perfect.

stephenr · on April 5, 2019

Proponents of the cloud love to ignore that for 'small' places, it's often perfectly fine to have a 'part-time +emergencies' ops/sysadmin person/team to manage infra.

Yes, some places will need a full time team of multiple people, but a lot of places don't, and can get a tailored solution to suit their needs perfectly, rather than just trying to fit into however it seems to work best using the soup of rube-goldberg like interdependent AWS services.

luckylion · on April 5, 2019

Absolutely, and unless you're riding a roller coaster, you can always grow your team with demand. When your operation has grown enough, you can shift from consultants to employees to save money and get the knowledge into the company, you don't have to start with a team of three to manage one server.

stephenr · on April 5, 2019

Exactly - and good consultants/contractors will still provide you with the in-house docs/guides/configuration management that will help you either transition existing, or hire new staff when the time comes.

dodobirdlord · on April 5, 2019

Holy cow, that's nuts. A pair of perfectly webservers (m5.xlarge) comes in at ~€250/m. And cheaper if you get reserved instances. ~€2.2k/m for a pair of instances and maybe a load balancer is incredible!

eastern · on April 5, 2019

If you can use servers _without_disks_ and without bandwidth, then yeah that's the price. Add some EBS, specially provisioned IOPS.

Not even getting into the fact that an m5.xlarge has roughly as much muscle as an old laptop.

user5994461 · on April 5, 2019

Don't use PIOPS, it's a scam.

Get normal disks of a larger size, it comes with 3 IOPS per GB.

luckylion · on April 5, 2019

It is, especially when they are running a lot of base load, so it's not about firing up 20 nodes for an hour and then idling along on one node for the rest of the week.

"We'd rather not manage the database ourselves" was the answer from the lead dev, and I do understand that: it's not the developers' job to manage servers. In this company, management has removed themselves from anything technical and the devs don't like managing servers, so they say "let's use AWS, we'll write some code once to create our environment and Amazon takes care of the rest" - and by this point they are committed, with months having been spent on getting things running on AWS, changing the code where necessary etc, so reversing it would be a hard thing to sell to investors and the team.

kokey · on April 5, 2019

What really bugs me about this is that it reminds me of the dotcom days. Microsoft ASP stacks made it easier to find less experienced developers to quickly develope sites, no one having an awareness of optimisation and instead would throw hardware at it. Large clusters of DEC Alpha servers would handle the same amount of traffic as a single FreeBSD on a PC, but this cost difference wasn't a problem when investor cash flowed easily and shifting Microsoft and Cisco products fed into the revenues of the system integrators which was fine since revenue growth mattered more than profits.

I've seen this with many AWS deployments, which on a pure hardware cost is 3-5 times more, but with the way it makes it 'easy' to scale instead of optimise instead costs 10-20 times more. When the investor cash starts drying up and the focus is going to be on competing for profits in a market that is much smaller in general, many organisations are going to find themselves locked into AWS and for them it's going to feel like IE6 and asp.net all over again.

jtdev · on April 5, 2019

Same goes for J2EE, Flash, and a bunch of other technologies. Your focus on ASP.NET seems rather biased.

int_19h · on April 5, 2019

I don't think you could describe J2EE as "made it easier to find less experienced developers to quickly develop sites". If anything, it's diametrically opposed to that goal.

And back in dotcom days, it was ASP, not ASP.NET. Which is to say, much like PHP, except with VBScript as a language, and COM as a library API/ABI.

cik · on April 4, 2019

Bang on. Realistically the value of Amazon's managed side is in the early stages. At latter stages with people, it's significantly lower cost to tune real resources, and you get added performance benefits.

We make a decent business out of doing just this, at scale for clients today.

raviolo · on April 5, 2019

Agree. AWS and the likes is an awesome tool to get access to a lot of compute power quickly which is great for unexpectedly large workloads or experimenting during early stages. For established businesses/processes the cost of running on premises is often significantly lower.

We manage about 150T of data for company on relatively inexpensive raid array + one replica + offline backup. It is accessible on our 10Gbps intranet for processing cluster, users to pull to their workstations, etc. The whole thing is running on stock HP servers for many years and never had outages beyond disks (which are in raid) predicting failure and maybe one redundant PSU failure. We did the math of replicating what we have with Amazon or Google and it would cost a fortune.

rashkov · on April 4, 2019

Would love to hear more about that business — do you help people go from cloud back to on-prem?

cik · on April 5, 2019

We're all about mixed mode. Let's never pretend that we ought focus on a specific one and only business. Clients like Amazon, clients like Azure - clients then get forced by (say the BC Health Authority) to run on physical machines in a British Columbia hosted datacenter.

We help make that happen, and help folks manage the lifecycle associated with it.

user5994461 · on April 5, 2019

Companies just really suck at managing hardware or resources. The bigger and the more consultants they get, the more terrible they get, that's what you come after.

Chances are you will find tens of instances without any naming or tagging, of the more expensive types, created by previous dudes who long left. Thanks to AWS tooling it's easy to identify them, see if they use any resource and either delete or scale down dramatically.

cik · on April 8, 2019

It's not a size thing - it's all companies, and it's about significantly more than just 'finding instances'. The 'finding instances' side of AWS is something we just do for free (AWS, Azure, etc) for our existing customers.

Our goal is to provide actual value and do real DevOps work, regardless of whether you're AWS, Azure, GCP, etc. This includes physical, mixed-mode, cloud bursting, and changing constantly. <----- That's what keeps it interesting.

scarface74 · on April 5, 2019

If that were universally true, then why did Netflix go all in on AWS?

hackerpacker · on April 5, 2019

hype maybe? Got a backroom deal for the exposure? Oh, and you probably aren't netflix either. But they have a pretty severe case of vendor lock in now, will be interesting to see how it plays out. As of 2018 they spend 40 million on cloud services, 23 million of that on aws.

Do you think you are gonna get AWS's attention with that $10 s3 instance when something goes wrong?!? You will have negative leverage after signing up for vendor lock in.

I'll take linux servers any day, thanks.

scarface74 · on April 5, 2019

So you think Netflix was suckered into using AWS and if they had just listened to random people on HN they would have made different choices?

I’m sure with all of your developers using the repository pattern to “abstract their database access”, you can change your database at the drop of a dime.

Companies rarely change infrastructure wholesale no matter how many levels of abstraction you put in front of your resources.

While at the same time you’re spending money maintaining hardware instead of focusing on your business’s competitive advantage.

hackerpacker · on April 5, 2019

so, you might be dealing with some half/truths here.

Netflix does NOT use aws for their meat and potatoes streaming, just the housekeeping. They use OWS for the heavy lifting.

https://www.networkworld.com/article/3037428/netflix-is-not-...

But re: maintaining hardware, I only maintain my dev box these days. We are fine with hosted linux services, but backup/monitoring/updating is just too trivial, and the hosting so affordable (and predictable) w/linux it would have to be a hype/marketing/nepotic decision to switch to aws in our case. The internet is still built on a backbone and run by linux, any programmer would be foolhardy to ignore that bit of reality for very long.

scarface74 · on April 5, 2019

So what do you think AWS services are running on if not Linux?

Netflix is by far AWSs largest customer.

From the horses mouth on why they decided to move to AWS:

https://www.se-radio.net/2014/12/episode-216-adrian-cockcrof...

You could also go to YouTube and watch any of the dozens of talks NetFlix has done at ReInvent.

Of course Netflix caches it’s videos in the ISPs data center. But caching is not the “heavy lifting”.

aaronblohowiak · on April 5, 2019

Please do not spread falsehoods.

stickfigure · on April 5, 2019

If your "managed services" are a ton of work, then they're not really managed.

I built a system selling and fulfilling 15k tshirts/day on Google App Engine using (what is now called) the Cloud Datastore. The programming limitations are annoying (it's useless for analytics) but it was rock solid reliable, autoscaled, and completely automated. Nobody wore a pager, and sometimes the entire team would go camping. You simply cannot get that kind of stress-free life out of a traditional RDBMS stack.

mijamo · on April 5, 2019

If anything I had much more troubles with the datastore than with any other DB ever. We are migrating away and the day it's over will be thz biggest profesional relief I've experienced. I guess you coule consider that you don't need to manage it, but the trade off is that you have to go around all the limitations through nasty software hacks instead of just a simple configuration.

stickfigure · on April 5, 2019

I also am curious to know what sorts of problems you've had.

I have noticed that people tend to get into trouble when they force the datastore to be something it's not. For example, I mentioned that it's terrible for analytics - there are no aggregation queries.

In the case of the tshirt retailer, I replicated a portion of our data into a database that was good for analytics (postgres, actually). We could afford to lose the analytics dashboard for a few hours due to a poorly tested migration (and did, a few times), but the sales flow never went down.

The datastore is not appropriate for every project (which echoes the original article) but it's a good tool when used appropriately.

spyspy · on April 5, 2019

It definitely sounds like they shouldn't have used datastore in the first place if it's giving them that much trouble. A common pattern we use is to replicate the data in bigquery, either by stream or batch job that flows over pubsub - perfectly scalable and resilient to failure.

ngrilly · on April 5, 2019

Regarding Google Cloud Datastore, what kind of limitations have you met?

Rapzid · on April 4, 2019

Yeah, I think it's a trade off. Certain services can be a no-brainer, but others will cause pain if your particular use case doesn't align precisely with the service's strengths and limitations.

DynamoDB vs RDS is a perfect example. Most of that boils down to the typical document store and lack of transactions challenges. God forbid you start with DynamoDB and then discover you really need a transnational unit of work, or you got your indexes wrong the first time around. If you didn't need the benefits of DynamoDB in the first place, you will be wishing you just went with a traditional RDBMS vs RDS to start with.

Lambda can be another mixed bag. It can be a PITA to troubleshoot, and their are a lot of gotchas like execution time limits, storage limits, cold start latency, and so on. But once you invested all the time getting everything setup, wired up...

In for a penny, in for a pound.

aaronblohowiak · on April 4, 2019

This was announced in nov, i think: https://aws.amazon.com/blogs/aws/new-amazon-dynamodb-transac...

Animats · on April 4, 2019

When you buy a service from a big company, and it doesn't work, you get to debug the service.

kingraoul3 · on April 4, 2019

Which is exactly why everything runs Linux instead of Windows.

Scarbutt · on April 4, 2019

I guess that's his point, you are not doing less work, all you did was moved your debugging from one place to another.

kingraoul3 · on April 4, 2019

And let's not forget! Got a support contract so we can all BLAME someone not in the room and feel good.

zaarn · on April 5, 2019

>You will run into performance issues, config gotchas, voodoo tuning, and maintenance concerns with any of AWS's managed databases or k8s.

The default config for a LAMP stack will easily handle 100 requests per second. 10 if you app isn't optimized.

Run apt upgrade once a month and enable automatic security updates on ubuntu.

That is neither hard nor "vodoo".

I've used managed services and I don't see the point until you hit massive scale, at which point you can afford to hire your engineers to do it.

staticassertion · on April 4, 2019

That doesn't sound like a shift of work. It sounds like work I already would have done - performance tuning doesn't go away by bringing things in house.

Now that person I pay to operate the service can focus on tuning, not backups and other work that's been automated away.

Sounds like a massive win to me.

opportune · on April 4, 2019

It does make performance tuning harder since you likely don't have access to the codebase of the managed service, requiring more trial-and-error or just asking someone on the support team ($$)

sokoloff · on April 4, 2019

Then you pay $O/month for AWS Enterprise Support (who are actually quite good and helpful) to help augment your $M/month employees and $N/month direct spend.

milquetoastaf · on April 4, 2019

Support - in the long run - is pretty cheap (I think I pay around 300) and 100 percent worth the tradeoff. The web chat offers the fastest resolution times in my experience once you are out of the queue

taeric · on April 4, 2019

To be fair, this is only a valid shift for folks moving. If you are creating something, you have both "how do I configure" and "how do I keep it moving?"

That is, the shift to managed services does largely remove a large portion, and just changes another.

ibejoeb · on April 4, 2019

Yes, but it's still pretty much a straight cost offset. If you hold your own metal, you have to do all of that and still administer the database. Sure, there could be a little overlap in storage design, nut most of the managed systems have typical operational concerns at a button click: backup, restore, HA... Unless your fleet is huge and your workload is special, you're going to win with managed services.

Scarbutt · on April 4, 2019

If you hold your own metal

That's going too much to the other extreme, ec2, droplets, etc.. are fine.

late2part · on April 4, 2019

define huge?

dodobirdlord · on April 5, 2019

Bigger than Netflix, assuming that you don't know something that Netflix doesn't.

dsfyu404ed · on April 5, 2019

And a lot of the time keeping X running is simpler than configuring and tuning this service.

scarface74 · on April 5, 2019

He specifically mentioned Dynamo. There is nothing to configure except indexes, and read and write capacity units.

thatoneuser · on April 4, 2019

That's no joke. I have a decent software background and it was far from trivial to get going with aws services. Their documentation doesn't always quite tel you everything you need to know and half thhe time there are conflicting docs both saying to do something that's wrong. Still has been less work than a production server at my last engineering job but then again that project had a lot of issues related to age and shitty code bases. Hard to say which would have been less work honestly.