Your site says it's distributed, but honestly it has no details of how it works. Is it built on top fo SIA, STORJ or similar? How does it work? How does one "become a node", etc.
The fact they seem to refuse to answer WHERE the storage is coming from is honestly pretty concerning. It's been asked multiple times in this thread and they don't seem interested in addressing it.
For all we know the storage is coming compromised hosts that don't even know they're participating.
Your data is intelligently encrypted on your My Cloud Home and sliced into tiny pieces that only you can put back together. The encrypted data pieces are scattered across our community of users, using small portions of our member’s unused hard drive space. Spreading your data out in this way makes it more safe and secure than it would be if it was all stored in one location.
We are not built on top of Sia or Storj. Our technology has been developed for the past decade and is independent.
Storage comes from services that we offer to end consumers. In exchange for our services they agree to let us use the unused storage on device, i.e. (https://crowdstorage.com/products/device-backup/). We plan on expanding to many more devices.
Storage also comes from being able to transparently leverage more traditional storage that we purchase from various providers.
We span many right now. Primarily APAC (serving customers in Jakarta by way of Singapore and soon Thailand), as well as South America.
How do you handle the problem of low interest? Do you preseed a region with storage servers so you have enough shards to store all the data until you have enough in-market storage endpoints?
I agree that the egress pricing seems a little disingenuous but model doesn't typically egress much more than 100%. Will take a closer look, when there's time.
We can seamlessly leverage more traditional storage as well as distributed endpoints. So in a way yes we can 'seed' it. We will likely wait until we have enough storage in a region before launching there.
> the egress pricing seems a little disingenuous
I have made small edits to the website to make it clearer it is 100% of data stored is free egress. This is definitely something we will focus on in the redesign.
It seems the catch is the same as with Wasabi. It says everywhere on the site that egress is free and there's no hidden fees. Except for this FAQ entry [1]:
> Pricing for Polycloud is only $0.004 per GB per month. There are no other fees as long as you do not egress more than 100% of your data during the month. If you do egress more than 100% of the amount of data you have stored then you will only be charged $0.01 per GB for any overages.
Looks like a hidden fee to me. Note that even the Pricing page [2] does not mention it, and the calculator does not allow to enter egress volume for calculation.
That's an interesting observation! Definitely creates weird incentives.
It seems incentives also depend on the nitty gritty details around how you are billed which are not defined very clearly. Granularity and timing, whether you pay for part months of storage etc.
However all games aside it does seem like if your own egress were free (or much cheaper than 0.01) it would be most efficient to send your data directly to where it needed to go, yourself.
Interesting thought thread on our pricing structure... It saves money if you intend to send the backup to 10x places every month. If you only intend to do it once though you don't come out ahead $10 * 3=$30 versus $1 * 3+$22.50 = $25.50.
Good point -- the minimum 3-month storage duration eliminates this edge case for one-offs (since 3 * $0.4 > $1).
That said, it almost breaks even after the first month and pays off for itself a week into the second month. To eliminate this incentive, I'd suggest setting the cost of egress at (or below) the cost of storage.
It also would make more sense to me as a prospective customer that I'm not paying crazy overage fees, and I'm paying as if I reuploaded the same data and downloaded it an extra time.
> I can't wrap my head around the fact that electrical signals are priced higher than equivalent HDDs.
I think this is standard with cloud storage. Check out this comparison [1], showing 2-6x higher costs for downloads versus monthly storage in B2, AWS, GCP, and Azure.
> They aren't doing 20x replication. They do 20+20? erasure coding, so the overhead is 2x or less.
With "sending that backup to 10 machines" I meant "egress 10x of 250GB" = 2500GB. With a pricing of $0.01 per GB after egress tops 100% of stored data, or 250GB, that's 0.01 * 2250 = $22.5. I'm not talking about replication overhead on their side.
If I do the math but I store a lot of data that gets served by a CDN. It rarely changes and is requested enough that it will stay in the cache. I'm already paying for the CDN, so I can't save there but if can save on storage, it could be useful.
I'm not disagreeing with the point that that pricing is clear or not "hidden" but more that there's a use case here that can make sense.
Wow, that's just outright false advertising (assuming they are actually charging for egress)... Here is everything the pricing page, signup page, and TOS (which reference the first) have to say about paying for egress... I especially like the part where they are calling this hot storage, which certainly implies that it might be accessed more than once a month.
Pricing page [1]
> We have no hidden fees and the monthly calculator to prove it.
> Egress $0.00
> No charge for egress, ops, or retrieval.
> With Polycloud from CrowdStorage, you only pay for the storage you need. And we never penalize you with fees for accessing your data.
> No egress charges
> We keep it simple—access your data when you want it, without being nickel-and-dimed with hidden fees.
> Hot storage for cold storage prices.
> Don’t overpay to get the speed you need. Polycloud delivers quick access you data whenever you need it.
> Egress $0.00
Signup page [2]
> Everything you need to store your data for only $4 per TB/mo and no hidden fees.
Terms of service [3]
> 6.1 Device Backup. In the event that you subscribe to a paid version of the Device Backup Services, we will put you on a recurring payment plan that charges you for the fees set forth at https://app.crowdstorage.com/pricing in advance for each billing cycle. We will charge the payment method you specify at the time of purchase and, if you do not cancel the Services prior to the end of the current billing cycle in accordance with Section 7, you will automatically be charged the then-current fee for the Services at the start of the following billing cycle.
> 6.2 Polycloud. With the Polycloud Services, you pay only for what you use. There are no set-up fees or commitments to begin using the Polycloud Services. At the end of each month, you will be charged for that month’s usage of the Polycloud Services as further set forth at https://polycloud.crowdstorage.com/pricing.
I'm assuming the catch with this is that you are letting your data be stored on some randomer's set of disks. which means availability is directly tied to the interest of nerds who are loaning out their hdd space for a pittance.
But "random farmers" have almost no bearing on the ACTUAL coffee supply chain. Starbucks doesn't use the local farmers market for their supply chain because it's not reliable. They have contracts in place that are fulfilled through obligation.
If anyone on this site has built a business that relies on wikipedia having accurate information at all times I'd call them crazy too.
In the same way I wouldn't call this "enterprise" as they have plastered all over the site. Using spare capacity on a bunch of random usb drives that users happen to have online gives me no guarantee of uptime. With a 20+20 they're betting that 21 users won't experience an outage at the same time, and that if a large portion experiences outages that they can rebuild faster than users fail.
Without knowing anything about where the users are coming from, or what kind of contract they've agreed to, you're just giving a company your data that has told you: it'll be secure, TRUST ME!
Thanks for that, I hadn't truly grasped the concept of supply and demand until you so clearly summed up macro economics.
To just fill you in, Coffee growers grow coffee because the cost of growing coffee is less than the local wholesale price (most of the time.) It is possible to subsist on the profits of growing coffee(depending on where you are).
The price per TB being charged to the consumer is $48 a year. which is significantly less than the initial setup cost to become a data host. (pi + sd card + hdd) That's before we get to the opex of paying for an ISP, (and any bandwidth overages) plus power and general maintenance.
Thats assuming that the company is selling at the price it pays the "hosts". I'm assuming they have some ambition for profitability.
which leads me back to the original statement: you're reliant on a bunch of randomers Who didn't really think about the economics subsidising this company.
Erm, you've got it all backwards, coffee farming is less profitable than data hosting.
The money is made on egress, that's the case with Storj and most others. For domestic setup, the data connection is a fixed cost you would have anyway.
Stop assuming everyone is an idiot, people do the math for profitability of these platforms, and switch between them.
They don't charge for egress, its explicit in the front page.
> coffee farming is less profitable than data hosting.
yes, the money is in roasting. you're missing the essential point, there is a reason why coffee growers don't roast their own raw material, because they cannot get access to capital to get the equipment, let along access to the customers.
but coffee isn't the point.
Unless there is a sustained incentive to store the data, there will not be anyone willing to host it. the amount they charge is not enough to make it sustainable for people to invest in equipment to host the data, so you'll be reliant on best effort, or worse still, short term incentives that are unsustainable, leading to wide spread evaporation of storage.
You pay for gaurenteed storage, which is why its expensive, its a slice of the capex for the equipment, opex for the running and maintenance, plus profit. At this current price, it doesn't cover any of them.
I guess this argument really depends on how much redundancy Polycloud implements. Your data is replicated on 2 random people's computers? Uh oh. Replicated across 10? Probably safe.
There are thousands (millions?) of coffee farmers, so they have no problem.
What happens if say 3 shards of a specific file you need are not online that day? Or never come back online again? I understand the files are copied to multiple locations but we are not talking about a 24/7 datacenter but random computers. People can decide to uninstall the program or never power back on or anything.
Unless it’s copied to basically every computer I don’t see how eventually someone would have a corrupted file just because they are unable to piece it back together with available computers.
I confess that I haven't gone deep into Storj, but I am running a node for some weeks now and I didn't have to pay anything (besides the operational costs of keeping the server online, of course) to get into it.
You are earning money every month, but they keep a colateral and payment to you is delayed. If your nodes combusts one day, that colateral dissapears broadly speaking.
Also there are like 32 shards for each data, so 32 conpuetrs don't go offline in one day.
We now manage a network of over 250,000 NAS devices that are part of our network. So we know how to handle devices going offline without losing data. This approach has been used at scale since 2014.
We are planning on releasing more information about this in the future. Using erasure encoding we can copy it to lots of devices, say 60, then only require 25 of those 60 to retrieve the object.
If the devices storing data go offline we constantly monitor and refresh the pieces to maintain integrity.
I think they meant int the sense of a API compatible
> Polycloud is 100% AWS S3 bit-compatible. If you’re used to using an S3 API, you can access Polycloud using the same API.
S3 has been around for a long time and they had a lot of objects to transition when they upgraded, so I imagine that is why it took awhile.
There are other object storage systems that have strong consistency guarantees that came out after S3.
It greatly simplifies things that an object written to S3 it is immutable.
On a high level, all writes to your storage use some UUID. All reads use a consistent metadata storage (pick a modern database). After your write is complete and you are sure it is persisted, do the metadata update and return success. Everyone gets a consistent view of the operation.
The first thing I check on services which advertise as S3-compatible is permissions. As usual - no permissions, no ACLs. So practically it looks like anyone with an access key can wipe out everything you store with them.
Where is the other side, i.e how are people with hard drives recruited? It didn't seem to be on this site but it's important to know otherwise how do we know it's not, for instance, a botnet?
What is the catch here? How can it be cheaper than b2 while essentially offering the same service to the end-user?(geographically distributing data does not directly result in any reduction of costs). It’s okay to say that “we have lower profit margin” than B2, what is it technically that enables this lower cost? , and what are the trade offs made to get there?..
- First, like you indicated is: you can take less margin (and believe me, there's still considerable margin).
- You can play with erasure coding policy. (n+10)/n makes that you can tolerate 10 failures. a higher n makes your storage overhead less. (which means more margin)
- you can use even fancier storage schemes (fe online codes will make your storage overhead something like 3%)
- you can use cheaper hardware.
- ...
Managing a distributed network has a much lower cost structure - no capital investment for hard drives, no electrical cost for running the hard drives, no cooling costs, etc.
We’ve seen these types of systems before and I always wonder how well thought out this is - either on the part of the hosts, the company or the client.
A couple of questions I have right off the bat are:
How do we know how secure this solution is?
Even if it were incredibly well secured - what are the laws around this setup?
If someone stores illegal materials on this system who is responsible? (The person who stored it? The company? The unsuspecting host?)
What happens if the hosts lose interest or it fails commercially? Does the data get lost without warning when hosts start uninstalling the software?
The problems with these systems is almost never the technology - it’s finding a way to negotiate the millions of different implications storing information on other people’s behalf brings.
We've been storing data like this for several years and have kept hundreds of PB and 10's of billions of objects without data loss despite having many nodes in areas affected by large regional outages. One recent example is when Texas was affected by blackouts.
We are working on getting as much content up on the website as we can and telling more about us!
I’d genuinely be interested in seeing that, it would certainly give me more confidence.
Have you looked into the legal aspects of your setup as regards liabilities? It probably doesn’t matter as much from the customer’s perspective as it’s their data so they should know what it contains, but would be worth knowing where this leaves you as a company and/or the network of nodes if some other customer pushed data onto the system that was objectionable.
Precisely. Although we also believe you shouldn't have to worry about going bankrupt if you ever need to restore all your data or want to migrate somewhere else, which is where the 100% egress of what you have stored comes from.
Would love to have lower cost S3. Last time I tried one I got burned when they changed their prices and then made it really hard to migrate with super slow outbound bandwidth.
Anyway my advice is giving people something to de risk the cost of migration. IDK maybe an abstraction layer that automatically keeps a backup copy inside s3 glacier . Or something else ...
$4/month for 1TB of storage and 1TB of bandwidth? Just get a 1TB Storage VPS (+8TB BW) from https://www.time4vps.com/storage-vps/ for 3 EURO a month instead and stick Minio on it ;) (Just a happy customer)
> this discount is valid only for the first invoice and it is not reoccurring
So it'll double in price after that. And that's on top of having to round up to the next terabyte or two. But if you want more bandwidth then it looks like it's worth considering.
Side note: It's interesting that above 2TB the deal gets steadily worse.
"We designed this service to reliably hold a huge amount of data. This setup will serve you best if it’s used to store compressed data archives or backups."
"Do you offer backups with this service?
Unfortunately, no. Users have to regularly back up the data themselves. "
That sounds a little contradicting
You are missing the durability part here, leave aside the overall operational burden. I am just saying in case the drive storing your files catches fire or... simply fails.
The catch is always in the fine print. The storage is cheap but the savings go away if you actually do anything with the data.
Storage is a foundational service in the cloud. There are huge advantages to having that storage sit adjacent to everything else. Such storage as a service somewhere else doesn’t make a lot of sense in many/most use cases. Now you’re having to pay to move data across comparatively slow networks links to where it’s actually needed. It’s a catchy headline but when putting this to the test in real world scenarios those numbers don’t pan out.
> storage is cheap but the savings go away if you actually do anything with the data.
This is the "data back up" use case, where you store heaps of data, and hope to never need to access it. For this use case the conditions seem excellent.
This is great feedback, we'll take this into consideration. You can always reach out to talk about specifics about your use case and how we can better meet it (there is a Contact Us button near the bottom of the page).
Glacier can get pricey when you store/retrieve your data because of ops, retrieval fees, and egress (if going to the internet). We feel like immediate availability is a compelling advantage of our product.
Best for archive and backup. You can also see how Vivint uses them to store video clips and stream them to customers when needed: https://crowdstorage.com/solutions/
> Yes, Polycloud is GDPR compliant. [...] processed and stored on servers located across the United States [...] certified with the EU-U.S. Privacy Shield Framework.
I don't think this means Polycloud is GDRP compliant. It's my understanding the Privacy Shield Framework was struck down by the EU courts. Might be a bit careful here if you are a EU business.
Says it is GDPR compliant - but I don't see it explained how?
You say you would use "Standard Data Protection Clauses" if the data goes outside of the US ... but I don't want my data in the US in the first place, so is this really a US only service?
How would those clauses be implemented?
Does every node have a contractual agreement with you?
How would you know if someone took their computer with them on holiday to another country?
Also, how is the data encrypted? "State of the art encryption" is just marketing fluff :)
>This means that with our 20/40 encoding scheme, a malicious attacker would need to physically access 20 different nodes within our network of almost 300,000 devices.
Which community is providing 300k devices? Is Polycloud building on top of IPFS or Sia?
One killer feature could be some form of ransomware protection. If a sudden entropy change is detected, snapshot the data, then provide the customer with the option to revert changes from that time (up to, say, a week).
If you scroll a bit on the Backblaze B2 pricing page, they have an easy calculator that will tell you exactly what the costs will be.
$0.005/GB/Month for storage
$0.01/GB for egress
The only "hidden" cost I'd say they have is they have a limit on some API calls. For example their b2_download_file_by_name is limited to 2500 calls per day and then $0.004 per 10k calls after that.
How do you arrive at twice as expensive as AWS glacier?
AWS S3 Glacier is the same $0.004/GB/month and requires the same minimum of 90 days, but retrieval takes "from 1 minute to 12 hours". They also have large retrieval costs (on top of the regular egress to the internet) and API call charges.
AWS S3 Deep glacier is only $0.00099/GB/month, but requires it to be stored 180 days and also have operation/retrieval fees. "For long-term data archiving that is accessed once or twice in a year and can be restored within 12 hours"
Our website is right now being redesigned and your comments here are helpful to help us know where we need to improve!