Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Would you say that all the common desktop filesystems, flash memory filesystems, optical filesystems, archive formats, etc. today all aren't robust since they don't do this?

I mean, yeah. They all guess at the mimetype, all the time. They all fall on their face fairly often in that regard.

> Combining multiple methods does solve that problem, but doesn't it kind of negate the benefits too?

No? A system that just stores a two-tuple of (type, data) and doesn't have to guess — when it knows the type — is strictly better than a system that always guesses. Where it integrates with other systems that understand how to type the incoming data, it would work flawlessly, every time. Where it integrates with systems that send untyped bytes, there is again no choice but to guess: the data simply isn't there.

> For example, it would need the creation of new mime type changing UI, which might be a hard thing to teach laypeople (they already barely understand extensions).

Yes, I agree. But people are only going to hit trouble where the mimetype isn't known and the sniffing fails, which is the same issue they'd hit today. I'd argue setting a mimetype has a better shot at being a good UX than trying to get them to set the file extension ever will though.

E.g., Right click → Set file type → prompt with different types (use friendly names, if at all possible). E.g.,

  This file looks like it is probably one of the following:

    JPEG image
      (can be opened with GIMP, Photoshop.)
    PDF document
      (can be opened with Adobe Acrobat.)

  > See all options
    (accordion dropdown to show all options)
Yes, that still requires the user to know the differences between a JPEG & and PNG and to a layman, that's considerably sub-optimal. But we're at the point where the system didn't record the mimetype, and can't guess it correctly, so there's not much left, really: some human has to make a call.

A form of this exists today in most systems, with an "Open with" context menu, and generally with an option to "always open files of this type with this application". (But that's more about the binding between the mimetype — still determined currently through sniffing or extension — and the handler for that mimetype.)

But ideally, if the interfaces were there to allow the process to be deterministic & obvious when the data is known, things could or would grow to adopt those interfaces. (Though it'll likely be a looong time.) Unix screwed up, in that regard, in that it set us down the path of "everything is untyped bytes", vs. "everything is strongly typed bytes". With the latter, the system can start figuring out correct or incorrect actions. With the former, it simply lacks the information to make a decision.

> Aside from just being confusing, that could be purposely abused by malware.

OS X essentially just marks, in the metadata of the file, that it came from the Internet. It could keep doing that. Whatever process "this file could harm your machine" happens with in the browser today could keep on happening that same way.

Granted, there is the possibility of "foo.jpg" being sent with a "application/executable" mimetype. That's a real concern. I think this comes down to the system being clear about what you're dealing with, and the consequences of actions. This problem already exists today: people have crafted ".pdf" files that are valid executables. Having better security controls on apps (not having desktop apps run with the same privilege as the user) would help (limits the damage).

We've learned, repeatedly, that strong typing results in most robust systems. I don't think the answer is any difference with the bag of bytes a file comes with: knowing the type is better than guessing it. We've also learned, I think, that sniffing almost always leads to loopholes…

(I'm not a fan of the "or guess" bit of the proposed idea in my comment; I think a simple "the file carries the mime and that's that" would be better, but I suspect that the roughness of integrating with legacy code that can't communicate what type of data it is reading/writing would hamper that.)



Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: