Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

At an old job, I knew some very idealistic folks who kept pushing semantic web business. "Let's do that everywhere!" As an exercise, I would have them open a browser, visit various sites, and then look at the source. "Go on, check to see if it validates," I would say with an anticipatory grin. Whether hand-crafted HTML or generated by any number of frameworks, many sites can barely manage to close their tags, asking for semantic references is a "just won't happen in practice" thing.

I have also seen a great deal of consultant money, programmer time, sys-admin sweat, and the like focused on these toweringly-designed, completely-unused triple stores, layer upon layer of hot technologies (ever-moving, construction on the tower never ceased) fused together to create a resource-intense monstrosity that, at the end of the day, barely got used. But hey, let's look at that jazz semantic web example one more time.

The most painful part is that I understand the urge to build a gleaming repository for information, where the cool URIs never change; SPARQLing pinnacles, ready to broadcast the Library of Alexandria, glimmer; and the serene manifold of abstract information lies RESTful ... but I have come to understand that the web of today is an endlessly bulldozed mudscape where Someone Very Important has to have that URL top-level yesterday (never mind that they will forget about it tomorrow), of shoddy materials and wildly varying workmanship, and where nobody is listening to your eager endpoints because the commercials are just too loud. I too once labored for information architecture, to have the correct thing in the obvious place, with accurate links and current knowledge, to provide visitors with the knowledge they desired ... but PR preempted all of it to push yet more nice photographs in yet another place: the Web as a technology for distributing images that would once live on glossy pamphlets.

The vision is lovely, but we who have always lived in the castle have walked alone.



I would argue the problem is not the broken tags, but the business disadvantage to exposing semantic data.

Remember when microformats were all the rage, and you could get hReview or hRecipe or XFN data everywhere?

Then every host in turn realized that actually, it's _better_ if people can't scrape your site, and it's even better if they can't even see it and it's behind a login wall.


“better” is too strong: in many cases, structured data is not a problem (and if it is, people will scrape it anyway), but there's simply no business case for spending time on it. Most of the semweb stack had a horrible developer experience — bad documentation, tools, validators, etc. — and rarely had tangible benefit from spending time slogging through it.

The semantic data which has actually been implemented on a wide scale happened because someone could go to their boss and say “Spending time on x will mean better Google ranking” or “Facebook will use their new sharing display for our pages”, and it was orders of magnitude simpler to implement so the time and risk were far more palatable.


Well, whether it's better depends on local incentives. But it's true that in many cases these push against making machine-readable data available, thus "semantic" tech becomes mostly irrelevant. Similarly, Linked Data has been most successful as Linked Open Data, where these incentives are explicitly aligned.


Indeed. Why would you expose all of your data to your competitors like Google, so they can commoditize you? (Incidentally, note that the big tech companies like the search engines are some of the major proponents of microformats, like for restaurants or local businesses... As always, 'commoditize your complement': https://www.gwern.net/Complement )


That’s a proximal cause. The root cause is that the Internet is not free, despite appearances. If hosting and bandwidth were free, we wouldn’t need businesses to do what we want. Wikipedia wouldn’t need donations. Everything would be great.


I'm working on the Semantic Web stack in a more limited setting of biomedical data. Performance is definitely a problem but the project is currently exiting pilot due to what were seen as satisfactory results in indexing and summarizing biomedical information, and bridging connections between domains of results (with human assistance).

This is a different outcome than in the commercial setting where the W3C is still imagining people as users of their computer rather than consumers of the services their computers connect to. But it also means that in certain technical domains where e.g. publication results are scaled out to oblivion but the ontologies are regular or made easily negotiable, there can be benefits for researchers.


I've read my share of SW papers: the fact that after a year more than half of links in such works are dead is more telling than the papers themselves.


The reason HTML pages doesn't validate is pretty simple: It does not provide any benefit for the publisher. Consider if the images didn't show up - you better believe the publisher would have it fixed immediately.

Same for the semantic web. Show the benefit for the publisher.


Agreed. Poetically written as well!


Just want to pile on more kudos. Nicely written.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: