At an old job, I knew some very idealistic folks who kept pushing semantic web b...

riffraff · on Dec 19, 2019

I would argue the problem is not the broken tags, but the business disadvantage to exposing semantic data.

Remember when microformats were all the rage, and you could get hReview or hRecipe or XFN data everywhere?

Then every host in turn realized that actually, it's _better_ if people can't scrape your site, and it's even better if they can't even see it and it's behind a login wall.

acdha · on Dec 19, 2019

“better” is too strong: in many cases, structured data is not a problem (and if it is, people will scrape it anyway), but there's simply no business case for spending time on it. Most of the semweb stack had a horrible developer experience — bad documentation, tools, validators, etc. — and rarely had tangible benefit from spending time slogging through it.

The semantic data which has actually been implemented on a wide scale happened because someone could go to their boss and say “Spending time on x will mean better Google ranking” or “Facebook will use their new sharing display for our pages”, and it was orders of magnitude simpler to implement so the time and risk were far more palatable.

zozbot234 · on Dec 19, 2019

Well, whether it's better depends on local incentives. But it's true that in many cases these push against making machine-readable data available, thus "semantic" tech becomes mostly irrelevant. Similarly, Linked Data has been most successful as Linked Open Data, where these incentives are explicitly aligned.

gwern · on Dec 19, 2019

Indeed. Why would you expose all of your data to your competitors like Google, so they can commoditize you? (Incidentally, note that the big tech companies like the search engines are some of the major proponents of microformats, like for restaurants or local businesses... As always, 'commoditize your complement': https://www.gwern.net/Complement )

chongli · on Dec 19, 2019

That’s a proximal cause. The root cause is that the Internet is not free, despite appearances. If hosting and bandwidth were free, we wouldn’t need businesses to do what we want. Wikipedia wouldn’t need donations. Everything would be great.

bordercases · on Dec 19, 2019

I'm working on the Semantic Web stack in a more limited setting of biomedical data. Performance is definitely a problem but the project is currently exiting pilot due to what were seen as satisfactory results in indexing and summarizing biomedical information, and bridging connections between domains of results (with human assistance).

This is a different outcome than in the commercial setting where the W3C is still imagining people as users of their computer rather than consumers of the services their computers connect to. But it also means that in certain technical domains where e.g. publication results are scaled out to oblivion but the ontologies are regular or made easily negotiable, there can be benefits for researchers.

tasogare · on Dec 19, 2019

I've read my share of SW papers: the fact that after a year more than half of links in such works are dead is more telling than the papers themselves.

goto11 · on Dec 19, 2019

The reason HTML pages doesn't validate is pretty simple: It does not provide any benefit for the publisher. Consider if the images didn't show up - you better believe the publisher would have it fixed immediately.

Same for the semantic web. Show the benefit for the publisher.

rbosinger · on Dec 19, 2019

Agreed. Poetically written as well!

dandelo53 · on Dec 19, 2019

Just want to pile on more kudos. Nicely written.