Hacker Newsnew | past | comments | ask | show | jobs | submit | krasun's commentslogin

It is a good one to fix. Thank you!

The "guesswork" done by browsers is actually pretty nuanced and not standardised in a slightest way. Some defaults are pretty common, and could be maybe considered de-facto standard, but I wouldn't want to draw the line where "most" browsers agree or should agree.

Personally, I have my browser set up to "guess" as little as possible, never do the search from the URL bar unless explicitly told to do so using a dedicated search keyword (plus I still keep separated auto-collapsing search bar). I have disabled all guessing for TLDs, auto prepending www. In short, when I enter "whatever" into my URL bar, my browser tries to load to "http://whatever/", what could be my local domain and I could get an answer -- it is is a valid URL after all. In a related note, I strongly doubt that any browser does the web search for "localhost".

The rabbit hole could naturally go even deeper: for example most browser still interpret top-level dataURIs. It is not that long browsers interpreted top-level `javascript:` URIs entered into URL bar, now surviving in bookmarklets but taken from all users for the sake of a pitiful "self-XSS prevention".

So I would be really careful telling what happens -- or, god forbid, should happen -- when someone types something into their URL bar: "whatever" could be a search keyword with set meaning: - it could be bound to http URL (bookmark), - the bookmark URL could have a `%s` or `%S` and then it would do the substitution, - it could be a `javascript:…` bookmark ("bookmarklet"/"favelet"; yes, most browser still let you do that, yet alas, mostly fail to treat CSP in a way it would remain operational). - It could be a local domain.

The fact that, statistically, "most" browsers will do a web search using some default engine is probably correct but oversimplifying claim that glosses over quite a lot of interesting possibilities.


Thank you for the suggestion! Would be writing something like "DOM in the modern browsers" more correct then?

> Would be writing something like "DOM in the modern browsers" more correct then?

No, I don't think so. I don't know why the GP comment is at the top beyond historical interest. If you continue with your plans mentioned elsewhere to cover things like layout, rendering, scripting, etc, under this standard almost everything will have to have the "in modern browsers" added to it.

Part of the problem is the term "DOM" is overloaded. Fundamentally it's an API, so in that sense it only has meaning for a browser to "have a DOM" if it supports scripting that can use that API. And, in fact, all browsers that ever shipped with scripting have had some form of a DOM API (going back to the retroactively named DOM Level 0). That makes sense, because what's the point of scripting if it can't interact with page contents in some way?

So, "Lynx remains a non-DOM browser by design" is true, but only in the sense that it's not scripted at all, so of course it doesn't have DOM APIs, the same way it remains a non-canvas browser and a non-webworker browser. There's no javascript to use those things (it's a non-cssanimation browser too).

There's a looser sense of the "DOM", though, that refers to how HTML parsers turn an HTML text document into the tree structure that will then be interpreted for layout, rendering, etc.

The HTML spec[1] uses this language ("User agents must use the parsing rules described in this section to generate the DOM trees from text/html resources"), but notes it's for parsing specification convenience to act as if you'll end up with a DOM tree at the end of parsing, even if you don't actually use it as a DOM tree ("Implementations that do not support scripting do not have to actually create a DOM Document object, but the DOM tree in such cases is still used as the model for the rest of the specification.")

In that broader sense, all browsers, even non-modern ones (and Lynx) "have a DOM", since they're all parsing a text resource and turning it into some data structure that will be used for layout and rendering, even if it's the very simple layouts of the first browsers, or the subset of layout that browsers like Lynx support.

[1] https://html.spec.whatwg.org/multipage/parsing.html


I wouldn't do anything to "correct" your guide - I think it is "correct" as is. This comment is great for its informational content but I'd consider it an addendum, not an erratum.

If you like it might be nice to include a section on historical and/or niche browsers that lack some of the elements this guide describes - like e.g. Dillo which is a modern browser that supports HTML4 & doesn't support Javascript. But your guide should (imho) centrally focus on the common expectation of how popular browsers work.


Will cover the rendering engine in more details. I didn't know at what sections to go deeper. So just stopped and published it to gather more feedback.

Thank you!


I thought about this, but I tried to keep it simple. Let me figure out how to add these blocks without over-complicating the guide.

Thank you!


Thank you! Fixing it...

I am planning to add more sections with more details. But decided first to collect some feedback.

Thank you! It is a good suggestion. Let me think about it.


I spent a lot on servers. Around $5500 all monthly expenses.


Thank you!


It was unimaginably tough. If I were to start again, I wouldn’t do it. I would choose a much easier niche.

SEO, social media and other channels. I spent a lot of time on all of that.


What would be an easier niche in your opinion?


I managed my own cluster.

I didn’t consider wrapping any service.

What needed for scraping is a bit different for what needed to screenshot websites.

I need to have full control over my cluster to guarantee the best possible quality.


It is great!

I signed up on my phone and tested in the playground.

It will fit perfectly into my workflow. I'm building a hyper-local directory site.

Getting good images for businesses is hard, so I'll use this to grab an image of their site as a place holder.

I can also add it to my AI workflow where I pass a website to OpenAI Assistant to extract data. OpenAI s not as robust with URLs as it is with images or PDFs. Often it won't visit then URL.

I can use this to get an image or pdf, pass it on and ask for the data back. OpenAI is better with files than URLs in my experience.

Good job!

Well done!


Don't you get problems with cloudflare blocking your browsers?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: