Hacker Newsnew | past | comments | ask | show | jobs | submit | freediver's commentslogin

So sad to hear this. Heartbroken.

Welcoming a PR to fix it.

What’s the review process for all this? It looks to only be you based on git and your HN bio. Can you get some trusted community lieutenants who can police the lists and create a file with their names ? And then anyone can raise a PR to remove or add someone to that list of lieutenants and enough approvals from community can get the PR through. I’m not super familiar with GitHub settings but sure there is a way to setup some sort of democratic process for this.

Love the site btw, but I fear if you don’t get some help you’re going to trust all PRs and let them fly and trust people will issue removal PRs. For instance, my girlfriend loves the concept as well and was immediately hooked but she doesn’t know GitHub, so there’s no practical mechanism for her to remove bad content. And editing files on mobile GitHub is a pain, I probably wouldn’t even consider doing it, especially with a text file that big. So there’s going to need to be some sort of loyal community policing or voting system, because at end of day all these sites could be flipped to bad content as soon as they make it past review.

It is a 'best effort' effort. GitHub friction is there on purpose, as it is a one man effort.

Click the Topics button.

How did I miss topics button!? It makes the whole experience absolutely different.

Next post is random. Show similar uses semantic search to find most similar indexed post. You also get the same effect if you 'like' a post.

It would've been ironic if we had infinite resources, but this entire project is built and maintained by one person - me. If this is the only thing you found wrong with it - I'll pat myself on the back :)

ps. If you don't want to see this blog again in Small Web, jsut submit a PR to remove it - index of sites is a crowd-sourced effort. Cheers


Kagi Small Web has about 32K sites and I'd like to think that we have captured most of (english speaking) personal blogs out there (we are adding about 10 per day and a significant effort went into discovering/fidning them).

It is kind of sad that the entire size of this small web is only 30k sites these days.


Suspect there's a long tail/iceberg you still haven't captured (source: you haven't found me yet and I'm not hiding, I'm just not chasing SEO).

Same - but mine are also primarily so I can hand out links to specific articles - they're not hidden but they're not advertised either (and they're static sites with almost zero logging, so I wouldn't really notice either except that this site has a published list :-)

I am happy to hear this.

Hi, I took a quick look around the niche I'm interested in, and there's a lot of local history blogs you're missing. One of the bigger examples: https://threadinburgh.scot/

On reflection, maybe you've captured the bulk of the "Small Web Movement" (the technology-leaning bit of the blogosphere that is self-consciously part of a reactionary movement against the corporate web) but you haven't captured the bulk of the still-active blogosphere?

So I've got a question: What's the mission statement for kagisearch/smallweb - a curated list of Small Web sites, or a curated list of active blogosphere sites?

Because the current strategy for adding sites seems heavily biased towards the small web movement to me.


> I'd like to think that we have captured most of (english speaking) personal blogs

I think that's naive.

But maybe thats just because my blog wasn't on the list :)


Neither was either of mine, but I don't advertise them and specifically don't post them on social media

Neither is mine, But that's fine with me.

That is about to change :)

Hell yeah !

What methods are you using to find them? I notice my own doesn't appear, although it does show up well under some (very niche) Google search terms. I suspect there's the potential for an order of magnitude more sites than have been found.

Checking HN every day to see if something interesting surfaces :)

I noticed that Kagi Small Web tends to lean towards more tech focused blogs. So it feels more like you've captured that subset of the small web, especially if your main source is hackernews.

Not sure if you've used this as a source too but there's a lot of tiny personal sites in this directory too. https://melonland.net/surf-club


Does this use frames or iframe? https://kagi.com/smallweb

I would expect a raw link in the top bar to the page shown, to be able to bookmark it etc.


There is a '↗'-shaped icon in the navigation bar at the top. If you click on that it takes you to the original post in a new tab. On Firefox and Safari, you can also right click that icon and add the original post to the bookmarks.

Not visible on iphone xs/13 mini.

FYI frames don't exist any more. They're not supported by browsers.


Does this concept of "personal blog" include people periodically sharing, say, random knowledge on technical topics? Or is it specifically people writing about their day-to-day lives?

How would I check if my site is included?


You can check: <https://github.com/kagisearch/smallweb/blob/main/smallweb.tx...>. I can see that your RSS URL is listed there.

But it currently does not appear in the search results here: <https://kagi.com/smallweb/?search=zahlman>. The reason appears to be this:

"If the blog is included in small web feed list (which means it has content in English, it is informational/educational by nature and it is not trying to sell anything) we check for these two things to show it on the site: • Blog has recent posts (<7 days old) [...]"

(Source: https://github.com/kagisearch/smallweb#criteria-for-posts-to...)


Why would you only include blogs in your small web index? That must be a minute fraction of what is out there?

I can't think of a single blog that I read these days (small or not), yet there are loads of small "old school" sites out there that are still going strong.


> Why would you only include blogs in your small web index?

I am not associated with this project, so this would be a question for the project maintainer. As far as I understand, the project relies on RSS/Atom feeds to fetch new posts and display them in the search results. I believe, this is an easier problem to solve than using a full blown web crawler.

However, as far as I know, Kagi does have its own full blown crawler, so I am not entirely sure why they could not use it to present the Small Web search results. Perhaps they rely on date metadata in RSS feeds to determine whether a post was published within the last seven days? But having worked on an open source web crawler myself, many years ago, I know that this is something a web crawler can determine too if it is crawling frequently enough.

So yes, I think you have got a good point and only the project maintainer can provide a definitive answer.


I think it includes anything that's in the form of a chronological list of posts and noncommercial.

If you made a website instead of a blog, well... you're excluded. It's the small blogosphere, not the small web


It’s not 30k, it’s well over a million: https://screenshots.nry.me/

I'm noticing sites that break the rules. I report (flag) them, is that useful or should I just PR to remove them?

PR is better!

It doesn't have mine, because no rss

I mainly use Kagi Small Web as a starting point of my day, with my morning coffee. Especially now when categories are added, always find something worth reading. The size here does not present a problem as I would usually browse 20-30 sites this way.

Right, but that basically works as a retro alternative to scrolling through social media. If you're looking for something specific, it's simultaneously true that there's a small web page that answers your question and that it's not on any "small web" list because the owner of the webpage never submitted it there, or didn't meet the criteria for inclusion.

For example, I have several non-commercial, personal websites that I think anyone would agree are "small web", but each of them fails the Kagi inclusion criteria for a different reason. One is not a blog, another is a blog but with the wrong cadence of posts, etc.


Feel free to suggest changes to criteria for inclusion. It is mostly the way it is now as the entire project is maintained by one person - me :)

It might sound stupid, but I'm not a git or github user, I would rather fill in a webform to submit a new website and feed.

The (artificial) barrier to entry is there for a reason - one person maintains the entire project and because it is fairly technical to submit the acceptance rate has been close to 99%.

I guessed that might be the reason, smart move. Have you tried a webform in the past which resulted in a lot of crappy submissions?

Looking at the criteria again, I can think of at least three things that arbitrarily exclude large swathes of the small web:

1) The requirement that it needs to be a blog. There's plenty of small-web sites of people who obsess over really wonderful and wacky stuff (e.g., https://www.fleacircus.co.uk/History.htm) but don't qualify here.

2) The requirement that it needs to be updated regularly. Same as above - I get that infrequently updated websites don't generate a "daily morning" feed, but admitting them wouldn't harm in any way.

3) Blanket ban on Substack-like platforms while allowing Blogspot, Wordpress.com, YouTube, etc. Bloggers follow trends, so you're effectively excluding a significant proportion of personal blogs created in the last six years, including the stuff that isn't monetized or behind interstitials. The outcomes are pretty weird: for example, noahpinionblog.blogspot.com is on your list, but noahpinion.blog is apparently no longer small web.


1) It has to have a feed (we dont want to overcrawl) so hence 'blog' - more accurately any site with an RSS/atom feed would do

2) 'Regularly' means posted in the last 2 years to be included

3) Substack has an annoying subcribe popup and ads/popups are against the spirit of what this represents


To clarify criteria is less than 2 years since last blog post.

You may want to clarify that on https://github.com/kagisearch/smallweb because the README there says:

> Blog has recent posts (<7 days old)

This may be different than inclusion criteria for websites in general, but on first read it looks like it has to be very active.

I might have missed something while skimming it, but would assume others would miss it as well.


There's two criteria, I agree it's hard to skim:

* The blog must have a recent post, no older than 12 months, to meet the recency criteria for inclusion.

* Criteria for posts to show on the website: Blog has recent posts (<7 days old), The website can appear in an iframe

The latter criteria is for the website / post to appear in Kagi's random Small Web feature, where they display the blog post in an iframe. (So I think only posts from the last week are displayed there.) Being on the list should ensure that any new posts could be displayed in Small Web though, and presumably that the website is indexed in Kagi's Teclis index as well. At least, I really hope that the Teclis index is including all of those old blog posts too, and not discarding them.

EDIT: I just realized freediver actually is Vladimir - I'd love to know if Teclis does index all those older blog posts too. I assume it does index everything that is still present in the RSS feeds?


Thank you. I swear I read that three times and missed the other criteria until you pointed it out and then I found it. :/

As much as I use AI in daily workflows, I do not think an AI-first society will ever be a thing.

Historically there is no evidence of that happening with tech revolutions - or rather perhaps you could say to some extent - you can not say that we are an internet-first society, or cars-first society or mobile phone - first society despite these being profound technological revolutions.

And more importantly, the only science fiction movies that talk about "AI first societies" tend to be dystopian in nature (eg Terminator). And humans eventually always do better than that.

As much as the world in Star Trek is advanced for example, with all the fancy AI there is, it is still a human-first society. Only 10% of any Star Trek is about AI and fancy technologies, 90% is still human drama.


"Historically there is no evidence of that happening with tech revolutions - or rather perhaps you could say to some extent - you can not say that we are an internet-first society, or cars-first society or mobile phone - first society despite these being profound technological revolutions."

I'm... not actually sure I agree. The US *has* become a more cars first society. Our cities are designed around cars: parking space requirements for business, lacking of biking infrastructure in favor of more lanes, even the introduction of jaywalking as a crime. We've become much more of an internet first society too, we don't use books for research, our banking is largely done online, even humans social circles have moved much more online (probably to the detriment of society).

None of those technologies are as powerful/disruptive as where it seems that AI and LLMs are headed, so it's possible that societies shift towards "AI-first" will be more profound that it was for any of the other technologies listed.


People could not imagine how the PC was going to be a dominant computing paradigm until it was. I think I would argue in the direction that "this seems less likely". But I have been in this game almost 30 years. Anything goes. Also America looks "car first" empirically speaking from where I sit. The thing I am asking is if AI alters the collective human survival loop enough. Cars absolutely did. If people collectively can use AI to create a survival benefit they will. If enough people do this it starts looking more and more like an essential thing and not separable from the societies survival. So maybe it is the framing of "x-first" it is more like "x-dependent" perhaps? And what is a survival benefit? Just ask your brain why we go to work every week :)



FWIW, I'm also a paid Kagi user and would like very much if I could use it with SearXNG or potentially have it include my own self-hosted services as part of my personal search results.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: