Kinda wish they would use some more letters when writing the sample code. This terse style in general is fine but when the purpose of the code is to try to explain how something works, it's adding an extra layer of difficulty decoding what the heck 'nns, 'cv', 'd', 'e' and 'ef' is supposed to mean.
Like this ChatGPT-looking function
nns = [best]
visit = set(best) # set of visited nodes
candid = [best] # candidate nodes to insert into nearest neighbors
# find top-k nearest neighbors
while candid:
cv = heappop(candid)
if nns[-1][0] < cv[0]:
break
# loop through all nearest neighbors to the candidate vector
for e in graph[cv[1]][1]:
d = np.linalg.norm(graph[e][0] - query)
if (d, e) not in visit:
visit.add((d, e))
# push only "better" vectors into candidate heap
if d < nns[-1][0] or len(nns) < ef:
heappush(candid, (d, e))
insort(nns, (d, e))
if len(nns) > ef:
nns.pop()
code could have been trivially cleaned up and it would have been so much easier to follow. I'm guessing what some of these variables are supposed to mean.
nearest_neighbors = [best]
visited = set(best)
candidates = [best]
# find top-k nearest neighbors
while candidates:
candidate_vector = heappop(candidates)
if nearest_neighbors[-1][0] < candidate_vector[0]:
break
# loop through all nearest neighbors to the candidate vector
for neighbor in graph[candidate_vector[1]][1]:
distance = np.linalg.norm(graph[e][0] - query)
if (distance, neighbor) in visit:
continue
visited.add((distance, neighbor))
# push only "better" vectors into candidate heap
if distance < nearest_neighbors[-1][0] or len(nearest_neighbors) < epsilon:
heappush(candidates, (distance, neighbor))
insort(nns, (distance, neighbor))
if len(nearest_neighbors) > epsilon:
nearest_neighbors.pop()
Yeah it wasn't referenced or mentioned anywhere in the article so I had to guess what it was :-/ ... which I guess supports my point.
ef being "epsilon factor" was the least nonsensical thing I could come up with since it looked analogous to how you do like abs(a-b)<epsilon in floating point comparisons.
Probably a comment above the function explaining the incoming parameters, with a link to the specs if necessary, and then comments in the text and better variable names.
I think in this scenario when the primary audience of the code isn't an interpreter, but people who don't understand HNSWs, and who will probably only glance over the code once or twice, you really can't lay on the verbosity heavily enough.
Actually that’s a common misunderstanding of the ePrivacy directive (often bundled with the GDPR(, if you’re only providing an accept dialog you might as well save the effort and not show a cookie notice at all.
You need to obtain consent before you set any marketing cookies and effective consent can’t be obtained if your only option is “yeah track me”
Like this ChatGPT-looking function
code could have been trivially cleaned up and it would have been so much easier to follow. I'm guessing what some of these variables are supposed to mean.