Unless I'm missing something, this uses a simple synchronous for loop:
for text in texts:
key = (text, model)
if key not in pickle_cache:
pickle_cache[key] = openai_client.create_embedding(text, model=model)
embeddings.append(pickle_cache[key])
operations.save_pickle_cache(pickle_cache, pickle_path)
return embeddings
At the throughput rates I was seeing of one embedding per second, a million comments would take over a week to process!
I had to call the Gemini model with ten comments at a time from eight threads to reach even the paltry 3K rpm rate limit they offer to "Tier 1" customers.
Based on this experience, for real "enterprise" customers I might implement a generic wrapper for Google's Batch API that could handle continuous streaming from a database, chunking it, uploading, and then in parallel checking the status of the pending jobs and streaming the results back into a database.
Hey, idk if that helps but I developed something similar to the wrapper you're mentioning as an open-source python library.
Just plug any async function into the provided async context manager and you get Batch APIs in two lines of code with any existing framework you currently have: https://github.com/vienneraphael/batchling
Let me know if you have any questions, looking forward to having your feedback!
Looks very nice! This is exactly what I was thinking of doing, except that I work mostly with C# in enterprise settings.
Looking at your approach, the equivalent in .NET land would be if the Microsoft.AI.Extensions package added some sort of batch abstraction side-by-side (or on top of) their existing IChatClient or IEmbeddingGenerator interfaces.
Re-reading your comment :)
Yes, my demo has just a simple loop when loading the embeddings.
I was replying more towards the latency you mentioned. Because duckdb runs on device, you save yourself the additional round trip network time when comparing similarities.
This is still missing the "what" for me. What do you write down about the work?
Is it a plan for what you're about to work on? Is it a breakdown? Is it facts you learn as you work through something? Is it a minute by minute journal of what you've done? Is it just interesting details? Is it to-dos? Is it opinions you're trying to clarify?
Diagrams I get, my desk is covered in scribbled diagrams to help me visualise something or communicate it to a colleague.
For me, if it's worth thinking about it, it's worth writing it down. Doesn't matter if it's a todo list I just came up with, a system diagram, whatever I am currently working on, or thoughts on a human interaction I just witnessed. The act of writing it down guides me in my thinking.
- Notes about what I did, every so often. Or what I talked to someone about, what was decided.
- If I'm programming, I try to have a kind of plan for the next fifteen minutes / hour in a few sentences. "Going to refactor this now." "Updating the state here so it can hold this information." "Adding a component for this". Just so that I do think about what I'm going to do for a bit.
That sort of thing.
Apart from the to-do's the main point is to keep my focus, when I'm writing thoughts on paper I'm not on Hacker News. It doesn't matter all that much what the writing is, to me.
I really only find it useful when I'm investigating or troubleshooting some system I'm not familiar with.
A stupid yet accurate analogy is I turn up the log level for my brain lol
It's basically just a log file of everything I did and the result so I can pick it back up later, plus I include timestamps which helps me realize when I'm spinning my wheels for too long.
For building stuff, scribbling diagrams and flows is more useful if I need to work out something complex.
Every time you look up something on StackOverflow, refer to the API docs, or refer back to the ticket, use case, or requirements document, make a note of your question and the answer. Even when you stop typing to take a break for a moment, or after pushing code while you wait for the ci/cd pipeline, note down where you are and your last action or change.
Every time you start to write a TODO comment, make a note instead, or also.
Consider Kent’s Beck’s recommendation to write down every decision you make.
Frankly, at the beginning? Anything you feel like. You can start, perhaps, with Just a title of what you're doing, pomodoros style.
Maybe a note of something you thought but couldn't follow up on that moment.
Diagrams are good. Much easier to think and much better and faster doing by hand. I always get distracted by the tool when I'm drawing in a computer. Even artist-modd
I also make bullet points of general ideas that I'm trying to accomplish.
Doodles.
Important thing is, don't fret. Over time you'll find how it works for you.
> Version 2.1.32:
• Claude Opus 4.6 is now available!
• Added research preview agent teams feature for multi-agent collaboration (token-intensive feature, requires setting
CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1)
• Claude now automatically records and recalls memories as it works
• Added "Summarize from here" to the message selector, allowing partial conversation summarization.
• Skills defined in .claude/skills/ within additional directories (--add-dir) are now loaded automatically.
• Fixed @ file completion showing incorrect relative paths when running from a subdirectory
• Updated --resume to re-use --agent value specified in previous conversation by default.
• Fixed: Bash tool no longer throws "Bad substitution" errors when heredocs contain JavaScript template literals like ${index + 1}, which
previously interrupted tool execution
• Skill character budget now scales with context window (2% of context), so users with larger context windows can see more skill descriptions
without truncation
• Fixed Thai/Lao spacing vowels (สระ า, ำ) not rendering correctly in the input field
• VSCode: Fixed slash commands incorrectly being executed when pressing Enter with preceding text in the input field
• VSCode: Added spinner when loading past conversations list
If it works anything like the memories on Copilot (which have been around for quite a while), you need to be pretty explicit about it being a permanent preference for it to be stored as a memory. For example, "Don't use emoji in your response" would only be relevant for the current chat session, whereas this is more sticky: "I never want to see emojis from you, you sub-par excuse for a roided-out spreadsheet"
90-98% of the time I want the LLM to only have the knowledge I gave it in the prompt. I'm actually kind of scared that I'll wake up one day and the web interface for ChatGPT/Opus/Gemini will pull information from my prior chats.
I've had claude reference prior conversations when I'm trying to get technical help on thing A, and it will ask me if this conversation is because of thing B that we talked about in the immediate past
All these of these providers support this feature. I don’t know about ChatGPT but the rest are opt-in. I imagine with Gemini it’ll be default on soon enough, since it’s consumer focused. Claude does constantly nag me to enable it though.
Had chatgpt reference 3 prior chats a few days ago. So if you are looking for a total reset of context you probably would need to do a small bit of work.
Claude told me he can disable it by putting instructions in the MEMORY.md file to not use it. So only a soft disable AFAIK and you'd need to do it on each machine.
I ran into this yesterday and disabled it by changing permissions on the project’s memory directory. Claude was unable to advise me on how to disable. You could probably write a global hook for this. Gross though.
I understand everyone's trying to solve this problem but I'm envisioning 1 year down the line when your memory is full of stuff that shouldn't be in there.
I looked into it a bit. It stores memories near where it stores JSONL session history. It's per-project (and specific to the machine) Claude pretty aggressively and frequently writes stuff in there. It uses MEMORY.md as sort of the index, and will write out other files with other topics (linking to them from the main MEMORY.md) file.
It gives you a convenient way to say "remember this bug for me, we should fix tomorrow". I'll be playing around with it more for sure.
I asked Claude to give me a TLDR (condensed from its system prompt):
----
Persistent directory at ~/.claude/projects/{project-path}/memory/, persists across conversations
MEMORY.md is always injected into the system prompt; truncated after 200 lines, so keep it concise
Separate topic files for detailed notes, linked from MEMORY.md
What to record: problem constraints, strategies that worked/failed, lessons learned
Proactive: when I hit a common mistake, check memory first - if nothing there, write it down
Maintenance: update or remove memories that are wrong or outdated
Organization: by topic, not chronologically
Tools: use Write/Edit to update (so you always see the tool calls)
> Persistent directory at ~/.claude/projects/{project-path}/memory/, persists across conversations
I create a git worktree, start Claude Code in that tree, and delete after. I notice each worktree gets a memory directory in this location. So is memory fragmented and not combined for the "main" repo?
Yes, I noticed the same thing, and Claude told me that it's going to be deleted.
I will have it improve the skill that is part of our worktree cleanup process to consolidate that memory into the main memory if there's anything useful.
Wild how a single poor word choice can derail so much of HN's comments into the kind of nit-picking we're seeing here. Thanks for correcting! We'll probably have much better comments here because of it.
A lot of apps people use these days are cloud-first and automatically save all the time, so there's not even a save button to have a floppy icon for! The icon to say that it's synced looks like a cloud, and if you're using a web browser it'll probably have a Download button with a download icon. No floppy disks in sight.
I wouldn't be surprised if there's computer users out there that wouldn't recognise the "save icon".
I disagree. Not all it's "autosave on cloud", and some apps keeps having an explicit save something button or option.
I recently had a discussion about replacing the "save icon" (IE. the old floppy disk icon) for an icon with an arrow pointing down, for a button that saves (don't download!) a custom query of the user in the system. Perhaps it could be replaced with another icon, but not by someone that everyone would think is "Download".
Huh, that makes sense given "all dressed" came from French and New Orleans' French history.
I'm not sure why we both ended up with "dressed" given the French is literally "all garnishes / toppings" or "wholly garnished / topped". I'm sure some linguist could probably do a dissertation on this or something. And hopefully also cover how Saskatchewan ended up with using "all dressed" because I'm really curious about that outlier.
> 1. To decorate with ornaments; to adorn; to embellish.
(Bonus: "garnish" is etymologically related to "warn". There are many such other pairs in English, e.g. "guarantee" / "warranty" and "guard" / "ward". (As I understand it: the Gauls could pronounce the "g", but the Franks couldn't.)
https://github.com/patricktrainer/duckdb-embedding-search
reply