How's that different from pasting the text in the first chat and running the vector embedding step on the text on the server (maybe at least bypassing the chat text limit)? Does this fix the amnesia issue where the info from chats longer than the context length is forgotton because the document isn't baked directly into the weights like fine tuning?