Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

could someone please explain what this means?

> cluster the inconsistent labels by embedding them in a vector space



For any label it generates, he uses an embedding model to generate the embedding vector for it. (You could say that "embeds the label in a vector space") Then, he looks at that generated embedding for that label and asks if there is another previously-generated embedding that had a _very_ similar embedding generated for it. If you set some sort of "how close is close enough" threshold for that, you are "clustering" all generated labels, by saying "These 10 labels have slightly different words, but essentially mean the same thing.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: