Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> no one training these models feeds their own proprietary source code into publicly available models

You are presupposing that the company’s own code would somehow be a massive boon for the model, resulting in lower loss overall.

In reality, it would skew the model towards that company’s “mode” of coding which isn’t what “normal” programmers expect. In fact, they are most likely to expect the coding styles they learned from, and that is most likely to be found in public examples (GitHub, StackOverflow, textbooks, Reddit, etc.)

This argument is so silly to me. Anyone who has worked at a large enterprise, whether it’s Google, Amazon or Target, knows that company code is effectively guaranteed to be extremely hard to work with. This happens for organizational reasons and really the best thing to do about it is admit that it’s happening rather than pretend it’s all perfect.



The reason they don't train the model on their code is specifically because they don't want it accidentally spitting out snippets of their proprietary code, not because the code is "extremely hard to work with."

I'm amazed you called that argument silly while countering with this.


It is because of both and I agree that your reasoning takes clear precedence. I was merely pointing out the good faith position that “even if they wanted to, they wouldn’t do it”.

That definitely wasn’t very clear from my comment however.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: