I can easily believe that you can predict quite well ratings of Amazon reviews by a simple keyword-counting algorithm. Just by counting words like "good", "enjoyed", "fantastic" vs "terrible", "boring", "awful" you're gonna get a very strong correlation in that limited domain.
But what tests have you done on your broader methodology? What experiments can you really do to figure out the extent to which the use of the word "nosegay" is correlated with actual depression?
Also, as someone else said, where are the error bars? If there really is a correlation between word choice and other metrics, then some simple statistics should give you error bars on your other metrics, right?
Oh, one more thing: in the example on your website you say that the sentence:
"That joke kills me!"
is "subconsciously" aggressive. My question: would your algorithm rate that at exactly the same level of aggression as the sentence:
"I'm gonna kill you!"?
cuz, y'know, intuitively one seems rather more aggressive than the other.
It seems that the point is to introduce their special-sauce black box, with an argument to authority about its methodology. I think the correlations you ask for are where the problems will lie, in that there is a value judgement that is being hidden. If I can put myself out on a limb here, I'd say that that measurement is going to be fundamentally unscientific.
> Also, as someone else said, where are the error bars? If there really is a correlation between word choice and other metrics, then some simple statistics should give you error bars on your other metrics, right?
Please see my response to the original question about this. [1]
"That joke kills me!" vs. "I'm going to kill you!" (corrected grammar as we do not usually focus on slang)
They do not score exactly the same, though they are both very high on anxiety, hostility, and depression. The former has high marks for happiness and compassion due to the word "joke" being used.
But what tests have you done on your broader methodology? What experiments can you really do to figure out the extent to which the use of the word "nosegay" is correlated with actual depression?
Also, as someone else said, where are the error bars? If there really is a correlation between word choice and other metrics, then some simple statistics should give you error bars on your other metrics, right?
Oh, one more thing: in the example on your website you say that the sentence:
"That joke kills me!"
is "subconsciously" aggressive. My question: would your algorithm rate that at exactly the same level of aggression as the sentence:
"I'm gonna kill you!"?
cuz, y'know, intuitively one seems rather more aggressive than the other.