You mentioning youtube fast cuts is sort of the reaction I felt too, but at least in those cases they're usually filmed around the same time by a human.
Outside of the generated glitching in the sound here, my main complaint is that sentence umpteen sounds the same as sentence one. When we speak regularly, our intonation and cadence moves over time and the subject matter. A sentence here sounds okayish, but all the sentences in a row sounds like they're generated discretely (which I assume they technically are), and all the cohesion is gone.
Outside of the generated glitching in the sound here, my main complaint is that sentence umpteen sounds the same as sentence one. When we speak regularly, our intonation and cadence moves over time and the subject matter. A sentence here sounds okayish, but all the sentences in a row sounds like they're generated discretely (which I assume they technically are), and all the cohesion is gone.