Nice post. As the author of the original snippet I'll admit it wasn't meant to be production-quality code, just a toy example in the context of a blog post. When I find code like that in a reasonably large production codebase I tend to refactor it too. I do believe in optimizing for readability by default; in Clojure this sometimes requires a bit of effort because of the "stalagmite" effect OP describes.
There is no loop, and it almost reads like English: "Using :start as a departure point, iteratively pick the next words until you find one that ends with a dot. Join all the words generated so far with spaces."
Small catch: take-until is not yet in clojure.core, however it's easy to add to your utility toolbelt and there's already an issue open for it http://dev.clojure.org/jira/browse/CLJ-1451
Iterate was what I was looking for! That version is definitely better, even if you have to DIY take-until.
I, er, may have spent a while clicking around the related docs for loop and recur searching for a better function. I remembered there was something like this out there, but I never did find it.
In terms of cleanly communicating one's intent, I'd also suggest a look at Sean Johnson's talk about letting the function arguments determine what version of the function is used. This is a common idea in most programming languages and some Functional languages, such as Haskell, tend to take it to an extreme. And yet, it is not as common as it should be in Clojure.
One of his rules is "If your function starts with a conditional, get rid of the conditional and look for that pattern in your function arguments." That is, move the conditional into the different versions of your function.
Also, his "Recursive Function Pattern Matching Pattern" makes recursion much easier to read. See the screenshots here and then click through to the video:
I really enjoy Adam's blog. Very plain english explanations for everything. This and Clojure for Brave and True make the language much more approachable.
For me, this refactoring doesn't solve anything of note, and it introduces risks of its own. The function make-markov-mapping returns a weird intermediate data structure (a lazy sequence of tiny maps). You don't want that getting out into your codebase, which wasn't a risk in the original. We've also further embedded the string handling, which means when it comes to refactor away from just strings, we have more work to do and more places to do it. Also, just in terms of readability which is of course slightly better, we've still got things like nested threading macros which are a big cognitive overhead.
Ultimately, we're revealing no new insight about the underlying problem and offering no new capabilities for the future.
So, worried that I was being overly-negative, I thought I'd offer something of substance. If you wanted to separate the Markov bits from the string bits, you could do it thusly:
I'm not saying this is in any way strictly better, but it leaves you with abstractions that can be reused, and text-specific stuff in one place that could be a separate namespace. This is what I'd look for in a code-review of a refactoring like this - not just neatness, but making abstractions explicit and separate, naming as many things as possible, etc etc.
I think the point of the refactoring was not to "solve" anything, and certainly not to make anything more abstract. Rather it was to make the code more readable. For me, a Clojure newbie, it succeeded admirably at that purpose.
$ txr -i markov.tl
1> (defvar map (markov-data
"Influence in society, however, is a capital which has to be economized \
\ if it is to last. Prince Vasili knew this, and having once realized that \
\ if he asked on behalf of all who begged of him, he would soon be unable \
\ to ask for himself, he became chary of using his influence. But in \
\ Princess Drubetskaya's case he felt, after her second appeal, something \
\ like qualms of conscience. She had reminded him of what was quite true; \
\ he had been indebted to her father for the first steps in his career. \
\ Moreover, he could see by her manners that she was one of those women-- \
\ mostly mothers--who, having once made up their minds, will not rest \
\ until they have gained their end, and are prepared if necessary to go on \
\ insisting day after day and hour after hour, and even to make scenes. \
\ This last consideration moved him."))
map
2> (sentence map)
"Prince Vasili knew this, and having once realized that she was quite true; he
could see by her manners that if it is to ask for himself, he felt, after hour,
and even to last."
The problem here is that termination is indicated by a word which ends in a period, and that word must be included in the word list. This is a poor form of "in-band signaling".
Kludgy sequence operators which take or produce one extra item which doesn't satisfy the inclusion predicate can be avoided by proper design of the data representation which avoids this type of framing.
TXR Lisp version fixed to use the t symbol for start, nil for end, and straight giterate while true. (I.e. while the sequence element isn't nil):
(There's actually a very good reason Haskell cares about termination: it means you can be precise about side-effects. With a classic lazy list, any side effects caused by it are determined by how many items you actually evaluate.)
I've looked at this before; I decided to implement it finally, calling it expand-right. (I use the "reduce" terminology rather than "fold", so the antonym has to be consistent).
With explicit lambdas, decrement from 5 to zero, not including it in list: