Writing Friendlier Clojure

diego · on Oct 5, 2015

Nice post. As the author of the original snippet I'll admit it wasn't meant to be production-quality code, just a toy example in the context of a blog post. When I find code like that in a reasonably large production codebase I tend to refactor it too. I do believe in optimizing for readability by default; in Clojure this sometimes requires a bit of effort because of the "stalagmite" effect OP describes.

mraison · on Oct 5, 2015

Nice post! What about the following (untested) version of the sentence function:

    (defn sentence [mapping]
        (->> :start
             (iterate #(pick-next-word mapping %))
             next
             (take-until #(= (last %) \.))
             (clojure.string/join " ")))

There is no loop, and it almost reads like English: "Using :start as a departure point, iteratively pick the next words until you find one that ends with a dot. Join all the words generated so far with spaces."

Small catch: take-until is not yet in clojure.core, however it's easy to add to your utility toolbelt and there's already an issue open for it http://dev.clojure.org/jira/browse/CLJ-1451

adambard · on Oct 5, 2015

Iterate was what I was looking for! That version is definitely better, even if you have to DIY take-until.

I, er, may have spent a while clicking around the related docs for loop and recur searching for a better function. I remembered there was something like this out there, but I never did find it.

mjcohen · on Oct 5, 2015

So many programming languages are like magic: you have to know the name of something before you can control it.

moomin · on Oct 6, 2015

While we're code golfing:

1) Use a #"[.]$" to detect the full stop 2) Just pass in the iteration function (partial pick-next-word mapping)

lkrubner · on Oct 5, 2015

In terms of cleanly communicating one's intent, I'd also suggest a look at Sean Johnson's talk about letting the function arguments determine what version of the function is used. This is a common idea in most programming languages and some Functional languages, such as Haskell, tend to take it to an extreme. And yet, it is not as common as it should be in Clojure.

One of his rules is "If your function starts with a conditional, get rid of the conditional and look for that pattern in your function arguments." That is, move the conditional into the different versions of your function.

Also, his "Recursive Function Pattern Matching Pattern" makes recursion much easier to read. See the screenshots here and then click through to the video:

http://www.smashcompany.com/technology/the-recursive-functio...

Wonnk13 · on Oct 5, 2015

I really enjoy Adam's blog. Very plain english explanations for everything. This and Clojure for Brave and True make the language much more approachable.

papaf · on Oct 5, 2015

I know that writing code comments isn't in fashion at the moment, but both the original and the refactored code would have benefited from comments.

thom · on Oct 5, 2015

For me, this refactoring doesn't solve anything of note, and it introduces risks of its own. The function make-markov-mapping returns a weird intermediate data structure (a lazy sequence of tiny maps). You don't want that getting out into your codebase, which wasn't a risk in the original. We've also further embedded the string handling, which means when it comes to refactor away from just strings, we have more work to do and more places to do it. Also, just in terms of readability which is of course slightly better, we've still got things like nested threading macros which are a big cognitive overhead.

Ultimately, we're revealing no new insight about the underlying problem and offering no new capabilities for the future.

thom · on Oct 5, 2015

So, worried that I was being overly-negative, I thought I'd offer something of substance. If you wanted to separate the Markov bits from the string bits, you could do it thusly:

    (defn markov-chain [coll]
      (let [adjacencies (partition 2 1 coll)
	    initial-chain {::start [(first coll)]
			   (last coll) [::end]}]
	(reduce 
	 (fn [chain [predecessor successor]]
	   (assoc chain predecessor (conj (get chain predecessor []) successor)))
	 initial-chain
	 adjacencies)))

    (defn successor [chain item]
      (rand-nth (get chain item)))

    (defn follow-chain [chain]
      (->> ::start
	   (iterate (partial successor chain))
	   (remove #{::start ::end})
	   (take-while some?)))

    (defn sentences [text]
      (clojure.string/split text #"\."))

    (defn words [sentence]
      (remove empty? (clojure.string/split sentence #"\s+")))

    (defn markov-text [text]
      (->> text
	   sentences
	   (map words)
	   (map markov-chain)
	   (apply merge-with into)))

    (defn render-sentence [words]
      (str (clojure.string/join " " words) \.))

    (defn sentence [chain]
      (render-sentence (follow-chain chain)))

I'm not saying this is in any way strictly better, but it leaves you with abstractions that can be reused, and text-specific stuff in one place that could be a separate namespace. This is what I'd look for in a code-review of a refactoring like this - not just neatness, but making abstractions explicit and separate, naming as many things as possible, etc etc.

o_nate · on Oct 5, 2015

I think the point of the refactoring was not to "solve" anything, and certainly not to make anything more abstract. Rather it was to make the code more readable. For me, a Clojure newbie, it succeeded admirably at that purpose.

kazinator · on Oct 5, 2015

TXR Lisp:

  (defun wordlist-sentence (sentence)
    [(opip
       `@1.`
       (split-str @1 #/\s+/)
       (cons :start)
       (remqual ""))
     sentence])
  
  (defun make-markov-mapping (sentence)
    (let ((wordlist (wordlist-sentence sentence)))
      (hash-from-pairs (mapcar (ret ^(,@1 (,@2))) wordlist (rest wordlist))
                       :equal-based)))
  
  (defun markov-data (text)
    [(opip
       (split-str @1 #/\./)
       (mapcar make-markov-mapping)
       (reduce-left (op hash-uni @1 @2 append) @1 (hash :equal-based)))
     text])
  
  (defun pick-next-word (map word)
    (let ((choices [map word]))
      [choices (rand (length choices))]))
  
  (defun sentence (map)
    (for
      ((words ())
       (this-word :start)
       (next-word "X"))
      ((not (eql [next-word -1] #\.))
       (cat-str words " "))
      ((set next-word (pick-next-word map this-word))
       (set words ^(,*words ,next-word))
       (set this-word next-word))))

Run:

  $ txr -i markov.tl
  1> (defvar map (markov-data
    "Influence in society, however, is a capital which has to be economized \
    \ if it is to last. Prince Vasili knew this, and having once realized that \

    \ if he asked on behalf of all who begged of him, he would soon be unable \
    \ to ask for himself, he became chary of using his influence. But in \
    \ Princess Drubetskaya's case he felt, after her second appeal, something \
    \ like qualms of conscience. She had reminded him of what was quite true; \
    \ he had been indebted to her father for the first steps in his career. \
    \ Moreover, he could see by her manners that she was one of those women-- \
    \ mostly mothers--who, having once made up their minds, will not rest \
    \ until they have gained their end, and are prepared if necessary to go on \

    \ insisting day after day and hour after hour, and even to make scenes. \
    \ This last consideration moved him."))
  map
  2> (sentence map)
  "Prince Vasili knew this, and having once realized that she was quite true; he
 could see by her manners that if it is to ask for himself, he felt, after hour,
 and even to last."

kazinator · on Oct 5, 2015

sentence function based on giterate instead of for loop.

  (defun sentence (map)
    (let* ((words (cdr [giterate true (op pick-next-word map) :start]))
           (dot (pos-if (op eq [@1 -1] #\.) words)))
      (cat-str [words 0..(succ dot)] " ")))

Requires small weak to pick-next-word

  (defun pick-next-word (map word)
    (let ((choices [map word]))
      (if choices [choices (rand (length choices))])))

giterate returns an infinite list, but we take from it up to the word which ends with dot, ignoring the endless stream of nil objects which follows.

Alternative: partition the list on the ending word, then glue the first two partitions:

  (defun sentence (map)
    (let* ((words (cdr [giterate true (op pick-next-word map) :start]))
           (part (partition-by (op eq [@1 -1] #\.) words)))
      (cat-str (append (first part) (second part)) " ")))

moomin · on Oct 5, 2015

The implementation of "sentence" highlights that Clojure really needs an "interate" function with built in termination. Because no-one uses iterate...

kazinator · on Oct 5, 2015

I cannot quite agree.

The problem here is that termination is indicated by a word which ends in a period, and that word must be included in the word list. This is a poor form of "in-band signaling".

Kludgy sequence operators which take or produce one extra item which doesn't satisfy the inclusion predicate can be avoided by proper design of the data representation which avoids this type of framing.

TXR Lisp version fixed to use the t symbol for start, nil for end, and straight giterate while true. (I.e. while the sequence element isn't nil):

  (defun wordlist-sentence (sentence)
    [(opip
       `@1.`
       (split-str @1 #/\s+/)
       ^(t ,*@1 nil)          ;; list starts with t, ends with nil
       (remqual ""))          ;; blanks not permitted
     sentence])

  (defun make-markov-mapping (sentence)
    (let ((wordlist (wordlist-sentence sentence)))
      (hash-from-pairs (mapcar (ret ^(,@1 (,@2))) wordlist (rest wordlist))
                       :equal-based)))

  (defun markov-data (text)
    [(opip
       (split-str @1 #/\./)
       (mapcar make-markov-mapping)
       (reduce-left (op hash-uni @1 @2 append) @1 (hash :equal-based)))
     text])

  (defun pick-next-word (map word)
    (let ((choices [map word]))
        [choices (rand (length choices))]))

  (defun sentence (map)
    (let ((words (cdr [giterate true (op pick-next-word map) t])))
      (cat-str words " ")))

In this representation, the words which end with a period are still there, but they have a next-word entry in the map: they all point to nil.

So we have a nice "spindle":

                  t
                  |
      +---------+-------+----+
      |         |       |    |
    various   start   ..... words 
      |         |       |    |
                 .
                 :
                 .
                 :
        |         |       |
     sentence.  end.   words.
        |         |      |
        +---------+------+
                  |
                 nil

moomin · on Oct 6, 2015

It's not ideal, but there's plenty of stuff that looks like this in reality.

Personally, I favour Haskell's unfoldr http://hackage.haskell.org/package/base-4.8.1.0/docs/Data-Li... which both doesn't require the output to be the same as the intermediate values and cleanly supports termination.

(There's actually a very good reason Haskell cares about termination: it means you can be precise about side-effects. With a classic lazy list, any side effects caused by it are determined by how many items you actually evaluate.)

kazinator · on Oct 6, 2015

I've looked at this before; I decided to implement it finally, calling it expand-right. (I use the "reduce" terminology rather than "fold", so the antonym has to be consistent).

With explicit lambdas, decrement from 5 to zero, not including it in list:

  1> (expand-right (lambda (item) (if (zerop item) nil (cons item (pred item)))) 5)
  (5 4 3 2 1)

Using functional combinators:

  2> [expand-right [iff zerop false [callf cons identity pred]] 5]
  (5 4 3 2 1)

Include zero; bit of a mouthful:

  1> [expand-right [iff null nilf [callf cons identity [iff zerop nilf pred]]] 5]
  (5 4 3 2 1 0)

Though it can do it, it doesn't look like a great interface when inclusion of the last element is required.

Comparison with the inclusive iterate (which I did implement and called "ginterate"--thanks for the nice name:

  1> [ginterate [chain zerop not] pred 5]
  (5 4 3 2 1 0)