Guides

Essential string processing functions in Common Lisp

September 10, 2012

It should be noted that true Common Lisp somewhat lacks in several important parts of string-processing, and it shows sometime. Today I needed to heavily process large body of regular text and will write here some functions which are AFAIK considered "standard" in modern languages and which not so easily accessible and/or amazingly intuitive in CL.

In all following code snippets token input stands for input string.

  1. Trimming string from spaces, tabs and newlines

     (string-trim '(#\Space #\Newline #\Return #\Linefeed #\Tab) input))
    

    All named characters are listed in Hyperspec, 13.1.7 Character Names.

  2. Replacing by regular expressions

    Provided by CL-PPCRE package.

    In next snippet I remove all tokens enclosed in square brackets from the input string:

     (ql:quickload :cl-ppcre)
     (cl-ppcre:regex-replace-all "\\[[^]]+\\]" input "")
    

    Honestly, I don't know when you can need simple regex-replace and not regex-replace-all. Also, note the double-escaping of special symbols (\\[ instead of \[).

  3. Splitting string by separator symbol

    Provided by CL-UTILITIES package.

    In next snippet I split the input string by commas:

     (ql:quickload :cl-utilities)
     (cl-utilities:split-sequence #\, input)
    
  4. Making same modification on every string in given list

    In next snippet I trim spaces around all strings in list input-list:

     (map 'list
          (lambda (input) (string-trim " " input))
          input-list)
    

    However, way better is to wrap the transformation for the string in separate function and call the mapping referencing just the name of transformation:

     (defun trim-spaces (input)
       "Remove trailing and leading spaces from input string"
       (string-trim '(#\Space) input))
    
     (map 'list #'trim-spaces input)
    

    Do not forget that string is just a sequence of a characters, and all sequence-operating functions can work on strings in either "abcd" form or '(#\a #\b #\c #\d) form. This applies only to sequence-operating functions, however.

  5. Removing the characters from string by condition

    In the next snippet I leave only the alphanumeric characters in the input string:

     (remove-if-not #'alphanumericp input)
    

    There are remove-if also.

    As with map, you can make arbitrary complex predicates either with lambdas or wrapping them in separate functions.

Previous: Context-dependent Behat tests steps Next: Building changelog from Git log output