Guides

Essential string processing functions in Common Lisp

September 10, 2012

It should be noted that true Common Lisp somewhat lacks in several important parts of string-processing, and it shows sometime. Today I needed to heavily process large body of regular text and will write here some functions which are AFAIK considered "standard" in modern languages and which not so easily accessible and/or amazingly intuitive in CL.

In all following code snippets token input stands for input string.

Trimming string from spaces, tabs and newlines
```
 (string-trim '(#\Space #\Newline #\Return #\Linefeed #\Tab) input))
```
All named characters are listed in Hyperspec, 13.1.7 Character Names.
Replacing by regular expressions

Provided by CL-PPCRE package.

In next snippet I remove all tokens enclosed in square brackets from the input string:
```
 (ql:quickload :cl-ppcre)
 (cl-ppcre:regex-replace-all "\\[[^]]+\\]" input "")
```
Honestly, I don't know when you can need simple regex-replace and not regex-replace-all. Also, note the double-escaping of special symbols (\\[ instead of \[).
Splitting string by separator symbol

Provided by CL-UTILITIES package.

In next snippet I split the input string by commas:
```
 (ql:quickload :cl-utilities)
 (cl-utilities:split-sequence #\, input)
```
Making same modification on every string in given list

In next snippet I trim spaces around all strings in list input-list:
```
 (map 'list
      (lambda (input) (string-trim " " input))
      input-list)
```
However, way better is to wrap the transformation for the string in separate function and call the mapping referencing just the name of transformation:
```
 (defun trim-spaces (input)
   "Remove trailing and leading spaces from input string"
   (string-trim '(#\Space) input))

 (map 'list #'trim-spaces input)
```
Do not forget that string is just a sequence of a characters, and all sequence-operating functions can work on strings in either "abcd" form or '(#\a #\b #\c #\d) form. This applies only to sequence-operating functions, however.
Removing the characters from string by condition

In the next snippet I leave only the alphanumeric characters in the input string:
```
 (remove-if-not #'alphanumericp input)
```
There are remove-if also.

As with map, you can make arbitrary complex predicates either with lambdas or wrapping them in separate functions.