Essential string processing functions in Common Lisp
September 10, 2012
It should be noted that true Common Lisp somewhat lacks in several important parts of string-processing, and it shows sometime. Today I needed to heavily process large body of regular text and will write here some functions which are AFAIK considered "standard" in modern languages and which not so easily accessible and/or amazingly intuitive in CL.
In all following code snippets token input
stands for input string.
-
Trimming string from spaces, tabs and newlines
(string-trim '(#\Space #\Newline #\Return #\Linefeed #\Tab) input))
All named characters are listed in Hyperspec, 13.1.7 Character Names.
-
Replacing by regular expressions
Provided by CL-PPCRE package.
In next snippet I remove all tokens enclosed in square brackets from the input string:
(ql:quickload :cl-ppcre) (cl-ppcre:regex-replace-all "\\[[^]]+\\]" input "")
Honestly, I don't know when you can need simple
regex-replace
and notregex-replace-all
. Also, note the double-escaping of special symbols (\\[
instead of\[
). -
Splitting string by separator symbol
Provided by CL-UTILITIES package.
In next snippet I split the input string by commas:
(ql:quickload :cl-utilities) (cl-utilities:split-sequence #\, input)
-
Making same modification on every string in given list
In next snippet I trim spaces around all strings in list
input-list
:(map 'list (lambda (input) (string-trim " " input)) input-list)
However, way better is to wrap the transformation for the string in separate function and call the mapping referencing just the name of transformation:
(defun trim-spaces (input) "Remove trailing and leading spaces from input string" (string-trim '(#\Space) input)) (map 'list #'trim-spaces input)
Do not forget that string is just a sequence of a characters, and all sequence-operating functions can work on strings in either
"abcd"
form or'(#\a #\b #\c #\d)
form. This applies only to sequence-operating functions, however. -
Removing the characters from string by condition
In the next snippet I leave only the alphanumeric characters in the input string:
(remove-if-not #'alphanumericp input)
There are
remove-if
also.As with
map
, you can make arbitrary complex predicates either with lambdas or wrapping them in separate functions.