Splitting

Splitting strings into substrings is realized by these 3 API functions:

Getting all prefixes

To get all possible prefixes of the given string you can call all-prefixes function from the API. In its basic form, it takes one argument which should be a string and returns a lazy sequence of strings, including the original string in the last position:

(require '[smangler.api :as sa])

(sa/all-prefixes       "")  ; => nil
(sa/all-prefixes "abcdef")  ; => ("a" "ab" "abc" "abcd" "abcde" "abcdef")
(sa/all-prefixes      "a")  ; => ("a")

Coercion to strings

You can pass other types of arguments and they will be coerced to strings. Single characters and numbers are supported, and so are collections of strings, characters and numbers:

(require '[smangler.api :as sa])

(sa/all-prefixes        12345)  ; => ("1" "12" "123" "1234" "12345")
(sa/all-prefixes           \a)  ; => ("a")
(sa/all-prefixes      [0 1 2])  ; => ("0" "01" "012")
(sa/all-prefixes   [\a \b \c])  ; => ("a" "ab" "abc")
(sa/all-prefixes ["abc" "de"])  ; => ("a" "ab" "abc" "abcd" "abcde")

Custom splitter

Optionally, you can call all-prefixes with 2 arguments passed. In this scenario the first argument should be a function which takes a single character and returns a character, false or nil:

(fn [character]
  (and (some-lookup character) character)

The function is used to partition the string. As a result the prefixes will not be generated for all characters but for those substrings which are the effect of splitting the string each time the splitter returns a new value.

(require '[smangler.api :as sa])

(sa/all-prefixes #(and (= \a %) %)
                 "abcdef")  ; => ("a" "abcdef")

Sets as splitters

It is common to use sets for partitioning the string. This is possible because in Clojure sets implement function interface which allows us to perform quick lookup:

(require '[smangler.api :as sa])

(sa/all-prefixes #{\a \b} "abcdef")  ; => ("a" "ab" "abcdef")
(sa/all-prefixes #{\a}    "abcdef")  ; => ("a" abcdef")

Coercion to splitter

You can pass other types of arguments and they will be coerced to splitters. Single characters, strings and numbers are supported, and so are collections of strings, characters and numbers:

(require '[smangler.api :as sa])

(sa/all-prefixes \a        "abcde")  ; => ("a" "abcde")
(sa/all-prefixes 1         "abcde")  ; => ("abcde")
(sa/all-prefixes 12      "12abcde")  ; => ("1" "12" "12abcde")
(sa/all-prefixes [1 2]   "12abcde")  ; => ("1" "12" "12abcde")
(sa/all-prefixes [\a \b]   "abcde")  ; => ("a" "ab" "abcde")
(sa/all-prefixes "ab"      "abcde")  ; => ("a" "ab" "abcde")

Getting all suffixes

Getting all suffixes is possible with all-suffixes. It takes the same arguments and returns the same kind of values as all-prefixes but (as the name stands for) generates all possible suffixes for the given string:

(require '[smangler.api :as sa])

(sa/all-suffixes                "")  ; => nil
(sa/all-suffixes          "abcdef")  ; => ("abcdef" "bcdef" "cdef" "def" "ef" "f")
(sa/all-suffixes               "a")  ; => ("a")

(sa/all-suffixes             12345)  ; => ("12345" "2345" "345" "45" "5")
(sa/all-suffixes                \a)  ; => ("a")
(sa/all-suffixes           [0 1 2])  ; => ("012" "12" "2")
(sa/all-suffixes        [\a \b \c])  ; => ("abc" "bc" "c")
(sa/all-suffixes      ["abc" "de"])  ; => ("abcde" "bcde" "cde" "de" "e")

(sa/all-suffixes  #(and (= \a %) %)
                    "abcdef")        ; => ("abcdef" "bcdef")

(sa/all-suffixes #{\a \b} "abcdef")  ; => ("abcdef" "bcdef" "cdef")
(sa/all-suffixes #{\a}    "abcdef")  ; => ("abcdef" "bcdef")

(sa/all-suffixes \a        "abcde")  ; => ("abcde" "bcde")
(sa/all-suffixes 1         "abcde")  ; => ("abcde")
(sa/all-suffixes 12      "12abcde")  ; => ("12abcde" "2abcde" "abcde")
(sa/all-suffixes [1 2]   "12abcde")  ; => ("12abcde" "2abcde" "abcde")
(sa/all-suffixes [\a \b]   "abcde")  ; => ("abcde" "bcde" "cde")
(sa/all-suffixes "ab"      "abcde")  ; => ("abcde" "bcde" "cde")

Getting all substrings

You can get all possible substrings of a string by calling all-subs from smangler.api. It works similarly to all-prefixes and all-suffixes but returns all prefixes, infixes and suffixes, including the original string:

(require '[smangler.api :as sa])

(sa/all-subs                "")  ; => nil
(sa/all-subs             "abc")  ; => ("a" "ab" "b" "abc" "bc" "c")
(sa/all-subs               "a")  ; => ("a")

(sa/all-subs               123)  ; => ("1" "12" "2" "123" "23" "3")
(sa/all-subs                \a)  ; => ("a")
(sa/all-subs           [0 1 2])  ; => ("0" "01" "1" "012" "12" "2")
(sa/all-subs        [\a \b \c])  ; => ("a" "ab" "b" "abc" "bc" "c")
(sa/all-subs        ["ab" "c"])  ; => ("a" "ab" "b" "abc" "bc" "c")

(sa/all-subs  #(and (= \a %) %)
              "abc")             ; => ("a" "abc" "bc")

(sa/all-subs #{\a \b}    "abc")  ; => ("a" "ab" "b" "abc" "bc" "c")
(sa/all-subs #{\a}       "abc")  ; => ("a" "abc" "bc")

(sa/all-subs \a          "abc")  ; => ("a" "abc" "bc")
(sa/all-subs 1           "abc")  ; => ("abc")
(sa/all-subs 12          "12c")  ; => ("1" "12" "2" "12c" "2c" "c")
(sa/all-subs [1 2]       "12c")  ; => ("1" "12" "2" "12c" "2c" "c")
(sa/all-subs [\a \b]     "abc")  ; => ("a" "ab" "b" "abc" "bc" "c")
(sa/all-subs "ab"        "abc")  ; => ("a" "ab" "b" "abc" "bc" "c")

Low-level splitting

Certain applications may require more efficient and/or more strict splitting functions. It is particularly not recommended but there is smangler.core namespace which contains splitting operations which are a bit faster than those in API. They require certain argument types and no coercion is performed:

(require '[smangler.core :as c])

(c/all-prefixes          nil)  ; => nil
(c/all-prefixes           "")  ; => nil
(c/all-prefixes        "abc")  ; => ("a" "ab" "abc")
(c/all-prefixes  #{\a} "abc")  ; => ("a" "abc")

(c/all-suffixes          nil)  ; => nil
(c/all-suffixes           "")  ; => nil
(c/all-suffixes        "abc")  ; => ("abc" "bc" "c")
(c/all-suffixes  #{\a} "abc")  ; => ("abc" "bc")

(c/all-subs              nil)  ; => nil
(c/all-subs               "")  ; => nil
(c/all-subs            "abc")  ; => ("a" "ab" "b" "abc" "bc" "c")
(c/all-subs      #{\a} "abc")  ; => ("a" "abc" "bc")