Java: Splitting a query string

While working with our search solution at finn.no, we needed to split our querystring into each part, still supporting any phrases being passed on.  While I’m sure this has been done before, and also that there are more effective ways of coding this, for instance somewhere in the Lucene/SOLR packages, we just needed a simple variant that did just the splitting – simlar to

String.split(String regexp)

The clue was to get a Regular Expression that handled both separators and phrases.  The one that seemed to fix it was:

"/\"([^\"]+)\"|(\\S+)/"

Enter KeywordList, which seems to do exactly what we want:

import java.util.*;
import java.util.regex.*;

import static org.apache.commons.lang.StringUtils.isEmpty;

/**
 * Splits a querystring to it's indvidal parts.
 *
 * @Author ©2012 Fredrik Rodland, http://rodland.no
 * Date: 12.10.12
 * Time: 11:15
 */
public class KeywordList implements Iterable {

    private static final Pattern SEPARATORS = Pattern.compile(",\\s");
    private static final Pattern SPLIT_PATTERN = Pattern.compile("\"([^\"]+)\"|([^" + SEPARATORS + "]+)");
    private List keywords;

    public KeywordList(String queryString) {
        keywords = isEmpty(queryString) ? Collections.emptyList() : split(queryString);
    }

    private static List split(String queryString) {
        Matcher matcher = SPLIT_PATTERN.matcher(queryString);
        List list = new ArrayList();
        while (matcher.find()) {
            list.add(matcher.group().replaceAll("\"", "").trim());
        }
        return list;
    }

    public List getKeywords() {
        return keywords;
    }

    @Override
    public Iterator iterator() {
        return keywords.iterator();
    }
}

KeywordList supports empty String and null as input, and also supports both comma (,) ans space as separator.

Here are some test-runs and the output of System.out.println of each:

<string> => new KeywordList(input).getKeywords(<string>)
ipod => [ipod]
ipod touch => [ipod, touch]
"ipod touch" => [ipod touch]
"ipod touch" iphone 4s => [ipod touch, iphone, 4s]
"ipod touch" "iphone 4s",ipad,"macbook pro",,,,,  imac => [ipod touch, iphone 4s, ipad, macbook pro, imac]

Sourcecode

KeywordList.java
KeywordListTest.java

If you have any comments, bugs or suggestions on a better implementation, please feel free to comment below.

Leave a Reply

Your email address will not be published.
Required fields are marked:*

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>