Solr: Cache Responses of Slow Queries

I extended Solr stats to support stats.query before, but found out that it is quite slow when run stats query and stats.facet against 50 million data. 
Solr: Extend StatsComponent to Support stats.query, stats.query and facet.topn

Query like below would usually take more than 2 minutes, it would be much slower when run distributed stats queries.
http://localhost:8080/solr/select?q=*:*&rows=0&stats=true&stats.pagination=true&stats.field=size&f.size.stats.query=(size:[0 TO 1024))&f.size.stats.query=(size:[1024 TO 102400])&stats.facet=type&stats.facet.type.limit=5&stats.facet.ext_name.offset=0

So I am thinking to cache the response of stats query, so client can get response immediately.
The basic idea is that:
1.when we make a request, we can add one parameter:
cache=true/false – means whether we want to use cache for this query and whether we want to save the response of this query into cache.

For the first query, Solr will run the query and put results into cache, later for same query Solr will return response directly from cache.
This will not save the query, means Solr will not rebuild cache after restart.

2.Use cachehandler to manually add some queries into cache. If parameter persist is true, these queries will be saved into a file, Solr will rerun these queries and save response into cache after restart.
Example:
http://localhost:8080/solr/cachehandler?action=add&persist=true&cachequery=q=*:*%26rows=0%26facet=true%26facet.field=type
We can add multiple queries: cachequery=query1,query2.
Need escape special character in the query: convert & to %26.

After add some queries, run http://localhost:8080/solr/cachehandler?action=refill&sync=true/false, this will run the queries synchronously or asynchronously, and push results into cache.

We can use http://localhost:8080/solr/cachehandler?action=remove&cachequery=q=*:*%26rows=0%26facet=true%26facet.field=type to remove queries form cache and remove them from the property file.

These values will be automatically rebuilt when there is change in Solr server, and user commits the change to solr server.
Implementation Code
The code is like below: you can review the complete code at Github.
QueryResultLRUCache
This class extends LRUCache, the key is a customize <String, String[]>hashmap: CacheKeyHashMap, value would be like: q=query, fq=fq1,fq2, rows=rows.

We will change the cachequery string from cachehandler into CacheKeyHashMap(also add defaults, appends and invariants parameters defined in solrconfig.xml for the hanlder), and put it into cache. 
public class QueryResultLRUCache<K, V> extends
  LRUCache<CacheKeyHashMap, NamedList<Object>> {
 private String description = "My LRU Cache";
 private static final String PROPERTY_FILE = "querycache.properties";
 private static final String PRO_QUERIES = "queries";

 private final Set<String> cachedQueries = new HashSet<String>();
 private SolrCore core;

 public void init(int size, int initialSize, SolrCore core) {
  Map<String, String> args = new HashMap<String, String>();
  args.put("size", String.valueOf(size));
  args.put("initialSize", String.valueOf(initialSize));
  super.init(args, null, regenerator);
  this.core = core;
  cachedQueries.addAll(readCachedQueriesProperty(core));
  for (String query : cachedQueries) {
   CacheKeyHashMap key = convertQueryStringToParams(query, null);
   put(key, null);
   asyncRefill();
  }
 }

 private Set<String> readCachedQueriesProperty(SolrCore core) {
  Set<String> queries = new LinkedHashSet<String>();
  File propertyFile = new File(getPropertyFilePath(core));
  if (propertyFile.exists()) {
   InputStream is = null;
   try {
    is = new FileInputStream(propertyFile);
    Properties properties = new Properties();
    properties.load(is);
    String queriesStr = properties.getProperty(PRO_QUERIES);
    if (queriesStr != null) {
     String[] queriesArray = queriesStr.split(",");
     for (String query : queriesArray) {
      queries.add(query);
     }
    }

   } catch (Exception e) {
    logger.error("Exception happened when read " + propertyFile, e);
   } finally {
    if (is != null) {
     try {
      is.close();
     } catch (IOException e) {
      logger.error("Exception happened when close "
        + propertyFile, e);
     }
    }
   }
  }

  return queries;
 }

 private void saveCachedQueries() {
  if (!cachedQueries.isEmpty()) {
   File propertyFile = new File(getPropertyFilePath(core));
   OutputStream out = null;
   try {
    out = new FileOutputStream(propertyFile);
    Properties properties = new Properties();
    StringBuilder queries = new StringBuilder(
      16 * cachedQueries.size());
    Iterator<String> it = cachedQueries.iterator();
    while (it.hasNext()) {
     queries.append(it.next());
     if (it.hasNext()) {
      queries.append(",");
     }
    }
    properties.setProperty(PRO_QUERIES, queries.toString());
    properties.store(out, null);
   } catch (Exception e) {
    logger.error("Exception happened when save " + propertyFile, e);
   } finally {
    if (out != null) {
     try {
      out.close();
     } catch (IOException e) {
      logger.error("Exception happened when close "
        + propertyFile, e);
     }
    }
   }
  }
 }

 /*
  * Save cachedQueries to property File
  */
 public void close() {
  saveCachedQueries();
 }

 private static final String getPropertyFilePath(SolrCore core) {
  return core.getDataDir() + File.separator + PROPERTY_FILE;
 }
 public NamedList<Object> remove(String query) {
  CacheKeyHashMap params = convertQueryStringToParams(query, null);
  synchronized (cachedQueries) {
   cachedQueries.remove(query);
  }
  synchronized (map) {
   return remove(params);
  }
 }

 public NamedList<Object> remove(CacheKeyHashMap key) {
  synchronized (map) {
   return map.remove(key);
  }
 }
 public NamedList<Object> put(String query, NamedList<Object> value,
   boolean persist) {
  return put(query, value, persist, true, null);
 }
 public NamedList<Object> put(String query, NamedList<Object> value,
   boolean persist, boolean addDefault) {
  return put(query, value, persist, addDefault, null);
 }
 public NamedList<Object> put(String query, NamedList<Object> value,
   boolean persist, boolean addDefault, String handlerName) {
  CacheKeyHashMap key = convertQueryStringToParams(query, handlerName);
  if (persist) {
   synchronized (cachedQueries) {
    cachedQueries.add(query);
   }
  }
  return put(key, value);
 }

 @Override
 public NamedList<Object> put(CacheKeyHashMap key, NamedList<Object> value) {
  if (value != null) {
   value.remove("CachedAt");
   value.add("CachedAt",
     DateUtil.getThreadLocalDateFormat().format(new Date()));
  }
  return super.put(key, value);
 }
 public void asyncRefill() {
  refill(false);
 }
 public void refill(boolean sync) {
  if (sync) {
   refillImpl();
  } else {
   new Thread(new Runnable() {
    @Override
    public void run() {
     refillImpl();
    }
   }).start();
  }
 }
 @SuppressWarnings("unchecked")
 private void refillImpl() {
  synchronized (map) {
   SolrQueryRequest myreq = null;
   try {
    Iterator<CacheKeyHashMap> it = map.keySet().iterator();
    SolrRequestHandler searchHandler = core
      .getRequestHandler("/select");

    Map<CacheKeyHashMap, NamedList<Object>> newValue = new HashMap<CacheKeyHashMap, NamedList<Object>>();
    myreq = new LocalSolrQueryRequest(core,
      new ModifiableSolrParams());
    while (it.hasNext()) {
     CacheKeyHashMap query = it.next();
     SolrQueryResponse rsp = new SolrQueryResponse();
     searchHandler.handleRequest(myreq, rsp);
     MultiMapSolrParams params = new MultiMapSolrParams(query);
     myreq.setParams(params);
     newValue.put(query, rsp.getValues());
    }
    map.putAll(newValue);
   } finally {
    if (myreq != null) {
     myreq.close();
    }
   }
  }
 }
 public void clearValues() {
  synchronized (map) {
   Iterator<Map.Entry<CacheKeyHashMap, NamedList<Object>>> it = map
     .entrySet().iterator();
   while (it.hasNext()) {
    Map.Entry<CacheKeyHashMap, NamedList<Object>> entry = it.next();
    entry.setValue(null);
   }
  }
 }
 public static CacheKeyHashMap getKey(SolrQueryRequest req) {
  ModifiableSolrParams modifiableParams = new ModifiableSolrParams(
    req.getParams());
  modifiableParams.remove("cache");
  modifiableParams.remove("refresh");
  return paramsToHashMap(modifiableParams);
 }
 @Override
 public void warm(SolrIndexSearcher searcher,
   SolrCache<CacheKeyHashMap, NamedList<Object>> old) {
  throw new UnsupportedOperationException();
 }
 private static CacheKeyHashMap paramsToHashMap(
   ModifiableSolrParams modifiableParams) {
  CacheKeyHashMap map = new CacheKeyHashMap();
  map.putAll(SolrParams.toMultiMap(modifiableParams.toNamedList()));
  return map;
 }

 @SuppressWarnings("rawtypes")
 private CacheKeyHashMap convertQueryStringToParams(String query,
   String handlerName) {
  ModifiableSolrParams modifiableParams = new ModifiableSolrParams();
  if (handlerName == null) {
   handlerName = "/select";
  }
  RequestHandlerBase handler = (RequestHandlerBase) core
    .getRequestHandler(handlerName);
  NamedList initArgs = handler.getInitArgs();
  if (initArgs != null) {
   Object o = initArgs.get("defaults");
   if (o != null && o instanceof NamedList) {
    modifiableParams.add(SolrParams.toSolrParams((NamedList) o));
   }
   o = initArgs.get("appends");
   if (o != null && o instanceof NamedList) {
    modifiableParams.add(SolrParams.toSolrParams((NamedList) o));
   }
   o = initArgs.get("invariants");
   if (o != null && o instanceof NamedList) {
    modifiableParams.add(SolrParams.toSolrParams((NamedList) o));
   }
  }
  modifiableParams.add(SolrRequestParsers.parseQueryString(query));
  return paramsToHashMap(modifiableParams);
 }
}
Use QueryResultLRUCache in SearchHandler
In handleRequestBody, if request set cache=true, if will then first try to get response from cache, if exists, return it directly, otherwise, it will run the query and put the response into cache.
public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp, 
List<SearchComponent> components, ResponseBuilder rb) throws Exception
  {
   QueryResultLRUCache<NamedList<Object>,NamedList<Object>> queryResultCache = req.getCore().getQuerycaCache();
   SolrParams reqParams = req.getParams();
   boolean cacheRst = reqParams.getBool("cache", false);
   if(cacheRst)
   {
      boolean refresh = reqParams.getBool("refresh", false);
      if(!refresh)
      {
           NamedList<Object> cacheNL =queryResultCache.get(QueryResultLRUCache.getKey(req));
           if(cacheNL!=null)
           {
             NamedList<Object> responseHeader =rsp.getResponseHeader();
             responseHeader.add("UseCache", "true");
             NamedList<Object> rstNL= cacheNL.clone();
             rsp.getValues().addAll(rstNL);
             return;
           }
      }
   }
   ....
    if(cacheRst)
    {
      queryResultCache.put(QueryResultLRUCache.getKey(req), rsp.getValues()) ;
    }
  } 
Rebuild the cache asynchronously after commit in RunUpdateProcessorFactory
public void processCommit(CommitUpdateCommand cmd) throws IOException {
    updateHandler.commit(cmd);
    super.processCommit(cmd);
    changesSinceCommit = false;
    
    QueryResultLRUCache<NamedList<Object>,NamedList<Object>> querycaCache = req
        .getCore().getQuerycaCache();
    querycaCache.clearValues();
    querycaCache.asyncRefill();
  }
QueryResultCacheHandler
This class is simple, the code can be view from Github.
Post a Comment

Labels

Java (159) Lucene-Solr (110) Interview (61) All (58) J2SE (53) Algorithm (45) Soft Skills (37) Eclipse (33) Code Example (31) Linux (24) JavaScript (23) Spring (22) Windows (22) Web Development (20) Nutch2 (18) Tools (18) Bugs (17) Debug (16) Defects (14) Text Mining (14) J2EE (13) Network (13) Troubleshooting (13) PowerShell (11) Chrome (9) Design (9) How to (9) Learning code (9) Performance (9) Problem Solving (9) UIMA (9) html (9) Http Client (8) Maven (8) Security (8) bat (8) blogger (8) Big Data (7) Continuous Integration (7) Google (7) Guava (7) JSON (7) ANT (6) Coding Skills (6) Database (6) Scala (6) Shell (6) css (6) Algorithm Series (5) Cache (5) Dynamic Languages (5) IDE (5) Lesson Learned (5) Programmer Skills (5) System Design (5) Tips (5) adsense (5) xml (5) AIX (4) Code Quality (4) GAE (4) Git (4) Good Programming Practices (4) Jackson (4) Memory Usage (4) Miscs (4) OpenNLP (4) Project Managment (4) Spark (4) Testing (4) ads (4) regular-expression (4) Android (3) Apache Spark (3) Become a Better You (3) Concurrency (3) Eclipse RCP (3) English (3) Happy Hacking (3) IBM (3) J2SE Knowledge Series (3) JAX-RS (3) Jetty (3) Restful Web Service (3) Script (3) regex (3) seo (3) .Net (2) Android Studio (2) Apache (2) Apache Procrun (2) Architecture (2) Batch (2) Bit Operation (2) Build (2) Building Scalable Web Sites (2) C# (2) C/C++ (2) CSV (2) Career (2) Cassandra (2) Distributed (2) Fiddler (2) Firefox (2) Google Drive (2) Gson (2) How to Interview (2) Html Parser (2) Http (2) Image Tools (2) JQuery (2) Jersey (2) LDAP (2) Life (2) Logging (2) Python (2) Software Issues (2) Storage (2) Text Search (2) xml parser (2) AOP (1) Application Design (1) AspectJ (1) Chrome DevTools (1) Cloud (1) Codility (1) Data Mining (1) Data Structure (1) ExceptionUtils (1) Exif (1) Feature Request (1) FindBugs (1) Greasemonkey (1) HTML5 (1) Httpd (1) I18N (1) IBM Java Thread Dump Analyzer (1) JDK Source Code (1) JDK8 (1) JMX (1) Lazy Developer (1) Mac (1) Machine Learning (1) Mobile (1) My Plan for 2010 (1) Netbeans (1) Notes (1) Operating System (1) Perl (1) Problems (1) Product Architecture (1) Programming Life (1) Quality (1) Redhat (1) Redis (1) Review (1) RxJava (1) Solutions logs (1) Team Management (1) Thread Dump Analyzer (1) Visualization (1) boilerpipe (1) htm (1) ongoing (1) procrun (1) rss (1)

Popular Posts