Solr: Use DocTransformer to dynamically Generate groupCount and time value for group doc


Summary
Use DocTransformer to dynamically generate groupCount and time value for group doc(type:1) efficiently: no need ro run Solr query for each group doc(almost).

The User Case
There are two types of docs in Solr: one is child doc including fields: type(value 0), groupId, time and etc. 
another type of doc is group doc: type(value 1), they are actually just some faked docs.

We use join query with includeParent=true and group function: group.main=true&group.sort=map(type,1,1,-1) asc to make sure groups are sorted by time(the max value in the group) and the group doc is always be front of all child docs.

But Solr doesn't return groupCount in flat mode: in grouped mode, Solr can return groupCount in group header, but no such thing in flat mode.
So we have to dynamically generate groupCount and time value for each group(type=1) doc.


Now the last step is to actually generate groupCount and time value dynamically for group doc(type:1).

The Solution
After bump into one group doc, all we need do is to count how many child docs it follows(++lastGroupCount) until we bump into another group doc: 
we update groupCount when iterate last doc in this group, 
we update time field of group doc when iterate the first foc in this group.

If we don't bump into another group doc at the end, we need run query to get the group count as the accumulated lastGroupCount would be incomplete.

To update the time value of group doc is easy: when we hit its first child doc, change its group doc, note the boundary condition: the last group doc have to run query for it.

public class UpdateGroupDocTransfomerFactory extends TransformerFactory {
  public DocTransformer create(String field, SolrParams params,
      SolrQueryRequest req) {
    return new UpdateGroupDocTransfomer(req, params);
  }  
  /**
   * org.apache.solr.search.SolrReturnFields.parseFieldList(String[],
   * SolrQueryRequest) DocTransformers augmenters = new DocTransformers();
   * DocTransformer is thread safe.
   */
  private static class UpdateGroupDocTransfomer extends DocTransformer {
    private SolrQueryRequest req;
    private SolrDocument lastGroupDoc = null;
    private int lastGroupCount = 0;
    private TransformContext transContext;
    
    public void transform(SolrDocument doc, int docid) throws IOException {
      String type = SolrUtil.getFieldValue(doc, "type");
      if ("1".equals(type)) {
        if (lastGroupDoc != null) {
          lastGroupDoc.setField("[groupCount]", lastGroupCount);
        }
        lastGroupDoc = doc;
        lastGroupCount = 0;
        
        if (!transContext.iterator.hasNext()) {
          // this is last doc, run query to get
          runQueryToGetGroupCountAndTimeField(doc);
        }
      } else if (lastGroupDoc != null) {
        if (lastGroupCount == 0) {
          // the first doc in this group
          lastGroupDoc.setField(
              "time",
              DateUtil.getThreadLocalDateFormat()
                  .format(
                      new Date(Long.parseLong(SolrUtil.getFieldValue(doc,
                          "time")))));
        }
        if (!transContext.iterator.hasNext()) {
          // this is last doc, the lastGroupCount would be not correct for
          // lastGroupDoc, run query to get group count.
          runQueryToGetGroupCount(lastGroupDoc);
        } else {
          ++lastGroupCount;
        }
      }
      // else lastGroupDoc==null, and this is normal doc, nothing to do
    }
    
    public UpdateGroupDocTransfomer(SolrQueryRequest req, SolrParams params) {
      this.req = req;
    }
    public void setContext(TransformContext context) {
      this.transContext = context;
    }    
  }
}
Resources
Solr Join: Return Parent and Child Documents
Use Solr map function query(group.sort=map(type,1,1,-1) ) in group flat mode
Solr: Update other Document in DocTransformer by Writing custom SolrWriter

Labels

adsense (5) Algorithm (69) Algorithm Series (35) Android (7) ANT (6) bat (8) Big Data (7) Blogger (14) Bugs (6) Cache (5) Chrome (19) Code Example (29) Code Quality (7) Coding Skills (5) Database (7) Debug (16) Design (5) Dev Tips (63) Eclipse (32) Git (5) Google (33) Guava (7) How to (9) Http Client (8) IDE (7) Interview (88) J2EE (13) J2SE (49) Java (186) JavaScript (27) JSON (7) Learning code (9) Lesson Learned (6) Linux (26) Lucene-Solr (112) Mac (10) Maven (8) Network (9) Nutch2 (18) Performance (9) PowerShell (11) Problem Solving (11) Programmer Skills (6) regex (5) Scala (6) Security (9) Soft Skills (38) Spring (22) System Design (11) Testing (7) Text Mining (14) Tips (17) Tools (24) Troubleshooting (29) UIMA (9) Web Development (19) Windows (21) xml (5)