Programmer: Lifelong Learning: September 2013

JSON Output of Solr NamedList

There are different JSON output format for Solr NamedList, as said in JSON Response Writer, and org.apache.solr.response.JSONWriter.

We can controls the output format of NamedLists by parameter json.nl, its value van be flat, map, arrarr, arrmap.

We can check its output by running test case: org.apache.solr.request.JSONWriterTest.testJSON(): use different value for json.nl.
Basically when json.nl=flat, the response would be like: [name1,val1, name2,val2], when json.nl=map, the response would be like: [name1:val1, name2:val2], when json.nl=arr, the response would be like: [[name1,val1], [name2, val2]], when json.nl=arrmap, the response would be like: [{name1:val1}, {name2:val2}].
If not specified, it would use flat mode.

Usually we should use json.nl=map, so client code can easily parse the response and convert it to javascript object.

We can specify default json.nl value in solrconfig.xml when we declare our request handler: in default section or in invariants section if we don't want user to change it.

Another option is in our code, we can use subclass of NamedList: SimpleOrderedMap. It will use map mode as the JSON output.

org.apache.solr.response.JSONWriter.writeNamedList(String, NamedList)
  public void writeNamedList(String name, NamedList val) throws IOException {
    if (val instanceof SimpleOrderedMap) {
      writeNamedListAsMapWithDups(name,val);
    } else if (namedListStyle==JSON_NL_FLAT) {
      writeNamedListAsFlat(name,val);
    } else if (namedListStyle==JSON_NL_MAP){
      writeNamedListAsMapWithDups(name,val);
    } else if (namedListStyle==JSON_NL_ARROFARR) {
      writeNamedListAsArrArr(name,val);
    } else if (namedListStyle==JSON_NL_ARROFMAP) {
      writeNamedListAsArrMap(name,val);
    }
  }

Keep User's Change when Update solr.xml

Problem
In our product, user is able to create additional core: Solr will update solr.xml to add extra core information. But at some time, we need update solr.xml to add our extra cores.
This is a common requirement: we need upgrade solr config files in future release, but at same time keep customer's change.

For solrconfig.xml or solrschema.xml, we can use xi:include to put our change in one xml(for example solrconfig-upgrade.xml), user should only change solrconfig.xml either manually or via GUI.
Please refer to http://wiki.apache.org/solr/SolrConfigXml#XInclude

But this doesn't work for solr.xml, as if we define xi:include in solr.xml:

<solr persistent="true" sharedLib="lib" shareSchema="true">
 <cores adminPath="/admin/cores" defaultCoreName="collection1"

host="${host:}" hostPort="${jetty.port:}" hostContext="${hostContext:}" zkClientTimeout="${zkClientTimeout:15000}">

<core name="collection1" instanceDir="collection1"/>
  <!-- this doesn't work -->
   <xi:include href="solr_extracores.xml"/>
 </cores>
</solr>

later if user creates another solr core, solr will update solr.xml: it will delete <xi:include> and overwrite cores section to include all cores.
Solution
We can define extra cores in a property file: cvanalytics_solr_extra.propertie like below:

# support corename_name, corename_instanceDir, corename_schemaName, corename_configName
extra_cores=extracore1, extracore2
extracore1_name=extracore1
extracore1_instanceDir=extracore1
extracore1_schemaName=extracore1_schema.xml
extracore1_configName=extracore1_config.xml
....

We will iterate all core names in extra_cores, if it's already loaded: CoreContainer.getCoreFromAnyList(core_name) returns not null, we ignore, Otherwise, we create a SolrCore and register it.

Implementation Code

org.apache.solr.core.CoreContainer.Initializer
private void initExtraCores(CoreContainer cores) throws RuntimeException {
  try {
 Properties cvProps = cores.getCVProperties();
 String solrxml_extra = cvProps.getProperty("solrxml_extra");
 if (!StringUtils.isBlank(solrxml_extra)) {
   File solrxml_extraFile = new File(cores.getSolrHome(), solrxml_extra);
   Properties solrxml_extraProps = readProperties(solrxml_extraFile);
   
   String extra_cores = solrxml_extraProps.getProperty("extra_cores");
   if (!StringUtils.isBlank(extra_cores)) {
  String[] extra_coresArr = extra_cores.split(",");
  for (String extra_core : extra_coresArr) {
    extra_core = extra_core.trim();
    if (cores.getCoreFromAnyList(extra_core) == null) {
   String coreName = solrxml_extraProps.getProperty(extra_core
    + "_name");
   if (StringUtils.isBlank(coreName)) {
     coreName = extra_core;
   }
   String instanceDir = solrxml_extraProps.getProperty(extra_core
    + "_instanceDir");
   if (StringUtils.isBlank(instanceDir)) {
     throw new RuntimeException("No instanceDir defined for "
      + coreName + "in " + solrxml_extraFile);
   }
   CoreDescriptor cd = new CoreDescriptor(cores, coreName,
    instanceDir);
   String schemaName = solrxml_extraProps.getProperty(extra_core
    + "_schemaName");
   if (!StringUtils.isBlank(schemaName)) {
     cd.setSchemaName(schemaName);
   }
   String configName = solrxml_extraProps.getProperty(extra_core
    + "_configName");
   if (!StringUtils.isBlank(configName)) {
     cd.setConfigName(configName);
   }
   SolrCore core = cores.create(cd);
   cores.register(core, false);
    }
  }
   }
 }
  } catch (Exception e) {
 throw new RuntimeException(e);
  }
}
public CoreContainer initialize() {
  CoreContainer cores = null;
  String solrHome = SolrResourceLoader.locateSolrHome();
  cores = new CoreContainer(solrHome);
  
  if (fconf.exists()) {
 cores.load(solrHome, fconf);
  } else {
 log.info("no solr.xml file found - using default");
 try {
   cores.load(solrHome, new InputSource(new ByteArrayInputStream(DEF_SOLR_XML.getBytes("UTF-8"))));
 } catch (Exception e) {
   throw new SolrException(ErrorCode.SERVER_ERROR,
    "CoreContainer.Initialize failed when trying to load default solr.xml file", e);
 }
 cores.configFile = fconf;
  }
  // call initExtraCores
  initExtraCores(cores);
  return cores;
}

Import CSV that Contains Double-Quotes into Solr

My colleague meets problem when trying to import a CSV file which contains Double-Quotes in a column value to Solr.
Looked at CSV standard:
If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote.
For example:
"aaa","b""bb","ccc"

Afte add a another preceding double quote, it works.
Implementation in Solr
When Solr imports CSV file， it honors CSV standard.
In Solr, the default encapsulator is also ". Please refer to: Updating a Solr Index with CSV
From org.apache.solr.internal.csv.CSVParser.encapsulatedTokenLexer(Token, int), we can see How Solr parse value.

private Token encapsulatedTokenLexer(Token tkn, int c) throws IOException {
for (;;) {
  c = in.read();
  if (c == '\\' && strategy.getUnicodeEscapeInterpretation() && in.lookAhead()=='u') {
 tkn.content.append((char) unicodeEscapeLexer(c));
  } else if (c == strategy.getEscape()) {
 tkn.content.append((char)readEscape(c));
  } else if (c == strategy.getEncapsulator()) {
 if (in.lookAhead() == strategy.getEncapsulator()) {
   // double or escaped encapsulator -> add single encapsulator to token
   c = in.read();
   tkn.content.append((char) c);
 } else {
   // token finish mark (encapsulator) reached: ignore whitespace till delimiter
   for (;;) {
  c = in.read();
  if (c == strategy.getDelimiter()) {
    tkn.type = TT_TOKEN;
    tkn.isReady = true;
    return tkn;
  } else if (isEndOfFile(c)) {
    tkn.type = TT_EOF;
    tkn.isReady = true;
    return tkn;
  } else if (isEndOfLine(c)) {
    // ok eo token reached
    tkn.type = TT_EORECORD;
    tkn.isReady = true;
    return tkn;
  } else if (!isWhitespace(c)) {
    // error invalid char between token and next delimiter
    throw new IOException(
      "(line " + getLineNumber()
        + ") invalid char between encapsulated token end delimiter"
    );
  }
   }
 }
  } else if (isEndOfFile(c)) {
 // error condition (end of file before end of token)
 throw new IOException(
   "(startline " + startLineNumber + ")"
     + "eof reached before encapsulated token finished"
 );
  } else {
 // consume character
 tkn.content.append((char) c);
  }
}
}

Eclipse Issues and Fixes

Copy/Paste doesn't work
Disabling "Java->Editor->Typing->Update imports" seems to be an effective work around.

JSON Output of Solr NamedList

Keep User's Change when Update solr.xml

Import CSV that Contains Double-Quotes into Solr

Eclipse Issues and Fixes

Labels