Programmer: Lifelong Learning: March 2013

How Solr Executes a Request

How Solr Executes Search Request to a Single Server
SolrDispatchFilter.doFilter
Find the RequestHandler corresponding to the path.
core.getRequestHandler( path )
SolrDispatchFilter.execute
sreq.getContext().put( "webapp", req.getContextPath() );

SolrCore.execute will call handler.handleRequest.
RequestHandlerBase.handleRequest define basic skeleton, which calls handleRequestBody implemented in subclass.
How Solr parses query string and content stream
SolrDispatchFilter will call solrReq = (SolrRequestParsers)parser.parse( core, path, req );

SolrRequestParsers.parse will put the data into SolrParams and contentStream of request.
SolrParams params = parser.parseParamsAndFillStreams( req, streams );
SolrQueryRequest sreq = buildRequestFrom( core, params, streams );
sreq.getContext().put( "path", path );

SolrRequestParsers wraps multiple SolrRequestParser, in parseParamsAndFillStreams, it will call corresponding SolrRequestParser based on request method(Get or Post) and contentType - formdata: "application/x-www-form-urlencoded", or multipart content.

SolrRequestParsers.init
  parsers.put( MULTIPART, multi );
  parsers.put( FORMDATA, formdata );
  parsers.put( RAW, raw );
  parsers.put( SIMPLE, new SimpleRequestParser() );
  parsers.put( STANDARD, standard );
  parsers.put( "", standard );

RequestHandlerBase.init will parse the request hander definition in solrconfig.xml, and put them into defaults, appends, invariants accordingly.
There is only one instance of request hander for a core, be careful of its thread safety.

SolrCore constructor calls loadSearchComponents, then initPlugins to load all components defined, will also put all classed defined in solrconfig.xml which implement SolrCoreAware int waitingForCore.

Then later it will cal SolrResourceLoader.inform(SolrCore), to call inform(SolrCore) defined in each class.

For a non-shard query, SearchHandler.handleRequestBody will call prepare on all components, and then call process on all components.

QueryComponent.prepare user QParser to prepare parameter to get query, sort.
FacetComponent.prepare check whether facet it enabled: facet=true
MoreLikeThisComponent.prepare does nothing
HighlightComponent.prepare check whether highlight it enabled: hl=true
StatsComponent.prepare check whether stats it enabled: stats=true
DebugComponent.prepare does nothing

QueryComponent.process do the real search stuff, grouping
SolrIndexSearcher searcher = req.getSearcher();
searcher.search(QueryResult qr, QueryCommand cmd)

SearchHandler.inform will create SearchComponent, shardHandlerFactory.
Solr.postDecorateResponse will put status, QTime, and possibly request handler and params, in the response header.
Then SolrDispatchFilter will choose QueryResponseWriter and write response.
Classes: XMLResponseWriter, XMLWriter; CSVResponseWriter, CSVWriter.
Classes
QParser: Parse query, sort
QParserPlugin.DEFAULT_QTYPE=lucene
QueryParsing.LOCALPARAM_START
How to parse Local Param
QParser.getParser(String, String, SolrQueryRequest)
if(qstr.startsWith(QueryParsing.LOCALPARAM_START))
localParamsEnd = QueryParsing.parseLocalParams(qstr, 0, localMap, globalParams);

ThreadLocal: SolrRequestInfo
protected final static ThreadLocal<SolrRequestInfo> threadLocal = new ThreadLocal<SolrRequestInfo>();
It is cleared in finally bock of SolrDispatchFilter.doFilter.
How Solr Executes Shard Requests
Same as non-shard request, through SolrDispatchFilter to SearchHandler.handleRequestBody.
It will call prepare on all components, it defines different stages for distributed request. In method distributedProcess of each component, it checks the current stages and responds accordingly.
If needed, the component may create ShardRequest, and call modifyRequest on each components in ResponseBuilder.addRequest(SearchComponent, ShardRequest).
Then it will remove parameter like shards, set distrib false, use completionService and Callable task to send request to all shards using ShardHandler, later will get ShardResponse back, which wraps SolrResponse. Then call handleResponses on all components, which usually merges response from multiple servers.
Then call finishStage on all components.

ShardHandler shardHandler1 = shardHandlerFactory.getShardHandler();
shardHandler1.checkDistributed(rb);

for (String shard : sreq.actualShards) {
shardHandler1.submit(sreq, shard, params);
}

handler.component.HttpShardHandler.submit will use completionService to submit a Callable task to call HttpSolrServer to send the request,
SolrServer server = new HttpSolrServer(url, httpClient);
ssr.nl = server.request(req);

In the shards parameter, for each shard, we can use | to specify multiple solr server to balance request in multiple servers.
handler.component.HttpShardHandler.getURLs(String)
urls = StrUtils.splitSmart(shard, "|", true);
Resources
http://wiki.apache.org/solr/DistributedSearch

Learning Solr Code: How Solr Is Started

Entrance
SolrDispatchFilter.init
new CoreContainer.Initialize().initialize();
CoreContainer.Initialize.initialize()
1. SolrResourceLoader.locateSolrHome()
The order to find Solr Home: jndi lookup: java:comp/env/solr/home, system environment: solr.solr.home, otherwise solr/, relative to current directory.
2. new CoreContainer(solrHome).load(solrHome, solr.xml)
This will create add jars in solr-home/lib to class loader, parse solr.xml, use ThreadPoolExecutor and CompletionService to load cores defined in parallel, define a CoreDescriptor for each core, in CoreDescriptor, we can see the default value for core properties: loadOnStartup=true, isTransient = false.
CompletionService completionService = new ExecutorCompletionService(
coreLoadExecutor);
Set> pending = new HashSet>();
pending.add(completionService.submit(task));

In the Callable task, it will call CoreContainer.createFromLocal(String, CoreDescriptor).
CoreContainer.createFromLocal

1. Create SolrConfig which represents solrconfig.xml, this will read the xml, load the jars into classloader, create SolrIndexConfig for indexConfig section, create CacheConfig, HttpCachingConfig, load requestHandler, queryParser, transformer etc.
2. Creat IndexSchema which represents schema.xml, this will parse schema,xml. create field types, SchemaField, read SimilarityFactory.
3. Create one SolrCore: core = new SolrCore(dcore.getName(), null, config, schema, dcore);
This will initialize listeners defined in solrconfig.xml, initIndex, initQParsers, initValueSourceParsers, initTransformerFactories. It will initialize RequestHandlers, create one instance for each request handler defined, and put it into a map. So there will be only one instance for each request handleRequestBody. So be careful of the thread safty when write our own request handler.
reqHandlers = new RequestHandlers(this);
reqHandlers.initHandlersFromConfig(solrConfig);

Class in IndexSchema: DynamicField, DynamicCopy

CoreContainer.cfg represents solr.xml:, in CoreContainer, you can find out all available configuration in solr.xml.
Example: String dcoreName = cfg.get("solr/cores/@defaultCoreName", null);

SolrCore.SolrCore(String, String, SolrConfig, IndexSchema, CoreDescriptor, UpdateHandler, SolrCore)
reqHandlers = new RequestHandlers(this);
reqHandlers.initHandlersFromConfig(solrConfig);
SolrResourceLoader
Add solr-home/lib jars to class loader.
this.classLoader = createClassLoader(null, parent);
addToClassLoader("./lib/", null);
reloadLuceneSPI();
Learned
1. Add jars into class loader?
SolrResourceLoader.replaceClassLoader
2. CompletionService

Eclipse: Enable Java Assertions Globally

When run Solr test cases, it reports error:
Test class requires enabled assertions, enable globally (-ea) or for Solr/Lucene subpackages only: org.apache.solr.TestJoin

It's easy to fix it: just add -ea to the vm arguments of the JUnit test case. But it's kind of annoying to have to add -ea every time when I run a new test case.

Gladly, there are several ways to enable java assertions globally:

1. Define _JAVA_OPTIONS

Add one system environment in OS: _JAVA_OPTIONS=-ea

2. Change JUnit Settings

Go to Windows->Preferences->JUnit
Select "Add '-ea' to VM arguments when creating a new Junit launch configuration".
3. Add System Environment in "Installed JREs"
Windows->Preferences -> installed JREs, select the Java you're using, click Edit, add -ea as the default VM arguments.

Resources:
http://mindprod.com/jgloss/javaexe.html#JAVAOPTIONS
http://stackoverflow.com/questions/1798016/junit-enable-assertions-in-class-under-test
http://stackoverflow.com/questions/10639322/how-can-i-specify-the-default-jvm-arguments-for-programs-i-run-from-eclipse

Solr: Split Big CSV File to Speed up Indexing

When import big csv file, we can split it to multiple small csv files, then use multiple threads to import them, this can improve import performance.

Test with one 882mb csv file, using the new threaded request handler with 2GB memory, it takes 6 mins 32 seconds.
If we import the 882mb file directly - no split, it takes 8 mins 22 seconds.
So with this new request handler, it is 32% faster.
Implementation
The code is like below: you can review the complete code at Github.

package org.codeexample.jeffery.solr;

public class ThreadedCSVFileRequestHandler extends UpdateRequestHandler {
  private static final Logger logger = LoggerFactory.getLogger(ThreadedCSVFileRequestHandler.class);

  private static final String PARAM_THREADED_CSV_FILE = "threaded_csv_file";

  public static final String PARAM_THREAD_POOL_SIZE = "threaded_csv_file_thread_pool_size";
  public static final String PARAM_QUEUE_SIZE = "threaded_csv_file_queue_size";

  private static final String PARAM_USE_CONFIGURED_FIELD_NAMES = "useconfiguredfieldnames";

  private static final String PARAM_LINES_READ_FILE = "lines_per_file";
  private static final String PARAM_FILE_SIZE_LIMIT_MB = "file_limit_mb";
  private static final String PARAM_SPLIT_FILE_NUMBER_LIMIT = "split_file_number_limit";

  private static final String PARAM_FIELD_NAMES = "fieldnames";
  private String fieldnames;

  private boolean threadedcsvfile = false;
  private int threadPoolSize;
  private int threadPoolQueueSize;
  // default 100 million
  private int linesPerFile;
  // unit mb
  private int defaultFileSizeLimitMB;
  private static final long MB = 1024 * 1024;

  private int splitFileNumberLimit;

  @SuppressWarnings("rawtypes")
  @Override
  public void init(NamedList args) {
    super.init(args);
    if (args != null) {
      SolrParams params = SolrParams.toSolrParams(args);
      threadedcsvfile = params.getBool(PARAM_THREADED_CSV_FILE, false);
      threadPoolSize = params.getInt(PARAM_THREAD_POOL_SIZE, 1000);
      threadPoolQueueSize = params.getInt(PARAM_QUEUE_SIZE, 1000);
      fieldnames = params.get(PARAM_FIELD_NAMES);

      linesPerFile = params.getInt(PARAM_LINES_READ_FILE, 1000000);
      defaultFileSizeLimitMB = params.getInt(PARAM_FILE_SIZE_LIMIT_MB, 200);
      splitFileNumberLimit = params.getInt(PARAM_SPLIT_FILE_NUMBER_LIMIT, 50);
    }
  }

  @Override
  public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp) throws Exception {
    SolrParams params = req.getParams();

    boolean useconfiguredfieldnames = true;
    boolean tmpThreadedcsvfile = threadedcsvfile;
    if (params != null) {
      useconfiguredfieldnames = params.getBool(PARAM_USE_CONFIGURED_FIELD_NAMES, true);
      tmpThreadedcsvfile = params.getBool(PARAM_THREADED_CSV_FILE, threadedcsvfile);
    }

    if (useconfiguredfieldnames) {
      ModifiableSolrParams modifiableSolrParams = new ModifiableSolrParams(params);
      modifiableSolrParams.set(PARAM_FIELD_NAMES, fieldnames);
      req.setParams(modifiableSolrParams);
    }

    if (tmpThreadedcsvfile) {
      List$lt;ContentStream$gt; streams = getStreams(req);
      if (streams.size() $gt; splitFileNumberLimit) {
        super.handleRequestBody(req, rsp);
      } else {
        threadedCSVFile(req, streams);
      }
    } else {
      super.handleRequestBody(req, rsp);
    }
  }

  private void threadedCSVFile(SolrQueryRequest req, List$lt;ContentStream$gt; streams) throws IOException,
      InterruptedException {
    ThreadCSVFilePoolExecutor importExecutor = null;
    ThreadCSVFilePoolExecutor submitFileExecutor = null;
    final List$lt;File$gt; tmpDirs = new ArrayList$lt;File$gt;();

    try {
      if (req instanceof SolrQueryRequestBase) {
        SolrQueryRequestBase requestBase = (SolrQueryRequestBase) req;
        requestBase.setContentStreams(null);
      }
      List$lt;ContentStream$gt; otherStreams = new ArrayList$lt;ContentStream$gt;(streams.size());

      List$lt;ContentStreamBase.FileStream$gt; streamFiles = new ArrayList$lt;ContentStreamBase.FileStream$gt;(streams.size());

      Iterator$lt;ContentStream$gt; iterator = streams.iterator();
      while (iterator.hasNext()) {
        ContentStream stream = iterator.next();
        iterator.remove();
        if (stream instanceof ContentStreamBase.FileStream) {
          streamFiles.add((FileStream) stream);
        } else {
          otherStreams.add(stream);
        }
      }

      importExecutor = newExecutor(threadPoolSize, threadPoolQueueSize);

      iterator = otherStreams.iterator();
      while (iterator.hasNext()) {
        List$lt;ContentStream$gt; tmpStreams = new ArrayList$lt;ContentStream$gt;();
        tmpStreams.add(iterator.next());
        iterator.remove();

        ImportCSVDataTask callable = new ImportCSVDataTask(this, req, new SolrQueryResponse(), tmpStreams);
        importExecutor.submit(callable);
      }

      Throwable throwable = importExecutor.getTaskThrows();
      if (throwable != null) {
        // should already be shutdown
        logger.error(this.getClass().getName() + " throws exception, shutdown threadpool now.", throwable);
        importExecutor.shutdownNow();
        throw new RuntimeException(throwable);
      }
      if (!streamFiles.isEmpty()) {

        long fileLimit = getFileLimitMB(req);
        // now handle csv files
        Iterator$lt;ContentStreamBase.FileStream$gt; fileStreamIt = streamFiles.iterator();
        // List$lt;Thread$gt; threads = new ArrayList$lt;Thread$gt;(streamFiles.size());

        submitFileExecutor = newExecutor(threadPoolSize, threadPoolQueueSize);

        while (fileStreamIt.hasNext()) {
          ContentStreamBase.FileStream fileStream = fileStreamIt.next();
          fileStreamIt.remove();

          if (fileStream.getSize() $lt;= fileLimit) {
            List$lt;ContentStream$gt; tmpStreams = new ArrayList$lt;ContentStream$gt;();
            tmpStreams.add(fileStream);
            ImportCSVDataTask callable = new ImportCSVDataTask(this, req, new SolrQueryResponse(), tmpStreams);
            importExecutor.submit(callable);
          } else {
            SubmitCSVFileTask task = new SubmitCSVFileTask(importExecutor, this, req, fileStream, linesPerFile, tmpDirs);
            submitFileExecutor.submit(task);
          }
        }
        throwable = submitFileExecutor.getTaskThrows();
      }
      if (throwable != null) {
        // should already be shutdown
        importExecutor.shutdownNow();
        if (submitFileExecutor != null) {
          submitFileExecutor.shutdownNow();
        }
        logger.error(this.getClass().getName() + " throws exception, shutdown threadpool now.", throwable);
        throw new RuntimeException(throwable);
      }
      boolean terminated = false;
      if (submitFileExecutor != null) {
        submitFileExecutor.shutdown();
        terminated = submitFileExecutor.awaitTermination(Long.MAX_VALUE, TimeUnit.MINUTES);
        if (!terminated) {
          logger.error("shutdown submitFileExecutor takes too much time");
          throw new RuntimeException("Request takes too much time");
        }
      }
      importExecutor.shutdown();
      terminated = importExecutor.awaitTermination(Long.MAX_VALUE, TimeUnit.MINUTES);
      if (!terminated) {
        logger.error("shutdown importExecutor takes too much time");
        throw new RuntimeException("Request takes too much time");
      }
    } finally {
      if (submitFileExecutor != null) {
        try {
          submitFileExecutor.shutdownNow();
        } catch (Exception e) {
          logger.error("submitFileExecutor.shutdownNow throws: " + e);
        }
      }
      if (importExecutor != null) {
        try {
          importExecutor.shutdownNow();
        } catch (Exception e) {
          logger.error("importExecutor.shutdownNow throws: " + e);
        }
      }

      // remove all files in tmpDirs
      new Thread() {
        public void run() {
          for (File dir : tmpDirs) {
            try {
              deleteDirectory(dir);
              logger.info("Deleted tmp dir:" + dir);
            } catch (IOException e) {
              logger.error("Exception happened when delete tmp dir: " + dir, e);
            }
          }
        };
      }.start();
    }

  }

  void deleteDirectory(File dir) throws IOException {
    if (dir.isDirectory()) {
      for (File file : dir.listFiles()) {
        if (file.isDirectory()) {
          deleteDirectory(file);
        } else {
          file.delete();
        }
      }
    }
    if (!dir.delete())
      throw new FileNotFoundException("Failed to delete file: " + dir);
  }

  private long getFileLimitMB(SolrQueryRequest req) {
    long mb = req.getParams().getInt(PARAM_FILE_SIZE_LIMIT_MB, defaultFileSizeLimitMB);
    return mb * MB;
  }

  private static class SubmitCSVFileTask implements Callable$lt;Void$gt; {
    // private volatile boolean running = true;
    private ContentStreamBase.FileStream fileStream;
    private ThreadCSVFilePoolExecutor executor;
    private ThreadedCSVFileRequestHandler requestHandler;

    private SolrQueryRequest req;
    private BufferedReader srcBr;
    private int linesPerFile;
    private List$lt;File$gt; tmpDirs;

    public SubmitCSVFileTask(ThreadCSVFilePoolExecutor executor, ThreadedCSVFileRequestHandler requestHandler,
        SolrQueryRequest req, FileStream fileStream, int lines, List$lt;File$gt; tmpDirs) throws IOException {
      super();
      this.executor = executor;
      this.requestHandler = requestHandler;
      this.req = req;
      this.fileStream = fileStream;
      srcBr = new BufferedReader(fileStream.getReader());
      this.linesPerFile = lines;
      this.tmpDirs = tmpDirs;
    }

    private void doSplitSubmit(File srcFile, File tmpDir) throws Exception {
      logger.info("Start to split " + srcFile + " to " + tmpDir);
      int counter = 0;
      try {
        while (srcBr.ready()) {
          String newFileName = tmpDir.getAbsolutePath() + File.separator + srcFile.getName() + counter;
          File newFile = new File(newFileName);
          ++counter;
          boolean created = copyTo(newFile);
          if (!created) {
            break;
          } else {
            // submit file
            FileStream tmpFileStream = new FileStream(newFile);
            List$lt;ContentStream$gt; tmpStreams = new ArrayList$lt;ContentStream$gt;();
            tmpStreams.add(tmpFileStream);
            ImportCSVDataTask callable = new ImportCSVDataTask(requestHandler, req, new SolrQueryResponse(), tmpStreams);
            executor.submit(callable);
          }
        }
      } finally {
        if (srcBr != null) {
          try {
            srcBr.close();
          } catch (Exception e) {
            e.printStackTrace();
          }
        }
      }
      logger.info("Finished split " + srcFile + " to " + tmpDir);
    }

    private boolean copyTo(File newFile) throws Exception {
      boolean created = true;
      int linesRead = 0;
      BufferedWriter bw = new BufferedWriter(new FileWriter(newFile));
      try {
        while (linesRead $lt; linesPerFile) {
          String line = srcBr.readLine();
          if (line == null) {
            break;
          } else {
            line = line.trim();
            if (line.length() != 0) {
              ++linesRead;
              bw.write(line);
              bw.newLine();
            }
          }
        }
      } finally {
        if (bw != null) {
          try {
            bw.close();
          } catch (Exception e) {
            e.printStackTrace();
          }
        }
        if (linesRead == 0) {
          newFile.delete();
          created = false;
        }
      }
      return created;
    }

    @Override
    public Void call() throws Exception {
      try {
        URI uri = URI.create(fileStream.getSourceInfo());
        File srcFile = new File(uri);
        File tmpDir = createTempDir(srcFile.getName());
        tmpDirs.add(tmpDir);
        if (logger.isDebugEnabled()) {
          logger.debug("Create tmpdir: " + tmpDir.getAbsolutePath());
        }
        doSplitSubmit(srcFile, tmpDir);
      } catch (Exception e) {
        logger.error("Exception happened when handle file: " + fileStream.getName(), e);
        throw e;
      } finally {
        if (srcBr != null) {
          try {
            srcBr.close();
          } catch (IOException e) {
            logger.error("Exception happened when close BufferedReader for file: " + fileStream.getName(), e);

          }
        }
      }
      return null;
    }

    public static File createTempDir(String prefix) {
      File baseDir = new File(System.getProperty("java.io.tmpdir"));
      String baseName = System.currentTimeMillis() + "-";
      int TEMP_DIR_ATTEMPTS = 20;
      for (int counter = 0; counter $lt; TEMP_DIR_ATTEMPTS; counter++) {
        File tempDir = new File(baseDir, prefix + "-" + baseName + counter);
        if (tempDir.mkdir()) {
          return tempDir;
        }
      }
      throw new IllegalStateException("Failed to create directory within " + TEMP_DIR_ATTEMPTS + " attempts (tried "
          + baseName + "0 to " + baseName + (TEMP_DIR_ATTEMPTS - 1) + ')');
    }
  }

  /**
   *
   */
  private static class ImportCSVDataTask implements Callable$lt;Void$gt; {
    private SolrQueryRequest req;
    private SolrQueryResponse rsp;
    private ThreadedCSVFileRequestHandler requestHandler;
    private List$lt;ContentStream$gt; streams;

    // private long endLine;

    public ImportCSVDataTask(ThreadedCSVFileRequestHandler requestHandler, SolrQueryRequest req, SolrQueryResponse rsp,
        List$lt;ContentStream$gt; streams) {
      super();
      this.req = req;
      this.rsp = rsp;
      this.requestHandler = requestHandler;
      this.streams = streams;
    }

    @Override
    public Void call() throws Exception {
      UpdateRequestProcessor processor = null;

      UpdateRequestProcessorChain processorChain = req.getCore().getUpdateProcessingChain(
          req.getParams().get(UpdateParams.UPDATE_CHAIN));

      processor = processorChain.createProcessor(req, rsp);
      ContentStreamLoader documentLoader = requestHandler.newLoader(req, processor);

      for (ContentStream stream : streams) {
        if (stream.getName() != null) {
          logger.info("Start to import " + stream.getName());
        }
        documentLoader.load(req, rsp, stream, processor);
        if (stream.getName() != null) {
          logger.info("Finished import " + stream.getName());
        }
      }

      return null;
    }
  }

  private ThreadCSVFilePoolExecutor newExecutor(int threadPoolSize, int queueSize) {
    ThreadCSVFilePoolExecutor executor;
    executor = new ThreadCSVFilePoolExecutor(threadPoolSize, threadPoolSize, 60, TimeUnit.SECONDS,
        new ArrayBlockingQueue$lt;Runnable$gt;(queueSize), new ThreadPoolExecutor.CallerRunsPolicy());
    executor.allowCoreThreadTimeOut(true);
    return executor;
  }

  private static class ThreadCSVFilePoolExecutor extends ThreadPoolExecutor {

    private Throwable taskThrows;

    public Throwable getTaskThrows() {
      return taskThrows;
    }

    public ThreadCSVFilePoolExecutor(int corePoolSize, int maximumPoolSize, long keepAliveTime, TimeUnit unit,
        ArrayBlockingQueue$lt;Runnable$gt; workQueue, CallerRunsPolicy callerRunsPolicy) {
      super(corePoolSize, maximumPoolSize, keepAliveTime, unit, workQueue, new CSVFileThreadedThreadFactory(
          ThreadedCSVFileRequestHandler.class.getName()), callerRunsPolicy);
    }

    /*
     * From:
     * http://stackoverflow.com/questions/2248131/handling-exceptions-from-java
     * -executorservice-tasks
     */
    @Override
    protected void afterExecute(Runnable runnable, Throwable throwable) {
      super.afterExecute(runnable, throwable);
      taskThrows = throwable;
      if (throwable == null && runnable instanceof Future$lt;?$gt;) {
        try {
          Future$lt;?$gt; future = (Future$lt;?$gt;) runnable;
          future.get();
        } catch (CancellationException ce) {
          taskThrows = ce;
        } catch (ExecutionException ee) {
          taskThrows = ee.getCause();
        } catch (InterruptedException ie) {
          Thread.currentThread().interrupt(); // ignore/reset
        }
      }
      if (taskThrows != null) {
        logger.error("Task throws exception, shutdown threadpool." + taskThrows);
        shutdownNow();
      }
    }

  }

  private static class CSVFileThreadedThreadFactory implements ThreadFactory {
    private static final AtomicInteger POOLNUMBER = new AtomicInteger(1);
    private final ThreadGroup group;
    private final AtomicInteger threadNumber = new AtomicInteger(1);
    private String namePrefix;

    CSVFileThreadedThreadFactory(String prefix) {
      SecurityManager s = System.getSecurityManager();
      group = (s != null) ? s.getThreadGroup() : Thread.currentThread().getThreadGroup();
      namePrefix = (prefix == null ? this.getClass().getSimpleName() : prefix) + POOLNUMBER.getAndIncrement()
          + "-thread-";
    }

    public Thread newThread(Runnable r) {
      Thread t = new Thread(group, r, namePrefix + threadNumber.getAndIncrement(), 0);
      if (t.isDaemon())
        t.setDaemon(false);
      if (t.getPriority() != Thread.NORM_PRIORITY)
        t.setPriority(Thread.NORM_PRIORITY);
      return t;
    }
  }
}

Solr: Run Embedded Jetty As Windows Service

In previous posts, I introduced how to build a package to include embedded jetty, solr.war, and solr.home in one package, reduce size of the package, and to start/stop it programmatically.
Part 1: Shrink Solr Application Size
Part 2: Use Proguard to Shrink Solr Application Size
Part 3: Use Pack200 to Shrink Solr Application Size
Start Stop Embedded Jetty Programmatically

In this article, I would like to introduce how to use Apache commons-daemon to install it as a Windows Service.

First download latest commons-daemon-*-bin-windows.zip from here.
Download latest commons-daemon-*-src.tar.gz from here.

In src\samples folder in commons-daemon-*-src.tar.gz, you can learn how to create a service, and the scripts to install and remove windows services:
ProcrunService.java, ProcrunServiceInstall.cmd, ProcrunServiceRemove.cmd.

The java code to start/stop embedded jetty server is here.
Next we create 2 scripts to install it as a windows service, and remove the windows services.
installEmbededJettyService.bat

@echo off
setlocal
set MYPATH=%~dp0
set PATH_PRUNSRV=%MYPATH%
set SERVICE_JAVA=EmbededJetty
set PR_DESCRIPTION=Embedded Jetty Server
set PR_DISPLAYNAME=Embedded Jetty Server
set "PR_LOGPATH=%MYPATH%/../logs"
set PATH="%MYPATH%/../lib/servlet-api-3.0.jar;%MYPATH%/../lib/jetty-all.jar;%MYPATH%/../lib/startjetty.jar"

rem Allow prunsrv to be overridden
if "%PRUNSRV%" == "" set PRUNSRV=%PATH_PRUNSRV%prunsrv

echo Installing %SERVICE_JAVA%
%PRUNSRV% //DS//%SERVICE_JAVA%
%PRUNSRV% //IS//%SERVICE_JAVA% --Install="%MYPATH%prunsrv"

if not errorlevel 1 goto installed
echo Failed installing '%SERVICE_JAVA%' service
goto end
:installed
echo The service '%SERVICE_JAVA%' has been installed.

set MY_JVMOPTIONS=-server;-Xms512M;-Xmx2048M
echo Setting the parameters for %SERVICE_JAVA%
%PRUNSRV% //US//%SERVICE_JAVA% --StdOutput auto --StdError auto ^
--Classpath=%PATH% --JvmOptions=%MY_JVMOPTIONS% --Startup=manual ^
--StartMode=java --StartClass=com.codeexample.solr.EmbeddedSolrJettyServer --StartParams=start;-dynamicPort;true;%1;%2;%3;%4;%5;%6 ^
--StopMode=java  --StopClass=com.codeexample.solr.EmbeddedSolrJettyServer  --StopParams=shutdown

if not errorlevel 1 goto updated
echo Failed updating '%SERVICE_JAVA%' service
goto end
:updated
echo The service '%SERVICE_JAVA%' has been updated.

echo Installation of %SERVICE_JAVA% is complete
:end
endlocal 
@echo on

uninstallEmbededJettyService.bat

@echo off
setlocal
set MYPATH=%~dp0
echo %MYPATH%
set PATH_PRUNSRV=%MYPATH%
set SERVICE_JAVA=EmbededJetty
set "PR_LOGPATH=%MYPATH%/../logs"
if "%PRUNSRV%" == "" set PRUNSRV=%PATH_PRUNSRV%prunsrv

echo Removing %SERVICE_JAVA%
%PRUNSRV% //DS//%SERVICE_JAVA%

if not errorlevel 1 goto removed
echo.
echo Failed uninstalling '%SERVICE_JAVA%' service
goto end
:removed
echo The service '%SERVICE_JAVA%' has been removed
:end
endlocal
@echo on

We can copy prunmgr.exe to the bin folder, and rename it as ${service-name}w.exe, so user can run it to edit it, or start, stop it like below:

2 Scripts to start and stop the windows servers.
startService.bat

@echo off
setlocal
set MYPATH=%~dp0
set PATH_PRUNSRV=%MYPATH%
set SERVICE_JAVA=EmbededJetty
if "%PRUNSRV%" == "" set PRUNSRV=%PATH_PRUNSRV%prunsrv

if [%1] == [] goto startService

echo Changing the parameters for %SERVICE_JAVA%
%PRUNSRV% //US//%SERVICE_JAVA% --StartParams=start;-dynamicPort;true;%1;%2;%3;%4;%5;%6 

if not errorlevel 1 goto updated
echo Failed updating '%SERVICE_JAVA%' service
goto end
:updated
echo The service '%SERVICE_JAVA%' has been updated.

:startService
"%PRUNSRV%" //ES//%SERVICE_JAVA%
if not errorlevel 1 goto started
echo Failed starting '%SERVICE_JAVA%' service
goto end

:started
echo %SERVICE_JAVA% is started.

:end
endlocal 
@echo on

stopService.bat

@echo off
setlocal
set MYPATH=%~dp0
set PATH_PRUNSRV=%MYPATH%
set SERVICE_JAVA=EmbededJetty
if "%PRUNSRV%" == "" set PRUNSRV=%PATH_PRUNSRV%prunsrv
:startService
%PRUNSRV%  //SS//%SERVICE_JAVA%
if not errorlevel 1 goto stopped
echo Failed stopping '%SERVICE_JAVA%' service
goto end

:stopped
echo %SERVICE_JAVA% is stopped.

:end
endlocal 
@echo on

Learning Solr: How to import CSV String into Solr

Examples we find on web are usually to import csv files into Solr, here I want to to show how to import CSV string into Solr.
Why we want to import CSV string?
1 Compared to importing csv files: importing csv string would be much faster, as no unnecessary IO: no need to write the csv files in client side and read it at server side.
2 Compared to importing XML String, importing csv string should be also faster:
It is faster than writing xml string in the client would and reading/parsing it in server side.

3. CSV string is usually much smaller than XML, less time spend on network transfer, also can save a little bandwidth.
How to import CSV String into Solr?
1. Use stream.body to specify csv data, one thing we should pay attention, we have to use ASCII code %0D%0A to separate lines - use \r\r wouldn't work.
2. In the stream body, you have to escape special character, change “ to \”, \ to \\.

The following request would import 2 lines into Solr.
curl -d "stream.body=2,0,1,0,1,\"c:\\\",1,0,\"c:\",0,1,16 %0D%0A 2,0,1,0,1,\"x:\\\",2,0,\"x:\",0,1,16 &separator=,&fieldnames=omiited&literal.id=9000&stream.contentType=text/csv;charset=utf-8&commit=true" http://localhost:8080/solr/update/csv

In order to simplify client code, we can create a new requestHandler, which sets fieldnames, so client need not specify fieldnames in each request.
Solr Code

org.apache.solr.handler.loader.CSVLoaderBase.load(SolrQueryRequest, SolrQueryResponse, ContentStream, UpdateRequestProcessor)
org.apache.solr.internal.csv.CSVParser
getLine()
nextToken(Token)
  private boolean isEndOfLine(int c) throws IOException {
    // check if we have \r\n...
    if (c == '\r') {
      if (in.lookAhead() == '\n') {
        // note: does not change c outside of this method !!
        c = in.read();
      }
    }
    return (c == '\n');
  }

References:
UpdateCSV - Solr Wiki

Gotcha in Java Properties

Don't mix Properties.put and getProperty
Today, I made a mistake, the code is like below: I put a key/value pair, but the getProperty return null, so the code throws NPE which I didn't expect.

// Don't mix put and getProperty
properties.put("aint", 123);
// Throws NumberFormatException, Throws NumberFormatException, as properties.getProperty("aint") return NULL.
aint = Integer.parseInt(properties.getProperty("aint"));

Why getProperty returns null?
The reason is that Properties extends Hashtable, put its value into a Hashtable. its getProperty method will only return String value, it first use call super.get(key) to search the Hashtable, if the key doesn't exist, or its value is not a String, it will try to get from the string value from defaults.

public String getProperty(String key) {
        Object oval = super.get(key);
        String sval = (oval instanceof String) ? (String)oval : null;
        return ((sval == null) && (defaults != null)) ? defaults.getProperty(key) : sval;
    }

As we usually will read Properties from a configuration file, and later save it to the file, so it's better we always use getProperty and setProperty, but use put, and get methods.
defaults in Properties
defaults is also a Properties. When construct a new properties:newProp, we can pass a Properties:oldProp, it will save it as an instance value, and leave it here. When get value of a key, it will first search the Properties itself, it not found, it will search the defaults.

When you use remove to remove a key, it will only remove from the Properties's Hashtable, not from defaults. Later get(key) will return null as it only searches the Properties's Hashtable, but getProperty will return the value in the defaults.

If you don't like this behavior, you can use properties.putAll(aProperties); this will copy all element into the Hashtable of this new peoperties.
Summary
Java Properties is not well designed, as it extends Hashtable directly, and expose many mehtods of Hashtable which it should not. But it's still the simple way to read and save properties from/to a file.

Just be careful when we use it, always use getProperty and setProperty, not put nor get methods.
The test code is like below:

private static void testProperties() {
 int aint;
 // SHould always use setProperty and getProperty
 Properties properties = new Properties();
 properties.setProperty("aint", "123");
 aint = Integer.parseInt(properties.getProperty("aint"));

 // Don't mix put and getProperty
 properties = new Properties();
 properties.put("aint", 123);
 // Throws NumberFormatException, as properties.getProperty("aint") return NULL.
 try {
  aint = Integer.parseInt(properties.getProperty("aint"));
 } catch (NumberFormatException e) {
  e.printStackTrace();
 }

 Properties oldProperties = new Properties();
 oldProperties.setProperty("defaultKey", "value");
 Properties newProps = new Properties(oldProperties);
 newProps.remove("defaultKey");
 // print null
 System.out.println(newProps.get("defaultKey"));

 // print value
 System.out.println(newProps.getProperty("defaultKey"));
}