How To Conduct a Technical Interview Effectively

Technical Skills
- Problem solving: not-easy algorithm questions
- Coding
- Design
Soft skills
- Communication
- Retrospect
  - Mistakes related with design/decision
  - What you learned from your mistake
  - Bugs/troubleshooting
- Eager to learn
- Be flexible, willing to listen, not stubborn

What questions to ask
- ask interesting/challenging questions
- Or questions that's not difficult but focus on coding (bug free)
- ask questions that can be solved in different ways
- Avoid questions that can only solved one specific approach, unless it's obvious(binary search etc), and you are tesing coding skills not problem solving skills

Don't ask 
- brain teasers, puzzles, riddles
- problems only because you are interested, you just happen to know, or you just learned recently

Know the questions very well
- Different approaches
- Expect different approaches that you don't even know
  - Verify it(use example, proof), if it works, the candidate does a good job and you also learn something new

Know common cause of bugs
- Able to detect bugs in candidate's code quickly

Give candidates the opportunity to prove themselves and shine
We are trying to evaluate the candidate's skills thoroughly, what he/she is good at, what not.
If you plan to ask 2 coding questions, one simple, and one more difficult, tell candidates
Let the candidates know your expectation

Make the candidates learn something
- If the candidate doesn't give right solution/answer, and at the end of the interview, he/she wants to know how to approach it, tell him/her.
Why?
- Candidates takes a lot of effort for the interview (one day off and commute), if they desire to learn something, and learning something make them feel good
- Prove that you know the solution and have reasonable answer, and not ask questions you even don't know much

No surprise
If you find issues/bugs in candidate's code or design, point them out
The candidate should have a rough idea about how he/she performs in this interview

Be fair

Phone interview
Prefer coding question over design question
- as design is partly about communication and it's hard to test communication skills over phone

About me - Jeffery Yuan (2017)

This would be a short list that about I am good at and what I should improve.
- I will keep updating it, and hope when I retrospect after 1 year, I will realize that I have improved  and learned a lot of things.

Strength
Retrospect and Learning Logs
- I like to summarize what I have learned, and write them down

Sharing Knowledge

Problem Solving and troubleshooting
- I like to solve difficult problems as I can always learn something from it.
- I also summarize how(what steps) I take to solve the problems, what I learned that can make me solve problems quicker later.
- See more at my blog: Troubleshooting

Proactively find problems and fix them
- such as find problems in existing design and code, and think about how to improve them

Be honest
- to myself and colleague about what I know and what I don't
Be moderate
- I know there are still a lot of things that I should learn and improve.
- I like to learn from others

Proactively learning
- Have a safaribooksonline account
- Like to learn from book, and people
- When I use Cassandra, Kafka in our project, I took time to learn not only how to use it but more importantly its high level design.
- Read more at my log System Design
Programmer: Lifelong Learning

Weakness - things need improving
System design
Knowledge about distributed system
Public Speaking
Presentation
Visibility

How to Review and Discuss Software Design

Talk/Think about all related
- how do we store data, 
- client api 
- ui change
- back compatibility: how to handle old data/client

But focus on most important stuff (first)

Talk/think about design principles/practices
- such as idempotent, parallelization,monitoring, etc
- Check more at System Design - Summary

What's the impact of other (internal and cross-team) components?


How others components use it?

What're the known and potential constraints/issues/flaws in current design?
Don't only talk about its advantages, 
Also talk about issues, don't hide them

What are alternatives?
Think alternative and different approaches, this can help find better solution
We can't really review and compare if there is no alternatives

Welcome different approaches
- although it doesn't mean it's better, or we will use it


Development Cost
- How difficult it takes to implement?

What may change and How to evolve

What may change in (very) near future?

Visibility/Monitoring
How do can we know when the new feature works or doesn't work
How can we know problems happen

Feature Flag
Can we enable/disable the feature at runtime

Be Prepared
Ok to have informal/impromptu discussion with one or two colleagues

But make sure everyone is prepared for the formal team design discussion
All attendees should know the topic: how they would design it

Don't make design decision immediately - for things that really matters
Take time to reflect and develop disagreement, talk it again later

Attitude
Listen first

When you don't agree with other's approaches
Don't get too defensive
Talk about ideas not people

Be prepared

Related
System Design - Summary

Problem Solving Practice - Redis cache.put Hangs

The Issue
After deployed the change: Multi Tiered Caching - Using in-process EhCache in front of Distributed Redis to test environment (with some other change and someone did some change in the server like restart), we found out that cache.put hangs when save data to redis.

Troubleshooting Process
First we tried to reproduce the issue in my local setup, it always works. But we can easily reproduce it in test environment.

This mde me think this maybe something related with the test environment.

Then I used kill -8 processId to generate several thread dumps when reproduce the issue in test machine. I found out some suspect:
"ajp-nio-8009-exec-10" #91 daemon prio=5 os_prio=0 tid=0x00007f49c400a800 nid=0x75db waiting on condition [0x00007f495333e000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at  RedisCache$RedisCachePutCallback(RedisCache$AbstractRedisCacheCallback).waitForLock(RedisConnection) line: 600
RedisCache$RedisCachePutCallback(RedisCache$AbstractRedisCacheCallback).doInRedis(RedisConnection) line: 564
at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:207)
at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:169)
at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:157)
at org.springframework.data.redis.cache.RedisCache.put(RedisCache.java:226)
at org.springframework.data.redis.cache.RedisCache.put(RedisCache.java:194)
at com.lifelong.example.MultiTieredCache.lambda$put$40(MultiTieredCache.java:130)
at com.lifelong.example.MultiTieredCache$$Lambda$18/1283186866.accept(Unknown Source)
at java.util.ArrayList.forEach(ArrayList.java:1249)
at com.lifelong.example.MultiTieredCache.put(MultiTieredCache.java:128)
at org.springframework.cache.interceptor.AbstractCacheInvoker.doPut(AbstractCacheInvoker.java:85)
at org.springframework.cache.interceptor.CacheAspectSupport$CachePutRequest.apply(CacheAspectSupport.java:784)
at org.springframework.cache.interceptor.CacheAspectSupport.execute(CacheAspectSupport.java:417)
at org.springframework.cache.interceptor.CacheAspectSupport.execute(CacheAspectSupport.java:327)
at org.springframework.cache.interceptor.CacheInterceptor.invoke(CacheInterceptor.java:61)

Check the code at RedisCache$AbstractRedisCacheCallback to understand how it works:
for operations like put/putIfAbsent/evict/clear, @cacheable with sync =true(RedisWriteThroughCallback), it check whether there is a key like cacheName~lock in redis, if exist, it will wait until it's gone.

This lock is created and deleted for @Cacheable with sync =true in RedisWriteThroughCallback which calls lock and unlock methods.

This made me check the settings in redis: after created the tunnel to redis, ran command: key cacheName~lock, I found out that it's indeed there.

Now everything make sense:
- we did set sync=true and run performance test, then restarted the server and removed it. The cacheName~lock was left there may be due to server restart. Due to the cacheName~lock, now all resid update api would not work.

After removed cacheName~lock in redis, everything works fine.

Take away
- When use some feature (@Cacheable(sync=true) in this case), know how it's implemented.

Multi Tiered Caching - Using in-process Cache in front of Distributed Cache

Why Multi Tiered Caching?
  To improve application's performance, we usually cache data in distributed cache like redis/memcached or in-process cache like EhCache. 

  Each have its own strengths and weaknesses:
  In-Process Cache is faster but it's hard to maintain consistency and can't store a lot of data; This can be easily solved when using a distributed cache, but it's slower due to network latency and serialization and deserialization.

  In some cases, we may want to use both: mainly use a distributed cache to cache data, but also cache data that is small and doesn't change often (or at all) such as configuration in in-process cache.
The Implementation
  Spring uses CacheManager to determine which cache implementation to use.
  We define our own MultiTieredCacheManager and MultiTieredCache like below.
public class MultiTieredCacheManager extends AbstractCacheManager {
    private final List<CacheManager> cacheManagers;
    /**
     * @param cacheManagers - the order matters, when fetch data, it will check the first one if not
     *        there, will check the second one, then back-fill the first one
     */
    public MultiTieredCacheManager(final List<CacheManager> cacheManagers) {
        this.cacheManagers = cacheManagers;
    }
    @Override
    protected Collection<? extends Cache> loadCaches() {
        return new ArrayList<>();
    }
    @Override
    protected Cache getMissingCache(final String name) {
        return new MultiTieredCache(name, cacheManagers);
    }
}

public class MultiTieredCache implements Cache {
    private static final Logger logger = LoggerFactory.getLogger(MultiTieredCache.class);

    private final List<Cache> caches = new ArrayList<>();
    private final String name;

    public MultiTieredCache(final String name, @Nonnull final List<CacheManager> cacheManagers) {
        this.name = name;
        for (final CacheManager cacheManager : cacheManagers) {
            caches.add(cacheManager.getCache(name));
        }
    }

    @Override
    public ValueWrapper get(final Object key) {
        ValueWrapper result = null;
        final List<Cache> cachesWithoutKey = new ArrayList<>();
        for (final Cache cache : caches) {
            result = cache.get(key);
            if (result != null) {
                break;
            } else {
                cachesWithoutKey.add(cache);
            }
        }
        if (result != null) {
            for (final Cache cache : cachesWithoutKey) {
                cache.put(key, result.get());
            }
        }
        return result;
    }

    @Override
    public <T> T get(final Object key, final Class<T> type) {
        T result = null;
        final List<Cache> noThisKeyCaches = new ArrayList<>();
        for (final Cache cache : caches) {
            result = cache.get(key, type);
            if (result != null) {
                break;
            } else {
                noThisKeyCaches.add(cache);
            }
        }
        if (result != null) {
            for (final Cache cache : noThisKeyCaches) {
                cache.put(key, result);
            }
        }

        return result;
    }
    // called when set sync = true in @Cacheable
    public <T> T get(final Object key, final Callable<T> valueLoader) {
        T result = null;
        for (final Cache cache : caches) {
            result = cache.get(key, valueLoader);
            if (result != null) {
                break;
            }
        }
        return result;
    }
    @Override
    public void put(final Object key, final Object value) {
        caches.forEach(cache -> cache.put(key, value));
    }
    @Override
    public void evict(final Object key) {
        caches.forEach(cache -> cache.evict(key));
    }
    @Override
    public void clear() {
        caches.forEach(cache -> cache.clear());
    }
    @Override
    public String getName() {
        return name;
    }
    @Override
    public Object getNativeCache() {
        return this;
    }
}

@Configuration
@EnableCaching
public class CacheConfig extends CachingConfigurerSupport {
  @Bean
  @Primary
  public CacheManager cacheManager(EhCacheCacheManager ehCacheCacheManager, RedisCacheManager redisCacheManager) {
      if (!cacheEnabled) {
          return new NoOpCacheManager();
      }
      // Be careful when make change - the order matters
      ArrayList<CacheManager> cacheManagers = new ArrayList<>();
      if (ehCacheEnabled) {
          cacheManagers.add(ehCacheCacheManager);
      }
      if (redisCacheEnabled) {
          cacheManagers.add(redisCacheManager);
      }
      return new MultiTieredCacheManager(cacheManagers);
  }

  @Bean(name = EH_CACHE_CACHE_MANAGER)
  public EhCacheCacheManager ehCacheCacheManager() {
      final EhCacheManagerFactoryBean ehCacheManagerFactoryBean = new EhCacheManagerFactoryBean();
      ehCacheManagerFactoryBean.setConfigLocation(new ClassPathResource("ehcache.xml"));
      ehCacheManagerFactoryBean.setShared(true);
      ehCacheManagerFactoryBean.afterPropertiesSet();

      final EhCacheManagerWrapper ehCacheManagerWrapper = new EhCacheManagerWrapper();
      ehCacheManagerWrapper.setCacheManager(ehCacheManagerFactoryBean.getObject());
      return ehCacheManagerWrapper;
  }

  @Bean(name = "redisCacheManager")
  public RedisCacheManager redisCacheManager(final RedisTemplate<String, Object> redisTemplate) {
      final RedisCacheManager redisCacheManager =
              new RedisCacheManager(redisTemplate, Collections.<String>emptyList(), true);
      redisCacheManager.setDefaultExpiration(DEFAULT_CACHE_EXPIRE_TIME_IN_SECOND);
      redisCacheManager.setExpires(expires);
      redisCacheManager.setLoadRemoteCachesOnStartup(true);
      return redisCacheManager;
  }
}
Misc
  Others things we can do when use multi (tiered) cache in CacheManager:
- We can use cache name prefix to determine which cache to use.
- We can add logic to only cache some kinds of data in specific cache.

TODO
- able to use only Distributed Cache or only in-process Cache

Making Child Documents Working with Spring-data-solr

The Problem
We use spring-data-solr in our project - as we like its conversion feature which can convert string to enum, entity to json data and etc, and vice versa, and recently we need use Solr's nested documents feature which spring-data-solr doesn't support.

Issues in Spring-data-solr
SolrInputDocument class contains a Map _fields AND List _childDocuments.

Spring-data-solr converts java entity class to SolrDocument. It provides two converters: MappingSolrConverter and SolrJConverter.

MappingSolrConverter converts the entity to a Map: MappingSolrConverter.write(Object, Map, SolrPersistentEntity)

SolrJConverter uses solr's DocumentObjectBinder to convert entity to SolrInputDocument,
it will convert field that is annotated with @Field(child = true) to child documents.
- This also means that spring-data-solt's convert features will not work with SolrJConverter

BUT SolrJConverter still just thinks SolrInputDocument is a map and add all into the destination: Map sink
- SolrJConverter.write(Object, Map)

After this, the child documents is discarded.

The Fix
We still want to use spring-data-solr's conversion functions - partly because we don't want to rewrite everything to use SolrJ directly.

So when save to solr: we uses spring-data-solr's MappingSolrConverter to convert parent entity as solrInputDocument, then convert child entities as solrInputDocuments and add them into parent's solrInputDocument.

When read from solr, we read the SolrDocument as parent entity, then read its child documents as child entities and add them into parent entity.
public class ParentEntity {
  @Field(child = true)
  private List<ChildEntity> children;
}
@Autowired
protected SolrClient solrClient;

// we add our own converters into MappingSolrConverter
// for more, please check 
// http://lifelongprogrammer.blogspot.com/2015/09/mix-spring-data-solr-and-solrj-in-solr.html
@Autowired
protected MyMappingSolrConverter solrConverter;

public void save(@Nonnull final ParentEntity parentEntity) {
    final SolrInputDocument solrInputDocument = solrConverter.createAndWrite(parentEntity);
    daddChildDocuemnts(parentEntity, solrInputDocument);
    try {
        solrClient.add(getCollection(), solrInputDocument);
        solrClient.commit(getCollection());
    } catch (SolrServerException | IOException e) {
        throw new BusinessException(e, "failed to save " + parentEntity);
    }
}

protected void daddChildDocuemnts(@Nonnull final ParentEntity parentEntity,
        @Nonnull final SolrInputDocument solrInputDocument) {
    solrInputDocument.addChildDocuments(parentEntity.getChildren().stream()
            .map(child -> solrConverter.createAndWrite(child)).collect(Collectors.toList()));
}

public List<T> querySolr(final SolrParams query) {
    try {
        final QueryResponse response = solrClient.query(getCollection(), query);
        return convertFromSolrDocs(response.getResults());
    } catch (final Exception e) {
        throw new BusinessException("data retrieve failed." + query);
    }
}
/*
 * Also return child documents in solr response as ChildEntity if it exists
 */
protected List<ParentEntity> convertFromSolrDocs(final SolrDocumentList docList) {
    List<ParentEntity> result = new ArrayList<>();
    if (docList != null) {
        result = docList.stream().map(solrDoc -> {
            final ParentEntity parentEntity = solrConverter.read(ParentEntity.class, solrDoc);
            final List<SolrDocument> childDocs = solrDoc.getChildDocuments();
            if (childDocs != null) {
                ParentEntity.setChildren(
                        childDocs.stream().map(childSolrDoc -> solrConverter.read(ChildEntity.class, solrDoc))
                                .collect(Collectors.toList()));
            }

            return parentEntity;
        }).collect(Collectors.toList());
    }

    return result;
}
Related
Mix Spring Data Solr and SolrJ in Solr Cloud 5
SolrJ: Support Converter and make it easier to extend DocumentObjectBinder

Eclipse: Add another Project as Dependency may Cause Unexpected Exception

The Problem
In local development, we run spring-boot application in eclipse tomcat - as we also deploy the project as a war.

But for some reason, one developer stills run it as a java application, and it fails with error - while it works well when run in (eclipse) tomcat.
Bean instantiation via factory method failed; nested exception is org.springframework.beans.BeanInstantiationException:
Failed to instantiate [com.amazonaws.services.s3.AmazonS3]: Factory method 's3Client' threw exception;
nested exception is java.lang.NoSuchMethodError: com.amazonaws.handlers.HandlerChainFactory.getGlobalHandlers()Ljava/util/List;

The Root Cause
Maven uses nearest wins strategy to determine which version to use, So we explicitly specify what version of aws-java-sdk-s3 to use in admin's pom.xml, but one library also implicitly depends on aws-java-sdk-s3 in common-module project. 

run mvn dependency:tree -Dverbose -Dincludes=com.amazonaws:aws-java-sdk-core, which shows that maven chooses the right version.
[INFO] |  \- com.amazonaws:aws-java-sdk-cloudfront:jar:1.11.32:compile
[INFO] |     \- (com.amazonaws:aws-java-sdk-core:jar:1.11.32:compile - omitted for conflict with 1.11.98)

- We can also get which version maven uses and why in eclipse: 
open pom,xml then go to dependency hierarchy tab , select the library in right panel.

I tried to run it as a java application, it works fine. But why it failed in his environment?

I compared the difference between his eclipse setup and mine, and found out that he manually added common-module in admin's Java Build Path -> Projects tab.

Now it's kind of clear why it failed: when we add a project as dependency, Eclipse also includes all libraries it depends on to the project. So now the project includes both versions, and Eclipse chooses the wrong version to use.

I created one bug Bug 514094 - Adding another Project as Dependency Causes Unexpected Exception to track it.

Troubleshooting - JsonMappingException: Already had POJO for id

The Problem
We have two entities with one-to-many relationships which references each other, but it failed with the exception:
com.fasterxml.jackson.databind.JsonMappingException: Already had POJO for id

The Fix
To easily troubleshoot the issue, I created a sample class like below:
@Data
@Accessors(chain = true)
@EqualsAndHashCode(of = {"employeeId"}, callSuper = false)
@JsonIdentityInfo(generator = ObjectIdGenerators.PropertyGenerator.class, property = "employeeId")
public static class Employee {
    private UUID employeeId;
    private String name;
    private Department department;
}
@Data
@Accessors(chain = true)
@EqualsAndHashCode(of = {"departmentId"}, callSuper = false)
@JsonIdentityInfo(generator = ObjectIdGenerators.PropertyGenerator.class, property = "departmentId")
@ToString(exclude = "employees")
private static class Department {
    private UUID departmentId;
    private String name;
    private Set<Employee> employees;

    public Set<Employee> getEmployees() {
        if (employees == null) {
            employees = new HashSet<>();
        }
        return employees;
    }
}


public static void main(String[] args) throws IOException {
    Employee e1 = new Employee().setEmployeeId(UUID.randomUUID()).setName("e1");
    Department d1 =
            new Department().setDepartmentId(UUID.randomUUID()).setName("oldD1").setEmployees(Sets.newHashSet(e1));
    e1.setDepartment(d1);
    ObjectMapper objectMapper = new ObjectMapper();

    String departmentStr = objectMapper.writeValueAsString(d1);
    objectMapper.writeValueAsString(e1);

    Department oldD1 = objectMapper.readValue(departmentStr, Department.class);

    Department newD1 = new Department().setDepartmentId(d1.getDepartmentId()).setName("newD1");
    newD1.getEmployees().addAll(oldD1.getEmployees());
    // without the following statements: it will throw
    // com.fasterxml.jackson.databind.JsonMappingException: Already had POJO for id
    // for (Employee e : oldD1.getEmployees()) {
    // e.setDepartment(newD1);
    // }

    departmentStr = objectMapper.writeValueAsString(newD1);
    System.out.println("new department: " + departmentStr);
    // now read it back will throw sonMappingException: Already had POJO for id
    Department newNewD1 = objectMapper.readValue(departmentStr, Department.class);
    System.out.println("---" + newNewD1);
}
This reproduces the issue, and from the output:
new department: {"departmentId":"e3e0e676-0c52-493d-8f49-bedde05cbb11","name":"newD1","employees":[{"employeeId":"6b7bbbec-8be6-4423-a4ef-af7924df177b","name":"e1","department":{"departmentId":"e3e0e676-0c52-493d-8f49-bedde05cbb11","name":"oldD1","employees":["6b7bbbec-8be6-4423-a4ef-af7924df177b"]}}]}
I found that after I changed the department to newD1, the employee still refers to old department object with department name: oldD1.

This leads to my fix like below: after I made change to the department object, make sure the employees refers to the new department object.

// without the following statements: it will throw
// com.fasterxml.jackson.databind.JsonMappingException: Already had POJO for id
for (Employee e : oldD1.getEmployees()) {
  e.setDepartment(newD1);
}

Miscs
We need exclude employees from Department's toString: @ToString(exclude = "employees")
- Otherwise it would throw java.lang.StackOverflowError
Likewise, we need exclude employees from @EqualsAndHashCode.

Support Spring Expression Language in Spring AOP

User Case
We want to create @Loggable so developers can use it to specify log level and what to log before or after the method being called. developers can use #p0, #p1 to log param values, use #result to log response or specify any valid spring expression.

Learning How to Do it from Spring Code
We know that spring cache annotations supports spring expression language, we can use #p0, #p1 in @Cacheable, use #result in @CachePut. So we can debug spring cache code to figure it how it works.

Relate code in Spring CacheAspectSupport.execute
CacheAspectSupport.generateKey(CacheOperationContext, Object)

CacheOperationExpressionEvaluator.createEvaluationContext
if (result != NO_RESULT) evaluationContext.setVariable(RESULT_VARIABLE, result);
- To support #result, we just put method return result value into the varaible: result.

The Implementation
@Retention(RetentionPolicy.RUNTIME)
@Target(ElementType.METHOD)
public @interface Loggable {
    Level level() default Level.INFO;
    /**
     * This value has to be a valid spring expression<br>
     * Example: "'-begin'" or use #p0, #p1 to refer method params.
     * 
     * @return
     */
    String beginMessage() default "";
    /**
     * This value has to be a valid spring expression<br>
     * Example: "#result", "'-end'"
     * 
     * @return
     */
    String endMessage() default "";
}
@Aspect
@Component
public class LoggableAspect {
    @Value("${throw.exception.if.invalid.expression:true}")
    private boolean throwExceptionIfInvalidExpression;
    public static final String RESULT_VARIABLE = "result";
    /**
     * It is recommended to reuse ParameterNameDiscoverer instances as far as possible.
     */
    private static final ParameterNameDiscoverer parameterNameDiscoverer =
            new LocalVariableTableParameterNameDiscoverer();
    /**
     * SpEL parser. Instances are reusable and thread-safe.
     */
    private static final ExpressionParser parser = new SpelExpressionParser();

    @Around("@annotation(com.sony.sie.kamaji.vue.metadata.aop.Loggable)")
    public Object logExecutionTime(ProceedingJoinPoint joinPoint) throws Throwable {
        Method method = ((MethodSignature) joinPoint.getSignature()).getMethod();
        final Class<?> declaringClass = method.getDeclaringClass();
        final Logger logger = LoggerFactory.getLogger(declaringClass);
        final Loggable loggable = method.getAnnotation(Loggable.class);
        Object result = null;
        if (loggable != null) {
            final Level logLevel = loggable.level();
            if (whetherParseExpression(loggable.beginMessage(), logger, logLevel)) {
                final EvaluationContext beginContext = new MethodBasedEvaluationContext(invocation.getThis(), method,
                        invocation.getArguments(), parameterNameDiscoverer);
                parseExpressionValue(logger, logLevel, loggable.beginMessage(), beginContext);
            }
            try {
                result = invocation.proceed();
                if (whetherParseExpression(loggable.endMessage(), logger, logLevel)) {
                    final EvaluationContext context = new MethodBasedEvaluationContext(invocation.getThis(), method,
                            invocation.getArguments(), parameterNameDiscoverer);
                    context.setVariable(RESULT_VARIABLE, result);
                    parseExpressionValue(logger, logLevel, loggable.endMessage(), context);
                }
                return result;
            } catch (final RuntimeException e) {
                logValues(logger, "Failed with exception: " + e.getMessage(), logLevel);
                throw e;
            }
        }
        return invocation.proceed();
    }

    private void parseExpressionValue(final Logger logger, final Level logLevel, final String expression,
            final EvaluationContext context) {
        if (StringUtils.isBlank(expression)) {
            return;
        }
        Object value = null;
        try {
            value = parser.parseExpression(expression).getValue(context);
        } catch (final Exception e) {
            if (throwExceptionIfInvalidExpression) {
                throw new VMSBusinessException(ErrorCode.internal_error, e,
                        "Failed to parse expression: " + expression);
            }
        }
        logValues(logger, value != null ? value.toString() : expression, logLevel);
    }

    /**
     * If the log level is not enabled, no need to do anything at all.
     */
    private boolean whetherParseExpression(String expression, final Logger logger, final Level logLevel) {
        if (StringUtils.isBlank(expression)) {
            return false;
        }
        switch (logLevel) {
            case INFO:
                return logger.isInfoEnabled();
            case ERROR:
                return logger.isErrorEnabled();
            case WARN:
                return logger.isWarnEnabled();
            case DEBUG:
                return logger.isDebugEnabled();
            case TRACE:
                return logger.isTraceEnabled();
            default:
                return false;
        }
    }
}

Cassandra in Theory and Practice

Not using the “in” query for multiple partitions
- Query them one by one instead

Primary key vs partition key
The first part of primary key is partition key which determines which node stores the data.
Composite/compound keys
skinny rows
- the primary key only contains the partition key
wide rows

- the primary key contains columns other than the partition key

primary key restrictions
- it must contain all the primary key columns of the base table. This ensures that every row of the view correspond to exactly one row of the base table.
- it can only contain a single column that is not a primary key column in the base table.

Materialized view
- implemented as normal Cassandra table which takes as the same amount of disk space as the base table

Table design
- Determine what queries to support, use different tables(or Materialized view) for different queries if needed
- Avoid hot spot and unbounded row growth
- Spreads data evenly
- Minimal partitions read
DESCending for time to search for recent, time-based data


We can only run EQ or IN in partition key.

How deletes are implemented and why
Delete and tombstones
- grace period
Understanding Deletes
A row tombstone is a row with no liveness_info and no cells.
A cell tombstone: no liveness_info at the column level
Range delete
Partition delete


Local Index
Secondary index is slow, requires to access all nodes
- only suited for low cardinality data

SASI - SStable-Attached Secondary Indexing
- a new on-disk format based on B+ trees
- it attaches to each sstable/memtable its own immutable index file

memtable
- SSTable in memory
- write-back cache

off-heap memory
- Same concept for Cassandra, Kafka

Cache
- serialize cache data (row-cache, key cache) to avoid cold restart

clqsh
DESCRIBE keyspaces;
describe tables;

COPY keyspace.table to 'output.txt';
COPY keyspace.table(column1,c2) to 'output.txt';

Write query result to file
cqlsh -e'cqlQuery' > output.txt

Use CAPTURE command to export the query result to a file:
cqlsh> CAPTURE
cqlsh> CAPTURE '~/output.txt';

File Store Format
Data (Data.db)
Primary Index (Index.db)
SSTable Index Summary (SUMMARY.db)
Bloom filter (Filter.db)
Compression Information (CompressionInfo.db)
Statistics (Statistics.db)
SSTable Table of Contents (TOC.txt)

Secondary Index (SI_.*.db)

Iterator vs Iterable - Don't use Iterator as cache value

The Problem
Can you figure it out what's the issue in the following code?
@Cacheable(key = "#appName", sync = true)
public Iterator<Message> findActiveMessages(final String appName) {}

Iterator vs Iterable
In most case we can use either Iterable or Iterator, but there is one key difference: 
- if we need traverse and get the value multiple times, then we can't use Iterator.

As once you loop all the data in the iterator, the iterator points to the last position +1 and is not usable anymore. Now if we can iterator.haneNext(), it would be false; if we try to loop it again, we will get empty data.

Unless you define it as ListIterator and manually move back one by one till reach the beginning, which is inefficient and we don't usually code that way.

Takeaway
Don't use Iterator when need traverse multiple times
Don't use Iterator as cache value
Don't store Iterator in collection.

Eclipse - Run Code Clean Up Manually + Save Action

The Scenario
Eclipse shows compiler warnings in Problems view.
Some may be trivial such as unused import, but some may be more serious such as null access.

But if we don't fix trivial issues, there may be too many warnings in the project; This may cause us just ignore all these warnings, which can lead us ignore vital/important warnings and potential bugs.

So usually I don't like see any compile warning in current editor or the whole project - We can find this in Problems view.

How Eclipse can Help
First we can configure compiler at Preferences -> Java -> Compiler -> Errors/Warnings.

Save Action
We can configure Eclipse "Save Action" at Java -> Editor -> Save Action to auto format code, organize imports and a lot of things.
- We can also configure save action for Javascript and Scala or other langs.

But sometimes, when we only modify a few lines of the file, we don't want to change other parts otherwise when others review the change, it's difficult for them to figure out what changed.

So usually I only configure "Save Action" to format edited lines and organize imports.

We can also configure General -> Editors -> AutoSave to save dirty editors every X seconds.

Run Code Cleanup Manually
First we assign a shortcut key such as Ctrl+Alt+Command+C in Preferences(Command+,) -> Key
- We can also configure this for Javascript.

Then we configure what Code Clean Up does at Preferences -> Java -> Code Style -> Clean Up

It can do things(more than 20) such as format code, organize imports, add @Override, final,  serial ID, add unimplemented methods, remove trailing space, correct indentation and much more.

If I change most part of the current file, or I think it's necessary, I will click Ctrl+Alt+Command+C to tun Code Clean Up manually.

Caching Data in Spring Using Redis

The Scenario
We would like to cache Cassandra data to Redis for better read performance.

Cache Configuration
To make data in Redis more readable and easy for troubleshooting and debugging, we use GenericJackson2JsonRedisSerializer to serialize value as Json data in Redis, use StringRedisSerializer to serialize key.

To make GenericJackson2JsonRedisSerializer work, we also configure objectMapper to store type info: objectMapper.enableDefaultTyping(ObjectMapper.DefaultTyping.NON_FINAL, JsonTypeInfo.As.PROPERTY); 
- as we need store class info. 
- The data would be like: {\"@class\":\"com....Configuration\",\"name\":\"configA\",\"value\":\"805\"}

We also configure objectMapper: configure(SerializationFeature.FAIL_ON_EMPTY_BEANS, false).
- As when there Cassandra, we want to cache this info to redis. So later we can return from redis cache, and no need to read from database again.
- Spring-cache stores org.springframework.cache.support.NullValue in this case, its json data is {}. We need configure ObjectMapper to return empty object with no properties. - By default it throws exception:
org.springframework.data.redis.serializer.SerializationException: Could not write JSON: No serializer found for class org.springframework.cache.support.NullValue and no properties discovered to create BeanSerializer (to avoid exception, disable SerializationFeature.FAIL_ON_EMPTY_BEANS); nested exception is com.fasterxml.jackson.databind.JsonMappingException: No serializer found for class org.springframework.cache.support.NullValue and no properties discovered to create BeanSerializer (to avoid exception, disable SerializationFeature.FAIL_ON_EMPTY_BEANS) 

We configured cacheManager to store null value.
- We use the configuration-driven approach and have a lot of configurations; We define default configuration values in property files. In the code, it first read from db and if null then read from property files. This makes us want to cache null value.

We also use SpEL to set different TTL for different cache.
redis.expires={configData:XSeconds, userSession: YSeconds}

@Configuration
@EnableCaching
public class RedisCacheConfig extends CachingConfigurerSupport {
    @Value("${redis.hostname}")
    String redisHostname;
    @Value("${redis.port}")
    int redisPort;
    @Value("#{${redis.expires}}")
    private Map<String, Long> expires;
    @Bean
    public JedisConnectionFactory redisConnectionFactory() {
        final JedisConnectionFactory redisConnectionFactory = new JedisConnectionFactory();
        redisConnectionFactory.setHostName(redisHostname);
        redisConnectionFactory.setPort(redisPort);
        redisConnectionFactory.setUsePool(true);
        return redisConnectionFactory;
    }
    @Bean("redisTemplate")
    public RedisTemplate<String, Object> genricJacksonRedisTemplate(final JedisConnectionFactory cf) {
        final RedisTemplate<String, Object> redisTemplate = new RedisTemplate<>();
        redisTemplate.setKeySerializer(new StringRedisSerializer());
        redisTemplate.setHashKeySerializer(new StringRedisSerializer());
        redisTemplate.setValueSerializer(new GenericJackson2JsonRedisSerializer(createRedisObjectmapper()));
        redisTemplate.setHashValueSerializer(new GenericJackson2JsonRedisSerializer(objectMapper));
        redisTemplate.setConnectionFactory(cf);
        return redisTemplate;
    }
    @Bean
    public CacheManager cacheManager(final RedisTemplate<String, Object> redisTemplate) {
        final RedisCacheManager cacheManager =
                new RedisCacheManager(redisTemplate, Collections.<String>emptyList(), true);
        cacheManager.setDefaultExpiration(86400);
        cacheManager.setExpires(expires);
        cacheManager.setLoadRemoteCachesOnStartup(true);
        return cacheManager;
    }

    public static ObjectMapper createRedisObjectmapper() {
        final SimpleDateFormat sdf = new SimpleDateFormat(DEFAULT_DATE_FORMAT, Locale.ROOT);
        sdf.setTimeZone(TimeZone.getTimeZone("UTC"));
        final SimpleModule dateModule = (new SimpleModule()).addDeserializer(Date.class, new JsonDateDeserializer());
        return new ObjectMapper()
                .enableDefaultTyping(ObjectMapper.DefaultTyping.NON_FINAL,JsonTypeInfo.As.PROPERTY)//\\
                .registerModule(dateModule).setDateFormat(sdf)
                .configure(DeserializationFeature.ACCEPT_SINGLE_VALUE_AS_ARRAY, true)
                .configure(DeserializationFeature.UNWRAP_SINGLE_VALUE_ARRAYS, true)
                .configure(DeserializationFeature.FAIL_ON_IGNORED_PROPERTIES, false)
                .configure(DeserializationFeature.FAIL_ON_INVALID_SUBTYPE, false)
                .configure(DeserializationFeature.FAIL_ON_NULL_FOR_PRIMITIVES, false)
                .configure(DeserializationFeature.FAIL_ON_NUMBERS_FOR_ENUMS, false)
                .configure(DeserializationFeature.FAIL_ON_READING_DUP_TREE_KEY, false)
                .configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false)
                .configure(DeserializationFeature.FAIL_ON_UNRESOLVED_OBJECT_IDS, false)
                .configure(SerializationFeature.FAIL_ON_EMPTY_BEANS, false) //\\
                .setSerializationInclusion(JsonInclude.Include.NON_NULL)
                .setVisibility(PropertyAccessor.ALL, JsonAutoDetect.Visibility.ANY);
    }
}
Cache CassandraRepository
@Repository
@CacheConfig(cacheNames = Util.CACHE_CONFIG)
public interface ConfigurationDao extends CassandraRepository<Configuration> {
    @Query("Select * from configuration where name=?0")
    @Cacheable
    Configuration findByName(String name);

    @Query("Delete from configuration where name=?0")
    @CacheEvict
    void delete(String name);

    @Override
    @CacheEvict(key = "#p0.name")
    void delete(Configuration config);

    /*
     * Check https://docs.spring.io/spring/docs/current/spring-framework-reference/html/cache.html
     * about what #p0 means
     */
    @Override
    @SuppressWarnings("unchecked")
    @CachePut(key = "#p0.name")
    Configuration save(Configuration config);

    /*
     * This API doesn't work very well with cache - as spring cache doesn't support put or evict
     * multiple keys. Call save(Configuration config) in a loop instead.
     */
    @Override
    @CacheEvict(allEntries = true)
    @Deprecated
    <S extends Configuration> Iterable<S> save(Iterable<S> configs);

    /*
     * This API doesn't work very well with cache - as spring cache doesn't support put or evict
     * multiple keys. Call delete(Configuration config) in a loop instead.
     */
    @Override
    @CacheEvict(allEntries = true)
    @Deprecated
    void delete(Iterable<? extends Configuration> configs);
}
Admin API to Manage Cache 
We inject CacheManager to add or evict data from Redis.
But to scan all keys in a cache(like: cofig), I need to use stringRedisTemplate.opsForZSet() to get keys of the cache:
- as the value in the cache (config), its value is a list of string keys. So here I need use StringRedisTemplate to read it.

After get the keys, I use redisTemplate.opsForValue().multiGet to get their values.

- I will update this post if I find some better ways to do this. 

public class CacheResource {
    private static final String REDIS_CACHE_SUFFIX_KEYS = "~keys";
    @Autowired
    @Qualifier("redisTemplate")
    RedisTemplate<String, Object> redisTemplate;

    @Autowired
    @Qualifier("stringRedisTemplate")
    StringRedisTemplate stringRedisTemplate;

    @Autowired
    private CacheManager cacheManager;

    /**
     * If sessionId is not null, return its associated user info.<br>
     * It also returns other cached data: they are small data.
     *
     * @return
     */
    @GetMapping(produces = MediaType.APPLICATION_JSON_VALUE, path = "/cache")
    public Map<String, Object> get(@RequestParam("sessionIds") final String sessionIds,
            @RequestParam(name = "getConfig", defaultValue = "false") final boolean getConfig) {
        final Map<String, Object> resultMap = new HashMap<>();
        if (getConfig) {
            final Set<String> configKeys =
                    stringRedisTemplate.opsForZSet().range(Util.CACHE_CONFIG_DAO + REDIS_CACHE_SUFFIX_KEYS, 0, -1);
            final List<Object> objects = redisTemplate.opsForValue().multiGet(configKeys);
            resultMap.put(Util.CACHE_CONFIG + REDIS_CACHE_SUFFIX_KEYS, objects);
        }
        if (StringUtils.isNotBlank(sessionIds)) {
            final Map<String, Object> sessionIdToUsers = new HashMap<>();
            final Long totalUserCount = stringRedisTemplate.opsForZSet().size(Util.CACHE_USER + REDIS_CACHE_SUFFIX_KEYS);
            sessionIdToUsers.put("totalUserCount", totalUserCount);
            final ArrayList<String> sessionIdList = Lists.newArrayList(Util.COMMA_SPLITTER.split(sessionIds));
            final List<Object> sessionIDValues = redisTemplate.opsForValue().multiGet(sessionIdList);
            for (int i = 0; i < sessionIdList.size(); i++) {
                sessionIdToUsers.put(sessionIdList.get(i), sessionIDValues.get(i));
            }
            resultMap.put(Util.CACHE_USER + REDIS_CACHE_SUFFIX_KEYS, sessionIdToUsers);
        }
        return resultMap;
    }

    @DeleteMapping("/cache")
    public void clear(@RequestParam("removeSessionIds") final String removeSessionIds,
            @RequestParam(name = "clearSessions", defaultValue = "false") final boolean clearSessions,
            @RequestParam(name = "clearConfig", defaultValue = "false") final boolean clearConfig) {
        if (clearConfig) {
            final Cache configCache = getConfigCache();
            configCache.clear();
        }
        final Cache userCache = getUserCache();
        if (clearSessions) {
            userCache.clear();
        } else if (StringUtils.isNotBlank(removeSessionIds)) {
            final ArrayList<String> sessionIdList = Lists.newArrayList(Util.COMMA_SPLITTER.split(removeSessionIds));
            for (final String sessionId : sessionIdList) {
                userCache.evict(sessionId);
            }
        }
    }

    /**
     * Only handle client() data - as other caches such as configuration we can use server side api
     * to update them
     */
    @PutMapping("/cache")
    public void addOrupdate(...) {
        if (newUserSessions == null) {
            return;
        }
        final Cache userCache = getUserCache();
        // userCache.put to add key, value
    }

    private Cache getConfigCache() {
        return cacheManager.getCache(Util.CACHE_CONFIG_DAO);
    }

    private Cache getUserCache() {
        return cacheManager.getCache(Util.CACHE_USER);
    }
}
StringRedisTemplate
@Bean("stringRedisTemplate")
public StringRedisTemplate stringRedisTemplate(final JedisConnectionFactory cf, final ObjectMapper objectMapper) {
    final StringRedisTemplate redisTemplate = new StringRedisTemplate();
    redisTemplate.setConnectionFactory(cf);
    return redisTemplate;
}

Misc
- If you want to disable cache in some env, use NoOpCacheManager.
- When debug, check code:
CacheAspectSupport.execute
SpringCacheAnnotationParser.parseCacheAnnotations

Spring @Cacheable Not Working - How to Troubleshoot and Solve it

The Scenario
Spring cache abstraction annotation is easy to add cache abilty to the application: Just define CacheManager in confutation,  then use annotations: @Cacheable, @CachePut, @CacheEvict to use and maintain the cache.

But what to do if the cache annotation seems doesn't work?

How we know cache doesn't work?
Logging
We can change the log level to print database query in the log. For Cassandra, we can change log level of com.datastax.driver.core.RequestHandler to TRACE. 
Debug
We can set a breakpoint at CacheAspectSupport.execute.
If cache works, when we call a method annotated with cache annotation,  it will not directly call  the method, instead it will be intercepted and hit the breakpoint.
Test

Possible Root Causes
1. The class using cache annotation inited too early
This usually happens when we use @Cache annotated classes in configuration or AOP class.

Spring first creates configuration and AOP class which then cause beans of @Cache annotated classes created before cache config is correctly setup. This makes these beans created without handling @Cache.

Please check Bean X of type Y is not eligible for getting processed by all BeanPostProcessors for detailed explanation

How to troubleshoot
Add a breakpoint at the default constructor of the bean (add one if not exist), then from the stack trace we can figure out why and which bean (or configuration class) causes this bean to be created.

Using @Lazy or ObjectFactory or other approaches to break the eager dependency, restart and check again util the cached method works as expected.

Also check whether there is log like: Bean of type is not eligible for getting processed by all BeanPostProcessors (for example: not eligible for auto-proxying)

2. Calling cached method in same class
Solutions:
- Using self inject or applicationContext.getBean then use the bean to call cached method
- Using @Scope(proxyMode = ScopedProxyMode.TARGET_CLASS)

Script to Setup SolrCloud Environment

Senario
Here is the script how to create SolrCloud environment in local dev setup: a collection with numShards=2&replicationFactor=3 on 3 separate (local) nodes.

vms_solr_init creates folders: example/cloud/node{1,2,3}/solr, copy solr.xml and zoo.cfg to these folders, starts the server and creates the collection using admin collection apis.

Other scripts which start/stop are easier to implement.
The implementation
function solr_init()
{
  cd $SOLR_HOME
  mkdir -p $SOLR_NODE1_REL_HOME
  mkdir -p $SOLR_NODE2_REL_HOME
  mkdir -p $SOLR_NODE3_REL_HOME

  cp $SOLR_HOME/server/solr/solr.xml $SOLR_HOME/server/solr/zoo.cfg $SOLR_NODE1_REL_HOME
  cp $SOLR_HOME/server/solr/solr.xml $SOLR_HOME/server/solr/zoo.cfg $SOLR_NODE2_REL_HOME
  cp $SOLR_HOME/server/solr/solr.xml $SOLR_HOME/server/solr/zoo.cfg $SOLR_NODE3_REL_HOME

  solr_start
  data_solr_create
}

function solr_start() {
  if [[ `solr_pid` ]]
  then
    echo "solr is already running...";
  else
    echo "Starting solr-cloud... $SOLR_NODE1_PORT, $SOLR_NODE2_PORT, $SOLR_NODE3_PORT";
    $SOLR_HOME/bin/solr start -cloud -Dsolr.ltr.enabled=true -s "$SOLR_NODE1_REL_HOME" -p $SOLR_NODE1_PORT -h $SOLR_HOSTNAME;
    $SOLR_HOME/bin/solr start -cloud -Dsolr.ltr.enabled=true -s "$SOLR_NODE2_REL_HOME" -p $SOLR_NODE2_PORT -z $SOLR_ZKHOST -h $SOLR_HOSTNAME;
    $SOLR_HOME/bin/solr start -cloud -Dsolr.ltr.enabled=true -s "$SOLR_NODE3_REL_HOME" -p $SOLR_NODE3_PORT -z $SOLR_ZKHOST -h $SOLR_HOSTNAME;
  fi
}

function solr_stop() {
  echo "Stopping solr-cloud...";
  $SOLR_HOME/bin/solr stop -all;
}

function solr_restart() {
  echo "Restarting solr-cloud...";
  solr_stop && solr_start
}

function solr_pid() {
  pgrep -f "solr-6.4.0/server";
}

function data_solr_create() {
  # Go to the solr config directory
  currdir=`pwd`;
  cd "$WS/resource/solr";

  # Retrieve list of collections
  collections_list=`curl -s -v -X GET  -H 'Content-type:application/json' "$SOLR_NODE1_PORT/admin/collections?action=LIST&wt=json" | jq '.collections | join(" ")' `;

  # Create/update schema
  mv solrconfig solrconfig.old.`datetimestamp`;
  unzip -d solrconfig solr-core-config.zip;

  # create myCollection
  cp myCollection_solrconfig.xml solrconfig/conf/solrconfig.xml;
  cp myCollection_schema.xml solrconfig/conf/schema.xml;
  $SOLR_HOME/server/scripts/cloud-scripts/zkcli.sh -zkhost "$SOLR_ZKHOST" -cmd upconfig -confname myCollection -confdir "solrconfig/conf/";
  if grep -q "myCollection" <<< $collections_list; then
    curl -s -v -X GET "$SOLR_NODE1_ENDPOINT/admin/collections?action=RELOAD&name=myCollection";
    echo "Updated myCollection";
  else
    curl -s -v -X GET "$SOLR_NODE1_ENDPOINT/admin/collections?action=CREATE&name=myCollection&numShards=2&collection.configName=myCollection&replicationFactor=3&maxShardsPerNode=2";
    echo "Created myCollection";
  fi

  rm -rf solrconfig;
  cd $currdir;
}

Java APIs to Get and Update Solr Configuration Files

User Case
SolrCloud stores its configuration files (for example: elevate.xml) in Zookeeper. Usually we need APIs that clients(for example UI) can call to get these configuration files or update.

Related: Build Web Service APIs to Update Solr's Managed Resources (stop words, synonyms)

The Implementation
We use SolrJ's SolrZkClient APIs to get data, make znode, and set znode's data.

public String getConfigData(final String filePath) {
    final ZkStateReader zkStateReader = getZKReader(getSolrClient());
    final String path = normalizeConfigPath(filePath);
    final SolrZkClient zkClient = zkStateReader.getZkClient();
    try {
        return new String(zkClient.getData(path, null, null, true));
    } catch (KeeperException | InterruptedException e) {
        throw new BusinessException(ErrorCode.data_access_error, e, "Failed to get " + path);
    }
}
public void setConfigData(final String filePath, final String data, final boolean createPath,
        final boolean reloadCollection) {
    Validate.notNull(filePath);
    Validate.notNull(data);
    final ZkStateReader zkStateReader = getZKReader(getSolrClient());
    final String path = normalizeConfigPath(filePath);
    final SolrZkClient zkClient = zkStateReader.getZkClient();
    try {
        if (createPath) {
            zkClient.makePath(path, false, true);
        }
        zkClient.setData(path, data.getBytes(), true);

        if (reloadCollection) {
            reloadCollection();
        }
    } catch (KeeperException | InterruptedException e) {
        throw new BusinessException(ErrorCode.data_access_error, e, "Failed to get " + path);
    }
}

public void reloadCollection() {
    try {
        final CollectionAdminRequest.Reload reload = new CollectionAdminRequest.Reload();
        reload.setCollectionName(getSolrClient().getDefaultCollection());
        final CollectionAdminResponse response = reload.process(getSolrClient());
        logger.info(MessageFormat.format("reload collection: {0} rsp: {1}", getSolrClient().getDefaultCollection(),
                response));
        final int status = response.getStatus();
        if (status != 0) {
            throw new BusinessException(ErrorCode.data_access_error,
                    "Failed to reload collection, status: " + status);
        }
    } catch (SolrServerException | IOException e) {
        throw new BusinessException(ErrorCode.data_access_error,
                "Failed to reload collection: " + getSolrClient().getDefaultCollection());
    }
}

public static ZkStateReader getZKReader(final CloudSolrClient solrClient) {
    final ZkStateReader zkReader = solrClient.getZkStateReader();
    if (zkReader == null) {
        // This only happens when we first time call solrClient to do anything
        // Usually we will call solrClient to do something during abolition starts: such as
        // healthCheck, so in most cases, its already connected.
        solrClient.connect();
    }
    return solrClient.getZkStateReader();
}
/**
 * @param filePath
 * @return add prefix to make elevate.xml - /configs/myCollection/elevate.xml
 */
private String normalizeConfigPath(final String filePath) {
    return ZkStateReader.CONFIGS_ZKNODE + "/" + getSolrClient().getDefaultCollection() + "/" + filePath;
}

Resources
Build Web Service APIs to Update Solr's Managed Resources (stop words, synonyms)

Java APIs to Build Solr Suggester and Get Suggestion

User Case
Usually we provide Rest APIs to manage Solr, same for suggestor.
This article focuses on how to programmatically build Solr suggester and get suggestions using java code.

The implementation
Please check the end of the article for Solr configuration files.

Build Suggester
In Solr, after we add docs to Solr, we call suggest?suggest.build=true to build the suggestor to make them available for autocompletion.

The only trick here is the suggest.build request doesn't build suggester for all cores in the collection, BUT only builds suggester to the core that receives the request.

We need get all replicas urls of the collection, add them into shards parameter, and also add shards.qt=/suggest:
shards=127.0.0.1:4567/solr/myCollection_shard1_replica3,127.0.0.1:4565/solr/myCollection_shard1_replica2,127.0.0.1:4566/solr/myCollection_shard1_replica1,127.0.0.1:4567/solr/myCollection_shard2_replica3,127.0.0.1:4566/solr/myCollection_shard2_replica1/,127.0.0.1:4565/solr/myCollection_shard2_replica2&shards.qt=/suggest

public void buildSuggester() {
    final SolrQuery solrQuery = new SolrQuery();
    final List<String> urls = getAllSolrCoreUrls(getSolrClient());

    solrQuery.setRequestHandler("/suggest").setParam("suggest.build", "true")
            .setParam(ShardParams.SHARDS, COMMA_JOINER.join(urls))
            .setParam(ShardParams.SHARDS_QT, "/suggest");
    try {
        final QueryResponse queryResponse = getSolrClient().query(solrQuery);
        final int status = queryResponse.getStatus();
        if (status >= 300) {
            throw new BusinessException(ErrorCode.data_access_error,
                    MessageFormat.format("Failed to build suggestions: status: {0}", status));
        }
    } catch (SolrServerException | IOException e) {
        throw new BusinessException(ErrorCode.data_access_error, e, "Failed to build suggestions");
    }
}
public static List<String> getAllSolrCoreUrls(final CloudSolrClient solrClient) {
    final ZkStateReader zkReader = getZKReader(solrClient);
    final ClusterState clusterState = zkReader.getClusterState();

    final Collection<Slice> slices = clusterState.getSlices(solrClient.getDefaultCollection());
    if (slices.isEmpty()) {
        throw new BusinessException(ErrorCode.data_access_error, "No slices");
    }
    return slices.stream().map(slice -> slice.getReplicas()).flatMap(replicas -> replicas.stream())
            .map(replica -> replica.getCoreUrl()).collect(Collectors.toList());
}

private static ZkStateReader getZKReader(final CloudSolrClient solrClient) {
    final ZkStateReader zkReader = solrClient.getZkStateReader();
    if (zkReader == null) {
        // This only happens when we first time call solrClient to do anything
        // Usually we will call solrClient to do something during abolition starts: such as
        // healthCheck, so in most cases, its already connected.
        solrClient.connect();
    }
    return solrClient.getZkStateReader();
}

Get Suggestions


public Set<SearchSuggestion> getSuggestions(final String prefix, final int limit) {
   final Set<SearchSuggestion> result = new LinkedHashSet<>(limit);
   try {
       final SolrQuery solrQuery = new SolrQuery().setRequestHandler("/suggest").setParam("suggest.q", prefix)
               .setParam("suggest.count", String.valueOf(limit)).setParam(CommonParams.TIME_ALLOWED,
                       mergedConfig.getConfigByNameAsString("search.suggestions.time_allowed.millSeconds"));
       // context filters
       solrQuery.setParam("suggest.cfq", getContextFilters());
       final QueryResponse queryResponse = getSolrClient().query(solrQuery);
       if (queryResponse != null) {
           final SuggesterResponse suggesterResponse = queryResponse.getSuggesterResponse();
           final Map<String, List<Suggestion>> map = suggesterResponse.getSuggestions();
           final List<Suggestion> infixSuggesters = map.get("infixSuggester");
           if (infixSuggesters != null) {
               for (final Suggestion suggester : infixSuggesters) {
                   if (result.size() < limit) {
                       result.add(new SearchSuggestion().setText(suggester.getTerm())
                               .setHighlightedText(replaceTagB(suggester.getTerm())));
                   } else {
                       break;
                   }
               }
           }
       }
       logger.info(
               MessageFormat.format("User: {0}, query: {1}, limit: {2}, result: {3}", user, query, limit, result));
       return result;
   } catch (final Exception e) {
       throw new BusinessException(ErrorCode.data_access_error, e, "Failed to get suggestions for " + query);
   }
}
private static final Pattern TAGB_PATTERN = Pattern.compile("<b>|</b>");
public static String replaceTagB(String input)
{
    return TAGB_PATTERN.matcher(input).replaceAll("");
}

Schema.xml
We define textSuggest and suggesterContextField, copy fields which are shown in the autocompletion to textSuggest field, and copy filter fields such as zipCodes, genres to suggesterContextField.

Solr suggester supports filters on multiple fields, all we just need copy all these filter fields to suggesterContextField.


<field name="suggester" type="textSuggest" indexed="true"
  stored="true" multiValued="true" />
<field name="suggesterContextField" type="string" indexed="true" stored="true"
  multiValued="true" />

<copyField source="seriesTitle" dest="suggester" />
<copyField source="programTitle" dest="suggester" />

<copyField source="zipCodes" dest="suggesterContextField" />
<copyField source="genres" dest="suggesterContextField" />
SolrConfig.xml
We can add multiple suggester implementations to searchComponent. Another very useful is FileDictionaryFactory which allows us to using an external file that contains suggest entries. We may use it in future.


<searchComponent name="suggest" class="solr.SuggestComponent">
  <lst name="suggester">
    <str name="name">infixSuggester</str>
    <str name="lookupImpl">BlendedInfixLookupFactory</str>
    <str name="dictionaryImpl">DocumentDictionaryFactory</str>
    <str name="blenderType">position_linear</str>
    <str name="field">suggester</str>
    <str name="contextField">suggesterContextField</str>
    <str name="minPrefixChars">4</str>
    <str name="suggestAnalyzerFieldType">textSuggest</str>
    <str name="indexPath">infix_suggestions</str>
    <str name="highlight">true</str>
    <str name="buildOnStartup">false</str>
    <str name="buildOnCommit">false</str>
  </lst>
</searchComponent>

<requestHandler name="/suggest" class="solr.SearchHandler"
  >
  <lst name="defaults">
    <str name="suggest">true</str>
    <str name="suggest.dictionary">infixSuggester</str>
    <str name="suggest.onlyMorePopular">true</str>
    <str name="suggest.count">10</str>
    <str name="suggest.collate">true</str>
  </lst>
  <arr name="components">
    <str>suggest</str>
  </arr>
</requestHandler>

Resources
Solr Suggester

Using YAML Configuration Files in Spring Boot

The Scenario
Sometimes, some developers like to use Yaml property files in Spring-boot. 

This tutorial tells how: how to load properties for correct profile and  make their properties avaliable in Environment. So wen can use appContext.getEnvironment().getProperty to get property value in static or non-spring-managed context.

The implementation
EnvironmentAwarePropertySourcesPlaceholderConfigurer
First we will create EnvironmentAwarePropertySourcesPlaceholderConfigurer: we can use its addYamlPropertySource to add yaml file which will load properties defined for de the active profile and default properties.

/**
 * From http://jdpgrailsdev.github.io/blog/2014/12/30/groovy_script_spring_boot_yaml.html. This will
 * add propertySources into environment, so we can use environment().getProperty to get property
 * value in some cases.
 */
public class EnvironmentAwarePropertySourcesPlaceholderConfigurer extends PropertySourcesPlaceholderConfigurer
        implements EnvironmentAware, InitializingBean {

    private List<PropertySource<?>> propertySources = new ArrayList<>();
    private ConfigurableEnvironment environment;

    public EnvironmentAwarePropertySourcesPlaceholderConfigurer() {}

    /**
     * @param propertySources: order: abc-default.yaml, abc-{env}.yaml <br>
     *        This makes its usage same as org.springframework.context.annotation.PropertySource
     */
    public EnvironmentAwarePropertySourcesPlaceholderConfigurer(@Nonnull final List<PropertySource<?>> propertySources) {
        this.propertySources = propertySources;
    }

    @Override
    public void setEnvironment(final Environment environment) {
        // all subclasses extend ConfigurableEnvironment
        this.environment = (ConfigurableEnvironment) environment;
        super.setEnvironment(environment);
    }

    public EnvironmentAwarePropertySourcesPlaceholderConfigurer addYamlPropertySource(@Nonnull Resource resource)
            throws IOException {
        return addYamlPropertySource(resource.getFilename(), resource);
    }

    public EnvironmentAwarePropertySourcesPlaceholderConfigurer addYamlPropertySource(@Nonnull String name,
            @Nonnull Resource resource) throws IOException {
        YamlPropertySourceLoader loader = new YamlPropertySourceLoader();
        // order: abc-default.yaml, abc-{env}.yaml
        PropertySource<?> defaultYamlPropertySource = loader.load(name + ".defualt", resource, null);
        propertySources.add(defaultYamlPropertySource);
        PropertySource<?> applicationYamlPropertySource =
                loader.load(name + "." + System.getProperty("env"), resource, System.getProperty("env"));
        propertySources.add(applicationYamlPropertySource);
        return this;
    }

    @Override
    public void afterPropertiesSet() throws Exception {
        // This will ad it as abc-{env}.properties, abc-default.properties into
        // environment.propertySource
        // spring get value from the first propertySource which defines the property and return it.
        // check org.springframework.core.env.PropertySourcesPropertyResolver.getProperty(String,
        // Class<T>, boolean)
        if (propertySources != null) {
            propertySources.forEach(propertySource -> environment.getPropertySources().addFirst(propertySource));
        }
    }
}
PropertySourcesPlaceholderConfigurer
Then we create static PropertySourcesPlaceholderConfigurer and load yaml files. Due to spring restriction, we can't use yaml file in @PropertySource
public static PropertySourcesPlaceholderConfigurer propertyConfig() throws IOException {
    final String password = System.getenv(ENV_APP_ENCRYPTION_PASSWORD);
    if (StringUtils.isBlank(password)) {
        return new EnvironmentAwarePropertySourcesPlaceholderConfigurer()
                .addYamlPropertySource(new ClassPathResource("cassandra.yaml")); // add more
    }
    return new EncryptedPropertySourcesPlaceholderConfigurer(password)
            .addYamlPropertySource(new ClassPathResource("cassandra.yaml"));
}
Check Spring - Encrypt Properties by Customizing PropertySourcesPlaceholderConfigurer to know more how to implement EncryptedPropertySourcesPlaceholderConfigurer. 

Resources
Leverage Spring Boot’s YAML Configuration Files in Groovy Scripts

Labels

Java (159) Lucene-Solr (110) Interview (61) All (58) J2SE (53) Algorithm (45) Soft Skills (37) Eclipse (33) Code Example (31) Linux (24) JavaScript (23) Spring (22) Windows (22) Web Development (20) Nutch2 (18) Tools (18) Bugs (17) Debug (16) Defects (14) Text Mining (14) J2EE (13) Network (13) Troubleshooting (13) PowerShell (11) Chrome (9) Design (9) How to (9) Learning code (9) Performance (9) Problem Solving (9) UIMA (9) html (9) Http Client (8) Maven (8) Security (8) bat (8) blogger (8) Big Data (7) Continuous Integration (7) Google (7) Guava (7) JSON (7) ANT (6) Coding Skills (6) Database (6) Scala (6) Shell (6) css (6) Algorithm Series (5) Cache (5) Dynamic Languages (5) IDE (5) Lesson Learned (5) Programmer Skills (5) System Design (5) Tips (5) adsense (5) xml (5) AIX (4) Code Quality (4) GAE (4) Git (4) Good Programming Practices (4) Jackson (4) Memory Usage (4) Miscs (4) OpenNLP (4) Project Managment (4) Spark (4) Testing (4) ads (4) regular-expression (4) Android (3) Apache Spark (3) Become a Better You (3) Concurrency (3) Eclipse RCP (3) English (3) Happy Hacking (3) IBM (3) J2SE Knowledge Series (3) JAX-RS (3) Jetty (3) Restful Web Service (3) Script (3) regex (3) seo (3) .Net (2) Android Studio (2) Apache (2) Apache Procrun (2) Architecture (2) Batch (2) Bit Operation (2) Build (2) Building Scalable Web Sites (2) C# (2) C/C++ (2) CSV (2) Career (2) Cassandra (2) Distributed (2) Fiddler (2) Firefox (2) Google Drive (2) Gson (2) How to Interview (2) Html Parser (2) Http (2) Image Tools (2) JQuery (2) Jersey (2) LDAP (2) Life (2) Logging (2) Python (2) Software Issues (2) Storage (2) Text Search (2) xml parser (2) AOP (1) Application Design (1) AspectJ (1) Chrome DevTools (1) Cloud (1) Codility (1) Data Mining (1) Data Structure (1) ExceptionUtils (1) Exif (1) Feature Request (1) FindBugs (1) Greasemonkey (1) HTML5 (1) Httpd (1) I18N (1) IBM Java Thread Dump Analyzer (1) JDK Source Code (1) JDK8 (1) JMX (1) Lazy Developer (1) Mac (1) Machine Learning (1) Mobile (1) My Plan for 2010 (1) Netbeans (1) Notes (1) Operating System (1) Perl (1) Problems (1) Product Architecture (1) Programming Life (1) Quality (1) Redhat (1) Redis (1) Review (1) RxJava (1) Solutions logs (1) Team Management (1) Thread Dump Analyzer (1) Visualization (1) boilerpipe (1) htm (1) ongoing (1) procrun (1) rss (1)

Popular Posts