并发，缓存与guava

作为一个已经用了很长时间缓存的javaer，老实说到今天为止才真正知道缓存该怎么用。为什么这么说呢？考虑如下代码：

public Object compute(Object key) {
  Object value = null;
  if ( (value = cache.get(key)) == null ) {
    value = compteValue(key);
  }
  cache.put(key, value);
}

是不是很熟悉？典型的缓存客户端代码，但是这段代码是有问题的。问题不在cache，而在于这段代码本身。考虑如下的场景：

线程A和线程B用同一个key访问get方法。
线程A发现缓存缺失，即cache返回null，开始执行compute。
线程B在线程A执行compute的时候同样发现缓存缺失，开始执行compute。
线程A执行完毕，放入缓存。
线程B执行完毕，放入缓存。

发现问题了么？除了两次放入，compute还被执行了两次。这样就违背了缓存的要求，即一次计算后多次复用。用并发的说法，不满足原子性。
你可能会说cache满足原子性，理论上客户代码也应该是满足原子性的。事实上却并不是这样，compute执行时间越长，并发请求越多，越有可能出现违背原子性的情况。

那么解决方法是什么？
最简单也是最古老的方式是在方法签名或者是cache上增加synchronized，也就是有名的监视器模式。增加同步关键字之后客户代码满足了原子性，不再出现重复计算的情况了。但是不同关键字的计算被串行，高并发下性能会变得很差。

public class Memoier1<K, V> implements Computable<K, V> {
  private final Map<K, V> cache = new HashMap<K, V>();
  private final Computable<K, V> c;

  public Memoier1(Computable<K, V> c) {
    this.c = c;
  }

  @Override
  public synchronized V compute(K key) throws InterruptedException {
    V result = cache.get(key);
    if (result == null) {
      result = c.compute(key);
      cache.put(key, result);
    }
    return result;
  }
}

让我们再看这段代码。容易违背原子性的代码是compute和put。如果compute能在极短时间内返回，就能极大地降低违背原子性的机率。问题在于现实情况compute不会那么快返回，否则就没有使用cache的必要了。但这提供给我们一个思路：线程A在compute开始执行时在缓存中占个位，这样线程B就不会同时执行compute，并且可能等待A计算结束后的结果。在Java这其实就是Future做的事情。在缓存中放入Future。线程A等待Future的结果。线程B根据key获取到了Future引用，同样开始等待计算结果。

注意：Memoier2是去除同步关键字后使用ConcurrentHashMap。原因除了HashMap不能直接用于并发环境之外，cache组合操作的原子性仍旧依赖cache自身的原子性。所以Memoier3使用的是ConcurrentHashMap。

import java.util.Map;
import java.util.concurrent.Callable;
import java.util.concurrent.CancellationException;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.Future;
import java.util.concurrent.FutureTask;

public class Memoier3<K,V> implements Computable<K,V>{
  private final Map<K,Future<V>> cache = new ConcurrentHashMap<K,Future<V>>();
  private final Computable<K, V> c;
	
  public Memoier3(Computable<K,V> c){
    this.c = c;
  }

  @SuppressWarnings("rawtypes")
  @Override
  public V compute(final K k) throws InterruptedException {
    Future<V> f = cache.get(k);
    if(f == null){
      FutureTask<V> ft = new FutureTask<V>(new Callable<V>(){

        @Override
        public V call() throws Exception {
          return c.compute(k); 
        }
				 
      });
      cache.put(k, ft);
      f = ft;
      ft.run();
    }	
    try{
      return f.get();
    }catch(ExecutionException e){
      throw lanuderThrowable(e.getCause());
    }
  }
	
  public static RuntimeException lanuderThrowable(Throwable t){
    if( t instanceof RuntimeException ){
      return (RuntimeException)t;
    }else if(t instanceof Error){
      throw new Error();
    }else
      throw new IllegalStateException("Not Unchecked", t);
    }
  }
}

问题到这里结束了？还没有，刚才提到：线程A放入了Future，线程B获取到Future。问题在于高并发下，线程B可能在A放入Future之前调用cache的get方法，这样仍旧会引起重复的计算。针对这种问题，put可能无法满足我们的要求，我们可能需要ConcurrentHashMap相对于HashMap额外提供的方法putIfAbsent提供的原子性。

不管前面get方式怎么执行，最坏的情况是线程A和B都缓存缺失，尝试往缓存中放入Future。这时线程A和B都会调用putIfAbsent。putIfAbsent的语义是如果不存在就放入。不管线程A和B调用顺序如何，总有一个线程得到另外一个线程的结果，另外一个线程得到之前的值null。这样我们可以依赖这个确定的条件撰写如下代码：

注意：实际代码还要考虑Future执行失败的情况，所以会有一个循环

import java.util.concurrent.Callable;
import java.util.concurrent.CancellationException;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.Future;
import java.util.concurrent.FutureTask;

public class Memoier<K, V> implements Computable<K, V> {
  private final ConcurrentHashMap<K, Future<V>> cache =
      new ConcurrentHashMap<K, Future<V>>();
  private final Computable<K, V> c;

  public Memoier(Computable<K, V> c) {
    this.c = c;
  }

  public V compute(final K k) throws InterruptedException {
    while (true) {
      Future<V> f = cache.get(k);
      if (f == null) {
        FutureTask<V> ft = new FutureTask<V>(new Callable<V>() {

          public V call() throws Exception {
            return c.compute(k);
          }

        });
        f = cache.putIfAbsent(k, ft);
        if (f == null) {
          f = ft;
          ft.run();
        }
      }
      try {
        return f.get();
      } catch (CancellationException e) {
        cache.remove(k);
      } catch (ExecutionException e) {
        throw lanuderThrowable(e.getCause());
      }
    }
  }

  public static RuntimeException lanuderThrowable(Throwable t) {
    if (t instanceof RuntimeException) {
      return (RuntimeException) t;
    } else if (t instanceof Error) {
      throw new Error();
    } else {
      throw new IllegalStateException("Not Unchecked", t);
    }
  }

}

关于putIfAbsent的作用建议仔细品味下。

guava

以上所有缓存代码都不涉及过期、失效等问题。原书《Java并发编程》并没有讨论这些特性。不过个人认为足够了，大部分缓存产品都有过期和失效的特性，除了ConcurrentHashMap。那如果你需要基于ConcurrentHashMap开发内存缓存，又不想让缓存的对象序列化以满足某些缓存产品的要求的话，建议试试个google的guava中的CacheBuilder。（当然你也可以基于weakreference, softreference自己写）

以下是常规代码：

LoadingCache<Key, Graph> graphs = CacheBuilder.newBuilder()
       .maximumSize(1000)
       .build(
           new CacheLoader<Key, Graph>() {
             public Graph load(Key key) throws AnyException {
               return createExpensiveGraph(key);
             }
           });

...
try {
  return graphs.get(key);
} catch (ExecutionException e) {
  throw new OtherException(e.getCause());
}

个人认为和一般缓存代码不同的是，创建时提供关键字和缓存对象的转换函数是一个很聪明的做法，当然只适合于单一缓存内容的场景。CacheBuilder中个人认为比较有用的是：

maximumSize(1000)
expireAfterWrite(10, TimeUnit.MINUTES)

设置最大数量和过期时间。过期时间提供after read, after write两种策略。

try {
  // If the key wasn't in the "easy to compute" group, we need to
  // do things the hard way.
  cache.get(key, new Callable<Value>() {
    @Override
    public Value call() throws AnyException {
      return doThingsTheHardWay(key);
    }
  });
} catch (ExecutionException e) {
  throw new OtherException(e.getCause());
}

非简单计算场景使用callback延迟计算。类似future。

RemovalListener<Key, DatabaseConnection> removalListener = new RemovalListener<Key, DatabaseConnection>() {
  public void onRemoval(RemovalNotification<Key, DatabaseConnection> removal) {
    DatabaseConnection conn = removal.getValue();
    conn.close(); // tear down properly
  }
};

return CacheBuilder.newBuilder()
  .expireAfterWrite(2, TimeUnit.MINUTES)
  .removalListener(removalListener)
  .build(loader);

移除监听器，可以用来打日志。当然，移除依赖过期策略：guava的缓存并不会立即移除，除非你调用cleanUp。

以上就是正确使用缓存时你需要了解的一些东西，同时请告诉自己：Concurrency is hard.

并发，缓存与guava

guava

参考资料