エンタープライズ・クラウドと 並列・分散・非同期処理
DESCRIPTION
TRANSCRIPT
エンタープライズ・クラウドと 並列・分散・非同期処理
@maruyama097
丸山不二夫
Agenda
Part I Multi-coreのもとでの並列プログラミング
Part II ネットワーク上の分散環境をめぐる動き
Part III 非同期プログラミングの手法
Multi-coreのもとでの 並列プログラミング
Multi-core化の進行
JSR166y:ForkJoin
Java SE8での並列プログラミング
.NET Frameworkの並列プラグラミング
Intel OpenCL
JavaScript Intel River Trail
Multi-core化の進行
既に、PCの世界では、ほとんど全てのマシンがMulti-coreチップを搭載している。こうした傾向が変わることはない。クラウド・デバイスのMulti-Core化も進行している。
coreの数の増大は続いている。100coreのチップの登場も予告されている。
multi-core CPU
CPU名 コア数 製造会社
Nehalem-EX 8 Intel
Power 7 8 IBM
Magny-Cours 12 AMD
T3 16 Oracle
Intel SCC (Single-chip Cloud Computer)
Intel SCC (Single-chip Cloud Computer)
Intel Labが2010年3月30日に発表。http://techresearch.intel.com/articles/Tera-Scale/1826.htm
一つのタイル(tile)につき二つのIAコアを持つ24個のタイルから構成される。48コア
セクション間双方向256GB/secの帯域を持つ、24個のrouter mesh network
4つの統合されたDDR3コントローラ。64GB
Tilera GX 36,48,100core
モバイルに利用され始めた Multi-core Tegra-3 5core
CPU core management based on workload
JSR166y ForkJoin Divide and Conquer
ForkJoinは、現在の並列処理の基本アルゴリズムの一つ。Javaに限らず広く利用されている。
ForkJoinは、処理を分割して、分割された処理を、複数のコア上で並列化することによって、パフォーマンスを上げようとするものである。
ここでは、まず、そのエッセンスとしての「Divide and Conquer」の手法を見てみよう。
Divide and Conquer
Result solve(Problem problem) {
if (problem が小さいものであれば)
直接、problemを解け;
else {
problemを独立の部分に分割せよ;
それぞれの部分を解く、subtaskをforkせよ;
全てのsubtaskをjoinせよ;
subresultからresultを構成せよ;
}
}
class SortTask extends RecursiveAction { final long[] array; final int lo; final int hi; SortTask(long[] array, int lo, int hi) { this.array = array; this.lo = lo; this.hi = hi; } protected void compute() { if (hi - lo < THRESHOLD) sequentiallySort(array, lo, hi); else { int mid = (lo + hi) >>> 1; invokeAll(new SortTask(array, lo, mid), new SortTask(array, mid, hi)); merge(array, lo, hi); } } }
THRESHOLD以下は 普通の線形SORT
SortTaskをRecursive に呼び出す。
結果をmergeする
Recursiveな呼び出しで、処理が分割される
lo
lo
lo lo lo lo
lo
hi
hi hi
hi hi hi hi
(lo+hi)/2
(lo+hi)/2 (lo+hi)/2
invokeAll(sortTask…,sortTask… )
invokeAll(sortTask…,sortTask… ) invokeAll(sortTask…,sortTask… )
If (hi - lo) < THRESHHOLD
sequentialMerge
class IncrementTask extends RecursiveAction { final long[] array; final int lo; final int hi; IncrementTask(long[] array, int lo, int hi) { this.array = array; this.lo = lo; this.hi = hi; } protected void compute() { if (hi - lo < THRESHOLD) { for (int i = lo; i < hi; ++i) array[i]++; } else { int mid = (lo + hi) >>> 1; invokeAll(new IncrementTask(array, lo, mid), new IncrementTask(array, mid, hi)); } } }
THRESHOLD以下なら Arrayの要素を+1
IncrementalTask をRecursiveに呼び出す。
lo
lo
lo lo lo lo
lo
hi
hi hi
hi hi hi hi
(lo+hi)/2
(lo+hi)/2 (lo+hi)/2
invokeAll(incrementTask…,incrementTask… )
If (hi - lo) < THRESHHOLD
Array[i]++
invokeAll(incrementTask…,incrementTask… )
invokeAll(incrementTask…,incrementTask… )
Thresholdによる差異
Thresholdが大きいと、並列性がきかなくなる Thresholdが小さいと、並列化のためのオーバーヘッドが増える
並列化には、余分なコストがかかりうる
JSR166y ForkJoin Work-Steal
処理の分割と並ぶ、もう一つのForkJoinの心臓部は、Work-Stealアルゴリズムである。Work-Stealの手法は、Coreに割り振られるTaskの平均化に、とてもスマートな方法を提供している。ここでは、その概要をみていこう。
Multi-coreとWorker
0
Worker Core
Queue
1
Worker Core
Queue
7
Worker Core
Queue
2
Worker Core
Queue
3
Worker Core
Queue
6
Worker Core
Queue
5
Worker Core
Queue
4
Worker Core
Queue
それぞれのWorkerスレッドは、自分のスケジューリングQueue の中に、実行可能なTaskを管理している。
Double-Link Queue(dequeu)
LIFO (Last In / First Out)
FIFO (First In / First Out)
Queueは、double-link Queue(dequeu)として管理され、 LIFOのpush,popとFIFOのtakeをサポートする。
push pop
take
Subtaskのpush
あるWorkerのスレッドで実行されるtaskから生成される subtaskは、dequeにpushされる。
Worker
invokeAll(Task1…,Task2…)
push push
Taskの実行
pop pop
Task2実行 Task1実行
Workerスレッドは、自分のdequeを、LIFO(若い者が先)の 順序で、taskをpopさせながら処理する。
Work Steal
Workerスレッドは、自分が実行すべきローカルなtaskがな くなった場合には、ランダムに選ばれた他のWorkerから、 FIFO(古いものが先)のルールで、taskを取る(「盗む」)。
take
push
Work-Stealの動作
Pool.invoke()が呼ばれるとき、taskはランダムにdequeuに置かれる
Workerがtaskを実行しているとき
たいていは、二つのtaskをpushするだけ
そして、その一つをpopして実行する
そのうち、いくつかのWorkerが、top-levelのtaskを盗み始める
そうして、forkが終わると、taskは沢山のwork-queueに、自然に分散することになる
そうして、時間のかかるSequential部分を実行
Work-Stealing
WorkerスレッドがJoin操作に会うと、それは、利用可能な別のtaskを、そのtaskが終了したという通知(isDone)を受け取るまで処理を続ける。
Workerスレッドに仕事がなく、どの他のスレッドからも仕事を取ることが出来なかったら、いったん元の状態に戻り、他のスレッドが、同様に全てアイドル状態だということが分かるまでは、そのあとも試行を続ける。
全てアイドルの状態の時には、トップレベルから、別のtaskが投入されるまで、Workerはブロックされる。
extra JSR166y ParallelArray データの分割
ParallelArray (Extra JSR166y)は、ForkJoinの応用である。ForkJoinのアルゴリズムは、必ずしも理解が容易ではない。
ParallelArrayは、一般のプログラマにも、Bulkデータ対する処理のフロー化としてイメージがしやすい。Java,C#,JavaScriptの並列プログラミングの手法として、Parallel-Arrayは、広く受け入れられようとしている。
ParalellArray コードサンプル
// ある年度で最高点をとった学生を見つける
ParallelArray students =
new ParallelArray(fjPool, data);
double bestGpa = students
.withFilter(isSenior) // 卒業年でフィルター
.withMapping(selectGpa) // 点数を取り出す
.max(); // 最高点を選ぶ ここでは、明示的には、繰り返しのfor文は使われていない。 こうした処理をBulkデータ処理と呼ぶことがある。
Parallel Arrayで サポートされている基本操作
Apply – 選択されたそれぞれの要素へのアクションの実行
Filtering – 要素の部分を選択
複数のfilterを指定できる
ソートされたParallel Arrayには、Binary searchがサポートされている
Mapping – 選択された要素を、別の形式に変換
Replacement – 新しいParallelArrayを生成
Sorting, running accumulation
Aggregation – 全ての値を一つの値に
max, min, sum, average
一般的な用途のreduce() メソッド
Apply
public void apply( Ops.Procedure<? super T> procedure)
それぞれの要素に、procedureを適用する。
static final class Proc implements Ops.Procedure<Rand> { public void op(Rand x) { for (int k = 0; k < (1 << 10); ++k) x.next(); } }
ForkJoinPool fjp = new ForkJoinPool(i); ParallelArray pa = ParallelArray.createUsingHandoff(array, fjp); final Proc proc = new Proc(); pa.apply(proc);
withFilter
public ParallelArray withFilter(Ops.Predicate<? super T> selector)
selectorが真となる要素を選ぶ。
ForkJoinPool fjp = new ForkJoinPool(ps); ParallelArray<Rand> pa = ParallelArray.createUsingHandoff( array, fjp); final IsPrime pred = new IsPrime(); List<Rand> result = pa.withFilter(pred).all().asList();
static final Ops.Predicate isSenior = new Ops.Predicate() { public boolean op(Student s) { return s.graduationYear == Student.THIS_YEAR; } };
withMapping / Reduce
public <U> ParallelArrayWithMapping<T,U> withMapping(Ops.Op<? super T,? extends U> op)
sum += pa.withMapping(getNext).reduce(accum, zero);
static final class GetNext implements Ops.Op<Rand, Long> final GetNext getNext = new GetNext();
static final class Accum implements Ops.Reducer<Long> final Accum accum = new Accum(); final Long zero = Long.valueOf(0);
static final class GetNext implements Ops.Op<Rand, Long> { public Long op(Rand x) { return x.next(); } } static final class Accum implements Ops.Reducer<Long> { public Long op(Long a, Long b) { long x = a; long y = b; return x + y; } }
引数の型、返り値の型
引数の型
public class Ops { private Ops() {} // disable construction // Thanks to David Biesack for the above html table // You want to read/edit this with a wide editor panel public static interface Op<A,R> { R op(A a);} public static interface BinaryOp<A,B,R> { R op(A a, B b);} public static interface Predicate<A> { boolean op(A a);} public static interface BinaryPredicate<A,B> { boolean op(A a, B b);} public static interface Procedure<A> { void op(A a);} public static interface Generator<R> { R op();} public static interface Reducer<A> extends BinaryOp<A, A, A>{} …… …… }
基本的には、メソッドopの実装を与える必要がある。
この面倒さは、Closureを導入することで 大幅に、軽減される。
There’s not a moment to lose! http://mreinhold.org/blog/closures 2009/11/24
The free lunch is over. Multicore processors are not just coming—they’re here.
Leveraging multiple cores requires writing scalable parallel programs, which is incredibly hard.
Tools such as fork/join frameworks based on work-stealing algorithms make the task easier, but it still takes a fair bit of expertise and tuning.
Bulk-data APIs such as parallel arrays allow computations to be expressed in terms of higher-level, SQL-like operations (e.g., filter, map, and reduce) which can be mapped automatically onto the fork-join paradigm.
Working with parallel arrays in Java, unfortunately, requires lots of boilerplate code to solve even simple problems.
Closures can eliminate that boilerplate.
There’s not a moment to lose! Closures for Java By M.Reinhold
無料ランチの時間は終わった。マルチコア・プロセッサーは、これから登場しようとしているのではない。それは、もう、目の前にあるのだ。
マルチコアの力を発揮するには、スケーラブルな並列プログラムを書く必要があるのだが、それは信じられないほど困難だ。
Work-Stealアルゴリズムに基づいたFork/Join
フレームワークのようなツールは、その仕事をより簡単にするのだが、それでも、かなりの熟練とチューニングを必要とする。
There’s not a moment to lose! Closures for Java By M.Reinhold
ParallelArayのような大量データ用のAPIは、計算を抽象度の高いレベルで、SQL風な(例えば、filter, map, reduceといった)操作で表現することを可能とする。これらの操作を、自動的に、ForkJoinパラダイムにマップすることが可能である。
Javaで、ParallelArrayで仕事をするためには、残念なことに、簡単な問題を解く時でさえも、沢山の決まりきったコードを書く必要がある。
There’s not a moment to lose! Closures for Java By M.Reinhold
Closureを使えば、こうした決まりきったコードを無くすことが出来る。
JavaにClosureを追加すべきなのは、今だ。
このReinholdの主張は、2年前のものだが、残念ながら、Java SE7では、ForkJoinは導入されたが、Closureの導入は見送られ、Java SE8に持ち越された。
Java SE7のForkJoin
http://docs.oracle.com/javase/7/docs/technotes/guides/concurrency/index.html
Java SE7 ForkJoinのKey Class
ForkJoinPool
ForkJoinTaskを走らせるためのExecutor service
ForkJoinTask
forkjoin taskのbase class
RecursiveAction
ForkJoinTaskのサブクラス
Recursiveな、結果のないtask
計算のため、abstract method compute() を実装する。
RecursiveTask
RecursiveActionと同じだが、結果を返す
Java SE7 ForkJoin Example – Fibonacci
public class Fibonacci extends RecursiveTask<Integer> {
private final int number;
public Fibonacci(int n) { number = n; }
@Override protected Integer compute() {
switch (number) {
case 0: return (0);
case 1: return (1);
default:
Fibonacci f1 = new Fibonacci(number – 1);
Fibonacci f2 = new Fibonacci(number – 2);
f1.fork(); f2.fork();
return (f1.join() + f2.join());
}
}
}
Project Lambdaと Java SE8での並列プログラミング
Java7で、Closureの導入が見送られたのは残念なことであった。ここでは、次期Java SE8での、Project Lambdaに基づくClosureの導入と、そのもとでのMulti-core
対応の並列プログラミングのスタイルを見ておこう。
通常のSequentialな処理 for文での繰り返し
class Student { String name; int gradyear; double score; } List<Student> students = …… ; double max = Double.MIN_VALUE; for (Student s : students) { if (s.gradyear == 2011) max = Math.max(max, s,score) } Return max;
ParalellArrayでの処理 Closure無し
Double max = students . filter(new Predicate<Student>() { public boolean eval(Student s) { return s.gradYear == 2011; } }} . map(new Mapper<Student,Double>() { public Double map(Student s) { return s.score; } }} . reduce(0,0, new Reducer<Double,Double> () { public Double reduce(Double max, Double score) { return Math.max(max,score); } }};
Java SE8 Closureの導入と型推論による簡略化
Double max = students . filter((Student s) -> s.gradYear == 2011) . map((Student s) -> s.score) . reduce(0,0, (Double max, Double score) -> Math.max(max,score));
Double max = students . filter(s -> s.gradYear == 2011) . map(s -> s.score) . reduce(0,0, (max, score) -> Math.max(max,score));
Java SE8 Method Literal Math#max
Double max = students . filter(s -> s.gradYear == 2011) . map(s -> s.score) . reduce(0,0, (max, score) -> Math.max(max,score));
Double max = students . filter(s -> s.gradYear == 2011) // Iterable . map(s -> s.score) // Iterable . reduce(0,0, Math#max) ; // Double
Notationが、Math::max,Math#maxと、ゆれているようだ。
Java SE8 Iterableインターフェースの拡張
Interface Iterable<T> { Iterator<T> iterator(); void forEach(Block<E> block) default …; Iterable<T> filter(Predicate<? Super T> predicate); <U> Iterable<U> map(Mapper<? Super T, ? Extends U> mapper); <U> U reduce (U base,Reducer<U, ? Super T> reducer); }
Collection<E> extends Iterable<E> であるので、 Iterableは、Javaの最も基本的なContainer Typeである。
Java SE8 default implementation
Interface Iterable<T> { Iterator<T> iterator(); Iterable<T> filter(Predicate<? Super T> predicate) default Iterable.filter; <U> Iterable<U> map(Mapper<? Super T, ? Extends U> mapper) default Iterable.map; <U> U reduce (U base,Reducer<U, ? Super T> reducer) default Iterable.reduce; }
default:実装クラスに、メソッドがなかったら、この実装を利用する
Java SE8 Iterableの問題
Double max = students . filter(s -> s.gradYear == 2011) // Iterable . map(s -> s.score) // Iterable . reduce(0,0, Math::max) ; // Double
filter、map、reduceは、Sequentialに処理される。 もしも、studentsが、巨大なものであったら? もしも、reduceが、非常に高価な処理であったら?
Java SE8 ParalellでのBulk処理
Double max = students . filter(s -> s.gradYear == 2011) // Iterable . map(s -> s.score) // Iterable . reduce(0,0, Math::max) ; // Double
Double max = students . parallell() .filter(s -> s.gradYear == 2011) . map(s -> s.score) . reduce(0,0, Math::max) ;
parallel() は、Spliterableを返す。 Spliterable のmethodsは、ほとんどIterableと同じ。 ただ、iteratorの代わりに、spliteratorがある。
Java SE8 interace Spliterableの導入
public interface Spliterable<E> {
boolean canSplit();
long estimateElements();
Spliterable<E> left();
Spliterable<E> right();
Iterator<E> iterator();
……
}
parallel() は、Spliterableを返す。 Spliterable のmethodsは、ほとんどIterableと同じ。 ただ、iteratorの代わりに、spliteratorがある。
Java SE8 interace Spliterableの導入
parallel()が返すこのインターフェースは、基本的には、ForkJoinのDivide and Conquerを表現している。
Splitableは、自身を、right()とleft()に、分割する。
right()を、ForkJoinのWork Queueに置き、left()を、実行する。
これ以上分割をしないところまで来たら(ForkJoinのTHRESH_HOLD)、それ以降はIteratorを使って、Sequentialに処理する。
.NET Frameworkの 並列プラグラミング
現代の有力な言語で、並列プログラミングの対応が、一番進んでいるのは、.NET Frameworkであるように見える。
Parallel Programming in the .NET Framework
http://msdn.microsoft.com/en-us/library/dd460693.aspx
Parallel Programming in the .NET Framework
Many personal computers and workstations have two or four cores (that is, CPUs) that enable multiple threads to be executed simultaneously.
Computers in the near future are expected to have significantly more cores. To take advantage of the hardware of today and tomorrow, you can parallelize your code to distribute work across multiple processors.
In the past, parallelization required low-level manipulation of threads and locks.
Visual Studio 2010 and the .NET Framework 4 enhance support for parallel programming by providing a new runtime, new class library types, and new diagnostic tools.
.NET 4 new runtime, new class library
Task Parallel Library
Parallel LINQ (PLINQ)
Data Structures for Parallel Programming
Parallel Diagnostic Tools
Custom Partitioners for PLINQ and TPL
Task Factories
Task Schedulers
Lambda Expressions in PLINQ and TPL
………
Program Thread
CLR Thread Pool
.NET の User Mode Scheduler
Global Queue
Worker Thread 1
Worker Thread p
…
CLR Thread Pool: Work-Stealing
Worker Thread 1
Worker Thread p
…
Program Thread
.NET 4.0の User Mode Scheduler For Tasks
Global Queue
Local Queue
Local Queue
…
Task 1 Task 2
Task 3 Task 5
Task 4
Task 6
PLINK Code Sample
var source = Enumerable.Range(1, 10000);
// Opt-in to PLINQ with AsParallel
var evenNums = from num in source.AsParallel()
where Compute(num) > 0
select num;
var query = from item in source.AsParallel().WithDegreeOfParallelism(2)
where Compute(item) > 42
select item;
evenNums = from num in numbers.AsParallel().AsOrdered()
where num % 2 == 0
select num;
ForAll Operation
var nums = Enumerable.Range(10, 10000);
var query = from num in nums.AsParallel()
where num % 10 == 0
select num;
// Process the results as each thread completes
// and add them to a System.Collections.Concurrent.ConcurrentBag(Of Int)
// which can safely accept concurrent add operations
query.ForAll((e) => concurrentBag.Add(Compute(e)));
Sequential Fallback .NET4.0
intArray.AsParallel()
.Select(x => Foo(x)) Sequencial .NET4
.TakeWhile(x => Filter(x))
.ToArray();
Force Paralell .NET4.5
IntArray.AsParallel()
.WithExecutionMode( ParallelExecutionMode.ForceParallelism)
.Select(x => Foo(x))
.TakeWhile(x => Filter(x))
.ToArray();
Sequential Fallback in .NET 4 and .NET 4.5
Operators that may cause sequential fallback in both .NET 4 and .NET 4.5 are marked
in blue, and operators that may cause fallback in .NET 4 but no longer in .NET 4.5 are
marked in orange.
Intel OpenCL
Intel OpenCLは、多様な計算環境に対応した、包括的な並列プログラミングのフレームワークである。ただ、一般のプログラマが、これを直接使うことはないと思う。
http://software.intel.com/en-us/articles/vcsource-tools-opencl-sdk/
48core SCCのダイアグラム
SCCのメモリー構造
共有外部メモリー(可変長)
コア毎の 外部メモリ (可変長)
L1 Cache
16K cpu_0
L2 Cache
256K
コア毎の 外部メモリ (可変長)
L1 Cache
16K cpu_47
L2 Cache
256K
チップ上の共有メッセージ・パッシング・バッファー 384K 8K/core
SCCの共有仮想メモリー空間
コアをまたいだ、共有仮想空間が利用できる。
アプリケーションから見ると、単一のメモリー空間に見える。
複数のcore間で、シームレスにデータ構造やポインターを共有できる。
共有 仮想メモリー
アプリケーション
基本的に、Parallel型の利用法
Intel OpenCL
OpenCL™ (Open Computing Language) is the first open, royalty-free standard for general-purpose parallel programming of heterogeneous systems
OpenCL provides a uniform programming environment for software developers to write efficient, portable code for client computer systems, high-performance computing servers, and handheld devices using a diverse mix of multi-core CPUs and other parallel processors.
OpenCL Device Architecture Diagram
OpenCL - Class Diagram
Intel RiverTrail
JavaScriptの並列化の試み。
ParallelArrayを採用している。https://github.com/RiverTrail/RiverTrail/wiki
Intel RiverTrail https://github.com/RiverTrail/RiverTrail/wiki
The goal of Intel Lab’s River Trail project is to enable data-parallelism in web applications.
River Trail gently extends JavaScript with simple deterministic data-parallel constructs that are translated at runtime into a low-level hardware abstraction layer.
By leveraging multiple CPU cores and vector instructions, River Trail achieves significant speedup over sequential JavaScript.
ParallelArray
ParallelArray();
ParallelArray(size, elementalFunction, arg1, ..., argN);
ParallelArray(anArray);
ParallelArray(constructor, anArray);
ParallelArray(element0, element1, ..., elementN);
ParallelArray(canvas);
pa1 = new ParallelArray([ [0,1], [2,3], [4,5] ]); // <<0,1>, <2,3>, <4.5>>
pa2 = new ParallelArray(pa1); // <<0,1>, <2,3>, <4.5>>
new ParallelArray(<0,1>, <2,3>); // <<0,1>,<2,3>>
new ParallelArray([ [0,1],[2] ]) // <<0,1>, <2>>
new ParallelArray([<0,1>,<2>]); // <<0,1>, <2>>
new ParallelArray(3,
function(i){return [i, i+1];}); // <<0,1><1,2><2,3>>
new ParallelArray([3,2],
function(iv){return iv[0]*iv[1];}); // <<0,0><0,1><0,2>>
new ParallelArray(canvas); // CanvasPixelArray
Parallel Methods
map
combine
reduce
scan
scatter
filter
flatten
partition
get
Map
myArray.map(elementalFunction, arg1, arg2, ...)
Return A freshly minted ParallelArray
Example: an identity function pa.map(function(val){return val;})
Filter
myArray.filter(elementalFunction, arg1, arg2, ...)
Returns A freshly minted ParallelArray holding source elements where the results of applying the elemental function is true.
Example pa.filter(function(){return true;})
Reduce
myArray.reduce(elementalFunction) myArray.reduce(elementalFunction, arg1, arg2, ...)
Returns The result of the reducing a and b, typically used in further applications of the elemental function.
Reduce is free to group calls to the elemental function in arbitrary ways and order the calls arbitrarily. If the elemental function is associative then the final result will be the same regardless of the ordering.
Flatten
myArray.flatten()
Returns A freshly minted ParallelArray whose outermost two dimensions have been collapsed into one.
Example
pa = new ParallelArray([[1,2][3,4]]) // <<1,2>,<3,4>> pa.flatten() // <1,2,3,4>
Partition
myArray.partition(size)
size
the size of each element of the newly created dimension; the outermost dimension of myArray needs to be divisible by size
Return A freshly minted ParallelArray where the outermost dimension has been partitioned into elements of size size.
Example pa = new ParallelArray([1,23,4]) // <1,2,3,4>
pa.partition(2) // <<1,2>,<3,4>>
ネットワーク上の 分散環境をめぐる動き
Scale-outとStateless Server
WebSocket
SPDY
Scale-outとStateless Server
Multi-tier Web ApplicationのScale-out
Java EE6:StatelessSessionBean+Servlet
Java EE6:RESTful Web Service
Play2.0:RoutesファイルとAction
Web Appli Multi-tier
Web Server Business Logic Data Base
Web Appli Multi-tier のScale-out
Load Balancer
Web Server Business Logic
Data Base
Scale-out
Scale-out
・・・・・・・
Web Appli Multi-tier のScale-out
Load Balancer
Web Server Business Logic
Data Base
・・・・・・・
Web Appli Multi-tier のAvailability
Load Balancer
Web Server Business Logic
Data Base
Crash!!
Crash!!
・・・・・・・
Web Appli Multi-tier のAvailability
Load Balancer
Web Server Business Logic
Data Base
New Instance
New Instance
・・・・・・・
Web Server/HTTPは、Stateless
Load Balancer
Web Server Business Logic
Data Base
・・・・・・・
Business Logic層は、Stateful?
Load Balancer
Web Server Business Logic
Data Base
Stateless?
・・・・・・・
Application Server全体をStatelessに
Load Balancer
Web Server Business Logic
Data Base
Databaseが Appliのstate を担う。
・・・・・・・
Application Server全体をStatelessに
Load Balancer
Web Server Business Logic
Data Base
Databaseが Appliのstate を担う。
Sessionをまたぐ Sticky Session?
Java EE6 StatelessSessionBean+Servlet
@Stateless /** * Contains methods to create and query data */ public class StatelessSessionBean { @PersistenceContext private EntityManager em; public void createData(ServletOutputStream outputStream) {…} private void createOrder(int orderNumber) {…} public void queryData(ServletOutputStream outputStream) throws IOException {…} private void queryForOrderContainingItem(String itemName, ServletOutputStream outputStream) throws IOException {…} private void queryDataForOrder(int orderId, ServletOutputStream outputStream) throws IOException {…} … …
@WebServlet(name="TestServlet", urlPatterns={"/test/*"}) public class TestServlet extends HttpServlet { @EJB private StatelessSessionBean testEJB; protected void processRequest(…){…} protected void doGet(…){…} protected void doPost(…){…} public String getServletInfo() {…} … …
Java EE6 RESTful Web Service
@Stateless public class MessageBoardResourceBean { @Context private UriInfo ui; @EJB MessageHolderSingletonBean singleton; @GET public List<Message> getMessages() { return singleton.getMessages(); } @POST public Response addMessage(String msg) throws URISyntaxException { Message m = singleton.addMessage(msg); URI msgURI = ui.getRequestUriBuilder(). path(Integer.toString(m.getUniqueId())).build(); return Response.created(msgURI).build(); }
@Path("{msgNum}") @GET public Message getMessage(@PathParam("msgNum") int msgNum) throws NotFoundException { Message m = singleton.getMessage(msgNum); if(m == null) throw new NotFoundException(); return m; } @Path("{msgNum}") @DELETE public void deleteMessage(@PathParam("msgNum") int msgNum) throws NotFoundException { boolean deleted = singleton.deleteMessage(msgNum); if(!deleted) throw new NotFoundException(); } }
Play2.0 RoutesファイルとAction
RESTful アーキテクチャー
Webアプリケーションは、HTTPのRequestを受けて、Responseを返すものである。
ServletやStrutsは、HTTPのJavaレベルでのある抽象的な見方を与えているのだが、Webアプリケーションのフレームワークは、HTTPとそのコンセプトへの、完全でより直接のアクセスを可能にすべきである。
Template Engineを使えば、Servletは、必要ではない。
“Share-Nothing” Stateless アーキテクチャー
JavaのWebフレームワークの一部は、状態を持っている。
こうしたアプローチは、ページの状態を自動的に記憶するには役に立つ。同時に、「バックボタン」の処理等で面倒な問題も抱え込む。
Playは、PHP,Ruby on Railsと同様に、状態を持たない“Share-Nothing”アーキテクチャーを採用する。
# Routes # This file defines all application routes (Higher priority routes first) # ~~~~ # The home page GET / controllers.Projects.index # Authentication GET /login controllers.Application.login POST /login controllers.Application.authenticate GET /logout controllers.Application.logout # Projects POST /projects controllers.Projects.add POST /projects/groups controllers.Projects.addGroup() DELETE /projects/groups controllers.Projects.deleteGroup(group: String) PUT /projects/groups controllers.Projects.renameGroup(group: String) DELETE /projects/:project controllers.Projects.delete(project: Long) PUT /projects/:project controllers.Projects.rename(project: Long)
POST /projects/:project/team controllers.Projects.addUser(project: Long) DELETE /projects/:project/team controllers.Projects.removeUser(project: Long) # Tasks GET /projects/:project/tasks controllers.Tasks.index(project: Long) POST /projects/:project/tasks controllers.Tasks.add(project: Long, folder: String) PUT /tasks/:task controllers.Tasks.update(task: Long) DELETE /tasks/:task controllers.Tasks.delete(task: Long) POST /tasks/folder controllers.Tasks.addFolder DELETE /projects/:project/tasks/folder controllers.Tasks.deleteFolder(project: Long, folder: String) PUT /project/:project/tasks/folder controllers.Tasks.renameFolder(project: Long, folder: String) # Javascript routing GET /assets/javascripts/routes controllers.Application.javascriptRoutes # Map static resources from the /public folder to the /public path GET /assets/*file controllers.Assets.at(path="/public", file)
Controller Actionの記述 app/controllers/Application.java
app/controllers/以下のJava/Scalaファイルは、routesファイルで、HTTP Requestに対応づけられたActionを定義する。
package controllers; import play.*; import play.mvc.*; import views.html.*; public class Application extends Controller { public static Result index() { return ok(index.render("Hello World!")); } }
WebSocket Truly Web Competitive ?
http://www.infoq.com/presentations/WebSockets-The-Web-Communication-Revolution
Hack the Web for Real-Time
Ajax applications use various ―hacks‖ to simulate real-time communication
Polling -HTTP requests at regular intervals and immediately receives a response
Long Polling -HTTP request is kept open by the server for a set period
Streaming -More efficient, but not complex to implement and unreliable
Excessive HTTP header traffic, significant overhead to each request response
HTTP Characteristics
HTTP is designed for document transfer
Resource addressing
Request / Response interaction
Caching
HTTP is bidirectional, but half-duplex
Traffic flows in only one direction at a time
HTTP is stateless
Header information is resent for each request
Traditional vs Web
Traditional Computing
Full-duplex bidirectional TCP sockets
Access any server on the network
Web Computing
Half-duplex HTTP request-response
HTTP polling, long polling fraught with problems
Lots of latency, lots of bandwidth, lots of server-side resources
Bespoke solutions became very complex over time
HTML5 WebSocket
WebSocketsprovide an improved Web Commsfabric
Consists of W3C API and IETF Protocol
Provides a full-duplex, single socket over the Web
Traverses firewalls, proxies, and routers seamlessly
Leverages Cross-Origin Resource Sharing
Share port with existing HTTP content
Can be secured with TLS (much like HTTPS)
HTTP Is Not Full Duplex
Half-Duplex Web Architecture
WebSocketで、Webが Half DuplexからFull Duplexに
The Legacy Web Stack
Designed to serve static documents
HTTP
Half duplex communication
High latency
Bandwidth intensive
HTTP header traffic approx. 800 to 2000 bytes overhead per request/response
Complex architecture
Not changed since the 90’s
Plug-ins
Polling / long polling
Legacy application servers
Expensive to ―Webscale‖ applications
WebSocket Handshake Client Request
必須
GET /chat HTTP/1.1
HOST: server.example.com
Upgrade: websocket
Connection: Upgrade
オプション
Sec-Websocket-Key: 16-byte nonce, BASE64 encoded
Sec-Websocket-Version: 6
Sec-Websocket-Origin: http://example.com
Sec-Websocket-Protocol: protocol [, protokol]*
Sec-Websocket-Extension: extension [, extension]
Cookie: Cookie content & other cookie related headers
WebSocket Handshake Server Responce
必須
HTTP/1.1 101 “Switching Protocols” or other descriptions
Upgrade: websocket
Connection: Upgrade
Sec-Websocket-Accept: 20-bytes MDS hash in Base64
オプション
Sec-Websocket-Protocol: protocol
Sec-Websocket-Extension: extention [,extension]*
JavaScript How do I use: WebSocket API
//Create new WebSocket
var mySocket = new WebSocket("ws://www.WebSocket.org");
// Associate listeners
mySocket.onopen = function(evt) {
alert("Connectionopen…");
};
mySocket.onmessage = function(evt) {
alert("Receivedmessage: " + evt.data);
};
JavaScript How do I use: WebSocket API
mySocket.onclose = function(evt) {
alert("Connectionclosed…");
};
// Sending data
mySocket.send("WebSocket Rocks!");
// Close WebSocket
mySocket.close();
WebSocket Frames
Frameshave a fewheaderbytes
Data may be text or binary
Frames from client to server are masked (XORed w/ random value) to avoid confusing proxies
HTTP Header Traffic Analysis
Example network throughput for HTTP request and response headers associated with polling
Use case A: 1,000 clients polling every second:
Network throughput is (871 x 1,000) = 871,000 bytes = 6,968,000 bits per second (~6.6 Mbps)
Use case B: 10,000 clients polling every second:
Network throughput is (871 x 10,000) = 8,710,000 bytes = 69,680,000 bits per second (~66 Mbps)
Use case C: 100,000 clients polling every second:
Network throughput is (871 x 100,000) = 87,100,000 bytes = 696,800,000 bits per second (~665 Mbps)
Reduction in Network Traffic
With WebSocket, each frame has only several bytes of packaging (a 500:1 or even 1000:1 reduction)
No latency involved in establishing new TCP connections for each HTTP message
Dramatic reduction in unnecessary network traffic and latency
Remember the Polling HTTP header traffic? 665 Mbps network throughput for just headers
HTTP versus WebSockets
Example: Entering a character in a search field with auto suggestion
WebSockets reduces bandwidth overhead up to 1000x
HTTP Traffic WebSocket Traffic
Google 788 + 1 byte 2 + 1 byte
Yahoo 1737 + 1 byte 2 + 1 byte
Polling vs. Web Sockets
“Reducing kilobytes of data to 2 bytes…and reducing latency from 150ms to 50ms is far more than marginal. In fact, these two factors alone are enough to make WebSocket seriously interesting to Google.”
—Ian Hickson (Google, HTML5 spec lead)
SPDY: An experimental protocol for a faster web
http://www.chromium.org/spdy/spdy-whitepaper
Let's make the web faster
As part of the "Let's make the web faster" initiative, we are experimenting with alternative protocols to help reduce the latency of web pages. One of these experiments is SPDY (pronounced "SPeeDY"), an application-layer protocol for transporting content over the web, designed specifically for minimal latency.
In lab tests, we have compared the performance of these applications over HTTP and SPDY, and have observed up to 64% reductions in page load times in SPDY.
Background: web protocols and web latency
Unfortunately, HTTP was not particularly designed for latency. Furthermore, the web pages transmitted today are significantly different from web pages 10 years ago and demand improvements to HTTP that could not have been anticipated when HTTP was developed.
Single request per connection.
Exclusively client-initiated requests.
Uncompressed request and response headers.
Redundant header
Optional data compression.
Goals for SPDY
To target a 50% reduction in page load time.
To minimize deployment complexity.
To avoid the need for any changes to content by website authors.
To bring together like-minded parties interested in exploring protocols as a way of solving the latency problem.
Some specific technical goals
To allow many concurrent HTTP requests to run across a single TCP session.
To define a protocol that is easy to implement and server-efficient.
To make SSL the underlying transport protocol, for better security and compatibility with existing network infrastructure.
To enable the server to initiate communications with the client and push data to the client whenever possible.
SPDY design and features
SPDY adds a session layer atop of SSL that allows for multiple concurrent, interleaved streams over a single TCP connection.
The usual HTTP GET and POST message formats remain the same; however, SPDY specifies a new framing format for encoding and transmitting the data over the wire.
Streams are bi-directional i.e. can be initiated by the client and server.
Basic features
Multiplexed streams
SPDY allows for unlimited concurrent streams over a single TCP connection. Because requests are interleaved on a single channel, the efficiency of TCP is much higher: fewer network connections need to be made, and fewer, but more densely packed, packets are issued.
Request prioritization
SPDY implements request priorities: the client can request as many items as it wants from the server, and assign a priority to each request.
HTTP header compression
SPDY compresses request and response HTTP headers, resulting in fewer packets and fewer bytes transmitted.
Advanced features
Server push
SPDY experiments with an option for servers to push data to clients via the X-Associated-Content header. This header informs the client that the server is pushing a resource to the client before the client has asked for it. For initial-page downloads (e.g. the first time a user visits a site), this can vastly enhance the user experience.
Server hint
Rather than automatically pushing resources to the client, the server uses the X-Subresources header to suggest to the client that it should ask for specific resources, in cases where the server knows in advance of the client that those resources will be needed.
非同期プログラミングの手法
Java Future
.NET Async
Scala Future, Promise
Akka Future, Promise
JMS 2.0
Java Future
http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/Future.html
Since Java SE5
public interface Future<V>
A Future represents the result of an asynchronous computation. Methods are provided to check if the computation is complete, to wait for its completion, and to retrieve the result of the computation. The result can only be retrieved using method get when the computation has completed, blocking if necessary until it is ready. Cancellation is performed by the cancel method. Additional methods are provided to determine if the task completed normally or was cancelled. Once a computation has completed, the computation cannot be cancelled. If you would like to use a Future for the sake of cancellability but not provide a usable result, you can declare types of the form Future<?> and return null as a result of the underlying task.
Future Sample
interface ArchiveSearcher { String search(String target); }
class App {
ExecutorService executor = ...
ArchiveSearcher searcher = ...
void showSearch(final String target)
throws InterruptedException {
Future<String> future
= executor.submit(new Callable<String>() {
public String call() {
return searcher.search(target);
}});
displayOtherThings(); // do other things while searching
try {
displayText(future.get()); // use future
} catch (ExecutionException ex) { cleanup(); return; }
FutureTask
FutureTask<String> future =
new FutureTask<String>(new
Callable<String>() {
public String call() {
return searcher.search(target);
}});
executor.execute(future);
.NET Async
http://media.ch9.ms/teched/na/2011/ppt/DEV324.pptx
http://lunarfrog.com/blog/2012/01/23/simplicity-of-async-and-await/
var data = DownloadData(...); ProcessData(data);
DownloadDataAsync(... , data => { ProcessData(data); });
var data = DownloadData(...); ProcessData(data);
DownloadDataAsync(... , data => { ProcessData(data); });
DoWorkAsync
async void DoWorkAsync() {
var t1 = ProcessFeedAsync("www.acme.com/rss");
var t2 = ProcessFeedAsync("www.xyznews.com/rss");
await Task.WhenAll(t1, t2);
DisplayMessage("Done");
}
async Task ProcessFeedAsync(string url) {
var text = await DownloadFeedAsync(url);
var doc = ParseFeedIntoDoc(text);
await SaveDocAsync(doc);
ProcessLog.WriteEntry(url);
}
WriteFileAsync
async public Task void WriteFileAsync(string filename, string contents)
{
var localFolder =
Windows.Storage.ApplicationData.Current.LocalFolder;
var file = await localFolder.CreateFileAsync(filename,
Windows.Storage.CreationCollisionOption.ReplaceExisting);
var fs = await file.OpenAsync(
Windows.Storage.FileAccessMode.ReadWrite);
//...
}
await WriteFileAsync("FileName", "Some Text");
GetRssAsync
async Task <XElement> GetRssAsync(string url) {
var client = new WebClient();
var task = client.DownloadStringTaskAsync(url);
var text = await task;
var xml = XElement.Parse(text);
return xml;
}
Youtubeを分割してDownload
try {
// Network-bound
string[] videoUrls = await ScrapeYoutubeAsync(url);
// Start two downloads
Task<Video> t1 = DownloadVideoAsync(videoUrls[0]);
Task<Video> t2 = DownloadVideoAsync(videoUrls[1]);
// Wait for both
Video[] vids = await Task.WhenAll(t1, t2);
// CPU-bound
Video v = await MashupVideosAsync(vids[0], vids[1]);
// IO-bound
await v.SaveAsync(textbox.Text);
}
catch (WebException ex) { ReportError(ex);
}
Scala Future, Promise
http://docs.scala-lang.org/sips/pending/futures-promises.html
Futures
A future is an abstraction which represents a value which may become available at some point.
A Future object either holds a result of a computation or an exception in the case that the computation failed.
An important property of a future is that it is in effect immutable– it can never be written to or failed by the holder of the Future object.
val f: Future[List[String]] = future {
session.getRecentPosts
}
f onFailure {
case t => render("An error has occured: " +
t.getMessage)
} onSuccess {
case posts => for (post <- posts) render(post)
Callbacks
Registering an onComplete callback on the future ensures that the corresponding closure is invoked after the future is completed.
Registering an onSuccess or onFailure callback has the same semantics, with the difference that the closure is only called if the future is completed successfully or fails, respectively.
Registering a callback on the future which is already completed will result in the callback being executed eventually (as implied by
). Furthermore, the callback may even be executed synchronously on the same thread.
Callbacks
In the event that multiple callbacks are registered on the future, the order in which they are executed is not defined. In fact, the callbacks may be executed concurrently with one another. However, a particular Future implementation may have a well-defined order.
In the event that some of the callbacks throw an exception, the other callbacks are executed irregardlessly.
In the event that some of the callbacks never complete (e.g. the callback contains an infinite loop), the other callbacks may not be executed at all.
Functional Composition
val rateQuote = future {
connection.getCurrentValue(USD)
}
rateQuote onSuccess { case quote =>
val purchase = future {
if (isProfitable(quote)) connection.buy(amount, quote)
else throw new Exception("not profitable")
}
purchase onSuccess {
case _ => println("Purchased " + amount + " USD")
}
}
For-Comprehensions
val usdQuote = future { connection.getCurrentValue(USD) }
val chfQuote = future { connection.getCurrentValue(CHF) }
val purchase = for {
usd <- usdQuote
chf <- chfQuote
if isProfitable(usd, chf)
} yield connection.buy(amount, chf)
purchase onSuccess {
case _ => println("Purchased " + amount + " CHF")
}
Promises
While futures are defined as a type of read-only placeholder object created for a result which doesn’t yet exist, a promise can be thought of as a writeable, single-assignment container, which completes a future.
That is, a promise can be used to successfully complete a future with a value (by “completing” the promise) using the success method. Conversely, a promise can also be used to complete a future with an exception, by failing the promise, using the failure method.
import scala.concurrent.{ future, promise }
val p = promise[T]
val f = p.future
val producer = future {
val r = produceSomething()
p success r
continueDoingSomethingUnrelated()
}
val consumer = future {
startDoingSomething()
f onSuccess {
case r => doSomethingWithResult()
}
}
Akka Future, Promise
http://akka.io/docs/akka/2.0-M2/scala/futures.html
import akka.dispatch.Await
implicit val timeout = system.settings.ActorTimeout
val future = actor ? msg
val result = Await.result(future, timeout.duration).
asInstanceOf[String]
import akka.dispatch.Future
val future: Future[String] = (actor ? msg).mapTo[String]
import akka.dispatch.Await
import akka.dispatch.Future
import akka.util.duration._
val future = Future {
"Hello" + "World"
}
val result = Await.result(future, 1 second)
Composition
val f1 = Future {
"Hello" + "World"
}
val f2 = Promise.successful(3)
val f3 = f1 flatMap { x ⇒
f2 map { y ⇒
x.length * y
}
}
val result = Await.result(f3, 1 second)
result must be(30)
For Complehension
val f = for {
a ← Future(10 / 2) // 10 / 2 = 5
b ← Future(a + 1) // 5 + 1 = 6
c ← Future(a - 1) // 5 - 1 = 4
} yield b * c // 6 * 4 = 24
// Note that the execution of futures a, b, and c
// are not done in parallel.
val result = Await.result(f, 1 second)
result must be(24)
val f1 = actor1 ? msg1
val f2 = actor2 ? msg2
val a = Await.result(f1, 1 second).asInstanceOf[Int]
val b = Await.result(f2, 1 second).asInstanceOf[Int]
val f3 = actor3 ? (a + b)
val result = Await.result(f3, 1 second).asInstanceOf[Int]
// Create a sequence of Futures
val futures = for (i ← 1 to 1000) yield Future(i * 2)
val futureSum = Future.fold(futures)(0)(_ + _)
Await.result(futureSum, 1 second) must be(1001000)
// Create a sequence of Futures
val futures = for (i ← 1 to 1000) yield Future(i * 2)
val futureSum = Future.reduce(futures)(_ + _)
Await.result(futureSum, 1 second) must be(1001000)
Beyond Mere Actors
http://www.slideshare.net/bostonscala/beyond-mere-actors
On Time-Travel
Promised values are available in the future.
What does it mean to get a value out of the future? Time-travel into the future is easy. Just wait. But we don't have to go into the future. We can give our future-selves instructions.
Instead of getting values out of the future, we send computations into the future.
JMS 2.0
Last maintenance release (1.1) was in 2003
March 2011: JSR 343 launched to
develop JMS 2.0
Initial goals of JMS 2.0
Simpler and easier to use
simplify the API
make use of CDI (Contexts and Dependency Injection)
clarify any ambiguities in the spec
Support new themes of Java EE 7
PaaS
Multi-tenancy
Initial goals of JMS 2.0
Standardise interface with application servers
Clarify relationship with other Java EE specs
some JMS behaviour defined in other specs
New messaging features
standardize some existing vendor extensions (or will retrospective standardisation be difficult?)
Simplifying the JMS API Receiving messages in Java EE
@MessageDriven(mappedName = "jms/inboundQueue")
public class MyMDB implements MessageListener {
public void onMessage(Message message) {
String payload = (TextMessage)textMessage.getText();
// do something with payload
}
}
Sending messages in Java EE
@Resource(lookup = "jms/connFactory")
ConnectionFactory cf;
@Resource(lookup="jms/inboundQueue")
Destination dest;
public void sendMessage (String payload) throws JMSException {
Connection conn = cf.createConnection();
Session sess =
conn.createSession(false,Session.AUTO_ACKNOWLEDGE);
MessageProducer producer = sess.createProducer(dest);
TextMessage textMessage = sess.createTextMessage(payload);
messageProducer.send(textMessage);
connection.close();
}
Possible new API
@Resource(mappedName="jms/contextFactory")
ContextFactory contextFactory;
@Resource(mappedName="jms/orderQueue")
Queue orderQueue;
public void sendMessage(String payload) {
try (MessagingContext mCtx =
contextFactory.createContext();){
TextMessage textMessage =
mCtx.createTextMessage(payload);
mCtx.send(orderQueue,textMessage);
}
}
Annotations for the new API
@Resource(mappedName="jms/orderQueue")
Queue orderQueue;
@Inject
@MessagingContext(lookup="jms/contextFactory")
MessagingContext mCtx;
@Inject
TextMessage textMessage;
public void sendMessage(String payload) {
textMessage.setText(payload);
mCtx.send(orderQueue,textMessage);
}
Annotations for the old API
@Inject
@JMSConnection(lookup="jms/connFactory")
@JMSDestination(lookup="jms/inboundQueue")
MessageProducer producer;
@Inject
TextMessage textMessage;
public void sendMessage (String payload){
try {
textMessage.setText(payload);
producer.send(textMessage);
} catch {JMSException e}
// do something
}
}
Send a message with async acknowledgement from server
Send a message and return immediately without blocking until an acknowledgement has been received from the server.
Instead, when the acknowledgement is received, an asynchronous callback will be invoked
Why? Allows thread to do other work whilst waiting for the acknowledgement
producer.send(message, new AcknowledgeListener(){
public void onAcknowledge(Message message) {
// process ack
}
});
Topic hierarchies
Topics can be arranged in a hierarchy STOCK.NASDAQ.TECH.ORCL
STOCK.NASDAQ.TECH.GOOG
STOCK.NASDAQ.TECH.ADBE
STOCK.NYSE.TECH.HPQ
Consumers can subscribe using wildcards STOCK.*.TECH.*
STOCK.NASDAQ.TECH.*
Most vendors support this already
Details TBD
Multiple consumers on a topic subscription
Allows scalable consumption of messages from a topic subscription
multiple threads
multiple JVMs
No further change to API for durable subscriptions (clientID not used)
New API for non-durable subscriptions
Why? Scalability
Why? Allows greater scalability
MessageConsumer messageConsumer= session.createSharedConsumer(
topic,sharedSubscriptionName);
Batch delivery
Will allow messages to be delivered asynchronously in batches
New method on MessageConsumer
New listener interface BatchMessageListener
Acks also sent in a batch
Why? May be more efficient for JMS provider or application
void setBatchMessageListener(
BatchMessageListener listener,
int batchSize,
long batchTimeOut)