エンタープライズ・クラウドと並列・分散・非同期処理

エンタープライズ・クラウドと並列・分散・非同期処理

@maruyama097

丸山不二夫

Agenda

Part I Multi-coreのもとでの並列プログラミング

Part II ネットワーク上の分散環境をめぐる動き

Part III 非同期プログラミングの手法

Multi-coreのもとでの並列プログラミング

Multi-core化の進行

JSR166ｙ：ForkJoin

Java SE8での並列プログラミング

.NET Frameworkの並列プラグラミング

Intel OpenCL

JavaScript Intel River Trail

Multi-core化の進行

既に、PCの世界では、ほとんど全てのマシンがMulti-coreチップを搭載している。こうした傾向が変わることはない。クラウド・デバイスのMulti-Core化も進行している。

coreの数の増大は続いている。100coreのチップの登場も予告されている。

multi-core CPU

CPU名コア数製造会社

Nehalem-EX 8 Intel

Power 7 8 IBM

Magny-Cours 12 AMD

T3 16 Oracle

Intel SCC (Single-chip Cloud Computer)

Intel SCC (Single-chip Cloud Computer)

Intel Labが2010年3月30日に発表。http://techresearch.intel.com/articles/Tera-Scale/1826.htm

一つのタイル(tile)につき二つのIAコアを持つ24個のタイルから構成される。48コア

セクション間双方向256GB/secの帯域を持つ、24個のrouter mesh network

4つの統合されたDDR3コントローラ。64GB

http://techresearch.intel.com/articles/Tera-Scale/1826.htm






Tilera GX 36,48,100core

モバイルに利用され始めた Multi-core Tegra-3 5core

CPU core management based on workload

JSR166ｙ ForkJoin Divide and Conquer

ForkJoinは、現在の並列処理の基本アルゴリズムの一つ。Javaに限らず広く利用されている。

ForkJoinは、処理を分割して、分割された処理を、複数のコア上で並列化することによって、パフォーマンスを上げようとするものである。

ここでは、まず、そのエッセンスとしての「Divide and Conquer」の手法を見てみよう。

Divide and Conquer

Result solve(Problem problem) {

if (problem が小さいものであれば)

直接、problemを解け;

else {

problemを独立の部分に分割せよ;

それぞれの部分を解く、subtaskをforkせよ;

全てのsubtaskをjoinせよ;

subresultからresultを構成せよ;

}

}

class SortTask extends RecursiveAction { final long[] array; final int lo; final int hi; SortTask(long[] array, int lo, int hi) { this.array = array; this.lo = lo; this.hi = hi; } protected void compute() { if (hi - lo < THRESHOLD) sequentiallySort(array, lo, hi); else { int mid = (lo + hi) >>> 1; invokeAll(new SortTask(array, lo, mid), new SortTask(array, mid, hi)); merge(array, lo, hi); } } }

THRESHOLD以下は普通の線形SORT

SortTaskをRecursive に呼び出す。

結果をmergeする

Recursiveな呼び出しで、処理が分割される

lo

lo

lo lo lo lo

lo

hi

hi hi

hi hi hi hi

(lo+hi)/2

(lo+hi)/2 (lo+hi)/2

invokeAll(sortTask…,sortTask… )

invokeAll(sortTask…,sortTask… ) invokeAll(sortTask…,sortTask… )

If (hi - lo) < THRESHHOLD

sequentialMerge

class IncrementTask extends RecursiveAction { final long[] array; final int lo; final int hi; IncrementTask(long[] array, int lo, int hi) { this.array = array; this.lo = lo; this.hi = hi; } protected void compute() { if (hi - lo < THRESHOLD) { for (int i = lo; i < hi; ++i) array[i]++; } else { int mid = (lo + hi) >>> 1; invokeAll(new IncrementTask(array, lo, mid), new IncrementTask(array, mid, hi)); } } }

THRESHOLD以下なら Arrayの要素を+1

IncrementalTask をRecursiveに呼び出す。

lo

lo

lo lo lo lo

lo

hi

hi hi

hi hi hi hi

(lo+hi)/2

(lo+hi)/2 (lo+hi)/2

invokeAll(incrementTask…,incrementTask… )

If (hi - lo) < THRESHHOLD

Array[i]++

Thresholdによる差異

Thresholdが大きいと、並列性がきかなくなる Thresholdが小さいと、並列化のためのオーバーヘッドが増える

並列化には、余分なコストがかかりうる

JSR166ｙ ForkJoin Work-Steal

処理の分割と並ぶ、もう一つのForkJoinの心臓部は、Work-Stealアルゴリズムである。Work-Stealの手法は、Coreに割り振られるTaskの平均化に、とてもスマートな方法を提供している。ここでは、その概要をみていこう。

Multi-coreとWorker

０

Worker Core

Queue

１

Worker Core

Queue

７

Worker Core

Queue

２

Worker Core

Queue

３

Worker Core

Queue

６

Worker Core

Queue

５

Worker Core

Queue

４

Worker Core

Queue

それぞれのWorkerスレッドは、自分のスケジューリングQueue の中に、実行可能なTaskを管理している。

Double-Link Queue(dequeu)

LIFO (Last In / First Out)

FIFO (First In / First Out)

Queueは、double-link Queue(dequeu)として管理され、 LIFOのpush,popとFIFOのtakeをサポートする。

push pop

take

Subtaskのpush

あるWorkerのスレッドで実行されるtaskから生成される subtaskは、dequeにpushされる。

Worker

invokeAll(Task1…,Task2…)

push push

Taskの実行

pop pop

Task2実行 Task1実行

Workerスレッドは、自分のdequeを、LIFO（若い者が先）の順序で、taskをpopさせながら処理する。

Work Steal

Workerスレッドは、自分が実行すべきローカルなtaskがなくなった場合には、ランダムに選ばれた他のWorkerから、 FIFO（古いものが先）のルールで、taskを取る（「盗む」）。

take

push

Work-Stealの動作

Pool.invoke()が呼ばれるとき、taskはランダムにdequeuに置かれる

Workerがtaskを実行しているとき

たいていは、二つのtaskをpushするだけ

そして、その一つをpopして実行する

そのうち、いくつかのWorkerが、top-levelのtaskを盗み始める

そうして、forkが終わると、taskは沢山のwork-queueに、自然に分散することになる

そうして、時間のかかるSequential部分を実行

Work-Stealing

WorkerスレッドがJoin操作に会うと、それは、利用可能な別のtaskを、そのtaskが終了したという通知(isDone)を受け取るまで処理を続ける。

Workerスレッドに仕事がなく、どの他のスレッドからも仕事を取ることが出来なかったら、いったん元の状態に戻り、他のスレッドが、同様に全てアイドル状態だということが分かるまでは、そのあとも試行を続ける。

全てアイドルの状態の時には、トップレベルから、別のtaskが投入されるまで、Workerはブロックされる。

extra JSR166ｙ ParallelArray データの分割

ParallelArray (Extra JSR166y）は、ForkJoinの応用である。ForkJoinのアルゴリズムは、必ずしも理解が容易ではない。

ParallelArrayは、一般のプログラマにも、Bulkデータ対する処理のフロー化としてイメージがしやすい。Java,C#,JavaScriptの並列プログラミングの手法として、Parallel-Arrayは、広く受け入れられようとしている。

ParalellArray コードサンプル

// ある年度で最高点をとった学生を見つける

ParallelArray students =

new ParallelArray(fjPool, data);

double bestGpa = students

.withFilter(isSenior) // 卒業年でフィルター

.withMapping(selectGpa) // 点数を取り出す

.max(); // 最高点を選ぶここでは、明示的には、繰り返しのfor文は使われていない。こうした処理をBulkデータ処理と呼ぶことがある。

Parallel Arrayでサポートされている基本操作

Apply – 選択されたそれぞれの要素へのアクションの実行

Filtering – 要素の部分を選択

複数のfilterを指定できる

ソートされたParallel Arrayには、Binary searchがサポートされている

Mapping – 選択された要素を、別の形式に変換

Replacement – 新しいParallelArrayを生成

Sorting, running accumulation

Aggregation – 全ての値を一つの値に

max, min, sum, average

一般的な用途のreduce() メソッド

Apply

public void apply( Ops.Procedure<? super T> procedure)

それぞれの要素に、procedureを適用する。

static final class Proc implements Ops.Procedure<Rand> { public void op(Rand x) { for (int k = 0; k < (1 << 10); ++k) x.next(); } }

ForkJoinPool fjp = new ForkJoinPool(i); ParallelArray pa = ParallelArray.createUsingHandoff(array, fjp); final Proc proc = new Proc(); pa.apply(proc);

withFilter

public ParallelArray withFilter(Ops.Predicate<? super T> selector)

selectorが真となる要素を選ぶ。

ForkJoinPool fjp = new ForkJoinPool(ps); ParallelArray<Rand> pa = ParallelArray.createUsingHandoff( array, fjp); final IsPrime pred = new IsPrime(); List<Rand> result = pa.withFilter(pred).all().asList();

static final Ops.Predicate isSenior = new Ops.Predicate() { public boolean op(Student s) { return s.graduationYear == Student.THIS_YEAR; } };

ｗｉｔｈＭａｐｐｉｎｇ / Reduce

public ParallelArrayWithMapping<T,U> withMapping(Ops.Op<? super T,? extends U> op)

sum += pa.withMapping(getNext).reduce(accum, zero);

static final class GetNext implements Ops.Op<Rand, Long> final GetNext getNext = new GetNext();

static final class Accum implements Ops.Reducer<Long> final Accum accum = new Accum(); final Long zero = Long.valueOf(0);

static final class GetNext implements Ops.Op<Rand, Long> { public Long op(Rand x) { return x.next(); } } static final class Accum implements Ops.Reducer<Long> { public Long op(Long a, Long b) { long x = a; long y = b; return x + y; } }

引数の型、返り値の型

引数の型

public class Ops { private Ops() {} // disable construction // Thanks to David Biesack for the above html table // You want to read/edit this with a wide editor panel public static interface Op<A,R> { R op(A a);} public static interface BinaryOp<A,B,R> { R op(A a, B b);} public static interface Predicate<A> { boolean op(A a);} public static interface BinaryPredicate<A,B> { boolean op(A a, B b);} public static interface Procedure<A> { void op(A a);} public static interface Generator<R> { R op();} public static interface Reducer<A> extends BinaryOp<A, A, A>{} …… …… }

基本的には、メソッドopの実装を与える必要がある。

この面倒さは、Closureを導入することで大幅に、軽減される。

There’s not a moment to lose! http://mreinhold.org/blog/closures 2009/11/24

The free lunch is over. Multicore processors are not just coming—they’re here.

Leveraging multiple cores requires writing scalable parallel programs, which is incredibly hard.

Tools such as fork/join frameworks based on work-stealing algorithms make the task easier, but it still takes a fair bit of expertise and tuning.

Bulk-data APIs such as parallel arrays allow computations to be expressed in terms of higher-level, SQL-like operations (e.g., filter, map, and reduce) which can be mapped automatically onto the fork-join paradigm.

Working with parallel arrays in Java, unfortunately, requires lots of boilerplate code to solve even simple problems.

Closures can eliminate that boilerplate.

http://mreinhold.org/blog/closures

http://mreinhold.org/blog/closures

There’s not a moment to lose! Closures for Java By M.Reinhold

無料ランチの時間は終わった。マルチコア・プロセッサーは、これから登場しようとしているのではない。それは、もう、目の前にあるのだ。

マルチコアの力を発揮するには、スケーラブルな並列プログラムを書く必要があるのだが、それは信じられないほど困難だ。

Work-Stealアルゴリズムに基づいたFork/Join

フレームワークのようなツールは、その仕事をより簡単にするのだが、それでも、かなりの熟練とチューニングを必要とする。


ParallelArayのような大量データ用のAPIは、計算を抽象度の高いレベルで、SQL風な（例えば、filter, map, reduceといった）操作で表現することを可能とする。これらの操作を、自動的に、ForkJoinパラダイムにマップすることが可能である。

Javaで、ParallelArrayで仕事をするためには、残念なことに、簡単な問題を解く時でさえも、沢山の決まりきったコードを書く必要がある。


Closureを使えば、こうした決まりきったコードを無くすことが出来る。

JavaにClosureを追加すべきなのは、今だ。

このReinholdの主張は、2年前のものだが、残念ながら、Java SE7では、ForkJoinは導入されたが、Closureの導入は見送られ、Java SE8に持ち越された。

Java SE7のForkJoin

http://docs.oracle.com/javase/7/docs/technotes/guides/concurrency/index.html





Java SE7 ForkJoinのKey Class

ForkJoinPool

ForkJoinTaskを走らせるためのExecutor service

ForkJoinTask

forkjoin taskのbase class

RecursiveAction

ForkJoinTaskのサブクラス

Recursiveな、結果のないtask

計算のため、abstract method compute() を実装する。

RecursiveTask

RecursiveActionと同じだが、結果を返す

Java SE7 ForkJoin Example – Fibonacci

public class Fibonacci extends RecursiveTask<Integer> {

private final int number;

public Fibonacci(int n) { number = n; }

@Override protected Integer compute() {

switch (number) {

case 0: return (0);

case 1: return (1);

default:

Fibonacci f1 = new Fibonacci(number – 1);

Fibonacci f2 = new Fibonacci(number – 2);

f1.fork(); f2.fork();

return (f1.join() + f2.join());

}

}

}

Project Lambdaと Java SE8での並列プログラミング

Java7で、Closureの導入が見送られたのは残念なことであった。ここでは、次期Java SE8での、Project Lambdaに基づくClosureの導入と、そのもとでのMulti-core

対応の並列プログラミングのスタイルを見ておこう。

通常のSequentialな処理 for文での繰り返し

class Student { String name; int gradyear; double score; } List<Student> students = …… ; double max = Double.MIN_VALUE; for (Student s : students) { if (s.gradyear == 2011) max = Math.max(max, s,score) } Return max;

ParalellArrayでの処理 Closure無し

Double max = students . filter(new Predicate<Student>() { public boolean eval(Student s) { return s.gradYear == 2011; } }} . map(new Mapper<Student,Double>() { public Double map(Student s) { return s.score; } }} . reduce(0,0, new Reducer<Double,Double> () { public Double reduce(Double max, Double score) { return Math.max(max,score); } }};

Java SE8 Closureの導入と型推論による簡略化

Double max = students . filter((Student s) -> s.gradYear == 2011) . map((Student s) -> s.score) . reduce(0,0, (Double max, Double score) -> Math.max(max,score));

Double max = students . filter(s -> s.gradYear == 2011) . map(s -> s.score) . reduce(0,0, (max, score) -> Math.max(max,score));

Java SE8 Method Literal Math#max

Double max = students . filter(s -> s.gradYear == 2011) . map(s -> s.score) . reduce(0,0, (max, score) -> Math.max(max,score));

Double max = students . filter(s -> s.gradYear == 2011) // Iterable . map(s -> s.score) // Iterable . reduce(0,0, Math＃max) ; // Double

Notationが、Math::max,Math#maxと、ゆれているようだ。

Java SE8 Iterableインターフェースの拡張

Interface Iterable<T> { Iterator<T> iterator(); void forEach(Block<E> block) default …; Iterable<T> filter(Predicate<? Super T> predicate); Iterable map(Mapper<? Super T, ? Extends U> mapper); U reduce (U base,Reducer<U, ? Super T> reducer); }

Collection<E> extends Iterable<E> であるので、 Iterableは、Javaの最も基本的なContainer Typeである。

Java SE8 default implementation

Interface Iterable<T> { Iterator<T> iterator(); Iterable<T> filter(Predicate<? Super T> predicate) default Iterable.filter; Iterable map(Mapper<? Super T, ? Extends U> mapper) default Iterable.map; U reduce (U base,Reducer<U, ? Super T> reducer) default Iterable.reduce; }

default:実装クラスに、メソッドがなかったら、この実装を利用する

Java SE8 Iterableの問題

Double max = students . filter(s -> s.gradYear == 2011) // Iterable . map(s -> s.score) // Iterable . reduce(0,0, Math::max) ; // Double

filter、map、reduceは、Sequentialに処理される。もしも、studentsが、巨大なものであったら? もしも、reduceが、非常に高価な処理であったら？

Java SE8 ParalellでのBulk処理

Double max = students . filter(s -> s.gradYear == 2011) // Iterable . map(s -> s.score) // Iterable . reduce(0,0, Math::max) ; // Double

Double max = students . parallell() .filter(s -> s.gradYear == 2011) . map(s -> s.score) . reduce(0,0, Math::max) ;

parallel() は、Spliterableを返す。 Spliterable のmethodsは、ほとんどIterableと同じ。ただ、iteratorの代わりに、spliteratorがある。

Java SE8 interace Spliterableの導入

public interface Spliterable<E> {

boolean canSplit();

long estimateElements();

Spliterable<E> left();

Spliterable<E> right();

Iterator<E> iterator();

……

}

parallel() は、Spliterableを返す。 Spliterable のmethodsは、ほとんどIterableと同じ。ただ、iteratorの代わりに、spliteratorがある。

Java SE8 interace Spliterableの導入

parallel（）が返すこのインターフェースは、基本的には、ForkJoinのDivide and Conquerを表現している。

Splitableは、自身を、right()とleft()に、分割する。

right()を、ForkJoinのWork Queueに置き、left()を、実行する。

これ以上分割をしないところまで来たら（ForkJoinのTHRESH_HOLD）、それ以降はIteratorを使って、Sequentialに処理する。

.NET Frameworkの並列プラグラミング

現代の有力な言語で、並列プログラミングの対応が、一番進んでいるのは、.NET Frameworkであるように見える。

Parallel Programming in the .NET Framework

http://msdn.microsoft.com/en-us/library/dd460693.aspx







Parallel Programming in the .NET Framework

Many personal computers and workstations have two or four cores (that is, CPUs) that enable multiple threads to be executed simultaneously.

Computers in the near future are expected to have significantly more cores. To take advantage of the hardware of today and tomorrow, you can parallelize your code to distribute work across multiple processors.

In the past, parallelization required low-level manipulation of threads and locks.

Visual Studio 2010 and the .NET Framework 4 enhance support for parallel programming by providing a new runtime, new class library types, and new diagnostic tools.

.NET 4 new runtime, new class library

Task Parallel Library

Parallel LINQ (PLINQ)

Data Structures for Parallel Programming

Parallel Diagnostic Tools

Custom Partitioners for PLINQ and TPL

Task Factories

Task Schedulers

Lambda Expressions in PLINQ and TPL

………

Program Thread

CLR Thread Pool

.NET の User Mode Scheduler

Global Queue

Worker Thread 1

Worker Thread p

…

CLR Thread Pool: Work-Stealing

Worker Thread 1

Worker Thread p

…

Program Thread

.NET 4.0の User Mode Scheduler For Tasks

Global Queue

Local Queue

Local Queue

…

Task 1 Task 2

Task 3 Task 5

Task 4

Task 6

PLINK Code Sample

var source = Enumerable.Range(1, 10000);

// Opt-in to PLINQ with AsParallel

var evenNums = from num in source.AsParallel()

where Compute(num) > 0

select num;

var query = from item in source.AsParallel().WithDegreeOfParallelism(2)

where Compute(item) > 42

select item;

evenNums = from num in numbers.AsParallel().AsOrdered()

where num % 2 == 0

select num;

ForAll Operation

var nums = Enumerable.Range(10, 10000);

var query = from num in nums.AsParallel()

where num % 10 == 0

select num;

// Process the results as each thread completes

// and add them to a System.Collections.Concurrent.ConcurrentBag(Of Int)

// which can safely accept concurrent add operations

query.ForAll((e) => concurrentBag.Add(Compute(e)));

Sequential Fallback .NET4.0

intArray.AsParallel()

.Select(x => Foo(x)) Sequencial .NET4

.TakeWhile(x => Filter(x))

.ToArray();

Force Paralell .NET4.5

IntArray.AsParallel()

.WithExecutionMode( ParallelExecutionMode.ForceParallelism)

.Select(x => Foo(x))

.TakeWhile(x => Filter(x))

.ToArray();

Sequential Fallback in .NET 4 and .NET 4.5

Operators that may cause sequential fallback in both .NET 4 and .NET 4.5 are marked

in blue, and operators that may cause fallback in .NET 4 but no longer in .NET 4.5 are

marked in orange.

Intel OpenCL

Intel OpenCLは、多様な計算環境に対応した、包括的な並列プログラミングのフレームワークである。ただ、一般のプログラマが、これを直接使うことはないと思う。

http://software.intel.com/en-us/articles/vcsource-tools-opencl-sdk/













48core SCCのダイアグラム

SCCのメモリー構造

共有外部メモリー（可変長）

コア毎の外部メモリ（可変長）

L1 Cache

16K cpu_0

L2 Cache

256K

コア毎の外部メモリ（可変長）

L1 Cache

16K cpu_47

L2 Cache

256K

チップ上の共有メッセージ・パッシング・バッファー 384K 8K/core

SCCの共有仮想メモリー空間

コアをまたいだ、共有仮想空間が利用できる。

アプリケーションから見ると、単一のメモリー空間に見える。

複数のcore間で、シームレスにデータ構造やポインターを共有できる。

共有仮想メモリー

アプリケーション

基本的に、Parallel型の利用法

Intel OpenCL

OpenCL™ (Open Computing Language) is the first open, royalty-free standard for general-purpose parallel programming of heterogeneous systems

OpenCL provides a uniform programming environment for software developers to write efficient, portable code for client computer systems, high-performance computing servers, and handheld devices using a diverse mix of multi-core CPUs and other parallel processors.

OpenCL Device Architecture Diagram

OpenCL - Class Diagram

Intel RiverTrail

JavaScriptの並列化の試み。

ParallelArrayを採用している。https://github.com/RiverTrail/RiverTrail/wiki

https://github.com/RiverTrail/RiverTrail/wiki




Intel RiverTrail https://github.com/RiverTrail/RiverTrail/wiki

The goal of Intel Lab’s River Trail project is to enable data-parallelism in web applications.

River Trail gently extends JavaScript with simple deterministic data-parallel constructs that are translated at runtime into a low-level hardware abstraction layer.

By leveraging multiple CPU cores and vector instructions, River Trail achieves significant speedup over sequential JavaScript.

ParallelArray

ParallelArray();

ParallelArray(size, elementalFunction, arg1, ..., argN);

ParallelArray(anArray);

ParallelArray(constructor, anArray);

ParallelArray(element0, element1, ..., elementN);

ParallelArray(canvas);

pa1 = new ParallelArray([ [0,1], [2,3], [4,5] ]); // <<0,1>, <2,3>, <4.5>>

pa2 = new ParallelArray(pa1); // <<0,1>, <2,3>, <4.5>>

new ParallelArray(<0,1>, <2,3>); // <<0,1>,<2,3>>

new ParallelArray([ [0,1],[2] ]) // <<0,1>, <2>>

new ParallelArray([<0,1>,<2>]); // <<0,1>, <2>>

new ParallelArray(3,

function(i){return [i, i+1];}); // <<0,1><1,2><2,3>>

new ParallelArray([3,2],

function(iv){return iv[0]*iv[1];}); // <<0,0><0,1><0,2>>

new ParallelArray(canvas); // CanvasPixelArray

Parallel Methods

map

combine

reduce

scan

scatter

filter

flatten

partition

get

Map

myArray.map(elementalFunction, arg1, arg2, ...)

Return A freshly minted ParallelArray

Example: an identity function pa.map(function(val){return val;})

Filter

myArray.filter(elementalFunction, arg1, arg2, ...)

Returns A freshly minted ParallelArray holding source elements where the results of applying the elemental function is true.

Example pa.filter(function(){return true;})

Reduce

myArray.reduce(elementalFunction) myArray.reduce(elementalFunction, arg1, arg2, ...)

Returns The result of the reducing a and b, typically used in further applications of the elemental function.

Reduce is free to group calls to the elemental function in arbitrary ways and order the calls arbitrarily. If the elemental function is associative then the final result will be the same regardless of the ordering.

Flatten

myArray.flatten()

Returns A freshly minted ParallelArray whose outermost two dimensions have been collapsed into one.

Example

pa = new ParallelArray([[1,2][3,4]]) // <<1,2>,<3,4>> pa.flatten() // <1,2,3,4>

Partition

myArray.partition(size)

size

the size of each element of the newly created dimension; the outermost dimension of myArray needs to be divisible by size

Return A freshly minted ParallelArray where the outermost dimension has been partitioned into elements of size size.

Example pa = new ParallelArray([1,23,4]) // <1,2,3,4>

pa.partition(2) // <<1,2>,<3,4>>

ネットワーク上の分散環境をめぐる動き

Scale-outとStateless Server

WebSocket

SPDY

Scale-outとStateless Server

Multi-tier Web ApplicationのScale-out

Java EE6：StatelessSessionBean+Servlet

Java EE6：RESTful Web Service

Play2.0：RoutesファイルとAction

Web Appli Multi-tier

Web Server Business Logic Data Base

Web Appli Multi-tier のScale-out

Load Balancer

Web Server Business Logic

Data Base

Scale-out

Scale-out

・・・・・・・

Web Appli Multi-tier のScale-out

Load Balancer


Data Base

・・・・・・・

Web Appli Multi-tier のAvailability

Load Balancer


Data Base

Crash!!

Crash!!

・・・・・・・

Web Appli Multi-tier のAvailability

Load Balancer


Data Base

New Instance

New Instance

・・・・・・・

Web Server/HTTPは、Stateless

Load Balancer


Data Base

・・・・・・・

Business Logic層は、Stateful？

Load Balancer


Data Base

Stateless?

・・・・・・・

Application Server全体をStatelessに

Load Balancer


Data Base

Databaseが Appliのstate を担う。

・・・・・・・

Application Server全体をStatelessに

Load Balancer


Data Base

Databaseが Appliのstate を担う。

Sessionをまたぐ Sticky Session?

Java EE6 StatelessSessionBean+Servlet

@Stateless /** * Contains methods to create and query data */ public class StatelessSessionBean { @PersistenceContext private EntityManager em; public void createData(ServletOutputStream outputStream) {…} private void createOrder(int orderNumber) {…} public void queryData(ServletOutputStream outputStream) throws IOException {…} private void queryForOrderContainingItem(String itemName, ServletOutputStream outputStream) throws IOException {…} private void queryDataForOrder(int orderId, ServletOutputStream outputStream) throws IOException {…} … …

@WebServlet(name="TestServlet", urlPatterns={"/test/*"}) public class TestServlet extends HttpServlet { @EJB private StatelessSessionBean testEJB; protected void processRequest(…){…} protected void doGet(…){…} protected void doPost(…){…} public String getServletInfo() {…} … …

Java EE6 RESTful Web Service

@Stateless public class MessageBoardResourceBean { @Context private UriInfo ui; @EJB MessageHolderSingletonBean singleton; @GET public List<Message> getMessages() { return singleton.getMessages(); } @POST public Response addMessage(String msg) throws URISyntaxException { Message m = singleton.addMessage(msg); URI msgURI = ui.getRequestUriBuilder(). path(Integer.toString(m.getUniqueId())).build(); return Response.created(msgURI).build(); }

@Path("{msgNum}") @GET public Message getMessage(@PathParam("msgNum") int msgNum) throws NotFoundException { Message m = singleton.getMessage(msgNum); if(m == null) throw new NotFoundException(); return m; } @Path("{msgNum}") @DELETE public void deleteMessage(@PathParam("msgNum") int msgNum) throws NotFoundException { boolean deleted = singleton.deleteMessage(msgNum); if(!deleted) throw new NotFoundException(); } }

Play2.0 RoutesファイルとAction

RESTful アーキテクチャー

Webアプリケーションは、HTTPのRequestを受けて、Responseを返すものである。

ServletやStrutsは、HTTPのJavaレベルでのある抽象的な見方を与えているのだが、Webアプリケーションのフレームワークは、HTTPとそのコンセプトへの、完全でより直接のアクセスを可能にすべきである。

Template Engineを使えば、Servletは、必要ではない。

“Share-Nothing” Stateless アーキテクチャー

JavaのWebフレームワークの一部は、状態を持っている。

こうしたアプローチは、ページの状態を自動的に記憶するには役に立つ。同時に、「バックボタン」の処理等で面倒な問題も抱え込む。

Playは、PHP,Ruby on Railsと同様に、状態を持たない“Share-Nothing”アーキテクチャーを採用する。

# Routes # This file defines all application routes (Higher priority routes first) # ~~~~ # The home page GET / controllers.Projects.index # Authentication GET /login controllers.Application.login POST /login controllers.Application.authenticate GET /logout controllers.Application.logout # Projects POST /projects controllers.Projects.add POST /projects/groups controllers.Projects.addGroup() DELETE /projects/groups controllers.Projects.deleteGroup(group: String) PUT /projects/groups controllers.Projects.renameGroup(group: String) DELETE /projects/:project controllers.Projects.delete(project: Long) PUT /projects/:project controllers.Projects.rename(project: Long)

POST /projects/:project/team controllers.Projects.addUser(project: Long) DELETE /projects/:project/team controllers.Projects.removeUser(project: Long) # Tasks GET /projects/:project/tasks controllers.Tasks.index(project: Long) POST /projects/:project/tasks controllers.Tasks.add(project: Long, folder: String) PUT /tasks/:task controllers.Tasks.update(task: Long) DELETE /tasks/:task controllers.Tasks.delete(task: Long) POST /tasks/folder controllers.Tasks.addFolder DELETE /projects/:project/tasks/folder controllers.Tasks.deleteFolder(project: Long, folder: String) PUT /project/:project/tasks/folder controllers.Tasks.renameFolder(project: Long, folder: String) # Javascript routing GET /assets/javascripts/routes controllers.Application.javascriptRoutes # Map static resources from the /public folder to the /public path GET /assets/*file controllers.Assets.at(path="/public", file)

Controller Actionの記述 app/controllers/Application.java

app/controllers/以下のJava/Scalaファイルは、routesファイルで、HTTP Requestに対応づけられたActionを定義する。

package controllers; import play.*; import play.mvc.*; import views.html.*; public class Application extends Controller { public static Result index() { return ok(index.render("Hello World!")); } }

WebSocket Truly Web Competitive ?

http://www.infoq.com/presentations/WebSockets-The-Web-Communication-Revolution

Hack the Web for Real-Time

Ajax applications use various ―hacks‖ to simulate real-time communication

Polling -HTTP requests at regular intervals and immediately receives a response

Long Polling -HTTP request is kept open by the server for a set period

Streaming -More efficient, but not complex to implement and unreliable

Excessive HTTP header traffic, significant overhead to each request response

HTTP Characteristics

HTTP is designed for document transfer

Resource addressing

Request / Response interaction

Caching

HTTP is bidirectional, but half-duplex

Traffic flows in only one direction at a time

HTTP is stateless

Header information is resent for each request

Traditional vs Web

Traditional Computing

Full-duplex bidirectional TCP sockets

Access any server on the network

Web Computing

Half-duplex HTTP request-response

HTTP polling, long polling fraught with problems

Lots of latency, lots of bandwidth, lots of server-side resources

Bespoke solutions became very complex over time

HTML5 WebSocket

WebSocketsprovide an improved Web Commsfabric

Consists of W3C API and IETF Protocol

Provides a full-duplex, single socket over the Web

Traverses firewalls, proxies, and routers seamlessly

Leverages Cross-Origin Resource Sharing

Share port with existing HTTP content

Can be secured with TLS (much like HTTPS)

HTTP Is Not Full Duplex

Half-Duplex Web Architecture

WebSocketで、Webが Half DuplexからFull Duplexに

The Legacy Web Stack

Designed to serve static documents

HTTP

Half duplex communication

High latency

Bandwidth intensive

HTTP header traffic approx. 800 to 2000 bytes overhead per request/response

Complex architecture

Not changed since the 90’s

Plug-ins

Polling / long polling

Legacy application servers

Expensive to ―Webscale‖ applications

WebSocket Handshake Client Request

必須

GET /chat HTTP/1.1

HOST: server.example.com

Upgrade: websocket

Connection: Upgrade

オプション

Sec-Websocket-Key: 16-byte nonce, BASE64 encoded

Sec-Websocket-Version: 6

Sec-Websocket-Origin: http://example.com

Sec-Websocket-Protocol: protocol [, protokol]*

Sec-Websocket-Extension: extension [, extension]

Cookie: Cookie content & other cookie related headers

http://example.com/

WebSocket Handshake Server Responce

必須

HTTP/1.1 101 “Switching Protocols” or other descriptions

Upgrade: websocket

Connection: Upgrade

Sec-Websocket-Accept: 20-bytes MDS hash in Base64

オプション

Sec-Websocket-Protocol: protocol

Sec-Websocket-Extension: extention [,extension]*

JavaScript How do I use: WebSocket API

//Create new WebSocket

var mySocket = new WebSocket("ws://www.WebSocket.org");

// Associate listeners

mySocket.onopen = function(evt) {

alert("Connectionopen…");

};

mySocket.onmessage = function(evt) {

alert("Receivedmessage: " + evt.data);

};

JavaScript How do I use: WebSocket API

mySocket.onclose = function(evt) {

alert("Connectionclosed…");

};

// Sending data

mySocket.send("WebSocket Rocks!");

// Close WebSocket

mySocket.close();

WebSocket Frames

Frameshave a fewheaderbytes

Data may be text or binary

Frames from client to server are masked (XORed w/ random value) to avoid confusing proxies

HTTP Header Traffic Analysis

Example network throughput for HTTP request and response headers associated with polling

Use case A: 1,000 clients polling every second:

Network throughput is (871 x 1,000) = 871,000 bytes = 6,968,000 bits per second (~6.6 Mbps)

Use case B: 10,000 clients polling every second:

Network throughput is (871 x 10,000) = 8,710,000 bytes = 69,680,000 bits per second (~66 Mbps)

Use case C: 100,000 clients polling every second:

Network throughput is (871 x 100,000) = 87,100,000 bytes = 696,800,000 bits per second (~665 Mbps)

Reduction in Network Traffic

With WebSocket, each frame has only several bytes of packaging (a 500:1 or even 1000:1 reduction)

No latency involved in establishing new TCP connections for each HTTP message

Dramatic reduction in unnecessary network traffic and latency

Remember the Polling HTTP header traffic? 665 Mbps network throughput for just headers

HTTP versus WebSockets

Example: Entering a character in a search field with auto suggestion

WebSockets reduces bandwidth overhead up to 1000x

HTTP Traffic WebSocket Traffic

Google 788 + 1 byte 2 + 1 byte

Yahoo 1737 + 1 byte 2 + 1 byte

Polling vs. Web Sockets

“Reducing kilobytes of data to 2 bytes…and reducing latency from 150ms to 50ms is far more than marginal. In fact, these two factors alone are enough to make WebSocket seriously interesting to Google.”

—Ian Hickson (Google, HTML5 spec lead)

SPDY: An experimental protocol for a faster web

http://www.chromium.org/spdy/spdy-whitepaper





Let's make the web faster

As part of the "Let's make the web faster" initiative, we are experimenting with alternative protocols to help reduce the latency of web pages. One of these experiments is SPDY (pronounced "SPeeDY"), an application-layer protocol for transporting content over the web, designed specifically for minimal latency.

In lab tests, we have compared the performance of these applications over HTTP and SPDY, and have observed up to 64% reductions in page load times in SPDY.

Background: web protocols and web latency

Unfortunately, HTTP was not particularly designed for latency. Furthermore, the web pages transmitted today are significantly different from web pages 10 years ago and demand improvements to HTTP that could not have been anticipated when HTTP was developed.

Single request per connection.

Exclusively client-initiated requests.

Uncompressed request and response headers.

Redundant header

Optional data compression.

Goals for SPDY

To target a 50% reduction in page load time.

To minimize deployment complexity.

To avoid the need for any changes to content by website authors.

To bring together like-minded parties interested in exploring protocols as a way of solving the latency problem.

Some specific technical goals

To allow many concurrent HTTP requests to run across a single TCP session.

To define a protocol that is easy to implement and server-efficient.

To make SSL the underlying transport protocol, for better security and compatibility with existing network infrastructure.

To enable the server to initiate communications with the client and push data to the client whenever possible.

SPDY design and features

SPDY adds a session layer atop of SSL that allows for multiple concurrent, interleaved streams over a single TCP connection.

The usual HTTP GET and POST message formats remain the same; however, SPDY specifies a new framing format for encoding and transmitting the data over the wire.

Streams are bi-directional i.e. can be initiated by the client and server.

Basic features

Multiplexed streams

SPDY allows for unlimited concurrent streams over a single TCP connection. Because requests are interleaved on a single channel, the efficiency of TCP is much higher: fewer network connections need to be made, and fewer, but more densely packed, packets are issued.

Request prioritization

SPDY implements request priorities: the client can request as many items as it wants from the server, and assign a priority to each request.

HTTP header compression

SPDY compresses request and response HTTP headers, resulting in fewer packets and fewer bytes transmitted.

Advanced features

Server push

SPDY experiments with an option for servers to push data to clients via the X-Associated-Content header. This header informs the client that the server is pushing a resource to the client before the client has asked for it. For initial-page downloads (e.g. the first time a user visits a site), this can vastly enhance the user experience.

Server hint

Rather than automatically pushing resources to the client, the server uses the X-Subresources header to suggest to the client that it should ask for specific resources, in cases where the server knows in advance of the client that those resources will be needed.

非同期プログラミングの手法

Java Future

.NET Async

Scala Future, Promise

Akka Future, Promise

JMS 2.0

Java Future

http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/Future.html

Since Java SE5





public interface Future<V>

A Future represents the result of an asynchronous computation. Methods are provided to check if the computation is complete, to wait for its completion, and to retrieve the result of the computation. The result can only be retrieved using method get when the computation has completed, blocking if necessary until it is ready. Cancellation is performed by the cancel method. Additional methods are provided to determine if the task completed normally or was cancelled. Once a computation has completed, the computation cannot be cancelled. If you would like to use a Future for the sake of cancellability but not provide a usable result, you can declare types of the form Future<?> and return null as a result of the underlying task.

Future Sample

interface ArchiveSearcher { String search(String target); }

class App {

ExecutorService executor = ...

ArchiveSearcher searcher = ...

void showSearch(final String target)

throws InterruptedException {

Future<String> future

= executor.submit(new Callable<String>() {

public String call() {

return searcher.search(target);

}});

displayOtherThings(); // do other things while searching

try {

displayText(future.get()); // use future

} catch (ExecutionException ex) { cleanup(); return; }

FutureTask

FutureTask<String> future =

new FutureTask<String>(new

Callable<String>() {

public String call() {

return searcher.search(target);

}});

executor.execute(future);

.NET Async

http://media.ch9.ms/teched/na/2011/ppt/DEV324.pptx

http://lunarfrog.com/blog/2012/01/23/simplicity-of-async-and-await/
















var data = DownloadData(...); ProcessData(data);

DownloadDataAsync(... , data => { ProcessData(data); });

DoWorkAsync

async void DoWorkAsync() {

var t1 = ProcessFeedAsync("www.acme.com/rss");

var t2 = ProcessFeedAsync("www.xyznews.com/rss");

await Task.WhenAll(t1, t2);

DisplayMessage("Done");

}

async Task ProcessFeedAsync(string url) {

var text = await DownloadFeedAsync(url);

var doc = ParseFeedIntoDoc(text);

await SaveDocAsync(doc);

ProcessLog.WriteEntry(url);

}

WriteFileAsync

async public Task void WriteFileAsync(string filename, string contents)

{

var localFolder =

Windows.Storage.ApplicationData.Current.LocalFolder;

var file = await localFolder.CreateFileAsync(filename,

Windows.Storage.CreationCollisionOption.ReplaceExisting);

var fs = await file.OpenAsync(

Windows.Storage.FileAccessMode.ReadWrite);

//...

}

await WriteFileAsync("FileName", "Some Text");

GetRssAsync

async Task <XElement> GetRssAsync(string url) {

var client = new WebClient();

var task = client.DownloadStringTaskAsync(url);

var text = await task;

var xml = XElement.Parse(text);

return xml;

}

Youtubeを分割してDownload

try {

// Network-bound

string[] videoUrls = await ScrapeYoutubeAsync(url);

// Start two downloads

Task<Video> t1 = DownloadVideoAsync(videoUrls[0]);

Task<Video> t2 = DownloadVideoAsync(videoUrls[1]);

// Wait for both

Video[] vids = await Task.WhenAll(t1, t2);

// CPU-bound

Video v = await MashupVideosAsync(vids[0], vids[1]);

// IO-bound

await v.SaveAsync(textbox.Text);

}

catch (WebException ex) { ReportError(ex);

}

Scala Future, Promise

http://docs.scala-lang.org/sips/pending/futures-promises.html







Futures

A future is an abstraction which represents a value which may become available at some point.

A Future object either holds a result of a computation or an exception in the case that the computation failed.

An important property of a future is that it is in effect immutable– it can never be written to or failed by the holder of the Future object.

val f: Future[List[String]] = future {

session.getRecentPosts

}

f onFailure {

case t => render("An error has occured: " +

t.getMessage)

} onSuccess {

case posts => for (post <- posts) render(post)

Callbacks

Registering an onComplete callback on the future ensures that the corresponding closure is invoked after the future is completed.

Registering an onSuccess or onFailure callback has the same semantics, with the difference that the closure is only called if the future is completed successfully or fails, respectively.

Registering a callback on the future which is already completed will result in the callback being executed eventually (as implied by

). Furthermore, the callback may even be executed synchronously on the same thread.

Callbacks

In the event that multiple callbacks are registered on the future, the order in which they are executed is not defined. In fact, the callbacks may be executed concurrently with one another. However, a particular Future implementation may have a well-defined order.

In the event that some of the callbacks throw an exception, the other callbacks are executed irregardlessly.

In the event that some of the callbacks never complete (e.g. the callback contains an infinite loop), the other callbacks may not be executed at all.

Functional Composition

val rateQuote = future {

connection.getCurrentValue(USD)

}

rateQuote onSuccess { case quote =>

val purchase = future {

if (isProfitable(quote)) connection.buy(amount, quote)

else throw new Exception("not profitable")

}

purchase onSuccess {

case _ => println("Purchased " + amount + " USD")

}

}

For-Comprehensions

val usdQuote = future { connection.getCurrentValue(USD) }

val chfQuote = future { connection.getCurrentValue(CHF) }

val purchase = for {

usd <- usdQuote

chf <- chfQuote

if isProfitable(usd, chf)

} yield connection.buy(amount, chf)

purchase onSuccess {

case _ => println("Purchased " + amount + " CHF")

}

Promises

While futures are defined as a type of read-only placeholder object created for a result which doesn’t yet exist, a promise can be thought of as a writeable, single-assignment container, which completes a future.

That is, a promise can be used to successfully complete a future with a value (by “completing” the promise) using the success method. Conversely, a promise can also be used to complete a future with an exception, by failing the promise, using the failure method.

import scala.concurrent.{ future, promise }

val p = promise[T]

val f = p.future

val producer = future {

val r = produceSomething()

p success r

continueDoingSomethingUnrelated()

}

val consumer = future {

startDoingSomething()

f onSuccess {

case r => doSomethingWithResult()

}

}

Akka Future, Promise

http://akka.io/docs/akka/2.0-M2/scala/futures.html





import akka.dispatch.Await

implicit val timeout = system.settings.ActorTimeout

val future = actor ? msg

val result = Await.result(future, timeout.duration).

asInstanceOf[String]

import akka.dispatch.Future

val future: Future[String] = (actor ? msg).mapTo[String]

import akka.dispatch.Await

import akka.dispatch.Future

import akka.util.duration._

val future = Future {

"Hello" + "World"

}

val result = Await.result(future, 1 second)

Composition

val f1 = Future {

"Hello" + "World"

}

val f2 = Promise.successful(3)

val f3 = f1 flatMap { x ⇒

f2 map { y ⇒

x.length * y

}

}

val result = Await.result(f3, 1 second)

result must be(30)

For Complehension

val f = for {

a ← Future(10 / 2) // 10 / 2 = 5

b ← Future(a + 1) // 5 + 1 = 6

c ← Future(a - 1) // 5 - 1 = 4

} yield b * c // 6 * 4 = 24

// Note that the execution of futures a, b, and c

// are not done in parallel.

val result = Await.result(f, 1 second)

result must be(24)

val f1 = actor1 ? msg1

val f2 = actor2 ? msg2

val a = Await.result(f1, 1 second).asInstanceOf[Int]

val b = Await.result(f2, 1 second).asInstanceOf[Int]

val f3 = actor3 ? (a + b)

val result = Await.result(f3, 1 second).asInstanceOf[Int]

// Create a sequence of Futures

val futures = for (i ← 1 to 1000) yield Future(i * 2)

val futureSum = Future.fold(futures)(0)(_ + _)

Await.result(futureSum, 1 second) must be(1001000)

// Create a sequence of Futures

val futures = for (i ← 1 to 1000) yield Future(i * 2)

val futureSum = Future.reduce(futures)(_ + _)

Await.result(futureSum, 1 second) must be(1001000)

Beyond Mere Actors

http://www.slideshare.net/bostonscala/beyond-mere-actors









On Time-Travel

Promised values are available in the future.

What does it mean to get a value out of the future? Time-travel into the future is easy. Just wait. But we don't have to go into the future. We can give our future-selves instructions.

Instead of getting values out of the future, we send computations into the future.

JMS 2.0

Last maintenance release (1.1) was in 2003

March 2011: JSR 343 launched to

develop JMS 2.0

Initial goals of JMS 2.0

Simpler and easier to use

simplify the API

make use of CDI (Contexts and Dependency Injection)

clarify any ambiguities in the spec

Support new themes of Java EE 7

PaaS

Multi-tenancy

Initial goals of JMS 2.0

Standardise interface with application servers

Clarify relationship with other Java EE specs

some JMS behaviour defined in other specs

New messaging features

standardize some existing vendor extensions (or will retrospective standardisation be difficult?)

Simplifying the JMS API Receiving messages in Java EE

@MessageDriven(mappedName = "jms/inboundQueue")

public class MyMDB implements MessageListener {

public void onMessage(Message message) {

String payload = (TextMessage)textMessage.getText();

// do something with payload

}

}

Sending messages in Java EE

@Resource(lookup = "jms/connFactory")

ConnectionFactory cf;

@Resource(lookup="jms/inboundQueue")

Destination dest;

public void sendMessage (String payload) throws JMSException {

Connection conn = cf.createConnection();

Session sess =

conn.createSession(false,Session.AUTO_ACKNOWLEDGE);

MessageProducer producer = sess.createProducer(dest);

TextMessage textMessage = sess.createTextMessage(payload);

messageProducer.send(textMessage);

connection.close();

}

Possible new API

@Resource(mappedName="jms/contextFactory")

ContextFactory contextFactory;

@Resource(mappedName="jms/orderQueue")

Queue orderQueue;

public void sendMessage(String payload) {

try (MessagingContext mCtx =

contextFactory.createContext();){

TextMessage textMessage =

mCtx.createTextMessage(payload);

mCtx.send(orderQueue,textMessage);

}

}

Annotations for the new API

@Resource(mappedName="jms/orderQueue")

Queue orderQueue;

@Inject

@MessagingContext(lookup="jms/contextFactory")

MessagingContext mCtx;

@Inject

TextMessage textMessage;

public void sendMessage(String payload) {

textMessage.setText(payload);

mCtx.send(orderQueue,textMessage);

}

Annotations for the old API

@Inject

@JMSConnection(lookup="jms/connFactory")

@JMSDestination(lookup="jms/inboundQueue")

MessageProducer producer;

@Inject

TextMessage textMessage;

public void sendMessage (String payload){

try {

textMessage.setText(payload);

producer.send(textMessage);

} catch {JMSException e}

// do something

}

}

Send a message with async acknowledgement from server

Send a message and return immediately without blocking until an acknowledgement has been received from the server.

Instead, when the acknowledgement is received, an asynchronous callback will be invoked

Why? Allows thread to do other work whilst waiting for the acknowledgement

producer.send(message, new AcknowledgeListener(){

public void onAcknowledge(Message message) {

// process ack

}

});

Topic hierarchies

Topics can be arranged in a hierarchy STOCK.NASDAQ.TECH.ORCL

STOCK.NASDAQ.TECH.GOOG

STOCK.NASDAQ.TECH.ADBE

STOCK.NYSE.TECH.HPQ

Consumers can subscribe using wildcards STOCK.*.TECH.*

STOCK.NASDAQ.TECH.*

Most vendors support this already

Details TBD

Multiple consumers on a topic subscription

Allows scalable consumption of messages from a topic subscription

multiple threads

multiple JVMs

No further change to API for durable subscriptions (clientID not used)

New API for non-durable subscriptions

Why? Scalability

Why? Allows greater scalability

MessageConsumer messageConsumer= session.createSharedConsumer(

topic,sharedSubscriptionName);

Batch delivery

Will allow messages to be delivered asynchronously in batches

New method on MessageConsumer

New listener interface BatchMessageListener

Acks also sent in a batch

Why? May be more efficient for JMS provider or application

void setBatchMessageListener(

BatchMessageListener listener,

int batchSize,

long batchTimeOut)

エンタープライズ・クラウドと 並列・分散・非同期処理

Technology

エンタープライズ・クラウドと並列・分散・非同期処理