consistency oblivious programming

Post on 12-Jan-2016

55 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Consistency Oblivious Programming. Hillel Avni Tel Aviv University. Agenda. Transactional Memory and Locking Consistency Oblivious Programming (COP) COP with STM COP With HTM Future Work. 2. Global Lock. Easy to use Composable - Concatenate critical sections Not scalable. 3. - PowerPoint PPT Presentation

TRANSCRIPT

Consistency Oblivious Programming

Hillel AvniTel Aviv University

Agenda Transactional Memory and Locking

Consistency Oblivious Programming (COP)

COP with STM

COP With HTM

Future Work

2

Global Lock

Easy to use

Composable - Concatenate critical sections

Not scalable

3

Fine Grain Locking

Hard to use

Not Composable

Scalable

Lazy linked list is a good example…

4

Lazy Traversal

b d ea

add(c) Aha!

5

Lock and Validate

b d ea

add(c) Yes, b still points to d

6

Perform Updates and Release Locks

b d ea

add(c)

c

7

Transactional Memory

Easy to use

Composable

Scalable

How is it done?

8

9

Java (Duece)bool CAS(int location, int expected, int new val){ atomic { if (location != expected) return false; location = new val; } return true;}

10

bool CAS(int location, int expected, int new val){ __transaction_atomic { if (location != expected) return false; location = new val; } return true;}

C/C++ (GCC-4.7)

1111

Software Transactional Memory

Different algorithms are used. Different algorithms are used.

consistency checkingconsistency checking

rollbackrollback

Compiler recognizes shared accesses.

Compiler recognizes shared accesses.

STM Problem - Overheadtemplate <typename V> static V load(const V* addr, ls_modifier mod)

{

if (unlikely(mod == RfW))

{

pre_write(addr, sizeof(V));

return *addr;

}

if (unlikely(mod == RaW))

return *addr;

gtm_thread *tx = gtm_thr();

gtm_rwlog_entry* log = pre_load(tx, addr, sizeof(V));

V v = *addr;

atomic_thread_fence(memory_order_acquire);

post_load(tx, log);

return v;

}

load function from GCC 4.8.1load function from GCC 4.8.1

12

STM Problem - Overhead static gtm_rwlog_entry* pre_load(gtm_thread *tx, const void* addr, size_t len)

{

size_t log_start = tx->readlog.size();

gtm_word snapshot = tx->shared_state.load(memory_order_relaxed);

gtm_word locked_by_tx = ml_mg::set_locked(tx);

size_t orec = ml_mg::get_orec(addr);

size_t orec_end = ml_mg::get_orec_end(addr, len);

do

{

gtm_word o = o_ml_mg.orecs[orec].load(memory_order_acquire);

if (likely (!ml_mg::is_more_recent_or_locked(o, snapshot))) {

success:

gtm_rwlog_entry *e = tx->readlog.push();

e->orec = o_ml_mg.orecs + orec; e->value = o;

}

else if (!ml_mg::is_locked(o)) {snapshot = extend(tx); goto success; } else {

if (o != locked_by_tx)

tx->restart(RESTART_LOCKED_READ);}

orec = o_ml_mg.get_next_orec(orec); }

while (orec != orec_end);

return &tx->readlog[log_start];

}

load always call pre_loadload always call pre_load

13

STM Problem - Overhead

static void post_load(gtm_thread *tx, gtm_rwlog_entry* log)

{

for (gtm_rwlog_entry *end = tx->readlog.end(); log != end; log++)

{

gtm_word o = log->orec->load(memory_order_relaxed);

if (log->value != o)

tx->restart(RESTART_VALIDATE_READ);

}

} and post_loadand post_load

Compare to mov eax, [ebx]on x86

Compare to mov eax, [ebx]on x86

14

1515

Hardware Transactional Memory

Exploit native cache coherenceExploit native cache coherence

consistency checkingconsistency checking

rollbackrollback

1616

HTM Problem – Resources

limitslimits

cache size limits data footprintcache size limits data footprint

A transaction cannot commit if it isA transaction cannot commit if it is

too bigtoo big

too slowtoo slow

quantum size limits durationquantum size limits duration

1717

All TM Problem – False Conflicts

Any address that was encountered during the transaction is monitored until the endof that transaction.

An address may abort a transaction long After it is not relevant…

Any address that was encountered during the transaction is monitored until the endof that transaction.

An address may abort a transaction long After it is not relevant…

Agenda Transactional Memory and Locking

Consistency Oblivious Programming (COP)

COP with STM

COP With HTM

Future Work

18

COP Operation

• In non transactional mode:– Execute the read-only prefix of the

operation and record its output.

• In transactional mode:– Verify output is correct.– Perform updates.

19

COP Example – RB Tree

20

3010

27 40

2528

20

Add 26 – Tree Unbalanced

20

3010

40

TM Search 26TM Search 26

27

2528

2621

Tree Balanced

27

3020

2510

2840

26

TM Search continues from 27TM Search continues from 27

Conflict and AbortConflict and Abort

22

Add 26 – Tree Unbalanced

20

3010

40

COP Search 26COP Search 26

27

2528

2623

Tree Balanced

27

3020

2510

2840

26

TM Search continues from 27TM Search continues from 27

FoundFound

24

COP RB-Tree VerifyTo facilitate verification:

• all nodes in the RB-Tree are connected in a successor-predecessor doubly linked list, and each node has a live mark.

• Search returns a node n with k or a leaf with k’s successor or predecessor.

25

COP RB-Tree Suffix• Resume a transaction

• Verify:– k found and n is live – done.– K not found, check:

• (n.k>k>n.pred.k && !n.right) or (n.k<k<n.succ.k && !n.left)

• If verification failed – abort the transaction.

• Complete updates, add / remove / rebalance, using n.

26

COP Template for opstart-transaction

any-code

suspend-transaction

output = op-rop();

resume-transaction

If(not(op-verify(output)))

abort-transaction

op-complete(output)

any-code

end-transaction

27

COP CorrectnessThe underlying TM:• Transactional Regular Registers

The COP algorithm:• Obliviousness• Verifiability• Separation

We prove that if the TM yields transactional regular registers, and the COP algorithm demonstrates obliviousness, verifiability, and separation, than the COP operation is linearizeable.

28

Agenda Transactional Memory and Locking

Consistency Oblivious Programming (COP)

COP with STM

COP With HTM

Future Work

29

STM Algorithm• GCC default STM algorithm is the one that proved to

be the most efficient and scalable in most scenarios:– Write Through (WT)– Encounter Time Locking (ETL)– Multi Lock (ML)

30

STM: WT – ETL - ML

1. RV Shared Version Clock2. On Read: check unlocked and

v# <= RV then add to read-Set3. On write: check v# <= RV, lock,

and add to undo-Set4. WV = F&I(VClock)5. Validate that in the read-set

each v# <= RV6. Release locks with v# WV

100 Shared Version Clock

87 0 87 0

34 0

88 0

44 0

V# 0

34 0

99 0 99 0

50 0 50 0

Mem Locks

87 0

34 0

99 0

50 0

34 1

99 1

87 0

X

Y

Commit

121 0

121 0

50 0

87 0

121 0

88 0

V# 0

44 0

V# 0

121 0

50 0

100 RV

100120121

X

Y

31

GCC Constructs__transaction_atomic{}: Mark the transaction.

__transaction_cancel: Explicit abort.

__attribute__((transaction_safe)): Instrument the code.

__attribute__((transaction_pure)):

Do not instrument the code. We will show this attribute can be used efficiently as __transaction_suspend with WT – ETL – ML default STM algorithm in GCC.

32

pure = suspend • Transactional Regular Registers – All values upto

one architecture-word size are written and read atomically. The rollback may use memcpy, but the memcpy is optimized to write maximal alignment.

• Now we will compare the future Power architecture HTM suspended mode, to transaction_pure with WT-ETL-ML STM algorithm.

33

Power tsuspend - tresume1. Until failure occurs, load instructions that access

memory locations that were transactionally written by the same thread will return the transactionally written data.

2. In the event of transaction failure, failure recording is performed, but failure handling is deferred until transactional execution is resumed.

3. The initiation of a new transaction is prevented.

4. Store instructions that access memory locations that have been accessed transactionally (due to load or store) by the same thread will cause the transaction to fail.

34

RB – 1M sz – 20%U - 10 op/tx

35

RB – 1K sz – 8 Threads – 20% U

36

Agenda Transactional Memory and Locking

Consistency Oblivious Programming (COP)

COP with STM

COP With HTM

Future Work

37

Haswell HTM with COPThere is no suspend mode, so to compose COP

operations, we execute all ROP before the transaction. This limits the composition to one writing COP operation in a transaction at most.

38

Capacity and Cache AssociativityPacked Memory Array (PMA) search is done by divide

and conquer. Assume a PMA size is 0x800000, and it starts at address 0. A searches for an item that is found in address 0x0…0x7FFF, must go through the addresses:

0x400000 0x20000 0x100000 0x80000

0x40000 0x20000 0x10000 0x8000

As cache size in Haswell is 0x8000, all these addresses have the same cache index (0), and will always abort.

39

PMA

40

RB-Tree Capacity Aborts

41

RB-Tree Conflict Aborts

42

Agenda Transactional Memory and Locking

Consistency Oblivious Programming (COP)

COP with STM

COP With HTM

Future Work

43

Data StructuresWe already have COP versions of:• RB-Tree• Linked list• PMA• Cache Oblivious B-Tree• Leaplist (k-ary skip list, tailored for range queries)

Can we design more COP data structures?

44

ApplicationsUse COP in applications.

Many applications use shared data structures, so it is interesting to see the impact of COP on their performance.

45

InfrastructureAdd statistics (transactional accesses, conflicts) to GCC.

Add real suspend-mode to GCC, hardware.

46

TheoryHow to make transformation to COP automatic?

Is COP applicable outside the data-structures area?

Bounds on the amount of transactional accesses?

Bounds on the amount of false conflicts?

47

Thank You

top related