week 11 - lecture 21 · compare order to previous slide. as soon as a data satisfies the condition...

42
BMMB 597D - Practical Data Analysis for Life Scientists Week 11 - Lecture 21 István Albert BMB, Bioinformatics

Upload: others

Post on 07-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Week 11 - Lecture 21 · Compare order to previous slide. As soon as a data satisfies the condition it will passed to the next function. Slicing iterators. Slicing a stream (open file)

BMMB 597D - Practical Data Analysis for Life Scientists

Week 11 - Lecture 21

István Albert

BMB, Bioinformatics

Page 2: Week 11 - Lecture 21 · Compare order to previous slide. As soon as a data satisfies the condition it will passed to the next function. Slicing iterators. Slicing a stream (open file)

Infinite datasets

• Large dataset the meaning of the word LARGE changes in time

• What is the limiting factor? The data processing or data representation?

• Infinite datasets stream that gets processed

Page 3: Week 11 - Lecture 21 · Compare order to previous slide. As soon as a data satisfies the condition it will passed to the next function. Slicing iterators. Slicing a stream (open file)

Iterators

• Iterators are objects that have a method called next() that returns the next element

• We don’t usually need to call the next()method ourselves. It gets automatically called while iterating

Page 4: Week 11 - Lecture 21 · Compare order to previous slide. As soon as a data satisfies the condition it will passed to the next function. Slicing iterators. Slicing a stream (open file)

Get the iterator with the iter() function

Page 5: Week 11 - Lecture 21 · Compare order to previous slide. As soon as a data satisfies the condition it will passed to the next function. Slicing iterators. Slicing a stream (open file)

Loops use the iterators

Page 6: Week 11 - Lecture 21 · Compare order to previous slide. As soon as a data satisfies the condition it will passed to the next function. Slicing iterators. Slicing a stream (open file)

Iterators can be moved forward

Page 7: Week 11 - Lecture 21 · Compare order to previous slide. As soon as a data satisfies the condition it will passed to the next function. Slicing iterators. Slicing a stream (open file)

File streams are iterators!

Page 8: Week 11 - Lecture 21 · Compare order to previous slide. As soon as a data satisfies the condition it will passed to the next function. Slicing iterators. Slicing a stream (open file)

Advancing the file stream

Page 9: Week 11 - Lecture 21 · Compare order to previous slide. As soon as a data satisfies the condition it will passed to the next function. Slicing iterators. Slicing a stream (open file)

Iterators can run out

Page 10: Week 11 - Lecture 21 · Compare order to previous slide. As soon as a data satisfies the condition it will passed to the next function. Slicing iterators. Slicing a stream (open file)

Multiple iterators on the same data

Page 11: Week 11 - Lecture 21 · Compare order to previous slide. As soon as a data satisfies the condition it will passed to the next function. Slicing iterators. Slicing a stream (open file)

Unwinding the iterator

If the data easily fits into memory it is convenient to unwind

Page 12: Week 11 - Lecture 21 · Compare order to previous slide. As soon as a data satisfies the condition it will passed to the next function. Slicing iterators. Slicing a stream (open file)

Operations that unwind an iterator

Operations that unwind the iterator:

• List, map, filter, dictionary creation, sort

Operations that do not unwind the iterator:

• for loop, reversed, iterator versions of map and filter (next lecture)

Page 13: Week 11 - Lecture 21 · Compare order to previous slide. As soon as a data satisfies the condition it will passed to the next function. Slicing iterators. Slicing a stream (open file)

Infinite iterators

• Can be merged with finite data to produce certain types of behaviors counting, cycling, repeating, chaining results

• Don’t unwind a an infinite generator it has no end

• Many of the solutions are nicer and simpler with an iterator

Page 14: Week 11 - Lecture 21 · Compare order to previous slide. As soon as a data satisfies the condition it will passed to the next function. Slicing iterators. Slicing a stream (open file)

itertools.count

Page 15: Week 11 - Lecture 21 · Compare order to previous slide. As soon as a data satisfies the condition it will passed to the next function. Slicing iterators. Slicing a stream (open file)

itertools.repeat

Page 16: Week 11 - Lecture 21 · Compare order to previous slide. As soon as a data satisfies the condition it will passed to the next function. Slicing iterators. Slicing a stream (open file)

itertools.cycle

Page 17: Week 11 - Lecture 21 · Compare order to previous slide. As soon as a data satisfies the condition it will passed to the next function. Slicing iterators. Slicing a stream (open file)

“silent killer” – implicit infinite loop

Page 18: Week 11 - Lecture 21 · Compare order to previous slide. As soon as a data satisfies the condition it will passed to the next function. Slicing iterators. Slicing a stream (open file)

itertools.chain

Page 19: Week 11 - Lecture 21 · Compare order to previous slide. As soon as a data satisfies the condition it will passed to the next function. Slicing iterators. Slicing a stream (open file)

Iterator recap

• Use minimal memory regardless of the data size

• Can be run forward only, can’t go back.

• Getting a value moves the iterator to next element

• Single use only at the end they run out

BUT

• You may usually run two iterators on the same underlying data structure

Page 20: Week 11 - Lecture 21 · Compare order to previous slide. As soon as a data satisfies the condition it will passed to the next function. Slicing iterators. Slicing a stream (open file)

Homework 21

• Solve the problems on the next pages with iterators

• Provide either the code or the output

• If it is output then you should try to determine what the output is without running the code!

Page 21: Week 11 - Lecture 21 · Compare order to previous slide. As soon as a data satisfies the condition it will passed to the next function. Slicing iterators. Slicing a stream (open file)

Problem 1

?

Page 22: Week 11 - Lecture 21 · Compare order to previous slide. As soon as a data satisfies the condition it will passed to the next function. Slicing iterators. Slicing a stream (open file)

Problem 2

?

Page 23: Week 11 - Lecture 21 · Compare order to previous slide. As soon as a data satisfies the condition it will passed to the next function. Slicing iterators. Slicing a stream (open file)

Problem 3

?

Page 24: Week 11 - Lecture 21 · Compare order to previous slide. As soon as a data satisfies the condition it will passed to the next function. Slicing iterators. Slicing a stream (open file)

Problem 4

?

Page 25: Week 11 - Lecture 21 · Compare order to previous slide. As soon as a data satisfies the condition it will passed to the next function. Slicing iterators. Slicing a stream (open file)

Problem 5

?

Page 26: Week 11 - Lecture 21 · Compare order to previous slide. As soon as a data satisfies the condition it will passed to the next function. Slicing iterators. Slicing a stream (open file)

Problem 6

?

Page 27: Week 11 - Lecture 21 · Compare order to previous slide. As soon as a data satisfies the condition it will passed to the next function. Slicing iterators. Slicing a stream (open file)

Extra Credit.

? which iterator stops the loop?

Page 28: Week 11 - Lecture 21 · Compare order to previous slide. As soon as a data satisfies the condition it will passed to the next function. Slicing iterators. Slicing a stream (open file)

BMMB 597D - Practical Data Analysis for Life Scientists

Week 11 - Lecture 22

István Albert

BMB, Bioinformatics

Page 29: Week 11 - Lecture 21 · Compare order to previous slide. As soon as a data satisfies the condition it will passed to the next function. Slicing iterators. Slicing a stream (open file)

Compute the sum of N numbers

Uses almost 1 Gb

Page 30: Week 11 - Lecture 21 · Compare order to previous slide. As soon as a data satisfies the condition it will passed to the next function. Slicing iterators. Slicing a stream (open file)

Same result with xrange

Memory use is not noticeable

Page 31: Week 11 - Lecture 21 · Compare order to previous slide. As soon as a data satisfies the condition it will passed to the next function. Slicing iterators. Slicing a stream (open file)

range vs xrange

range(10) list( xrange(10) )

The earliest iterators that was created before the iterator formalism was developed

Unlike most iterators xrange knows how long it is len(xrange(10))

Page 32: Week 11 - Lecture 21 · Compare order to previous slide. As soon as a data satisfies the condition it will passed to the next function. Slicing iterators. Slicing a stream (open file)

Naming conventions for iterator tools

• Usually they are labeled with i* imap, ifilter, izip etc

• All iterators are imported from the itertoolsmodule

• One exception: xrange

• In Python 3.0 everything is an iterator!

Page 33: Week 11 - Lecture 21 · Compare order to previous slide. As soon as a data satisfies the condition it will passed to the next function. Slicing iterators. Slicing a stream (open file)

Reminder on map and filter

Page 34: Week 11 - Lecture 21 · Compare order to previous slide. As soon as a data satisfies the condition it will passed to the next function. Slicing iterators. Slicing a stream (open file)

Iterator imap and ifilter

Compare order to previous slide

Page 35: Week 11 - Lecture 21 · Compare order to previous slide. As soon as a data satisfies the condition it will passed to the next function. Slicing iterators. Slicing a stream (open file)

As soon as a data satisfies the condition it will passed to the next function

Page 36: Week 11 - Lecture 21 · Compare order to previous slide. As soon as a data satisfies the condition it will passed to the next function. Slicing iterators. Slicing a stream (open file)

Slicing iterators

Page 37: Week 11 - Lecture 21 · Compare order to previous slide. As soon as a data satisfies the condition it will passed to the next function. Slicing iterators. Slicing a stream (open file)

Slicing a stream (open file)

Page 38: Week 11 - Lecture 21 · Compare order to previous slide. As soon as a data satisfies the condition it will passed to the next function. Slicing iterators. Slicing a stream (open file)

Merge generators with streams

Page 39: Week 11 - Lecture 21 · Compare order to previous slide. As soon as a data satisfies the condition it will passed to the next function. Slicing iterators. Slicing a stream (open file)

Finding consecutive objects in an iterator

Page 40: Week 11 - Lecture 21 · Compare order to previous slide. As soon as a data satisfies the condition it will passed to the next function. Slicing iterators. Slicing a stream (open file)

Sort if you want all identical objects found

Page 41: Week 11 - Lecture 21 · Compare order to previous slide. As soon as a data satisfies the condition it will passed to the next function. Slicing iterators. Slicing a stream (open file)

Equivalence

map and filter are iteratorsthat are already unwound

map(something) list(imap(something))

filter(something)list(ifilter(something))

Page 42: Week 11 - Lecture 21 · Compare order to previous slide. As soon as a data satisfies the condition it will passed to the next function. Slicing iterators. Slicing a stream (open file)

Homework 22

• Run the examples from the class• Create an iterator version file reader that reads off only

the sequences from the DRR03.fastq file 2nd, 6th, 10th

… lines and writes them to a file

Possible solution:

1. Merge another iterator with your stream that labels every 4th line with True (starting with the second) for example cycle !

2. Filter on the cycled element to keep the valid lines3. Create a filter that extracts the sequence and imap-it onto the

valid element4. Write the element to a file