introduction to medical computing

49
Introduction to Medical Computing Stephen M. Watt The University of Western Ontario CS 2125

Upload: others

Post on 28-Feb-2022

4 views

Category:

Documents


0 download

TRANSCRIPT

Introduction to Medical Computing

Stephen M. Watt

The University of Western Ontario

CS 2125

Outline

• Data Representations

• Cryptography

UWO CS 2125 © Stephen M. Watt

Data Representations

• A reductionist view of data: it is all bits

• Data in main memory, data in files.

• All about interpreting bits in memory.

UWO CS 2125 © Stephen M. Watt

Basic Data Types

• Characters ‘a’ ‘é’ ‘中’

• Character Strings “Hello” “Κωνστάντζα” “孔丘”

• Integers 123

• Floating point numbers 123.7

UWO CS 2125 © Stephen M. Watt

Characters

• Older operating systems stored them as ASCII or EBCDIC, typically as 7 or 8-bit bytes, e.g. ‘a’ stored as 97 in an 8-bit byte, i.e. 0110 0001

• Problems with multiple alphabets.

• ISO/IEC extended 8-bit encodings, e.g. Latin-1, Latin-Thai, etc. Typically use first 128 characters for ASCII + second 128 characters for additional letters. http://en.wikipedia.org/wiki/ISO/IEC_8859-1

• Problem working with multiple alphabets at once.

UWO CS 2125 © Stephen M. Watt

Unicode

• Represent all character sets at once.

• Initially 16 bits.

• Now 17 planes of 16 bits each, i.e. 21 bits.

• Typically represented with variable length encodings:

– UTF 8 (multiple 8-bit bytes)

– UTF 16 (1 or 2 16 bit chunks)

UWO CS 2125 © Stephen M. Watt

UTF 8

UWO CS 2125 © Stephen M. Watt

UTF-16

• Represent a character as one or two 16 bit chunks.

UWO CS 2125 © Stephen M. Watt

Numbers Base 16

• Numbers in base 2 are long and error prone to write.

• It is easy to work base 16 by grouping the digits of base 2 numbers 4 at a time.

0000 -> 0 0001 -> 1 0010 -> 2 0011 -> 3 0100 -> 4 0101 -> 5 0110 -> 6 0111 -> 7 1000 -> 8 1001 -> 9 1010 -> a 1011 -> b 1100 -> c 1101 -> d 1110 -> e 1111 -> f

• So 60 (base 10) = 11 1100 (base 2) = 3c (base 16)

University of Western Ontario CS 2125. © Stephen M. Watt

• 3a1.5 (base 16) means 3 × 256 + a × 16 + 1 × 1 + 5 × 1/16 = 3 × 162 + 10 × 161 + 1 × 160 + 5 × 16-1

UTF-16

• Represent a character as one or two 16 bit chunks.

• The values in the range D800 .. DFFF are special.

• They are not used to represent characters, but are instead used to store parts of characters that need more than 16 bits, that is in the range 10000..10FFFF.

UWO CS 2125 © Stephen M. Watt

Integers • Typically represented as 16 bit, 32 bit or 64 bit.

– Different sizes have different ranges. • 8 bits: -128 to 127

or 0 to 255

• 16 bits: -32,768 to 32,767 or 0 to 65,535

• 32 bits: − 2,147,483,648 to 2,147,483,647 or 0 to 4,294,967,295

• 64 bits: − 9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 or 0 to 18,446,744,073,709,551,615

– 16 bits used when space is at a premium, i.e. large data sets or small devices.

• Arbitrarily large integers possible with dynamic storage allocation, but used mainly for advanced math software and cryptography.

UWO CS 2125 © Stephen M. Watt

Floating Point Numbers

• Represents quantities as

fraction × 2power

• fraction = 1.<fraction bits>

• power = exponent - 1023 • See http://en.wikipedia.org/wiki/Double_precision_floating-point_format

UWO CS 2125 © Stephen M. Watt

Compound Data in Programs

• Programming languages have ways to represent collections of data.

• These may be used to represent single things with many properties, or tables of many things.

UWO CS 2125 © Stephen M. Watt

Homogeneous Collections

• “arrays”

• A collection of things that are all the same.

double boneDensity[1000];

UWO CS 2125 © Stephen M. Watt

Heterogeneous Collections

• “records”, “structs”

struct patient {

int patientNumber;

char familyName[20];

char givenName[20];

int billingCode;

};

UWO CS 2125 © Stephen M. Watt

Combinations

• Arrays may have elements that are themselves arrays or other structured data.

double doses[100][100];

struct patient studyGroup[100];

UWO CS 2125 © Stephen M. Watt

Compound Data in Files

• Text files

• Binary files

• XML files

(actually, these are all just interpretations of bits)

UWO CS 2125 © Stephen M. Watt

Text Files

• Files of ASCII or Unicode data.

• Typically one line per item, separated by blanks or commas.

• May be in “fixed format”, i.e. specific data lies in certain columns,

2034632Smith Jane 22 Main St London 3927321Doe John 1004 Peppercorn WaMontreal 2379820Brown Charles2001 King St Toronto

or free-form that is parsed 2034632, Smith, Jane, 22 Main St, London 3927321, Doe, John, 1004 Peppercorn Way, Montreal 2379820, Brown, Charles, 2001 King St, Toronto

UWO CS 2125 © Stephen M. Watt

Text Files

• Easy for programs to construct and to read.

• Easy for people to check and debug.

• Many programs use as an exchange format, e.g.

– Most Unix/Linux programs

– Excel CSV

UWO CS 2125 © Stephen M. Watt

Binary Files

• Store numbers (e.g. integers and floating pt numbers) so the bytes in the file are the same as the bytes in the representation in main memory.

• Pros: Store data more compactly. – Less space required to store (e.g. 4 bytes vs 10 digits).

– Faster to read and write.

• Cons:

– Unforgiving format

– Harder to program

– Usually specific to one program or family of programs. UWO CS 2125 © Stephen M. Watt

Binary Files

• Typically used for:

– Images

– Audio

– Data bases

UWO CS 2125 © Stephen M. Watt

XML

• “Extensible Markup Langauge”

• Textual representation of structured data, very easy to parse.

• GML -> SGML -> HTML -> XML

• Can represent complex data objects.

• Important to know, easy to learn. E.g. http://www.w3schools.com/xml/

UWO CS 2125 © Stephen M. Watt

XML Basics

<?xml version=“1.0”?>

<patient>

<patientNumber>102001</patientNumber>

<name>

<family>Jones</family>

<middle initialOnly=“yes”>M</middle>

<given>Veronica</given>

</name>

<billingCode> 7993321</billingCode>

</patient>

UWO CS 2125 © Stephen M. Watt

XML Basics

• XML is not a programming language

– It doesn’t do anything

– It represents data

• Designed to represent and transport data

• Lets you design your own tags.

• Is a W3C “recommendation” (standard).

UWO CS 2125 © Stephen M. Watt

XML Basics

• Tags <patient>

• Attributes initialOnly=“yes”

• Elements <patient> ….. </patient>

• Text Jones

• Comments <!-- Do not disturb -->

UWO CS 2125 © Stephen M. Watt

XML Formats

• Represents data as a tree

• What is acceptable is specified by a grammar in the form of a DTD (old) or Schema (new)

• Various standards are specifications of XML grammars, e.g. MathML, InkML, ChemML, …

UWO CS 2125 © Stephen M. Watt

Cryptography

• Some things should be public, and some things should not be.

• Can secure data by physical means, e.g. locks, guards.

• Can secure data by access controls, e.g. passwords.

• Can secure data by encryption.

UWO CS 2125 © Stephen M. Watt

Types of Cryptography

• Secret Key Cryptography

• Public Key Cryptography

• Hash Functions

UWO CS 2125 © Stephen M. Watt

Some Vocabulary

• Plain text – the original data in unencrypted form

• Cipher text – the data in encrypted form

• Key – a piece of data used to do the encryption, like a code word.

• Dramatis Personae:

– Alice and Bob: want to exchange secret info

– Eve: an evesdropper

UWO CS 2125 © Stephen M. Watt

Secret Key Cryptography

• Single “key” is used for both encryption and decryption.

• Encrypt(plain text, key) -> cipher text

• Decrypt(cipher text, key) -> plain text

• E.g. (simple)

Encrypt(char, offset) -> (char + offset) mod 256

Decrypt(char, offset) -> (char – offset) mod 256

UWO CS 2125 © Stephen M. Watt

Secret Key Cryptography

• Electronic Codebook (ECB)

– Data divided into blocks and each encrypted separately.

– Pro: Simple. Con: Same plaintext -> same ciphertext

UWO CS 2125 © Stephen M. Watt

Secret Key Cryptography

• Cipher Block Chaining (CBC)

• Invented at IBM in the 1970s

• Each block used to modify the input of the next

UWO CS 2125 © Stephen M. Watt

Secret Key Cryptography

• Cipher Block Chaining (CBC)

UWO CS 2125 © Stephen M. Watt

UWO CS 2125 © Stephen M. Watt

Many More

UWO CS 2125 © Stephen M. Watt

Resources

• http://www.garykessler.net/library/crypto.html

• http://en.wikipedia.org/wiki/Block_cipher_modes_of_operation

UWO CS 2125 © Stephen M. Watt

Public Key Cryptography

• Most significant advance in cryptography in hundreds of years.

• First described publicly by Stanford professor Martin Hellman and graduate student Whitfield Diffie in 1976.

UWO CS 2125 © Stephen M. Watt

Public Key Cryptography

• Uses idea of functions that are hard to invert, “one way” functions.

• E.g. Multiplication vs Factorization

– Multiplication is easy. Takes time proportional to b log b log log b to multiply two b bit numbers.

– Factorization is thought to be hard. Best known algorithm for a b bit numbers is

UWO CS 2125 © Stephen M. Watt

Public Key Cryptography

• Uses idea of functions that are hard to invert, “one way” functions.

• E.g. Exponentiation vs logarithms

– Easy to compute 36 to get 729

– Hard to take 729 and find 3 and 6.

UWO CS 2125 © Stephen M. Watt

Public and Private Keys

• In both cases (multiplication, exponentiation) we have 2 pieces of information combining to give a result from which it is hard to find the pieces.

• Each participant can have a private number and reveal something publicly that does not give away the private number.

UWO CS 2125 © Stephen M. Watt

Original Diffie Hellman

• Multiplying integers mod p, a prime.

• Need g, a “primitive root” mod p.

– That is a number g such that { g1 mod p, g2 mod p, g3 mod p, …, gp-1 mod p } gives the values {1, 2, …, p-1} in any order.

UWO CS 2125 © Stephen M. Watt

Original Diffie Hellman

• E.g. 3 is a primitive root mod 7 because

31 = 3 = 3 (mod 7)

32 = 9 = 2 (mod 7)

33 = 27 = 6 (mod 7)

34 = 81 = 4 (mod 7)

35 = 243 = 5 (mod 7)

36 = 729 = 1 (mod 7)

UWO CS 2125 © Stephen M. Watt

Original Diffie Hellman

• Alice and Bob agree to use a prime p and base g.

• Alice chooses secret a. • Bob chooses secret b.

• Alice sends Bob A = ga (mod p). • Bob sends Alice B = gb (mod p).

• Alice computes s = B a = gab (mod p). • Bob computes s = A b = gab (mod p).

• Now Alice and Bob share a secret to use for Shared Key Crypto.

UWO CS 2125 © Stephen M. Watt

Example

• Alice computes s = B a mod p – s = 196 mod 23

– s = 47,045,881 mod 23

– s = 2

• Bob computes s = A b mod p – s = 815 mod 23

– s = 35,184,372,088,832 mod 23

– s = 2

UWO CS 2125 © Stephen M. Watt

Example

• Alice and Bob now share a secret: s = 2. Somebody who had known both these private integers might also have calculated s as follows: – s = 56*15 mod 23 – s = 515*6 mod 23 – s = 590 mod 23 – s = 807,793,566,946,316,088,741,610,050,849,573,099,185,363,389,5

51,639,556,884,765,625 mod23 – s = 2

http://en.wikipedia.org/wiki/Diffie%E2%80%93Hellman_key_exchange

UWO CS 2125 © Stephen M. Watt

Public Key Cryptography

• Can use Diffie-Hellman for public key crypto.

– Alice choses her “private key” a

– Alice publishes a “public key” (A = ga mod p, g, p)

– Bob chooses a random b and sends Alice (B = gb mod p, message encrypted with Ab mod p)

UWO CS 2125 © Stephen M. Watt

RSA

• Ron Rivest, Adi Shamir, Leonard Adleman (1978)

UWO CS 2125 © Stephen M. Watt

RSA Cryptography

• Alice’s key generation: – Choose two distinct random primes of similar size p and q.

– Compute N = p q and φ = (p-1)(q-1)

– Compute e such that 1 < e < φ and gcd(e, φ) = 1.

– Compute d = 1/e mod φ.

– e is the public key exponent, d is the private key exponent.

– Alice’s public key is (N, e).

• Communication: – Bob sends Alice the message M by sending c = Me mod N.

– Alice decrypts the message by computing M = cd mod N.

UWO CS 2125 © Stephen M. Watt

Cryptography in Medicine

Examples:

• Piotr Kasztelowicz, Marek Czubenko, Iwona Zięba, Security of Medical Data Transfer and Storage in Internet. Cryptography, Antiviral Security and Electronic Signature Problems, which Must Be Solved in Nearest Future in Practical Context Pol J Pathol 2003, 54, 3, 209-214

• Johannes Heurix, Thomas Neubauer, Privacy-Preserving Storage and Access of Medical Data through Pseudonymization and Encryption

• Y Zhou, K. Panetta, S. Agaian, A lossless encryption method for medical images using edge maps Engineering in Medicine and Biology Society, 2009. EMBC 2009. Annual International Conference of the IEEE, Sept 2009

UWO CS 2125 © Stephen M. Watt