data storage choices file or database ? binary or text file ? variable or fixed record length ?...

9
Data Storage Choices File or Database ? Binary or Text file ? Variable or fixed record length ? Choice of text file record and field delimiters XML anyone ?

Upload: jordan-owens

Post on 01-Jan-2016

219 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Data Storage Choices File or Database ? Binary or Text file ? Variable or fixed record length ? Choice of text file record and field delimiters XML anyone

Data Storage Choices

File or Database ?

Binary or Text file ?

Variable or fixed record length ?

Choice of text file record and field delimiters

XML anyone ?

Page 2: Data Storage Choices File or Database ? Binary or Text file ? Variable or fixed record length ? Choice of text file record and field delimiters XML anyone

File or Database ?

Files are simple and straightforward for standalone, single user, non-concurrent applications.

Concurrent applications with multiple simultaneous users can use lockfiles to protect file access.

Databases are more reliable for production use in a networked multi-user context.

Page 3: Data Storage Choices File or Database ? Binary or Text file ? Variable or fixed record length ? Choice of text file record and field delimiters XML anyone

Binary or Text file ?

With binary files data can be stored using internal format.

Advantage: don't have to convert data on input or output.

Disadvantage: binary files are less portable across networks, even across compiler settings.

Alignment of structure members on word or byte boundaries ?

Page 4: Data Storage Choices File or Database ? Binary or Text file ? Variable or fixed record length ? Choice of text file record and field delimiters XML anyone

Variable or fixed record length ?

Technically this choice is unrelated to whether a file is in text or binary format. In practice fixed length records are more likely for binary files. The advantages of fixed length records include:

* Ability to randomly access records based on position within file.

* Avoidance of space wasted for field and record delimiters.

* Problem in choosing delimiters compatible with data or escaping data avoided.

Page 5: Data Storage Choices File or Database ? Binary or Text file ? Variable or fixed record length ? Choice of text file record and field delimiters XML anyone

Variable or fixed record length ?

Advantages of variable length records include:

* Except for delimiters, only space needed for data is used and less storage wasted.

* Design of system does not directly constrain quantity of data.

Page 6: Data Storage Choices File or Database ? Binary or Text file ? Variable or fixed record length ? Choice of text file record and field delimiters XML anyone

Choice of text file record delimiter

Normally characters which don't appear in data are used for record (end of line) and field delimiters. Newline is most commonly used as the record delimiter.

Problem: what happens if newline is required within a data field ?

Possible solution: escape this value e.g. using \n . Then you have to escape backslash and convert data on input and output. Other characters possible, but fgets() function assumes use of newline.

Page 7: Data Storage Choices File or Database ? Binary or Text file ? Variable or fixed record length ? Choice of text file record and field delimiters XML anyone

Choice of text file field delimiters

This is more likely to involve a conflict between the data and delimiter. Normally different from the record delimiter, unless fields are counted or labelled. Popular delimiter characters include space, tab, comma ',', colon ':'. Many applications allow export/import using comma delimited format, typically double quoting strings e.g:

Record No.,Name,Mark123,"Asif Mohammed",76.2145,"Joe Brown",72.1

Page 8: Data Storage Choices File or Database ? Binary or Text file ? Variable or fixed record length ? Choice of text file record and field delimiters XML anyone

Choice of text file field delimiters

The appearance of a double quote within a string complicates this approach further.

The fscanf function assumes space, tab or newline delimited data, but is insecure if data of the wrong type or length is encountered, and can behave unpredictably with strings containing embedded spaces.

Some data can be simplified by converting embedded spaces into underscores, e.g. for file and variable names.

Page 9: Data Storage Choices File or Database ? Binary or Text file ? Variable or fixed record length ? Choice of text file record and field delimiters XML anyone

XML anyone ?

XML (eXtended Markup Language) involves enclosing data within opening and closing < > tags.

XML solves many internationalisation problems.

Probably not useful for simple standalone applications.

Can be useful for communicating data with a common and defined purpose between different platforms.

Not a good match to most 'C' type applications. Better suited for Java, Perl, Python business and web enabled applications for which XML libraries are available.