ntrs.nasa.gov · designed with flexibility and uniformity as the main considerations, ... poisson...

16
.......................................................... * * * * * * * USL / DBMS NASA / PC R&D * * * * * SERIES * * WORKING PAPER * * * * * * Report Number DM.NASA/PC RBI-5 .......................................................... The USLID- NASA/PC R&D Working Paper Series contains a collection of formal and informal reports representing results of PC-based research and development activities being conducted by the Computer Science Department of the University of Southwestern Louisiana pursuant to the specifications of National Aeronautics and Space Administration Contract Number NASW-3846. For more information, contact: Wayne D. Dominick Editor USLIDBMS NASA/PC -Working Paper Series Computer Science Department University of Southwestern Louisiana P. 0. Box 44330 Lafayette, Louisiana 70504 (318) 231-6308 https://ntrs.nasa.gov/search.jsp?R=19890005606 2020-03-12T02:27:26+00:00Z

Upload: others

Post on 11-Mar-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ntrs.nasa.gov · Designed with flexibility and uniformity as the main considerations, ... Poisson distribution Probability tests p-test Probability confidence intervals ... C(m,n)

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . * * * * * * * U S L / D B M S N A S A / P C R & D * * * * *

S E R I E S * * W O R K I N G P A P E R * * * * * *

Report Number

DM.NASA/PC RBI-5

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

The USLID- NASA/PC R&D Working Paper Series contains a collection of formal and informal reports representing results o f PC-based research and development activities being conducted b y the Computer Science Department o f the University of Southwestern Louisiana pursuant to the specifications of National Aeronautics and Space Administration Contract Number NASW-3846.

For more information, contact:

Wayne D. Dominick

Editor USLIDBMS NASA/PC -Working Paper Series

Computer Science Department University of Southwestern Louisiana

P. 0. Box 44330 Lafayette, Louisiana 70504

(318) 231-6308

https://ntrs.nasa.gov/search.jsp?R=19890005606 2020-03-12T02:27:26+00:00Z

Page 2: ntrs.nasa.gov · Designed with flexibility and uniformity as the main considerations, ... Poisson distribution Probability tests p-test Probability confidence intervals ... C(m,n)

GENERAL SPECIFICATIONS

FOR THE DEVELOPME"'

OF A USL NASA PC R&D

STATISTICAL ANALYSIS SUPPORT PACKAGE

Jinous Bassari

Spiros Triantafyllopoulos

T h e U n i v e r s i t y of Southwestern Louisiana C o m p u t e r S c i e n c e Department

Lafagette, Louisiana

August 2, 1984

........................ - 1 - I WRKING PAPER SERIES I

Page 3: ntrs.nasa.gov · Designed with flexibility and uniformity as the main considerations, ... Poisson distribution Probability tests p-test Probability confidence intervals ... C(m,n)

GENERAL SPECIFICATIONS

FOR THE DEVELOPMENT

OF A USL NASA PC R&D

STATISTICAL ANALYSIS SUPPORT PACKAGE

ABSTRACT

This is a three-level package designed to allow statistical

analysis for a variety of applications within the USL D M

NASA/RECON project. Designed with flexibility and uniformity as

the main considerations, i t is expected to provide computational

capabilities for a variety of user needs, beginner to expert, in

three different forms: a library package, an interactive package

and a batch-processing package.

I DM.NASA/PC R&D-S I ......................

Page 4: ntrs.nasa.gov · Designed with flexibility and uniformity as the main considerations, ... Poisson distribution Probability tests p-test Probability confidence intervals ... C(m,n)

I.

1 1 .

IV.

TABLE OF CONTENTS

Introduction . . . . . . . . . . . . . . . . 4

Objectives . . . . . . . . . . . . . . . . . 6

1 1 1 . M e t h o d o l o g y . . . . . . . . . . . . . . . . 8

Proposed Configuration . . . . . . . . . . . 10

Sumnary . . . . . . . . . . . . . . . . . . 14

- 3 -

Page 5: ntrs.nasa.gov · Designed with flexibility and uniformity as the main considerations, ... Poisson distribution Probability tests p-test Probability confidence intervals ... C(m,n)

GENERAL SPECIFICATIONS

FOR THE DEVELOPMENT

OF A USL NASA PC R&D

STATISTICAL ANALYSIS SUPPORT PACKAGE

I . INTRODUCTION

This is a proposal for the design, development and

implementation of a general-purpose statistical package for the

USL DIMS NASA/RECON project.

Statistical Packages offer to the user the power and

flexibility they need, without having to write complicated

programs. In addition, the user can be assured of the accuracy of

the results. Many statistical packages have been developed s o

far, for all types and sizes of computers.

There are three major types of statistical packages

available for the user:

i. Statistical Program Libraries.

Statistical Libraries are collections of programs that

are bound together in one collection. The user can call

them from his/her application programs, supplying the

appropriate arguments and obtaining the results in a

similar way.

Page 6: ntrs.nasa.gov · Designed with flexibility and uniformity as the main considerations, ... Poisson distribution Probability tests p-test Probability confidence intervals ... C(m,n)

i i . I nteractive Statistical Packages.

Interactive Statistical Packages a l l o w the user to

interact with the computer. T h e user is put o n at a

"comnand level", w h e r e h e / s h e issues comnands and

enters data. T h e p r o g r a m processes the d a t a and returns

the results t o the user o n the terminal screen.

i i i . B a t c h Statistical Programs.

B a t c h programs a l l o w the user to collect all his/her

comnands t o the p r o g r a m in one group, code t h e m in a

particular language, and t h e n process the entire batch.

T h e user d o e s not interact with the execution at all.

T h i s research a n d development proposal intends to implement

all t h r e e packages under a unified interface. T h e result is

e x p e c t e d t o be a flexible and powerful package with common

characteristics b e t w e e n its three forms. I t is a l s o intented to

be c o m p l e t e l y transportable among a n y computer that c a n support

the "C" programning language. T h i s includes 3 of the 4 large

c o m p u t e r systems available at USL, n a m e l y the DEC UNIX

VAX-11/780, DEC W VAX-11/780, the P y r a m i d T e c h n o l o g i e s 90x,

and, of course, the IBMPC/xT.

Page 7: ntrs.nasa.gov · Designed with flexibility and uniformity as the main considerations, ... Poisson distribution Probability tests p-test Probability confidence intervals ... C(m,n)

I I . OBJECTIVES OF THE PROJECT

The generic objectives of the project are as follows:

i. T o develop a powerful, flexible, easy-to-use and

transportable statistical package.

i i . T o improve our knowledge in the fields of statistics and

numerical computation.

i i i . T o obtain further experience on the design, implementation,

testing and maintenance of a major software product.

The specific objectives of the statistical package design are as

follows:

i. Computational power: the objective i s a design that can

satisfy most user needs in terms of available functions and

options. The package should offer a full range of comnands

for most applied statistical computations.

i i . Design flexibility: The design must be flexible so that

changes, improvments and addition of more functions can be

acconmodated without major changes, if any, to the entire

package.

i i i . Ease o f use: the design should be such that any of the three

interfaces will be easy t o use efficiently. This includes

Page 8: ntrs.nasa.gov · Designed with flexibility and uniformity as the main considerations, ... Poisson distribution Probability tests p-test Probability confidence intervals ... C(m,n)

error checking and, in the case of the interactive user

interfaces, available online help. Uniform comnand and

function formats in all three modes will be used also.

iv. Efficiency and accuracy: Efficiency of the algorithms

used is very important considering that the package to be

developed will be used in mini and micro computers with

often limited resources and speed of execution problems.

Accuracy is also critical so that the user is assured of the

quality of the results.

v. Package Transportability: The programs should be written in

a way that ensures transportability between varying

operating environments. Standard programning policy will be

adopted for all modules.

This design will be first implemented on the IBMPersonal

Computers of the NASA/RECON Project. Parallel development on the

DEC VAX-11/780 will also be considered.

Page 9: ntrs.nasa.gov · Designed with flexibility and uniformity as the main considerations, ... Poisson distribution Probability tests p-test Probability confidence intervals ... C(m,n)

I I I. METHODOLOGY

A highly modular approach will be followed in the design and

implementation of this project. This will ensure that many of the

design objectives, in particular, flexibility and modifiability,

are inherent in the implementation.

For achieving the transportability and modularity goals, the

"C" programning language was chosen as the implementation

language. I t offers good performance characteristics and high

modularity which make i t most desirable. I t is also powerful in

character and file manipulation, facts that make i t more

desirable to use in the second and third phase of the design.

Algorithm selection is critical in the computational parts.

Therefore, extensive research will have to be performed in order

t o determine the most appropriate ones t o be used. W h i l e a

computer approach to manual algorithms can be used, for some

cases i t is not efficient and better methods should be found.

At this point only the first phase of the statistical

package has been totally defined. The interactive and batch

interfaces will be designed under the considerations applied to

the first phase, with the final goal being to create a comnon,

efficient interface for all three modes. A defined comnand

Page 10: ntrs.nasa.gov · Designed with flexibility and uniformity as the main considerations, ... Poisson distribution Probability tests p-test Probability confidence intervals ... C(m,n)

language for the interactive phase would consist of either a

menu-selection procedure, a comnand language or a combination.

Again, the batch interface can be similar to the interactive in

terms of comnand names, arguments, e t c , or be a completely

different programing language by itself.

For the interactive interface, a spreadsheet configuration

like MINITAB is likely to be implemented, with its comnands,

arguments and options combined to form the batch programning

language. Therefore, by making the library interface with a

similar structure, the goal of uniformity can be achieved.

As a minimum, the functions shown below are expected to be

implemented for the first phase (program library). Then the

interactive and batch interfaces can be built on top o f the

packages. The modularity of the design will allow the addition

of n e w functions and/or the modification of existing ones to be

performed efficiently, with no major code changes in the entire

program structure.

Page 11: ntrs.nasa.gov · Designed with flexibility and uniformity as the main considerations, ... Poisson distribution Probability tests p-test Probability confidence intervals ... C(m,n)

P R O P O S E D CONF I GURAT I ON

OF PHASE 1

1. Basic Input/Output

R e a d f r o m a g i v e n file

R e a d f r o m terminal

W r i t e to a g i v e n file

W r i t e to terminal

R e p o r t error/warning m e s s a g e s

2. B a s i c One-Vector C a l c u l a t i o n s

sumx, sumx2, sum2x

M e a n , m o d e , m e d i a n

V a r i a n c e , standard d e v i a t i o n

S o r t ascending, descending, rank

F r e q u e n c y , m o s t / l e a s t frequent

R e l a t i v e frequencies, signs

M a x - m i n , local rnax/min, k-th maximin

3. One-Vector T e s t Statistics

C o n f i d e n c e intervals

z-scores, z-tests

P r o p o r t i o n tests

S t u d e n t ’ s t-test

Page 12: ntrs.nasa.gov · Designed with flexibility and uniformity as the main considerations, ... Poisson distribution Probability tests p-test Probability confidence intervals ... C(m,n)

Small/large sample sizes

4. Ba s i c one-Vector G r a p h s

B a r C h a r t s - Histograms

F r e q u e n c y graphs

5 . Two-way S t a t i s t i c s

H y p o t h e s i s testing

D i f f e r e n c e of m e a n s

V a r i a n c e known/unknown

D-test

P a i r e d Samples T e s t s

T e s t s for Standard D e v i a t i o n

D e g r e e s o f Freedom, F-test

T e s t s for proportions

6. Two-Vector G r a p h i c s

P l o t s

S c a t t e r Grams

Fr e q u e n c y P l o t s

C h a r t s

x-Y plots

D i s t r i b u t i o n s

Page 13: ntrs.nasa.gov · Designed with flexibility and uniformity as the main considerations, ... Poisson distribution Probability tests p-test Probability confidence intervals ... C(m,n)

7 . Linear Regression and Correlation

Linear Regression Analysis

Regression Line calculations

Correlation Analysis

Standard Error

8 . Multiple Regression

Exponential regression analysis

Logarithmic regression analysis

Parabolic regression analysis

Multiple Analysis

9 . Basic Probability Calculations

Probability distributions

Normal distribution

Binomial distribution

Poisson distribution

Probability tests

p-test Probability confidence intervals

10. Advanced Probability Calculations

Conditional Probabilities

Independent Probabilities

Probability Estimations

Bayes’ Theorem

Page 14: ntrs.nasa.gov · Designed with flexibility and uniformity as the main considerations, ... Poisson distribution Probability tests p-test Probability confidence intervals ... C(m,n)

Basic Combinatoric Calculations

P(m,n) value

C(m,n) value

n! value

Probability distributions of samples

11. Chi Square Analysis

Contingency Tables

Chi Square tests

Chi Square distribution

Lambda Index of association

1 2 . Analysis of Variance

One-way analysis

Two-way analysis

Difference of several means

Total Variance calculations

----------- I N A S A I -----------

13. Non-Parametric Tests

Sign test

Mann-Wh i t ne y t e s t

Non-parametric ANOVA

Page 15: ntrs.nasa.gov · Designed with flexibility and uniformity as the main considerations, ... Poisson distribution Probability tests p-test Probability confidence intervals ... C(m,n)

IV. SuvlkI4RY

Design, implementation, testing and maintenance of this

major software package i s expected to generate a support

environment for any other activities that require statistical

analysis within the NASA/RECON or related DIMS projects. The

applicability of statistical analysis methods in information

storage and retrieval systems is increasing, ranging from

performance measurement and evaluation to natural language text

analysis, thus making this project an interesting consideration

for further research and development.

The unified environment that this document proposes is

expected to further improve the user/system interface and make i t

more effective. Portability is also provided in order to have a

single data analysis environment for more than one hardware

configuration.

Page 16: ntrs.nasa.gov · Designed with flexibility and uniformity as the main considerations, ... Poisson distribution Probability tests p-test Probability confidence intervals ... C(m,n)

1. Report No.

' IM-RL

University of Southwestern Louisiana The Center for Advanced Computer Studies P.O. Box 44330

2, Government Accession No. / 83.3 75 3. Recipient's Catalog No.

11. Contract or Grant No.

NGT-19-010-900

16 i t 4. Title and Subtitle

USL/NGT-19-010-900: GENERAL SPECIFICATIONS FOR THE DEVELOPMENT OF A USL NASA PC R&D STATISTICAL ANALYSIS SUPPORT PACKAGE

7. Authods)

JINOUS BASSARI AND SPIROS TRIANTAFYLLOPOULOS

9. Performing Organization Name and Address

13. Type of Report and Period Covered

FINAL; 07/01/85 - 12/31/87 14. Sponsoring Agency Code

Lafayette, LA 70504-4330 12. Sponsoring Agency Name and Address

15. Supplementary Nates

5. Report Date D d / C August 2, 1984 &/L&-:,~'.+', .:

6. Performing Organization Code

8. Performing Organization Report No.

10. Work Unit No.

~~

16. Abstract

The USL NASA PC R&D statistical analysis support package is designed to be a three level package to allow Statistical analysis for a variety of applications within the USL DBMS NASA Contract work. The design addresses usage of the statistical facilities as a library package, as an interactive statisti- cal analysis system, and as a batch processing package.

17. Key Words (Suggested by Authorb) 1 18. Distribution Statement

USL NASA PC R&D Statistical Analysis Package Specifications, PC-Based Research and Development

19. Security Classif. (of this report) 20. Security Classif. (of this page) 21. NO. of Pages 22. Rice'

Unclassified . Unclassified 14 L

This report represents one of the 72 attachment reports to the University of Southwestern Louisiana's Final Report on NASA Grant NGT-19-010.900. Accordingly, appropriate care should be taken in using this report out of the context of the full Final Report.