mca - iv semester

18
Laboratory Manual (Version 9.0) for Data Warehousing and Data Mining-Lab (MCA-254) MCA - IV Semester Compiled by: Dr. Imran Khan (Assistant Professor, BVICAM, New Delhi) Bharati Vidyapeeth’s Institute of Computer Applications and Management (BVICAM) A-4, Paschim Vihar, Rohtak Road, New Delhi-63 Visit us at: www.bvicam.in

Upload: others

Post on 22-Mar-2022

6 views

Category:

Documents


0 download

TRANSCRIPT

Laboratory Manual (Version 9.0)

for

Data Warehousing and Data

Mining-Lab (MCA-254)

MCA - IV Semester

Compiled by:

Dr. Imran Khan (Assistant Professor, BVICAM, New Delhi)

Bharati Vidyapeeth’s

Institute of Computer Applications

and Management (BVICAM) A-4, Paschim Vihar, Rohtak Road, New Delhi-63

Visit us at: www.bvicam.in

Laboratory Manual for MCA-254; Data Warehousing and Data Mining. | Version 9.0

© Bharati Vidyapeeth’s Institute of Computer Applications and Ma nagement (BVICAM), New Delhi | Page 2 of 18

Index

List of Abbreviations

Declaration

1. Vision of the Department 5

2. Mission of the Department 5

3. Programme Educational Objectives (PEOs) 5

4. Programme Outcomes (POs) 6

5. Institutional Policy for Students’ Conduct 8

6. Learning Outcomes of Laboratory Work 9

7. Course/Lab Outcomes (COs) 10

8. Mapping of COs with POs 10

9. Course/Lab Description 10

10. Grading Policy 11

11. Lesson Plan 11

12. Assignments 12

Laboratory Manual for MCA-254; Data Warehousing and Data Mining. | Version 9.0

© Bharati Vidyapeeth’s Institute of Computer Applications and Management (BVICAM), New Delhi | Page 3 of 18

List of Abbreviations

BTL Bloom’s Taxonomy Level

CE Communication Efficacy

CICP Conduct Investigations of Complex Computing Problems

CK Computational Knowledge

CO Course Outcome

DAC Departmental Advisory Committee

DDS Design and Development of Solutions

I&E Innovation and Entrepreneurship

I&T Individual & Team Work

IQAC Internal Quality Assurance Cell

LLL Life-Long Learning

MTU Modern Tool Usage

PA Problem Analysis

PE Professional Ethics

PEO Programme Educational Objective

PMF Project Management and Finance

PO Programme Outcome

SEC Societal and Envoirnmental Concern

Laboratory Manual for MCA-254; Data Warehousing and Data Mining. | Version 9.0

© Bharati Vidyapeeth’s Institute of Computer Applications and Ma nagement (BVICAM), New Delhi | Page 4 of 18

Declaration

Department : Department of Computer Science and

Applications

Course, Year and the

Semester to which Lab is

offered

: MCA - II Year, IV Semester

Name of the Lab Course : Data Warehousing and Data Mining Lab

Course Code : MCA-254

Version No. : 9.0

Name of Course/Lab

Teacher

: Dr. Imran Khan

Laboratory Manual

Committee

: 1. Mrs. Vaishali Joshi, Chairperson

2. Dr. Anupam Baliyan, Member

3. Dr. Ritika Wason, Member

4. Mrs. Tanya Pathak Garg, Member

5. Mr. Uttam Singh Bist, Member

6. Prof. P. S. Grover, Margdarshak

7. Mr. Amit Sharma, Alumni & Industry Expert

8. Dr. Ritika Wason, Concerned Subject Teacher,

Convener

Approved by : DAC Date: 02/12/2019

Approved by : IQAC Date: 03/12/2019

Signature

(Course Teacher)

Signature

(Head of Department)

Signature

(IQAC Coordinator)

Laboratory Manual for MCA-254; Data Warehousing and Data Mining. | Version 9.0

© Bharati Vidyapeeth’s Institute of Computer Applications and Management (BVICAM), New Delhi | Page 5 of 18

1. Vision of the Department

To become a Centre of excellence in the field of Computer Science and

Applications, to contribute effectively in the rapidly changing global economy

directed towards national development ensuring prosperity for the mankind.

2. Mission of the Department

M1 To become a centre of excellence in the field of Computer Science and

Applications and produce professionals as per global industry standards.

M2 To foster innovation, entrepreneurial skills, research capabili ties and

bring all-round development amongst budding professionals.

M3 To promote analytical and collaborative life-long learning skills, among

students and faculty members involving all stakeholders.

M4 To inculcate strong ethical values and professional behaviour while

giving equal emphasis to social commitment and nation building.

3. Programme Educational Objectives (PEOs)

The PEOs for the MCA programme are as follows:

PEO1 Exhibit professional competencies and knowledge for being a successful

technocrat.

PEO2 Adopt creative and innovative practices to solve real-life complex

problems.

PEO3 Be a lifelong learner and contribute effectively to the betterment of the

society.

PEO4 Be effective and inspiring leader for fellow professionals and face the

challenges of the rapidly changing multi-dimensional, contemporary

world.

Laboratory Manual for MCA-254; Data Warehousing and Data Mining. | Version 9.0

© Bharati Vidyapeeth’s Institute of Computer Applications and Ma nagement (BVICAM), New Delhi | Page 6 of 18

4. Programme Outcomes (POs)

PO1:

Computational Knowledge (CK)

Demonstrate competencies in fundamentals of computing, computing

specialisation, mathematics, and domain knowledge suitable for the

computing specialisation to the abstraction and conceptualisation of

computing models from defined problems and requirements.

PO2:

Problem Analysis (PA)

Identify, formulate, and analyze complex real-life problems in order to

arrive at computationally viable conclusions using fundamentals of

mathematics, computer sciences, management and relevant domain

disciplines.

PO3: Design and Development of Solutions (DDS)

Design efficient solutions for complex, real-world problems to design

systems, components or processes that meet the specifications with

suitable consideration to public health, and safety, cultural, societal, and

environmental considerations.

PO4: Conduct Investigations of Complex Computing Problems (CICP)

Ability to research, analyze and investigate complex computing problems

through design of experiments, analysis and interpretation of data, and

synthesis of the information to arrive at valid conclusions.

PO5: Modern Tool Usage (MTU)

Create, select, adapt and apply appropriate technologies and tools to a

wide range of computational activities while understanding their

limitations.

PO6: Professional Ethics (PE)

Ability to perform professional practices in an ethical way, keeping in

mind cyber regulations & laws, responsibilities, and norms of

professional computing practices.

Laboratory Manual for MCA-254; Data Warehousing and Data Mining. | Version 9.0

© Bharati Vidyapeeth’s Institute of Computer Applications and Management (BVICAM), New Delhi | Page 7 of 18

PO7: Life-Long Learning (LLL)

Ability to engage in independent learning for continuous self-

development as a computing professional.

PO8: Project Management and Finance (PMF)

Ability to apply knowledge and understanding of the computing and

management principles and apply these to one’s own work, as a member

and leader in a team, to manage projects in multidisciplinary

environments.

PO9: Communication Efficacy (CE)

Ability to effectively communicate with the technical community, and

with society at large, about complex computing activities by being able to

understand and write effective reports, design documentation, make

effective presentations, with the capability of giving and taking clear

instructions.

PO10: Societal and Envoirnmental Concern (SEC)

Ability to recognize and assess societal, environmental, health, safety,

legal, and cultural issues within local and global contexts, and the

consequential responsibilities applicable to professional computing

practices.

PO11: Individual & Team Work (I&T)

Ability to work in multi-disciplinary team collaboration both as a

member and leader as per need.

PO12: Innovation and Entrepreneurship (I&E)

Ability to apply innovation to track a suitable opportunity to create value

and wealth for the betterment of the individual and society at large.

Laboratory Manual for MCA-254; Data Warehousing and Data Mining. | Version 9.0

© Bharati Vidyapeeth’s Institute of Computer Applications and Ma nagement (BVICAM), New Delhi | Page 8 of 18

5. Institutional Policy for Students’ Conduct

The following guidelines shall be followed:-

5.1 All the students in their introductory Lab. shall be assigned a system, which

shall be their workplace for the complete semester. Students can store

records of all their Lab. assignments on their individual workstations.

5.2 Introductory Lab. shall include an introduction to the appropriate

software/tool, followed by a basic Introductory Assignment having Practice

Questions. All the students are expected to complete this assignment within

a week time, as the same shall be assessed through a lab. test.

5.3 Each week the instructor, in parallel to respective topics covered in the

theory lecture, shall assign a set of practical problems to the students in form

of Assignments (A, B, C, .....). The problems in these assignments shall be

divided into two parts. The first set of Problems shall be compulsory for all

the students and its record need to be maintained in the Prcatical File,

having prescribed format, as given in Appendix-A. All the students should

get the weekly assigntment checked and signed in the Practical File by the

respective teacher in the immediate succeeding week. The second set of

problems are Advanced Problems and shall be optional. Student may solve

these advanced problems for their further practice.

5.4 Cellular phones, pagers, CD players, radios and similar devices are

prohibited in the classrooms, laboratories and examination halls.

5.5 Laptop-size computers / Tablets may be used in lectures for the purpose of

taking notes or working on team-projects.

5.6 The internal practical exam shall be conducted towards the end of the

semester and shall include the complete set of Lab exercises conducted as

syllabus. However, students shall be assessed on continuos basis through

overall performances in regular lab. tests, both announced and surprise and

viva-voce.

Laboratory Manual for MCA-254; Data Warehousing and Data Mining. | Version 9.0

© Bharati Vidyapeeth’s Institute of Computer Applications and Management (BVICAM), New Delhi | Page 9 of 18

5.7 The respective faculty shall prepare and submit sufficient number of

practical sets of computing problems to the Dean (Examinations), atleast two

weeks prior to the actual exam. It is the responsibility of the faculty to

ensure that a set should not be repeated for more than 5 students in a given

batch.

5.8 The exam shall be of 3 hours duration where the student shall be expected to

implement solutions to his/her assigned set of problems on appropriate

software tools in the lab.

5.9 Once implemented, student shall also appropriately document code

implemented in the assigned answer sheets, which shall be submitted at the

end of the examination. All the students shall also appear for viva-voce

examination during the exam.

5.10 Co-operate, Collaborate and Explore for the best individual learning

outcomes but copying or entering into the act of plagiarism is strictly

prohibited.

6. Learning Outcomes of Laboratory Work

The student shall demonstrate the ability to:

Verify and Implement the concepts and theory learnt in class.

Code and use Software Tools to solve problems and present their optimal

solutions.

Apply numerical/statistical formulas for solving problems/questions.

Develop and apply critical thinking skills.

Design and present Lab as well as project reports.

Apply appropriate methods for the analysis of raw data.

Perform logical troubleshooting as and when required.

Laboratory Manual for MCA-254; Data Warehousing and Data Mining. | Version 9.0

© Bharati Vidyapeeth’s Institute of Computer Applications and Ma nagement (BVICAM), New Delhi | Page 10 of 18

Work effectively as a member of a team in varying roles as need be.

Communicate effectively, both oral and written.

Cultivate ethics, social empathy, creativity and entrepreneurial mindset.

7. Course/Lab Outcomes (COs)

CO1 Apply BI tools in construction of data warehouse at its different stages.

(BTL3)

CO2 Manage enterprise data warehouse for BI applications. (BTL6)

CO3 Implement various data mining concepts for knowledge discovery. (BTL6)

CO4 Work in teams to implement data mining techniques to build decision

making systems for real life complex problems. (BTL6)

8. Mapping of CO’s with PO’s

Table 1: Mapping of COs with POs

PO/CO PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12

CO1 3 - - - 3 3 - - - 3 - -

CO2 3 3 3 3 - 3 1 - 2 3 2 -

CO3 3 2 2 3 3 3 2 - - 3 - -

CO4 3 3 3 3 3 3 2 3 3 3 3 -

9. Course/Lab Description

Course (Lab) Title : Data Warehousing and Data Mining -Lab.

Course (Lab) Code : MCA-254

Credits : 01

Laboratory Manual for MCA-254; Data Warehousing and Data Mining. | Version 9.0

© Bharati Vidyapeeth’s Institute of Computer Applications and Management (BVICAM), New Delhi | Page 11 of 18

Pre-requisites : MS- SQL Server, SQL Server BIDS tool, Rapidminer,

Anaconda

Academic Session : January to June

Contact Hours/Week : 02 (01 Labs of 02 hours/Week)

Internal Assessment : 40 Marks

External Assessment : 60 Marks

10. Grading Policy

Item Points Marks Remarks

Weekly Lab Assignments

including Practical Files

10 10 Closed Book/Open Book

Internal End-Term Practical

Examination

20 10 Closed Book

Viva-Voce 10 10 Closed Book

Project 10 10 Innovative Applications of

Programming

External End-Term

Examinations

60 60 Closed Book (conducted and

evaluated by the University)

Total 100

11. Lesson Plan

Week

No.

Lab

No.

Topics / Concepts to be Covered Reference of the Book

and its Chapter

1. 1. Basic SQL revision and Data warehouse

design issues and

Exercise-1-10 [AM]

2. 2. Buffer Reserved for Revision ASN [AP1-AP6]

3. 3. Data warehouse construction and BIDS

tool introduction.

ASN [BP1- BP10]

4. 4. Buffer Reserved for Revision ASN [BA1]

5. 5. Introduction to Rapidminer ASN [CP1-CP4]

Laboratory Manual for MCA-254; Data Warehousing and Data Mining. | Version 9.0

© Bharati Vidyapeeth’s Institute of Computer Applications and Ma nagement (BVICAM), New Delhi | Page 12 of 18

Week

No.

Lab

No.

Topics / Concepts to be Covered Reference of the Book

and its Chapter

6. 6. Buffer Reserved for Revision ASN [CA1]

7. 7. Data Preprocessing ASN [DP1- DP4]

8. 8. Buffer Reserved for Revision ASN [DP5- DP7]

9. 9. Data mining algorithms ASN [DA1- DA3]

10. 10. Python introduction ASN [EP1- EP7]

11. 11. Buffer Reserved for Revision ASN [EP7- EP10, EA1-

EA2]

12. 12. Advanced data analysis ASN [FP1- FP5]

13. 13. Buffer Reserved for Revision ASN [FA1- FA3]

12. Assignments

Assignment Set: A

Objectives of the Assignment:

Familiarize with basic constructs of data wareshousing.

Familiarize with problem analysis and data warehouse designing.

Revise the Relational OLAP and perform the task of data warehouse

design.

CO/BTLCovered: CO1/BTL3 & BTL6

Problems:

AP1 Identify the subject and analysis questions for online shopping website like

Flipkart/Amazon etc.

AP2 Identify the dimension and Fact Tables.

AP3 Design the schema (Star/Snowflake) as per the requirement and justfy.

AP4 Create some excel files for data in dimension and fact tables.

Laboratory Manual for MCA-254; Data Warehousing and Data Mining. | Version 9.0

© Bharati Vidyapeeth’s Institute of Computer Applications and Management (BVICAM), New Delhi | Page 13 of 18

AP5 Import the data from the excel files to respective dimension and fact tables

AP6 Revise the following basic SQL commands in SQL Server

1. Create a database and create tables.

2. Insert data into the tables.

3. Update the records in the tables.

4. Alter table and add constraineds.

5. Export/Import function.

6. Delete records and tables.

Advanced Problems:

AA1 Install Microsoft Bussiness Intelligence Developement Studio.

AA2 Create an analysis project and integrate the SQL Server with the Bussiness

Intelligence Developement Studio.

Assignment Set: B

Objectives of the Assignment:

Familiarize with the tool of Business Intelligence.

Implement data warehousing using Business Intelligence tool.

CO/BTLCovered: CO1 & CO2/BTL4 & BTL6

Problems:

BP1 Create a Data warehouse for online shopping website and import all the

requisite data.

BP2 Open your BIDS tool and create the source & source view.

BP3 Create a data cube.

BP4 Modify the attributes of dimension tables according to the analysis questions.

BP5 Perform the OLAP operations and find out the answers of all the analysis

questions.

BP6 Create Data warehouse for two case studies of your choice.

BP7 Open your BIDS tool and create the source & source view.

BP8 Create a data cube.

Laboratory Manual for MCA-254; Data Warehousing and Data Mining. | Version 9.0

© Bharati Vidyapeeth’s Institute of Computer Applications and Ma nagement (BVICAM), New Delhi | Page 14 of 18

BP9 Modify the attributes of dimension tables according to the analysis questions.

BP10 Perform the OLAP operations and find out the answers of all the analysis

questions.

Advanced Questions:

BA1 Consider two case studies of your choice and perform all taks of data

warehousing (Design, create and OLAP operations). Also, analyze the results

obtained.

Assignment Set: C

Objectives of the Assignment:

Familiarize with Data Mining tool Rapidminer.

Practice basic data handling tasks in Rapidminer using sample data.

CO/BTLCovered: CO3 /BTL6

Problems:

CP1 Install Rapid miner and explore various options.

CP2 Create a reopository for data in rapidminer and import different data files.

CP3 Visualize the loaded data in the repository and explore its meta-data.

CP4 Perform the task of type conversion using rapidminer.

Advanced Questions:

CA1 Download dataset any different study from kaggle and create a repository.

Import all the data files of different formats and combine them to a unified

format and visualize them.

Assignment Set: D

Objectives of the Assignment:

Perform the task of data preprocessing using Rapidminer.

Implement various data mining algorithms using Rapidminer.

CO/BTLCovered: CO3 & CO4/BTL6

Problems:

Laboratory Manual for MCA-254; Data Warehousing and Data Mining. | Version 9.0

© Bharati Vidyapeeth’s Institute of Computer Applications and Management (BVICAM), New Delhi | Page 15 of 18

DP1 Perform the task of data cleaning using appropriate operators of rapidminer.

DP2 Perform the task of data sampling using appropriate operators of rapidminer.

DP3 Perform the task of attribute selection using appropriate operators of

rapidminer.

DP4 Elaborate Classification algorithms using rapidminer.

DP5 Elaborate Regression algorithms using rapidminer.

DP6 Elaborate Clustering algorithms using rapidminer.

DP7 Elaborate Association rule mining algorithms using rapidminer.

Advanced Problems

DA1 Consider a case study of your choice and import data. Perform all steps of

data preprocessing on the data and apply classification/regression algorithms

accordingly.

DA2 Consider a case study of your choice and import data. Perform all steps of

data preprocessing on the data and apply Clustering algorithms accordingly.

DA3 Consider a case study of your choice and import data. Perform all steps of

data preprocessing on the data and identify Association Rules based by

choosing different support and confidence.

Assignment Set: E

Objectives of the Assignment:

Familarize with the basic syntax of python.

Practice and implement various problems using python.

CO/BTLCovered: CO3 /BTL6

Problems:

EP1 Install Anaconda and explore different IDEs like Jupyter notebook and

Spider.

EP2 Create a program that asks the user to enter their name and their age. Print

out a message addressed to them that tells them the year that they will turn

100 years old.

Laboratory Manual for MCA-254; Data Warehousing and Data Mining. | Version 9.0

© Bharati Vidyapeeth’s Institute of Computer Applications and Ma nagement (BVICAM), New Delhi | Page 16 of 18

EP3 Take a list of numbers and write a program that prints out all the elements of

the list that are less than 5.

EP4 Create a program that asks the user for a number and then prints out a list of

all the divisors of that number.

EP5 Take two lists and write a program that returns a list that contains only the

elements that are common between the lists (without duplicates). Make sure

your program works on two lists of different sizes.

EP6 Ask the user for a string and print out whether this string is a palindrome or

not.

EP7 Write one line of Python that takes a list and makes a new list that has only

the even elements of this list in it.

EP8 Given a .txt file that has a list of a bunch of names, count how many of each

name there are in the file, and print out the results to the screen.

EP9 Create a Python program that accepts a string and calculate the number of

digits and letters.

EP10 Create a Python function to check whether a number is perfect or not.

Advanced Problems

EA1 Create a Python project to get the value of e to n number of decimal places.

Note: Input a number and the program will generate e to the 'nth digit

EA2 Create a Python project to guess a number that has randomly selected.

Assignment Set: F

Objectives of the Assignment:

Create and Implement different mini projects of data analysis using

python.

Solve real world problems of data analysis and data mining through

various libereries available in python.

CO/BTLCovered: CO3 to CO4/ BTL6

Problems:

Laboratory Manual for MCA-254; Data Warehousing and Data Mining. | Version 9.0

© Bharati Vidyapeeth’s Institute of Computer Applications and Management (BVICAM), New Delhi | Page 17 of 18

FP1 Create a Python function that takes a number as a parameter and check the

number is prime or not.

FP2 Create a Python function that takes a list and returns a new list with unique

elements of the first list.

FP3 Create a Python program to replace dictionary values with their sum.

FP4 Create a Python program to remove spaces from dictionary keys.

FP5 Create a Python program to convert Python objects into JSON strings. Print

all the values.

Advanced Problems

FA1 Design a function called word_stats(), which will take the word frequency

dictionary( output of count_words_fast()/count_words() ) as a parameter.The

function will return the total no of unique words(sum/total keys in the word

frequency dictionary) and a dict_values holding total count of them together,

as a tuple.

FA2 Exploratory Data Analysis is a technique to analyze data with visual

techniques and all statistical results. Import a csv file and apply statistical

techniques to visualize the data using exploratory data analysis.

FA3 Use Pandas to analyze data on Country Data.csv file from UN public Data

Sets of a popular ‘statweb.stanford.edu’ website.

Laboratory Manual for MCA-254; Data Warehousing and Data Mining. | Version 9.0

© Bharati Vidyapeeth’s Institute of Computer Applications and Ma nagement (BVICAM), New Delhi | Page 18 of 18

APPENDIX-A

Template for the Index of Lab File

WEEK

NO.

PROBLEMS WITH DESCRPTION PAGE

NO.

SIGNATURE

OF THE

TEACHER

WITH DATE

1 AP1

AP2

AP3

2 AA1

AA2

AA3

3 BP1

BP2

BP3

BP4

Note: The students should use Header and Footer mentioning their roll no. & name

in header and page no. in footer

**********