lds assignment mpaani
DESCRIPTION
LDSTRANSCRIPT
-
m.Paani Lead Data Science Role Assignment Question 1
For the first question, you will dive into and analyse daily coal stock reports of Indian thermal power
plants for the years 2009-2012. The data is available as part of the Republic of Indias Open
Government Data (OGD) Initiative and can be accessed at http://bit.ly/1yfkXZN or by searching for
Coal Statement of Thermal Power Stations at data.gov.in. You will:
1) Arrange the 48 CSV files into a continuous time series database or data frame. We are
especially interested in how you deal with missing, erroneous, and non-uniform data? Briefly
explain how you handled missing and erroneous data in the time series.
2) Choose one of the following tasks and WOW us with your modelling skills and data-driven
insights:
- Segment the coal thermal plants into distinguishable groups based on a clustering
strategy of your choice. You are free to define the number of groups and methodology
yourself but make sure you have a method to assess the separability of your resultant
groups.
OR
- Forecast whether coal stocks will reach Super-Critical state the following (next) day for
any given power plant. How accurate is your model? What are its limitations and
strengths?
Instructions: Document and present your answers and results in a document, presentation,
web application, or medium of your choice. Make sure to attach your code (if any) and
heavily comment it so we can really dive into your thought process. Have fun!
-
m.Paani Lead Data Science Role Assignment Question 2
For the second question, we would like you to imagine that you are the Lead Data Scientist for a
loyalty company which handles millions of shopping transactions a day. These transactions are
carried out both online and through physical retail outlets. Managements main objective is to
make the transactions simple and easy so members can earn points towards redeeming the
reward of their choice. However, it has come to your attention that there is a small but steady
increase in the number of fraud and identity theft incidents among the companys customer base.
You and your team of 5 Data Scientists (1 Senior Data Scientist, 2 Junior Data Scientists, 1 Data
Visualization Expert, and 1 GIS Expert) are entrusted with the task of detecting fraudulent behaviour
efficiently in a real-time manner. As you are well aware, any false positives on your end will result in
erosion of customer trust whereas false negatives cost the company serious money.
Please:
1) Develop a high-level workflow of how you propose to tackle the challenge of developing
and implementing a real-time fraud detection engine that can sort through millions of
transactions a day. We would like you to focus on but by no means limit yourself to the
following points:
Allocation of tasks and responsibilities among each member of your team
What technology and platform you would be using to implement the fraud
detection engine
Model structure, execution, and validation
How to handle the transactions that do get flagged as fraudulent?
The use of data visualization in enhancing the fraud discovery process
2) Present your system in a format that can effectively communicate your thought process.
Be ready to discuss your proposed solution to a member of the mPaani data science
team once you have submitted the assignment.