common analytics interview questions

34
Common Analytics Interview Questions Question 1. Can you outline the various steps in an analytics project? Broadly speaking these are the steps. Of course these may vary slightly depending on the type of problem, data, tools available etc. 1. Problem definition – The first step is to of course understand the business problem. What is the problem you are trying to solve – what is the business context? Very often however your client may also just give you a whole lot of data and ask you to do something with it. In such a case you would need to take a more exploratory look at the data. Nevertheless if the client has a specific problem that needs to be tackled, then then first step is to clearly define and understand the problem. You will then need to convert the business problem into an analytics problem. I other words you need to understand exactly what you are going to predict with the model you build. There is no point in building a fabulous model, only to realise later that what it is predicting is not exactly what the business needs. 2. Data Exploration – Once you have the problem defined, the next step is to explore the data and become more familiar with it. This is especially important when dealing with a completely new data set. 3. Data Preparation – Now that you have a good understanding of the data, you will need to prepare it for modelling. You will identify

Upload: rohitkumarilu

Post on 16-Dec-2015

226 views

Category:

Documents


0 download

DESCRIPTION

The first step is to of course understand the business problem. What is the problem you are trying to solve – what is the business context? Very often however your client may also just give you a whole lot of data and ask you to do something with it. In such a case you would need to take a more exploratory look at the data.

TRANSCRIPT

Common Analytics Interview Questions

Question 1. Can you outline the various steps in an analytics project?Broadly speaking these are the steps. Of course these may vary slightly depending on the type of problem, data, tools available etc.1.Problem definition The first step is to of course understand the business problem. What is the problem you are trying to solve what is the business context? Very often however your client may also just give you a whole lot of data and ask you to do something with it. In such a case you would need to take a more exploratory look at the data. Nevertheless if the client has a specific problem that needs to be tackled, then then first step is to clearly define and understand the problem. You will then need to convert the business problem into an analytics problem. I other words you need to understand exactly what you are going to predict with the model you build. There is no point in building a fabulous model, only to realise later that what it is predicting is not exactly what the business needs.2.Data Exploration Once you have the problem defined, the next step is to explore the data and become more familiar with it. This is especially important when dealing with a completely new data set.3.Data Preparation Now that you have a good understanding of the data, you will need to prepare it for modelling. You will identify and treat missing values, detect outliers, transform variables, create binary variables if required and so on. This stage is very influenced by the modelling technique you will use at the next stage. For example, regression involves a fair amount of data preparation, but decision trees may need less prep whereas clustering requires a whole different kind of prep as compared to other techniques.4.Modelling Once the data is prepared, you can begin modelling. This is usually an iterative process where you run a model, evaluate the results, tweak your approach, run another model, evaluate the results, re-tweak and so on.. You go on doing this until you come up with a model you are satisfied with or what you feel is the best possible result with the given data.5.Validation The final model (or maybe the best 2-3 models) should then be put through the validation process. In this process, you test the model using completely new data set i.e. data that was not used to build the model. This process ensures that your model is a good model in general and not just a very good model for the specific data earlier used (Technically, this is called avoiding over fitting)6.Implementation and tracking The final model is chosen after the validation. Then you start implementing the model and tracking the results. You need to track results to see the performance of the model over time. In general, the accuracy of a model goes down over time. How much time will really depend on the variables how dynamic or static they are, and the general environment how static or dynamic that is.Question 2.What do you do in data exploration?Data exploration is done to become familiar with the data. This step is especially important when dealing with new data. There are a number of things you will want to do in this step a.What is there in the data look at the list of all the variables in the data set. Understand the meaning of each variable using the data dictionary. Go back to the business for more information in case of any confusion.b.How much data is there look at the volume of the data (how many records), look at the time frame of the data (last 3 months, last 6 months etc.)c.Quality of the data how much missing information, quality of data in each variable. Are all fields usable? If a field has data for only 10% of the observations, then maybe that field is not usable etc.d.You will also identify some important variables and may do a deeper investigation of these. Like looking at averages, min and max values, maybe 10thand 90thpercentile as welle.You may also identify fields that you need to transform in the data prep stage.Question 3: What do you do in data preparation?In data preparation, you will prepare the data for the next stage i.e. the modelling stage. What you do here is influenced by the choice of technique you use in the next stage.But some things are done in most cases example identifying missing values and treating them, identifying outlier values (unusual values) and treating them, transforming variables, creating binary variables if required etc,This is the stage where you will partition the data as well. i.e create training data (to do modelling) and validation (to do validation).Question 4: How will you treat missing values?The first step is to identify variables with missing values. Assess the extent of missing values. Is there a pattern in missing values? If yes, try and identify the pattern. It may lead to interesting insights.If no pattern, then we can either ignore missing values (SAS will not use any observation with missing data) or impute the missing values.Simple imputation substitute with mean or median valuesORCase wise imputation for example, if we have missing values in the income field.Question 5: How will you treat outlier values?You can identify outliers using graphical analysis and univariate analysis. If there are only a few outliers, you can assess them individually. If there are many, you may want to substitute the outlier values with the 1stpercentile or the 99thpercentile values.If there is a lot of data, you may decide to ignore records with outliers.Not all extreme values are outliers. Not all outliers are extreme values.Question 6: How do you assess the results of a logistic regression analysis?You can use different methods to assess how good a logistic model is.a. Concordance This tells you about the ability of the model to discriminate between the event happening and not happening.b. Lift It helps you assess how much better the model is compared to random selection.c. Classification matrix helps you look at the false positives and true negatives.Some other general questions you will most likely be asked: What have you done to improve your data analytics knowledge in the past year? What are your career goals? Why do you want a career in data analytics?The answers to these questions will have to be unique to the person answering it. The key is to show confidence and give well thought out answers that demonstrate you are knowledgeable about the industry and have the conviction to work hard and excel as a data analyst.

Macro Interview Question (for fresher)Macro Interview Question

1. Have you used macros? For what purpose you have used?

Yes I have, I used macros in creating analysis datasets and tables where it is necessary to make asmall change through out the program and where it is necessary to use the code again and again.

2. How would you invoke a macro?After I have defined a macro I can invoke it by adding the percent sign prefix to its name likethis: % macro name a semicolon is not required when invoking a macro, though adding onegenerally does no harm.3. How can you create a macro variable with in data step?with CALL SYMPUT

4. How would you identify a macro variable?with Ampersand (&) sign

5. How would you define the end of a macro?The end of the macro is defined by %Mend Statement

6. For what purposes have you used SAS macros?If we want use a program step for executing to execute the same Proc step on multiple data sets.We can accomplish repetitive tasks quickly and efficiently. A macro program can be reusedmany times. Parameters passed to the macro program customize the results without having tochange the code within the macro program. Macros in SAS make a small change in the programand have SAS echo that change thought that program.

7. What is the difference between %LOCAL and %GLOBAL?% Local is a macro variable defined inside a macro.%Global is a macro variable defined in opencode (outside the macro or can use anywhere).

8. How long can a macro variable be? A token?A component of SAS known as the word scanner breaks the program text into fundamental unitscalled tokens. Tokens are passed on demand to the compiler. The compiler then requests token until it receives a semicolon. Then the compiler performs the syntax check on the statement.

9. If you use a SYMPUT in a DATA step, when and where can you use the macro variable?The macro variable created by the CALL SYMPUT routine cannot be used in the same datastepin which it got created. Other than that we can use the macro variable at any time..

10. What do you code to create a macro? End one?We create a macro with %MACRO statement and end a macro with %MEND statemnt.

11. What is the difference between %PUT and SYMBOLGEN?

%PUT is used to display user defined messages on log window after execution of a programwhere as % SYMBOLGEN is used to print the value of a macro variable resolved, in logwindow.12. How do you add a number to a macro variable?Using %eval function or %sysevalf function if the number is a floating number.

13. Can you execute a macro within a macro? Describe.Yes, Such macros are called nested macros. They can be obtained by using symget and callsymput macros.

14. If you need the value of a variable rather than the variable itself what would you use toload the value to a macro variable?If we need a value of a macro variable then we must define it in such terms so that we can callthem everywhere in the program. Define it as Global. There are different ways of assigning aglobal variable. Simplest method is %LET.

Ex:A, is macro variable. Use following statement to assign the value of a rather than the variableitself%Let A=xyz; %put x="&A";

This will assign "xyz" to x, not the variable xyz to x.

15. Can you execute macro within another macro? If so, how would SAS know where thecurrent macro ended and the new one began?

Yes, I can execute macro within a macro, we call it as nesting of macros, which is allowed.Every macro's beginning is identified the keyword %macro and end with %mend.

16. How are parameters passed to a macro?A macro variable defined in parentheses in a %MACRO statement is a macro parameter. Macroparameters allow you to pass information into a macro.

%macro plot(yvar= ,xvar= );proc plot;plot &yvar*&xvar;run;%mend plot;%plot(age,sex)

17. How would you code a macro statement to produce information on the SAS log?This statement can be coded anywhere?OPTIONS MPRINT MLOGIC MERROR SYMBOLGEN;

Advance SAS Certification Question

Recently update Advance SAS Certification Question

Option to control input outputAns . busize and buffno

The following SAS program is submitted: %macro execute; Proc print data= sasuser.houses; Run; %end; %mend; %execute Which statement completes the program so that it executes on Tuesday? a) %if &sysday=Tuesday %then %do; b) %if &sysday=Tuesday %then %do; c) %if &sysdate= Tuesday %then %do; d) %if &sysdate=Tuesday %then %do;

Assume today is Tuesday, August 15, 2006. Which statement, submitted at the beginning of a SAS session, assigns the value Tuesday, August 15, 2006 to the macro variable START?a) %let start= %eval(today(), weekdate.);b) %let start= %sysfunc(today(), weekdate.);c) %let start= %sysexec(today(), weekdate.);d)%let start= %sysevalf(today(), weekdate.);

The following program is submitted: %let value=0.5; %let add=5; %let newwval=%eval(&value+&add); What is the value of the macro variable NEWVAL?a) 5b) 5.5c)0.5+5d) null

The SAS data set ONE has a variable X on which an index has been created. The data sets ONE and THREE are sorted by X. The following SAS program is submitted: Data two; Set three; Set one key=X; Run;What is the purpose of including the KEY= option in the program?a) It forces SAS to use the index X.b) It re-creates the index X on the output data set TWO.c) It instructs SAS to do a sequential read of both sorted data sets.d) It gives SAS the option to use the index X or to do a sequential read of the data set ONE.

The following SAS program is submitted: Data new(bufsize=6144 bufno=4); Set old; Run;What is the difference between usage of BUFSIZE= AND BUFNO= options?a) BUFSIZE= specifies the size of the input buffer in bytes; BUFNO= specifies the number of input buffers.b) BUFSIZE= specifies the size of the output buffer in bytes; BUFNO= specifies the number of output buffers.c) BUFSIZE= specifies the size of the input buffer in kilobytes; BUFNO= specifies the number of input buffers.d) BUFSIZE= specifies the size of the output buffer in kilobytes; BUFNO= specifies the number of output buffers.

Given the data set SASHELP.CLASS:SASHELP.CLASSNAME AGE------- ------Mary 15Philip 16Robert 12Ronald 15The following SAS program is submitted:%let value = Philip;proc print data = sashelp.class;

run;

Which WHERE statement successfully completes the program and produces a report?a) where upcase(name) = upcase(&value);b) where upcase(name) = %upcase(&value);c) where upcase(name) = "upcase(&value)";d) where upcase(name) = "%upcase(&value)";

The following SAS program is submitted:data combine;merge one two;by id;run;Which SQL procedure program produces the same results?

A. proc sql;create table combine asselect coalesce(one.id, two.id) as id,name,salaryfrom one full join twoon one.id = two.id;quit;B. proc sql;create table combine asselect one.id,name,salaryfrom one inner join twoon one.id = two.id;quit;C. proc sql;create table combine asselect coalesce(one.id, two.id) as id,name,salaryfrom one, twowhere one.id = two.id;quit;D. proc sql;create table combine asselect one.id,name,salaryfrom one full join twowhere one.id = two.id;quit;

Given the SAS data sets CLASS1 and CLASS2:CLASS1 CLASS2NAME COURSE NAME COURSE-------- ----------- -------- ------------Lauren MATH1 Smith MATH2Patel MATH1 Farmer MATH2Chang MATH1 Patel MATH2Hillier MATH2

The following SAS program is submitted:proc sql;select name from CLASS1

select name from CLASS2;quit;The following output is desired:NAME--------ChangLaurenWhich SQL set operator completes the program and generates the desired output?A. UNIONB. EXCEPTC. INTERSECTD. OUTER UNION CORR

The following SAS program is submitted:%macro loop;data one;%do I = 1 %to 3;var&I = &i; %end;run;%mend;%loop

After this program executes, the following is written to the SAS log:(LOOP): Beginning execution.(LOOP): %DO loop beginning; index variable I; start value is 1; stop value is 3; by value is 1.(LOOP): %DO loop index variable I is now 2; loop will iterate again.(LOOP): %DO loop index variable I is now 3; loop will iterate again.(LOOP): %DO loop index variable I is now 4; loop will not iterate again.(LOOP): Ending execution.Which SAS System option displays the notes in the SAS log?A. MACROB. MLOGICC. MPRINTD. SYMBOLGEN

The following SAS program is submitted:data temp;array points{2,3} (10, 15, 20, 25, 30, 35);run;

What impact does the ARRAY statement have in the Program Data Vector (PDV)?

A. The variables named POINTS1, POINTS2, POINTS3, POINTS4, POINTS5, POINTS6 arecreated in the PDV.B. The variables named POINTS10, POINTS15, POINTS20, POINTS25, POINTS30, POINTS35are created in the PDV.C. The variables named POINTS11, POINTS12, POINTS13, POINTS21, POINTS22, POINTS23are created in the PDV.D. No variables are created in the PDV.

Which SAS integrity constraint type ensures that a specific set or range of values are the onlyvalues in a variable?

A. CHECKB. UNIQUEC. NOT NULLD. PRIMARY KEYThe following SAS program is submitted:data new (bufsize = 6144 bufno = 4);set old;run;What is the difference between the usage of BUFSIZE= and BUFNO= options?

A. BUFSIZE= specifies the size of the input buffer in bytes; BUFNO= specifies the number ofinput buffers.B. BUFSIZE= specifies the size of the output buffer in bytes; BUFNO= specifies the number ofoutput buffers.C. BUFSIZE= specifies the size of the input buffer in kilobytes; BUFNO= specifies the number ofinput buffers.D. BUFSIZE= specifies the size of the output buffer in kilobytes; BUFNO= specifies the number ofoutput buffers.

The following SAS program is submitted:%let first = yourname;%let last = first;%put &&&last;What is written to the SAS log?A. FirstB. &&firstC. yournameD. &yournameGiven the following SAS data set ONE:ONEREP COST________________________SMITH 200SMITH 400JONES 100SMITH 600JONES 100JONES 200JONES 400SMITH 800JONES 100JONES 300

The following SAS program is submitted:proc sql;select rep, avg(cost) as AVERAGEfrom one group by rephaving avg(cost) > (select avg(cost) from one);quit;Which one of the following reports is generated?A. REP AVERAGE_______________JONES 200B. REP AVERAGE_________________JONES 320C. REP AVERAGE________________SMITH 320D. REP AVERAGE________________SMITH 500The following SAS program is submitted:%let value = 9;%let value2 = 5;%let newval = %eval(&value / &value2);

Which one of the following is the resulting value of the macro variable NEWVAL?A. 1B. 2C. 1.8D. null

The SAS data set ONE has a variable X on which an index has been created. The data sets ONEand THREE are sorted by X. Which one of the following SAS programs uses the index to selectobservations from the data set ONE?A. data two;set three;set one key = X;run;B. data two;set three key = X;set one;run;C. data two;set one;set three key = X;run;D. data two;set three;set one (key = X);run;

The following SAS program is submitted:proc sql;select rep, area, count(*) as TOTALfrom one group by rep, area;quit;Which one of the following reports is generated?A. REP AREA COUNT-----------------------------------------------JONES EAST 100JONES NORTH 600JONES WEST 500SMITH NORTH 800SMITH SOUTH 200

B. REP AREA TOTAL-----------------------------------------------JONES EAST 100JONES NORTH 600JONES WEST 500SMITH NORTH 800SMITH SOUTH 200

C. REP AREA TOTAL-----------------------------------------------JONES EAST 1JONES NORTH 2JONES WEST 3SMITH NORTH 3JONES WEST 3SMITH NORTH 3SMITH SOUTH 1D. REP AREA TOTAL-----------------------------------------------JONES EAST 1JONES NORTH 2JONES WEST 3SMITH NORTH 3SMITH SOUTH 1SMITH NORTH 3SMITH SOUTH 1

The following SAS program is submitted:data temp;array points{3,2}_temporary_ (10,20,30,40,50,60);score = points{2,1}run;Which one of the following is the value of the variable SCORE in the data set TEMP?A. 10B. 20C. 30D. 40

The following SAS program is submitted:%macro execute;

proc print data = sasuser.houses;run;%end;%mend;Which of the following completes the above program so that it executes on Tuesday?

A. %if &sysday = Tuesday %then %do;B. %if &sysday = 'Tuesday' %then %do;C. %if "&sysday" = Tuesday %then %do;D. %if '&sysday' = 'Tuesday' %then %do;

Which one of the following SAS integrity constraint types ensures that a specific set or range ofvalues are the only values in a variable?A. CHECKB. UNIQUEC. FORMATD. DISTINCT

Which one of the following options displays the value of a macro variable in the SAS log?A. MACROB. SOURCEC. SOURCE2D. SYMBOLGEN

What is the correct syntax to create macro variable with sql?

Select distinct country into:cur seprated by from tablename

The following SAS program is submitted:options yearcutoff = 1950;%macro y2kopt(date);%if &date >= 14610 %then %do;options yearcutoff = 2000;%end;%else %do;options yearcutoff = 1900;%end;%mend;data _null_ ;date = "01jan2000"d;call symput("date",left(date));run;%y2kopt(&date)

The SAS date for January 1, 2000 is 14610 and the SAS system option for YEARCUTOFF is setto 1920 prior to submitting the above program. Which one of the following is the value ofYEARCUTOFF when the macro finishes execution?

A. 1900B. 1920C. 1950D. 2000

Check the symtax what will happn when we submit this program.

Data aa ;Length x y 5 z ;Run ;

Data set will not created.

Which one of the following statements about compressed SAS data sets is always true?A. Each observation is treated as a single string of bytes.B. Each observation occupies the same number of bytes.C. An updated observation is stored in its original location.D. New observations are added to the end of the SAS data set

Given the following SAS data set ONE:

ONELEVEL AGE----------------------1 102 203 202 101 102 303 102 203 301 10

The following SAS program is submitted:proc sql;select level, max(age) as MAXfrom onegroup by levelhaving max(age) > (select avg(age) from one);quit;Which one of the following reports is generated?A. LEVEL AGE-------------------2 203 20B. LEVEL AGE---------------2 303 30C. LEVEL MAX--------------------2 203 30D. LEVEL MAX--------------2 303 30.

The following SAS program is submitted.

filename sales ('external-file1' 'external-file2');data new;infile sales;input date date9. company $ revenue;run;

Which one of the following is the result of including the FILENAME statement in this program?A. The FILENAME statement produces an ERROR message in the SAS log.B. The FILENAME statement associates SALES with external-file2 followed by external-file1.C. The FILENAME statement associates SALES with external-file1 followed by external-file2.D. The FILENAME statement reads record 1 from external-file 1, reads record 1 from external-file2, and combines them into one record

Which technique is use to find the unique value from a data sets?

First. And last.byProc sql uniqueProc sort

Where we cant use not sorted option ?

Merge

Code

CMARP

Proc print data = dataset name ;By code;Run ;No output will print.

Which statement is use to write data in a file ;

File statement

What option will display macro code and macro execution details in log window?

Mlogic and mprint

Data step with view ;

When msg will come to log ; ;

Both time .SAS Macro Interview Question

1. Have you used macros? For what purpose you have used?

Yes I have, I used macros in creating analysis datasets and tables where it is necessary to make a small change through out the program and where it is necessary to use the code again and again.

2. How would you invoke a macro?After I have defined a macro I can invoke it by adding the percent sign prefix to its name like this: % macro name a semicolon is not required when invoking a macro, though adding one generally does no harm.3. How can you create a macro variable with in data step?with CALL SYMPUT

4. How would you identify a macro variable?with Ampersand (&) sign

5. How would you define the end of a macro?The end of the macro is defined by %Mend Statement

6. For what purposes have you used SAS macros?If we want use a program step for executing to execute the same Proc step on multiple data sets. We can accomplish repetitive tasks quickly and efficiently. A macro program can be reused many times. Parameters passed to the macro program customize the results without having to change the code within the macro program. Macros in SAS make a small change in the program and have SAS echo that change thought that program.

7. What is the difference between %LOCAL and %GLOBAL?% Local is a macro variable defined inside a macro.%Global is a macro variable defined in open code (outside the macro or can use anywhere).

8. How long can a macro variable be? A token?A component of SAS known as the word scanner breaks the program text into fundamental units called tokens. Tokens are passed on demand to the compiler. The compiler then requests token until it receives a semicolon. Then the compiler performs the syntax check on the statement.

9. If you use a SYMPUT in a DATA step, when and where can you use the macro variable?The macro variable created by the CALL SYMPUT routine cannot be used in the same datastep in which it got created. Other than that we can use the macro variable at any time..

10. What do you code to create a macro? End one?We create a macro with%MACRO statement and end a macro with %MEND statemnt.