interview topics on sql

Upload: vinsho

Post on 30-May-2018

233 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/9/2019 Interview Topics on SQL

    1/22

    2009

    Vinay Kotha

    CSC

    11/5/2009

    Interview Topics for SQL & MSBI

  • 8/9/2019 Interview Topics on SQL

    2/22

    Author: Vinay Kotha Page 2

    Table of Contents

    Recovery Models: .................................................................................................................................... 4

    SimpleRecovery Model: ...................................................................................................................... 4

    Fullrecovery Model: ............................................................................................................................ 5

    Bulk-Logged: ........................................................................................................................................ 5

    Back-ups .................................................................................................................................................. 6

    Back-up Scopes: ................................................................................................................................... 6

    A) Database backups: ................................................................................................................... 6

    B) Partial Back-ups:....................................................................................................................... 6

    C) File Back-ups: ........................................................................................................................... 6

    Back-Up Types ......................................................................................................................................... 6

    A) Full Backups: ................................................................................................................................ 6

    B) Differential backups: .................................................................................................................... 6

    SQL SERVERREPLICATION........................................................................................................................ 7

    A) Load Balancing: ............................................................................................................................ 7

    B) Offline Processing: ....................................................................................................................... 7

    C) Redundancy: ................................................................................................................................ 7

    A) Publishers .................................................................................................................................... 7

    B) Subscribers .................................................................................................................................. 7

    A) Snapshot Replication:................................................................................................................... 7

    B) TransactionalReplication: ............................................................................................................ 7

    C) MergeReplication: ....................................................................................................................... 8

    A) Expressedition .............................. ...................... ................................ ...................... ................... 8

    B) Workgroup edition ....................................................................................................................... 8

    C) Standard edition .......................................................................................................................... 8

    D) Enterpriseedition ........................................................................................................................ 8

    Difference between Temp tables and Table variables in SQL Server ............................ ...................... ....... 8

    Suggestion forchoosing between these two: ....................................................................................... 9

    Stored Procedures ................................................................................................................................... 9

    Advantages of Stored Procedures: ....................................................................................................... 9

    Differences between User Defined Functions and Stored Procedures .................................................... 10

    SSAS ...................................................................................................................................................... 10

  • 8/9/2019 Interview Topics on SQL

    3/22

    Author: Vinay Kotha Page 3

    Different Dimension types by Microsoft available in Analysis Services.............................. .................. 10

    Different Types of Dimensions ............................................................................................................... 11

    Confirmed Dimension: ....................................................................................................................... 12

    Junked Dimension:............................................................................................................................. 12

    Degenerated Dimension: ................................................................................................................... 12

    Slowly Changing Dimensions:............................................................................................................. 12

    There are 10 types of dimension Tables ............................................................................................. 12

    Differences between Analysis Services 2005 and 2008 .................................................. ......................... 12

    Define temporary and extended stored procedure. ............................................................................... 13

    Differences between SSRS 2005 and SSRS 2008 ..................................................................................... 14

    Performance Tuning of SSRS: Handling a Large workload ............................... ....................... ................. 14

    Steps to Improve Performance........................................................................................................... 14

    Control the Size of yourReports..................................................................................................... 14

    Use Cache Execution ...................................................................................................................... 14

    Configure and Schedule YourReports ............................................................................................ 15

    DeliverRendered Reports forNon-browser Formats ...................................................................... 15

    Populate theReport Cache by Using Data-Driven Subscriptions for Parameterized Reports ........... 15

    Back to Report Catalogs ................................................................................................................. 15

    Tuning with Web Service................................................................................................................ 15

    Memory Limits in SQL ServerReporting Services 2008 ............................................. .......................... 16

    Memory Limit ................................................................................................................................ 16

    Maximum Memory Limit ................................ ....................... ................................ ...................... ... 16

    Performance Tuning of SQL Server......................................................................................................... 16

    Section A: .......................................................................................................................................... 16

    Microsoft Tips on Performance Tuning:.............................................................................................. 17

    Not knowing the performance and scalability characteristics of yoursystem:......................... 17

    Retrieving too much data: ...................................................................................................... 17

    Misuse of Transactions: .......................................................................................................... 17

    Misuse ofIndexes:.................................................................................................................. 17

    Mixing OLTP, OLAP and reporting workloads: ......................................................................... 17

    Inefficient Schemas: ............................................................................................................... 17

    Using an inefficient disksub-system: ...................................................................................... 17

  • 8/9/2019 Interview Topics on SQL

    4/22

    Author: Vinay Kotha Page 4

    SSIS 10 Best Practices: ........................................................................................................................... 17

    SSIS Performance tuning ........................................................................................................................ 18

    Data Flow Optimization Modes .................................................................................................. 18

    Buffers: ...................................................................................................................................... 18

    Buffer Sizing: .............................................................................................................................. 18

    Buffer Tuning: ............................................................................................................................ 19

    Parallelism: ................................................................................................................................ 19

    Extraction Tuning ....................................................................................................................... 19

    Transformation Tuning ................................ ....................... ................................ ...................... .. 20

    Merge-Join Transformation ............................ ....................... ................................ ..................... 20

    Slowly Changing Dimensions ...................................................................................................... 21

    Data Types ................................................................................................................................. 21

    Miscellaneous ............................................................................................................................ 21

    Load Tuning ............................................................................................................................... 21

    Differences between SSIS 2005 and SSIS 2008 ....................................................................................... 22

    Look-up ............................................................................................................................................. 22

    Cache Transformation ............................... ....................... ................................ ...................... ............ 22

    Data Profiling Task ............................................................................................................................. 22

    Script Task and Transformation .............................. ....................... ................................ ..................... 22

    Recovery Models:

    There are 3 recovery Models in SQL Server.

    1) Simple2) Full3) Bulk-Logged

    SimpleRecovery Model: Simplerecovery model allows you to recover data only to the mostrecent full database or differential back-up. Transaction log back-ups are not available because the

    contents of the transaction log are truncated each time a checkpoint is issued for the database.

    Or

  • 8/9/2019 Interview Topics on SQL

    5/22

    Author: Vinay Kotha Page 5

    Simplerecovery model is just that simple, in this approach; SQL Server maintains only a minimal amount

    of information in the transaction log. SQL Server truncates the transaction log each time the database

    reaches a transaction checkpoint, leaving no log entries for disasterrecovery purposes.

    In databases using simplerecovery model, you may restore full or differential back-up only. It is not

    possible to restoresuch a database to a given point in time; you may only restore it to theexact timewhen a full or differential back-up occurred. Therefore, you will automatically lose any data

    modifications made between the time of the most recent full/differential back-up and the time of

    failure.

    Fullrecovery Model: Fullrecovery model uses database back-ups and transaction log back-ups to

    providecomplete protection against failure. Along with being able to restore a full or differential back-

    up, you can recover the database to the point of failure or to a specific point in time. All operations,

    including bulk operationssuch as SELECT INTO, CREATE INDEX and bulk-loading data, are fully logged

    and recoverable.

    Or

    FullRecovery model also bears a self-descriptive name. In this model, SQL Server preserves the

    transaction log until you back it up. This allows you to design a disaster back-up in conjunction with

    transaction log back-ups.

    In theevent of a database failure, you have the most flexibility restoring databases using the full

    recovery model. In addition to preserving data modificationsstored in the transaction log, the full

    recovery model allows you to restore a database to a specific point in time. Forexample, if an erroneous

    modification corrupted your data at 2:36 Am on Monday, you could use SQL Servers point in time to

    restore to roll your database back to 2:35 AM, wiping out theeffects of therecovery.

    Bulk-Logged: Bulk-logged recovery model provides protection against failurecombined with the

    best performance. In order to get better performance, the following operations are minimally logged

    and not fully recoverable: SELECT INTO, bulkload operations.

    Or

    Bulkrecovery model is a special-purpose model that works in a similar manner to the fullrecovery

    model. The only difference is in the way it handles bulk data modification operations. The bulk-logged

    modelrecords these operations in the transaction log using a technicalknown as minimallogging. This

    savessignificantly on processing time, but prevents you from using point-in-timerestore option.

    Microsoft recommends that the bulk-logged recovery model only be used forshort periods of time. Best

    practice dictates that you switch a database to the bulk-logged recovery model immediately before

    conducting bulk operations and restore it to the fullrecovery model when those operationscomplete.

  • 8/9/2019 Interview Topics on SQL

    6/22

    Author: Vinay Kotha Page 6

    Back-ups

    One of the major advantages that enterprise-class databases offer over their desktop counterparts is a

    robust back-up and recovery featureset. Microsoft SQL Server provides database administrators with

    the ability to customize a database backup and recovery plan to the business and technical

    requirements of an organization.

    In this article, weexplore the process of backing up data with Microsoft SQL Server. When you create a

    backup plan, you will need to create an appropriate mix of backups with varying[em] backup

    scopes[/em] and [em]backup types[/em] that meet therecovery objectives of your organization and are

    suitable for your technicalenvironment.

    Back-upScopes: Thescope of a back-up defines the portion of the databasecovered by the

    backup. It defines the database, file and or file-group that SQL Server will backup. There are three

    different types of back-up scope available in Microsoft SQL Server:

    A)

    Database backups: Thesecover theentire database including allstructuralschemainformation, theentire data contents of the database and any portion of the transaction log

    necessary to restore the database from scratch to itsstate at the time of the backup. Database

    backups are thesimplest way to restore your data in theevent of a disaster, but they consume a

    large amount of diskspace and time to complete.

    B) Partial Back-ups: These are good alternatives to database back-ups for very largedatabases that contain significant quantities ofread-only data. If you haveread-only file-groups

    in your database, it probably doesnt makesense to back them up frequently, as they do not

    change. Therefore, thescope of a partial back-up includes all files in the primary file-group; all

    read/write file-groups, and any read-only file- groups that you explicitly specify.

    C) File Back-ups: This allows you to individually back-up files and/or file-groups from yourdatabase. They may be used to complement partial back-ups by creating one-time-only backups

    of yourread-only file-groups. They may also play a role in complex back-up models.

    Back-UpTypes

    Thesecond decision you need to make when planning a SQL Server database backup model is the type

    each backup included in your plan. The backup type describes the temporalcoverage of the database

    backup. SQL Serversupports two different back-up types:

    A) Full Backups: This includes all data within the backup scope. Forexample, a full databasebackup will include all data in the database, regardless of when it waslast created for modified.

    Similarly, a full partial backup will include theentirecontents ofevery file and file-group within

    in thescope of the partial backup.

    B) Differentialbackups: This includes only the portion of data that had changed since thelast full backup. Forexample, if you perform a full database backup on Monday morning and

    then perform a differential backup on Monday evening, the differential backup will be a much

  • 8/9/2019 Interview Topics on SQL

    7/22

    Author: Vinay Kotha Page 7

    smaller file and takes much less time to create, this includes only the data changed during the

    day on Monday.

    You should keep in mind that thescope and type of a backup are two independent decisions made

    when creating your backup plan. As described above, each type and scope allows you to customize

    the amount of data included in the backup and, therefore, the amounts of timerequired to backupand restore the database in theevent of a disaster.

    SQLSERVERREPLICATION

    SQL Serverreplication allows database administrators to distribute data to variousservers

    throughout an organization. You may wish to implement replication in your organization for a

    number ofreasons, such as

    A) Load Balancing:Replication allows you to disseminate your data to a number ofserversand then distribute thequery load among thoseservers.

    B) OfflineProcessing: you may wish to manipulate data from your database on a machinethat is not alwaysconnected to the network.

    C) Redundancy:Replication allows you to build a fail-over databaseserver thatsready to pickup the processing load at a moments notice.

    In any replication scenario there are 2 main components:

    A) Publishers have data to offer to the otherservers. Any given replication scheme may haveone or more publishers.

    B) Subscribers are databaseservers that wish to receive updates from the publisher when thedata is modified

    Theres nothing preventing a singlesystem from acting both of thesecapabilities. In fact, this is often

    done in large-scale distributed databasesystems. Microsoft SQL Serversupports three types of database

    replication. They are

    A) SnapshotReplication:It acts in the manner its name implies. The publishersimply takes asnapshot of theentirereplicated database and shares it with thesubscribers. Ofcourse, this is a

    very time and resource-intensive process. For thisreason, most administrators dont use

    snapshot replication on a recurring basis for databases that change frequently. There are two

    scenarios wheresnapshot replication iscommonly used. First, it is used for databases that rarelychange. Second, it is used to set a baseline to establish replication between systems while future

    updates are propagated using transactional or mergereplication.

    B) TransactionalReplication: This offers a more flexiblesolution for databases thatchange on a regular basis. With transactionalreplication, thereplication agent monitors the

    publisher forchanges to the database and transmits thosechanges to thesubscribers. This

    transmission can take place immediately or on a periodic basis.

  • 8/9/2019 Interview Topics on SQL

    8/22

    Author: Vinay Kotha Page 8

    C) MergeReplication:It allows the publisher and subscriber to independently makechangesto the database. Both entitiescan work without an active networkconnection. When they are

    reconnected, the mergereplication agentschecks forchanges on both sets of data and modifies

    each database accordingly. Ifchangesconflict with each other, it uses a predefined conflict

    resolution algorithm to determine the appropriate data. Mergereplication iscommonly used by

    laptop users and others who cannot beconstantly connected to the publisher.

    Each one of thesereplication techniquesserves a useful purpose and is well-suited to particular

    databasescenarios.

    If you are working with SQL Server 2005, youll need to choose youredition based upon your

    replication needs. Each edition has differing capabilities.

    A) Express edition hasextremely limited replication capabilities. It is able to act as areplication client only.

    B) W

    orkgroupedition addslimited publishing capabilities. It is able to serve fiveclients usingtransactionalreplication and up to 25 clients using mergereplication. It can also act as a

    replication client.

    C) Standard edition has full, unlimited replication capabilities with other SQL Serverdatabases

    D) Enterpriseedition adds a powerful tool for those operating in a mixed databaseenvironmentsitscapable ofreplication with oracle databases

    As you have undoubtedly recognized by this point, SQL Serversreplication capabilities offer

    database administrators a powerful tool for managing and scaling databases in an enterprise

    environment.

    Differencebetween Temp tables and Table variables in SQLServer

    1) Transaction log are not recorded for table variablesso they are transactional neutral or you cansay that they are out ofscope of transaction mechanism. Whereas temp tables participate in

    transactions just like normal tables

    2) Table variablescannot be altered it means no DDL action is allowed on them. Whereas temptablescan be altered

    3) Stored procedures with a temporary tablecannot be pre-compiled, while an execution plan ofprocedures with table variablescan bestatically compiled in advance. Pre-compiling a script

    gives a major advantage to itsspeed ofexecution. This advantagecan be dramatic forlong

    procedures, whererecompilation can be too pricy.

    4) Unlike temp tables, table variables are memory resident but not always. Under memorypressure, the pages belonging to a table variablecan be pushed out to tempdb.

    5) Therecan be big performance differences between using table variables and temporary tables.In most cases, temporary tables are faster than table variables. Although queries using table

    variables didnt generate parallelquery plans on a large SMP box, similarqueries using

  • 8/9/2019 Interview Topics on SQL

    9/22

    Author: Vinay Kotha Page 9

    temporary tables (local or global) and running undersamecircumstances did generate parallel

    plans.

    6) Table variables use internal metadata in a way that prevents theengine from using a tablevariable with parallelquery. SQL Server maintainsstatistics forqueries that use temporary

    tables but not forqueries that use table variables. Without statistics, SQL Server might choose a

    poor processing plan for a query that contains a table variable.

    No Statistics is maintained on the table variable which means that any changes in data

    impacting table variable will not causerecompilation ofqueries accessing table variable. Queries

    involving table variables dont generate parallel plans.

    Suggestion forchoosing between these two:

    1) Use table variable where you want to pass table to the SP as parameter because there is nootherchoice.

    2) Its found that table variable areslow in SQL Server 2005 than in 2000 on similar data andcircumstances, so if you have used table variablesextensively in your database and planning to

    migrate from 2000 to 2005, make yourchoicecarefully.

    3) Table variable areOK if used in smallqueries and for processing small amount of data otherwisego for temp tables.

    4) If you are using very complex businesslogic in your SP, its better using temp tables than tablevariables.

    Stored ProceduresA stored Procedure is a group of SQLstatements that form a logical unit and perform a particular

    task. Stored procedures are used to encapsulate a set of operations orqueries to execute on a

    databaseserver. Forexample, operations on an employee database (hire, fire, promote, lookup)could becoded asstored procedureexecuted by application code. Stored procedurescan be

    compiled and executed with different parameters and results, and they may have any combination

    of input, output, and input/output parameters.

    Advantages ofStored Procedures:

    A) Precompiled execution: SQL Servercompileseach stored procedure once and then reutilizestheexecution plan. Thisresults in tremendous performance boosts when stored procedures are

    called repeatedly.

    B) Reduced client/server traffic:If network bandwidth is a concern in yourenvironment, youll behappy to learn that stored procedurescan reducelong SQLqueries to a singleline that ittransmitted over the wire.

    C) Efficient re-use of code and programming abstraction: Stored Procedurescan be used bymultiple users and client programs. If you utilize them in a planned manner, youll find the

    development cycle takesless time.

    D) Enhanced security controls: you can grant users permissions to execute a stored procedureindependently of underlying table permissions.

  • 8/9/2019 Interview Topics on SQL

    10/22

    Author: Vinay Kotha Page 10

    Differences between UserDefined Functions and Stored Procedures

    Stored procedures are very similar to user-defined functions, but there aresuitable differences.

    Both allow you to create bundles of SQLstatements that arestored on theserver for future use. This

    offers you a tremendousefficiency benefit, as you can save programming by

    A) Reusing code from one program to another, cutting down on program development timeB) Hiding the SQL details, allowing database developers to worry about SQL and application

    developers to deal only in higher-levellanguages

    C) Centralize maintenance, allowing you to make businesslogicchanges in a single place thatautomatically affect all dependent applications

    At first glance, functions and stored proceduresseem identical. However, there areseveralsubtle, yet

    important differences between the two:

    A) Stored procedures arecalled independently, using the EXEC command, while functions arecalled from within another SQLstatement

    B) Stored procedures allow you to enhance application security by granting users and applicationspermission to usestored procedures, rather than permission to access the underlying tables.

    Stored procedures provide the ability to restrict user actions at a much more granularlevel than

    standard SQL Server permissions. Forexample if you have an inventory table that cashiers must

    updateeach time an item issold (to decrement the inventory for that item by 1 unit), you can

    grant cashiers permissions to use a decrement item stored procedure, rather than allowing

    them to make arbitrary changes to the inventory table.

    C) Functions always must return a value (either a scalar value or a table). Stored procedures mayreturn a scalar value, a table value or nothing at all.

    Overall, stored procedures are one of the greatest treasures available to SQL Server developers. The

    efficiency and security benefits are well worth the upfront investment in time.

    SSAS

    DifferentDimension types by Microsoftavailablein Analysis Services

    1) Regular2) Time3) Organization4) Geography5) Bill of Materials6) Accounts7) Customers8) Products9) Scenario10)Quantitative

  • 8/9/2019 Interview Topics on SQL

    11/22

    Author: Vinay Kotha Page 11

    11)Utility12)Currency13)Rates14)Channel15)PromotionRegular: A dimension whose type has not been set to a special dimension type

    Time: A dimension whose attributesrepresents time periods, such as years, semesters, quarters,

    months and days

    Organization: A dimension whose attributesrepresents organizational information such as

    employers orsubsidiaries

    Geography: A dimension whose attributerepresents geographic information, such ascities or postal

    codes

    Bill of Materials: A dimension whose attributesrepresent inventory r manufacturing information

    such as partslists for products

    Accounts: A dimension whose attributesrepresent a chart of accounts for financialreporting

    purposes

    Customers: A dimension whose attributerepresent customer orcontact information

    Products: Dimensions whose attributerepresent product information

    Scenario: Dimensions whose attributerepresent planning orstrategic analysis information

    Quantitative: Dimensions whose attributesrepresent quantitative information

    Utility: Dimensions whose attributerepresent miscellaneous information

    Currency: A dimension whose attributesrepresentscurrency rate information

    Rates: Dimensions whose attributerepresent currency rate information

    Channel: A dimension whose attributerepresent channel information

    Promotion: Dimensions whose attributerepresent marketing promotion information.

    DifferentTypes ofDimensions

    1) Confirmed Dimension2) Junk Dimension3) Degenerated Dimension4) Slowly changing dimensions

  • 8/9/2019 Interview Topics on SQL

    12/22

    Author: Vinay Kotha Page 12

    Confirmed Dimension: These dimensions aresomething that is built once in your model and

    can dereused multiple times with different fact tables. Forexampleconsider a modelcontaining

    multiple fact tables, representing different data-marts. Now look for a dimension that iscommon to

    these fact tables. In thisexampleletsconsider that the product dimension iscommon and hence

    can bereused by creating short cuts and joining the different fact tables. Some of theexamples are

    time dimension arecustomer dimension, product dimension.

    Junked Dimension: When you consolidatelots ofsmall dimensions and instead of having 100s

    ofsmall dimensions, that will have few records in them, cluttering your database with these mini

    identifier tables, allrecords from all thesesmall dimension tables areloaded into ONE dimension

    table and wecall this dimension table as JUNK dimension table. (Since we arestoring all the Junk in

    this one table) Forexample a company might have handful of manufacture plants, handful of order

    types, and so on, so forth, and wecan consolidate them into one dimension tablecalled Junk

    dimension table

    Degenerated Dimension: An item that is in the fact table but isstripped off of itsdescription, because the description belongs in dimension table, isreferred to as Degenerated

    Dimension. Since it lookslike dimension, but isreally in fact table and has been degenerated of its

    description, hence iscalled as Degenerated Dimension.

    Slowly Changing Dimensions: These dimensions are those wherekey value willremain

    static but description might change over the period of time

    Thereare 10 types of dimension Tables(This is not thecase in most of the instances)

    1) Primary Dimensions2) Secondary Dimensions3) Degenerate Dimensions4) Confirmed Dimensions5) Slowly Changing Dimensions6) Rapidly Changing Dimensions7) Large Dimensions8) Rapidly Changing Monster Dimensions9) Junk Dimensions10)Role-Playing Dimensions

    Differences between Analysis Services 2005 and 2008

    A) Real time best practice design warnings. These warnings are implemented in AMO, exposed intheUI via bluesquiggly lines, and can be dismissed individually (a single occurrence) or turned

    off all together. To disable/re-enable build project and then in the warning window select

    warning message and right mouseclick to choose disable orenable.

    B) New Dimension Design Wizard

  • 8/9/2019 Interview Topics on SQL

    13/22

    Author: Vinay Kotha Page 13

    C) New Cube Design WizardD) Attribute relationship tab in dimension designer. Allowseasier to define and understand

    attributerelationship.

    E) CREATE MEMBER syntax extensions to support defining caption, display folders and associatedmeasure group.

    F) CREATE SET syntax extensions to support defining caption and display folders as well as theability to define dynamic named sets.

    G) CREATE KPIcommand is addedH) Backup performance improvements.In SSAS 2005 backup time for big databases grew

    exponentially. In SSAS 2008 backup time grow islinear. Redesigned backup storage willremove

    backup sizelimits.

    I) Write-back to MOLAP Analysis Services 2008 removes therequirement to query ROLAPpartitions when performing write-backs, which results in huge performance gains.

    J) Scale-out Analysis Services. A singleread-only copy of Analysis Services databasecan besharedbetween many Analysis Services through a virtualIP address. Thiscreates a highly scalable

    deployment option for an Analysis Services Solution.

    K) UPDATE MEMBER new statement. TheUPDATE MEMBERstatement updates an existingcalculated member while preserving therelative precedence of this member with respect to

    othercalculations. Therefore, you cannot use theUPDATE MEMBERstatement to change

    SOLVEORDER. An UPDATE MEMBERstatement cannot bespecified in the MDX script for a Cube.

    L) Block Computation. Thiseliminates unnecessary aggregation calculations (forexample, whenthe values to be aggregated areNULL) and provides a significant improvement in analysiscube

    performance, which enables users to increase the depth of their hierarchies and complexity of

    computations.

    M) Aggregation Designer Changes. Algorithm that builds aggregations will be improved, there willbesupport for manualedit/create/delete of aggregations and weshould be able to see what

    aggregates was designed. Also aggregation designer will have built-in validations for optimal

    design assistance.

    N) Data Management Views (DMV). These DMVs will allow writing SELECT typestatements againstSSAS instance to get performance and statistics information.

    O) SSAS database attach/detachP) Analysis Services Personalization Extensions

    Define temporary and extended stored procedure.

    Answer - Temporary Stored Procedure isstored in Tempdb database. It is volatile and is deleted once

    connection gets terminated orserver isrestarted......

  • 8/9/2019 Interview Topics on SQL

    14/22

    Author: Vinay Kotha Page 14

    Differences between SSRS 2005 and SSRS 2008

    1) For SSRS 2005, it required Internet information services (IIS) to rum, where as in SSRS 2008, it nolongerrequiresIIS. 2008 useshttp.sys driver and listens forreport requests through http.sys.

    Not only does thisreduce deployment headaches, it also reducesserver overhead

    2) SSRS 2005 used more memory and it wasextremely resource intensive, so much so that manycompanies would install it on other machine apart from SQL Server, but 2008 utilizes memory

    moreefficiently, especially when working with reports that contain largesets of data.

    Additionally, SSRS 2008 will often load the first page of a report much faster than 2005.

    PerformanceTuning ofSSRS: Handling aLarge workload

    To get the highest performance when handling large workloads that include userrequests forlarge

    reports, implement the following recommendations

    Steps to ImprovePerformance

    1) Control thesize of yourreports2) Use Cache Execution3) Configure and Schedule yourreports4) DeliverRendered Reports forNon-browser Formats5) Populate theReport Cache by Using Data-Driven Subscriptions for Parameterized Reports6) Back to theReport Catalogs7) Tuning the Web Service

    Control the Size ofyour Reports

    you will first want to determine the purpose of thesereports and whether a large multi-pagereport is

    even necessary. If a largereport is necessary, how frequently will it be used? If you provide users with

    smallersummary reports, can you reduce the frequency with which users attempt to access thislarge

    multi-pagereport? Largereports have a significant processing load on thereport server, thereport

    servercatalog, and report data, so it is necessary to evaluateeach report on a case-by-case basis

    Somecommon problems with theselargereports are that they contain data fields that are not used in

    thereport or they contain duplicate datasets. Often usersretrieve more data than they really need. To

    significantly reduce theload placed on yourReporting Servicesenvironment, createsummary reports

    that use aggregatescreated at the data source, and include only the necessary columns,. If you want to

    provide data feeds, you can do this asynchronously using more appropriate toolssuch as SSIS, to provide

    the data feed.

    Use Cache Execution

    If thereports do not need to haveliveexecution, enable thecacheexecution setting foreach of your

  • 8/9/2019 Interview Topics on SQL

    15/22

    Author: Vinay Kotha Page 15

    appropriatereports. Thissetting causes thereport server to cache a temporary copy of thosereports in

    memory.

    Configure and Schedule Your Reports

    For yourlargereports, use theReport Execution Timeoutssetting to control how long a report can

    execute before it times out. Somereportssimply need a long time to run, so timeouts will not help youthere, but ifreports are based on bad orrunaway queries, execution timeoutsensure that resources are

    not being inappropriately utilized

    If you havelargereports that create data processing bottle-necks, you can mitigateresourcecontention

    issues by using Scheduled Snapshots. Instead of thereport data itself, a regularly scheduled report

    execution snapshot is used to render thereport. Thescheduled snapshot can beexecuted during off-

    peak hours, leaving moreresources available forlivereports for users during peak hours.

    Deliver Rendered Reports for Non-browser Formats

    rendering performance of non-browser formatssuch as PDF and XLS has improved SQL Server 2008Reporting Services, nevertheless, to reduce theload on your SQL ServerReporting Services

    environment, you can place non-browser format reports onto a fileshare and/or Sharepoint, so users

    can access the file directly instead ofcontinually regenerating thereport.

    Populate the Report Cache by Using Data-Driven Subscriptions for

    Parameterized Reports

    For yourlarge parameterized reports; you can improve performance by pre-populating thereport cache

    using data-driven subscriptions. Data-driven subscriptionsenableeasier population of thecache forset

    combinations of parameter values that are frequently used when the parameterized report isexecuted.

    Note that if you choose a set of parameters that are not used, you take on thecost ofrunning thecachewith little value in return. Therefore, to identify the more frequent parameter valuecombinations,

    analyze the Execution-Log2 view. Ultimately, when a user opens thereport, thereport servercan now

    use a cached copy of thereport instead ofcreating thereport on demand. You can schedule and

    populate thereport cache by using data-driven subscriptions.

    Back to Report Catalogs

    You can also increase thesize of yourreport servercatalogs, which allows the database to store more of

    thesnapshot data.

    Tuning with Web ServiceIIS and Http.Sys tuning helps get thelast incremental performance out of thereport servercomputer.

    Thelow-level options allow you to change thelength of the HTTP request queue, the duration that

    connections arekept alive, and so on. Forlargeconcurrent reporting loads, it may be necessary to

    change thesesettings to allow yourservercomputer to accept enough requests to fully utilize theserver

    resources.

  • 8/9/2019 Interview Topics on SQL

    16/22

    Author: Vinay Kotha Page 16

    you should consider this only if yourservers are at maximum load and you do not see fullresource

    utilization or if you experienceconnection failures to theReporting Services.

    MemoryLimits in SQLServerReporting Services 2008

    Memory Limit

    Thisconfiguration issimilar to WorkingSetMinimum in SQL Server 2008. Its default is 60% of physical

    memory. Increasing the value helpsReporting Services handle morerequests. After this threshold is

    reached, no new requests are accepted.

    MaximumMemory Limit

    Thisconfiguration issimilar to WorkingSetMaximum in SQL Server 2008. Its default is 80% of physical

    memory. But unlike SQL Server 2008 version, when its threshold isreached, it starts aborting process

    instead ofrejecting new requests

    PerformanceTuning ofSQLServer

    Section A:

    Increasing the min memory perquery option to improve the performance ofqueries that usehashing orsorting operations, if your SQL Server has a lot of memory available and there are

    many queriesrunning concurrently on theserver. Default min memory perquery option is

    equal to 1024 kb.

    Increasing the max asyncIO option if the SQL Server works on a high performanceserver withhigh-speed intelligent disksubsystem (such as hardware-based RAID with more than 10 disks)

    Changing the Network Packet Size option to the appropriate value. By default packet size is4096 kb, forqueries with high amounts of data packet sizecan be increased accordingly

    You can increase the Recovery Interval value Increasing the Priority boost for SQL Server options to 1. By default it isset to 0. Set the Max Worker Threads options to maximum number of userconnections to your SQL

    Server box.

    The default setting for the max worker threads options is 255. If the number of user

    connections will beless than the max worker threads value, a separate operating system

    thread will becreated foreach client connection, but if the number of userconnections will

    exceed this value the thread pooling will be used. Forexample, if the maximum number of the

    userconnections to your SQL Server box isequal to 50, you can set the max worker threads

    option to 50, this frees up resources for SQL Server to useelsewhere. If the maximum number ofthe userconnections to your SQL Server box isequal to 500, you can set the max worker

    threads options to 500, thiscan improve SQL Server performance because thread pooling will

    not be used.

    Specify the Min Server Memory and Max Server Memory options Specify the Set Working Set Size SQL Server option to reserve the amount of physical memory

    space for SQL Server.

  • 8/9/2019 Interview Topics on SQL

    17/22

    Author: Vinay Kotha Page 17

    MicrosoftTips on PerformanceTuning:

    Not knowing the performance and scalability characteristics ofyoursystem:If performance and scalability of a system are important to you, the biggest

    mistake that you can make is to not to know the actual performance and scalability

    characteristics of important queries, and theeffect the different queries have on each otherin a multiusersystem. You achieve performance and scalability when you limit resource use

    and handlecontention for thoseresources. Contention iscaused by locking and by physical

    contention. Resource use includes CPU utilization, networkI/O, diskI/O and memory use.

    Retrieving too much data: A common mistake is to retrieve more data than youactually require. Retrieving too much data leads to increased network traffic, and increased

    server and client resources. Thiscan include both thecolumns and rows.

    Misuse ofTransactions:Long-running transactions, transactions that depend on userinput to commit, transactions that nevercommit because of an error, and non-transactional

    queries inside transactionscausescalability and performance problems because they lock

    resourceslonger than needed.

    Misuse ofIndexes: if you do not create indexes that support thequeries that are issuedagainst yourserver, the performance of your application suffers as a result. However, if you

    have too many indexes, then insert and update performance of your application suffers. You

    have to find a balance between the indexing needs of the writes and reads that is based on

    how your application is used.

    Mixing OLTP, OLAP and reporting workloads:OLTP workloads arecharacterized by many small transactions, with an expectation of very quickresponse time

    from the user. OLAP and reporting workloads arecharacterized by a few-long running

    operations that might consume moreresources and cause morecontention. Thelong-

    running operations arecaused by locking and by the underlying physicalsub-system. You

    must resolve thisconflict to achieve a scalablesystem.

    Inefficient Schemas: Adding indexescan help improve performance, however theirimpact may belimited if yourqueries are inefficient because of poor table design that

    results in too many join operations or in inefficient join operations. Schema design is a key

    performance factor. It also provides information to theserver that may be used to optimize

    query plans. Schema design islargely a tradeoff between good read performance and good

    write performance. Normalization helps write performance. De-normalization helpsread

    performance

    Using an inefficient disk sub-system: the physical disksub-system must provide adatabaseserver with sufficient I/O processing power to permit the databaseserver to run

    without diskqueuing orlong I/O waits.

    SSIS 10 BestPractices:

    1) SSIS is an in-memory pipeline, so ensure all transformations occur in memory

  • 8/9/2019 Interview Topics on SQL

    18/22

    Author: Vinay Kotha Page 18

    2) Plan forcapacity by understanding resource utilization3) Baselinesourcesystem extract speed4) Optimize SQL data source, lookup transformations and destination5) Tune your network6) Use data types yes, back to data types wisely7) Change the design8) Partition the problem9) Minimizelogged operations10)Schedule and distribute it correctly

    SSISPerformance tuning

    SSIS architecture has two engines, Run-Timeengine and Data Flow engine. Run-Timeengine is a highly

    parallelcontrol flow engine that co-ordinates theexecution of tasks or units work within SSIS and

    manages theengine threads that carry out those tasks. Data-Flow engine manages the data pipeline

    within a data flow task.

    DataFlow Optimization ModesData flow task has a property called RunInOptimizedMode. When this property isenabled,

    any down-stream component that doesnt use any of thesourcecomponent columns is

    automatically disabled, and unused column is also automatically disabled. The net result of

    enabling the RunInOptimizedMode property is the performance of theentire data-flow task is

    improved

    SSIS projects also have a RunInOptimizedMode property. This indicates that the

    RunInOptimizedMode property of all the data-flow tasks in the project is overridden at design

    time, and that all of data-flow tasks in the project run is optimized mode during debugging.

    Buffers:A buffer is an in-memory dataset object utilized by the data flow engine to transform data. The

    data flow task has a configurable property called DefaultMaxBufferSize, which isset to 10,000

    by default. Data-flow task also has a configurable property called DefaultBufferSize, which is

    set to 10 MB by default. Additionally, data-flow task has a property called MaxBufferSize,

    which isset to 100 MB and cannot bechanged.

    BufferSizing:When performance tuning a data-flow task, the goalshould be to pass as many records as

    possible through a single buffer whileefficiently utilizing memory. This begs thequestion: what

    does efficiently utilizing memory mean? SSIS estimates thesize of a bufferrow by calculating

    the data source meta-data at design time. Optimally, the bufferrow sizeshould be assmall as

    possible, which can be accomplished by employing thesmallest possible data-type foreach

    column. SSIS automatically multiplies theestimated bufferrow size by the

    DefaultMaxBufferRows setting to determine how much memory to allocate to each buffer in

  • 8/9/2019 Interview Topics on SQL

    19/22

    Author: Vinay Kotha Page 19

    the data-flow engine. If this amount of memory exceeds Max Buffer Size100 MB, SSIS

    automatically reduces the number of bufferrows to fit within the 100 MB boundary.

    Data-flow task has another property called MinBufferSize, which is 64 KB and cannot be

    changed. If the amount of memory estimated by SSIS to be allocated foreach buffer is below 64

    KB, SSIS will automatically increase the number of bufferrows per buffer in order to exceed

    MinBufferSize memory boundary.

    BufferTuning:Data-flow task has a property called BufferSizeTuning. When the value of this property isset

    to true, SSIS will add information to the SSIS log indicating where SSIS had adjusted the buffer

    size. While buffer tuning, the goalshould be to fit as many rows into buffer as possible. Thus,

    the value for DefaultMaxBufferRows should be aslarge as possible without exceeding a total

    buffersize of 100 MB.

    Parallelism:SSIS natively supports the parallelexecution of packages, tasks and transformations. Therefore,

    parallelism can greatly improve the performance of a package when it isconfigures with-in the

    constraints ofsystem resources. A package has a property called MaxConcurrentExecutables,

    which can beconfigured to set the maximum number of threads that can execute in parallel per

    package. By default this isset to -1, which translates to the number oflogical machine

    processors plus 2. All orsome of the operations in a packagecan execute in parallel.

    Additionally, data-flow task has a property called EngineThreads, which defines how many

    threads the data-flow enginecan create and run in parallel. This property appliesequally to boththesource threads that the data flow enginecreates forsources and the worker threads that

    theenginecreates for transformations and destinations. Forexample, setting the EngineThreads

    property to 10 indicates that the data-flow enginecan create upto 10 source threads and 10

    worker threads.

    Extraction Tuninga) Increase the connectionmanagers packet size property:Useseparateconnection

    managers for bulkloading and smaller packet size for ole-db command transformations

    b) Affinitize network connections: thiscan be accomplished if a machine has multiple

    cores and multipleNICs.

    c) Tune Queries:

    --Select only needed columns

    --Use a hint to specify that no shared locks be used during theselect (query can potentially

    read uncommitted data). Used only if thequery must have the best performance

    d) Look-ups

    -- Select only needed columns

  • 8/9/2019 Interview Topics on SQL

    20/22

    Author: Vinay Kotha Page 20

    --Use the Shared Look-up Cache (available in 2008)

    e) Sorting

    Merge and Merge-Join transformationsrequiresorted inputs. Source data for these

    transformations that is already sorted obviates the need for an upstream Sort transformation

    and improves data flow performance. The following properties must beconfigured on a source

    component if thesource data is already sorted

    a) IsSorted: The outputs of a sourcecomponent have a property called IsSorted. The value of

    this property must be true.

    b) Sort Key Position: Each output column of a sourcecomponent has this property, which

    indicates whether a column issorted, thecolumnssort order and thesequence in which

    multiplecolumns aresorted. This property must beset foreach column ofsorted data.

    Transformation TuningPartially Blocking (Asynchronous): Merge, Merge-Join, union allcan possible be optimized in the

    sourcequery

    Use SSIS 2008:

    --Improved data flow taskscheduler

    --Union All transforms no longer necessary to split up and parallelizeexecution trees

    Blocking Transformations (Asynchronous): Aggregate, Sort, Pivot, Un-Pivot should belimited

    one per data flow on thesame data

    Aggregate Transformations: This transformations includes the Keys, KeyScale, CountDistinctKeys

    and CountDistinctScale properties, which improves performance by enabling the transformation

    to pre-allocate the amount of memory that the transformation needs for the data that the

    transformation caches. If theexact or approximate number of groups that areexpected to result

    from a Group By operation isknown, then set the Keys and KeyScale propertiesrespectively. If

    theexact or approximate number of distinct values that areexpected to result from a DistinctCount operation isknown, then set the CountDistinctKeys and CountDistinctScale properties

    respectively.

    If thecreation of multiple aggregations in a data flow is necessary, then consider thecreation of

    multiple aggregations that use one Aggregate transformation instead ofcreating multiple

    transformations. Performance is improved with this approach because when one aggregation is

    a subset of another aggregation, the transformations internalstorage is optimized by scanning

    the incoming data only once. Forexample, if an aggregation uses a Group By clause and an AVG

    aggregation, then performancecan be improved by combining them into one transformation.

    However, aggregation operations areserialized when multiple aggregations are performed

    within one aggregation transformation. Therefore, performance might not be improved whenmultiple aggregations must becomputed independently.

    Merge-Join TransformationMax Buffers PerInput: this property specifies the maximum number of buffers that can be

    active foreach input at one time. This property can be used to tune the amount of memory that

    buffersconsume, and consequently the performance of the transformation. As the number of

    buffers increase, the more memory the transformation useswhich improves performance. The

  • 8/9/2019 Interview Topics on SQL

    21/22

    Author: Vinay Kotha Page 21

    default value of this property is 5. This is the number of buffers that works well in most

    scenarios. Performancecan be tuned by using a slightly different number of bufferssuch as 4 or

    6.using a very small number of buffersshould be avoided if possible. Forexample, there is a

    significant impact on performance when MaxBuffersPerInput isset to 1 instead of 5.

    Additionally, MaxBuffersPerInput shouldnt beset to 0 orless. Throttling doesnt occur with this

    range of values. Also, depending on the data load the amount of memory available, the package

    may not complete.

    Slowly Changing Dimensionsthis wizard creates a set of data flow transformation components which work together with the

    slowly changing dimension transformation component. This wizard createsOLE DB Command

    transformation components that perform Updates against a singlerow at a time. Performance

    can be improved by replacing these transformation components with destination components

    that save allrows to be updated to a staging table. Then, an Execute SQL Taskcan be added that

    performs a singleset-based T-SQLUpdatestatement against allrows at thesame time.

    DataTypes1) Use thesmallest possible data-types in the data flow.

    2) Use the CAST or CONVERT functions in thesourcequery if possible

    Miscellaneous1) Sort in the Query if possible

    2) if possible, use the T-SQL Mergestatement instead of the SCD transformation

    3) If possible, use the T-SQLInsert Into statement instead of the data flow task

    4) A data reload may perform better than a delta refresh

    Load TuningUse the SQL Server Destination

    1) Only helps if the data flow and the destination databases are on thesame machine

    2) Weakererror handling then theOLE DB Destination

    3) Set Commit Size = 0

    Use OLE DB Destination

    1) Set Commit Size = 0

    Drop Indexes basedon the expected % load growth

    1) Dont drop an index if its the only clustered index: Data in a table issorted by a clustered

    index. Primary keys areclustered indexes. Loading will always be faster than dropping and

    recreating a primary key, and usually be faster than dropping and recreating a clustered index

    2) Drop a non-clustered index if theload willcause 100% increase: This is therule of thumb

    3) Dont drop non-clustered index if theload increase is under 10%:Not a rule of thumb,

    experiment to find out the optimal value.

    Use Partitions ifNecessary

    1) Use the SQL Server Profiler to trace the performance

    2) see The Data Load Performance Guide

    3) Use the Truncatestatement instead of the t-sql Deletestatement. Delete is a logged

  • 8/9/2019 Interview Topics on SQL

    22/22

    Author: Vinay Kotha Page 22

    operation which performsslower than Truncate

    4) Affinitize the network

    Differences between SSIS 2005 and SSIS 2008There is no difference between the architecture of both the SSIS 2005 and SSIS 2008. 2008 hassome

    additional features which 2005 did not have, it can besaid that 2008 is theenhancement of features to

    the 2005 version.

    Look-up

    In 2005 for ErrorOutput look-ups had only 3 options Fail Component, Ignore Failure and Re-direct row.

    But in 2008 it has an additional feature No match Out-Put

    In 2005 it did not had the Cache mode, while 2008 has 3 different Cache modes Full Cache, Partial Cache

    and No Cache

    2005 didnt have the Connection Manager types while 2008 hasOLE DB Connection Manager and CacheConnection Manager

    CacheTransformation

    2005 did not have this transformation; it is introduced in 2008 version. This is a Data-flow

    transformation. Cache transformation writes data from a connected data source in the data-flow to a

    Cache Connection Manager. TheLook-up transformation in a package performslookups on the data

    In a single package, only one Cache Transformation can write data to thesame Connection Manager. If

    the packagecontains multiple Cache transforms, then first Cache transform that arecalled when the

    packageruns, writes the data to theconnection manager. The write operations ofsubsequent cache

    transforms fail.Configuring ofthe Cache can be made in the following way

    1) Specify theconnection manager

    2) Map the input columns in thecache transform to destination columns in the Cache

    connection manager

    DataProfiling Task

    2005 did not have this Task while it is introduced in 2008; this is a Control-flow task. It lets you analyze

    data in a SQL Server database and from theresults of that analysis, generate XMLreports that can be

    saved to a file or an SSIS variable. By configuring one or more of the tasks profile types, you can

    generate a report that provides detailssuch as a columns minimum and maximum values, or thenumber and percentage of null values.

    ScriptTaskand Transformation

    2008 gives the option of writing thescriptseither in VB or C#, where as 2005 only enabled the users to

    write thescripts in only VB