Pages

Friday, 18 October 2013

Interview Ques

Question : What is the difference between where and having clause? 
 Answer: In SQL Where filters data on lowest row level. Having filters data after group by has been performed so it filters on "groups"
Question : How can you combine two tables/views together? For instance one table contains 100 rows and the other one contains 200 rows, have exactly the same fields and you want to show a query with all data (300 rows). This sql interview question can get complicated.
Answer: You use UNION operator. You can drill down this question and ask what is the difference between UNION and UNION ALL. More tricky question are how to sort the view (you use order by at the last query), how to name fields so they appear in query results/view schema (first query field names are used). How to filter groups when you use union using SQL (you would create separate query or use common table expression (CTE) or use unions in from with () or what happens when you have a combination of UNION and UNION ALL (very rare questions!)

Question : What type of wildcards have you used? This is usually one of mandatory sql interview question.
Answer: First question is what is a wildcard? Wildcards are special characters that allow matching string without having exact match. In simple word they work like contains or begins with. Wildcard characters are software specific and in SQL Server we have % which represent any groups of characters, _ that represent one character (any) and you also get [] where we can [ab] which means characters with letter a or b in a specific place.
Question : How do you find orphans?
Answer: This is more comprehensive SQL and database interview question. First of all we test if the candidate knows what an orphan is. An Orphan is a foreign key value in "child table" which doesn’t exist in primary key column in parent table. To get it you can use left outer join (important: child table on left side) with join condition on primary/foreign key columns and with where clause where primary key is null. Adding distinct or count to select is common precise. In SQL Server you can also you except which will show all unique values from first query that don't exist in second query.
















Question : How would you solve the following sql queries using today's date?


select getdate() as 'currdate'

select dateadd(mm,datediff(mm,0,getdate()),0) as 'First day of current month'
select dateadd(mm,datediff(mm,0,getdate())-1,0) as 'First day of previous month'
select dateadd(mm,datediff(mm,0,getdate()),0)-1 as 'last day of previous month'
select dateadd(mm,datediff(mm,0,getdate())+1,0)-1 as 'last day of current month'



Select Dateadd(d,1-DATEPART(d,getdate()),GETDATE()) as [First Day of the Month]

Select Dateadd(d,-DATEPART(d,DateAdd(m,1,getdate())),DateAdd(m,1,getdate())) as [Last Day of Month]

SELECT DATEADD(m,-1, Dateadd(d,1-DATEPART(d,getdate()),GETDATE())) AS [First Day of Previous Month]

Select Dateadd(d,-DATEPART(d,getdate()),GETDATE()) as [Last Day of Previous Month]


What are the difference between clustered and a non-clustered index?

  1. A clustered index is a special type of index that reorders the way records in the table are physically stored. Therefore table can have only one clustered index. The leaf nodes of a clustered index contain the data pages.
  2. A non clustered index is a special type of index in which the logical order of the index does not match the physical stored order of the rows on disk. The leaf node of a non clustered index does not consist of the data pages. Instead, the leaf nodes contain index rows.



What are the different index configurations a table can have?

A table can have one of the following index configurations:
  1. No indexes
  2. A clustered index
  3. A clustered index and many nonclustered indexes
  4. A nonclustered index
  5. Many nonclustered indexes


What is OLTP (Online Transaction Processing)?

In OLTP - online transaction processing systems relational database design use the discipline of data modeling and generally follow the Codd rules of data normalization in order to ensure absolute data integrity. Using these rules complex information is broken down into its most simple structures (a table) where all of the individual atomic level elements relate to each other and satisfy the normalization rules.






What's the difference between a primary key and a unique key?

Both primary key and unique key enforces uniqueness of the column on which they are defined. But by default primary key creates a clustered index on the column, where are unique creates a nonclustered index by default. Another major difference is that, primary key doesn't allow NULLs, but unique key allows one NULL only.



What is difference between DELETE and TRUNCATE commands?

Delete command removes the rows from a table based on the condition that we provide with a WHERE clause. Truncate will actually remove all the rows from a table and there will be no data in the table after we run the truncate command.

  1. TRUNCATE:
    1. TRUNCATE is faster and uses fewer system and transaction log resources than DELETE.
    2. TRUNCATE removes the data by deallocating the data pages used to store the table's data, and only the page deallocations are recorded in the transaction log.
    3. TRUNCATE removes all rows from a table, but the table structure, its columns, constraints, indexes and so on, remains. The counter used by an identity for new rows is reset to the seed for the column.
    4. You cannot use TRUNCATE TABLE on a table referenced by a FOREIGN KEY constraint. Because TRUNCATE TABLE is not logged, it cannot activate a trigger.
    5. TRUNCATE cannot be rolled back.
    6. TRUNCATE is DDL Command.
    7. TRUNCATE Resets identity of the table
  2. DELETE:
    1. DELETE removes rows one at a time and records an entry in the transaction log for each deleted row.
    2. If you want to retain the identity counter, use DELETE instead. If you want to remove table definition and its data, use the DROP TABLE statement.
    3. DELETE Can be used with or without a WHERE clause
    4. DELETE Activates Triggers.
    5. DELETE can be rolled back.
    6. DELETE is DML Command.
    7. DELETE does not reset identity of the table.
Note: DELETE and TRUNCATE both can be rolled back when surrounded by TRANSACTION if the current session is not closed. If TRUNCATE is written in Query Editor surrounded by TRANSACTION and if session is closed, it can not be rolled back but DELETE can be rolled back.


When is the use of UPDATE_STATISTICS command?

This command is basically used when a large processing of data has occurred. If a large amount of deletions any modification or Bulk Copy into the tables has occurred, it has to update the indexes to take these changes into account. UPDATE_STATISTICS updates the indexes on these tables accordingly.

What are the properties and different Types of Sub-Queries?

  1. Properties of Sub-Query
    1. A sub-query must be enclosed in the parenthesis.
    2. A sub-query must be put in the right hand of the comparison operator, and
    3. A sub-query cannot contain an ORDER-BY clause.
    4. A query can contain more than one sub-query.
  2. Types of Sub-Query
    1. Single-row sub-query, where the sub-query returns only one row.
    2. Multiple-row sub-query, where the sub-query returns multiple rows,. and
    3. Multiple column sub-query, where the sub-query returns multiple columns





























Question : You have a table that records website traffic. The table contains website name (multiple websites), page name, IP address and UTC date time. What would be the query to show all websites visited in the last 30 days with total number or visits, total number if unique page view and total number of unique visitors (using IP Address)?
 
Answer: This test is mainly about good understanding of aggregate functions and date time. In this we need to group by Website, Filter data using datediff but the trick in here is to use correct time zone. If I want to do that using UTC time than I could use GetUTCDate() in sql server and the final answer related to calculated fields using aggregate functions that I will list on separate lines below:

TotalNumberOfClicks = Count(*) 'nothing special here

TotalUniqueVisitors = Count(distinct Ipaddress) ' we count ipaddress fields but only unique ip addresses. The next field should be in here but as it is more complicated I put it as third field.

TotalNumberOfUniquePageViews = Count(distinct PageName+IPAddress) 'This one is tricky to get unique pageview we need to count all visits but per page but only for unique IP address. So I combined pagename with ipaddress to counted unique values. Just to explain one page could receive 3 vists from 2 unique visits and another page could receive one visit from ip that visited previous page so Unique IP is 2, PageView is 3 (1 visitor 2 pages and 1 visitor 1 page) and visits is 4













Question : How to display top 5 employees with the higest number of sales (total) and display position as a field. Note that if both of employees have the same total sales values they should receive the same position, in other words Top 5 employees might return more than 5 employees. 
 
Answer: Microsoft introduced in SQL Server 2005 ranking function and it is ideal to solve this query. RANK() function can be used to do that, DENSE_Rank() can also be used. Actually the question is ambiguous because if your two top employees have the same total sales which position should the third employee get 2 (Dense_Rank() function) or 3 (Rank() Function)? In order to filter the query Common Table Expression (CTE) can be used or query can be put inside FROM using brackets ().


Question : How to get accurate age of an employee using SQL? 
 Answer: The word accurate is crucial here. The short answer is you have to play with several functions. For more comprehensive answer see the following link SQL Age Function. Calculate accurate age using SQL Server


Question : This is SQL Server interview question. You have three fields ID, Date and Total. Your table contains multiple rows for the same day which is valid data however for reporting purpose you need to show only one row per day. The row with the highest ID per day should be returned the rest should be hidden from users (not returned).  [all questions]
To better picture the question below is sample data and sample output:
ID, Date, Total
1, 2011-12-22, 50
2, 2011-12-22, 150

The correct result is:

2, 2011-12-22, 150
The correct output is single row for 2011-12-22 date and this row was chosen because it has the highest ID (2>1)

Answer: Usually Group By and aggregate function are used (MAX/MIN) but in this case that will not work (unless you use joins as Anton mentioned in his comment, although from readability and potentially performance point of view I would prefer not to do that.)

Removing duplications with this kind of rules is not so easy however SQL Server provides ranking functions and the candidate can use dense_rank function partition by Date and order by id (desc) and then use cte/from query and filter it using rank = 1. There are several other ways to solve that but I found this way to be most efficient and simple.



No comments:

Post a Comment