SQL Introduction
SQL
is a standard computer language for accessing and manipulating databases. SQL
language mainly consists of 3 types of paradigms.
·
DDL (Data Definition Language)
·
DML (Data Manipulation Language)
·
DCL (Data Control Language)
SQL
is the American National Standards Institute (ANSI) language for talking to
Databases (though there are many additions to ANSI SQL). Some people pronounce
it ess-que-ell, but I prefer to say it like sequel, since I can say it faster.
Simply
put SQL is a way of asking questions, or giving orders, to a database. A
database is just a collection of information. I'm a fan of examples, so here's
a (not-so) hypothetical situation:
You keep a pad of paper next to the telephone for
writing down people who call, and new phone numbers. You want to know how many
times the school has called you this month. So you flip through the stack of
paper and pull out all the pages with the school and this month on it. Your
count them.
Your
stack of paper is a database. You are searching and counting is a query. Your
database works for you. But if you suddenly hire a call center, and need to
find out how many times your office called a prospect, you'll probably want to
move to a SQL database.
The
biggest advantage of SQL is that you don't have to go through the stack of
papers to find information you just need to know how to ask the question. And
the question is always asked the same way:
select what you want to
know
from what stack of paper
where something is on the page
from what stack of paper
where something is on the page
Not
to complicated. The above example would read like this
select count(when)
from phonecalls
where caller like '%school%' and date > now() - interval 1 months
from phonecalls
where caller like '%school%' and date > now() - interval 1 months
What is SQL?
·
SQL stands for Structured Query
Language
·
SQL allows you to access a database
·
SQL is ANSI standard computer
language
- SQL can execute queries against a database
- SQL can retrieve data from a database L can insert new records in a database
- SQL can delete records from a database
- SQL can update records in a database
- SQL is easy to learn
SQL DDL
The Data Definition Language (DDL) part of SQL permits database
tables to be created or deleted. We can also define indexes (keys), specify
links between tables, and impose constraints between database tables.
The most important DDL statements in SQL are:
- CREATE TABLE - creates a new database table
- ALTER TABLE - alters (changes) a database table
- DROP TABLE - deletes a database table
- CREATE INDEX - creates an index (search key)
- DROP INDEX - deletes an index
SQL DML
SQL (Structured Query Language) is syntax for executing queries.
But the SQL language also includes syntax to update, insert, and delete
records.
These query and update commands together form the Data
Manipulation Language (DML) part of SQL:
- SELECT - extracts data from a database table
- UPDATE - updates data in a database table
- DELETE - deletes data from a database table
- INSERT INTO - inserts new data into a database table
Common
terms used in SQL commands
·
Primary Key (PK) -
uniquely identifies each tuple in the relation with no redundant attributes
·
Foreign Key (FK) -
an attribute that is a PK in another relation
·
Composite Key -
a PK consisting of multiple attributes
·
Candidate Key (CK) -
a set of attributes that can be used as a PK
·
Alternate Key (AK) - a candidate key not designed as a
PK
·
Super Key -
a set of attributes that includes a CK and may include redundant key attributes
·
Prime Attribute -
any attribute that belongs to a CK
·
Secondary Key -
an attribute that is used as an index
·
Search Keys -
An attribute used in a query
SQL
reference
create
table syntax
create table table_name (column_name1
data_type1, column_name2 data_type2, …, [
CONSTRAINT constraint_name CONSTRAINT_TYPE constraint_condition,
…]);
update statement
syntax
update table_expression_clause set_clause
where_clause
select statement syntax
SELECT * | column_names_list FROM table_name
WHERE_clause group_by_clause | order_by-clause;
delete
statement syntax
delete FROM table_expression_clause
WHERE_clause;
insert
statement syntax
insert into table_expression_clause (column1,
column2, …) values_clause
Normalization
Normalization is the process of non-loss
decomposition of the database into tables so as to minimize the data redundancy
and inconsistencies in the database.
We have to consider various constraints on
the database so as to convert it into standard form viz. Integrity constraints
and dependencies. Normalization is achieved through the series of steps. These
steps are called normal forms.
There are various normal forms like 1NF, 2NF
and 3NF. Generally database normalized at 3NF serves the purpose, but for
complex database representations, more advance level of normalizations, viz.
4NF, BCNF, DKNF etc is done.
The database in one normal form is by default
in previous normal forms. E.g. if it is in 3NF, it must be in 1NF and 2NF.
First Normal Form: A relation is in first normal form if it does not contain any repeating
values or groups.
- Eliminate duplicative columns from the same table.
- Create separate tables for each group of related data and identify each row with a unique column or set of columns (the primary key).
Second Normal Form: A relation is in second normal form if it is in 1NF and all non-key attributes are fully functional
dependent on the primary key.
- Remove subsets of data that apply to multiple rows of a table and place them in separate tables.
- Create relationships between these new tables and their predecessors through the use of foreign keys.
Third Normal Form: A relation is in third normal form if it is in 2NF and every attribute is independent of all other
non-key attributes.
- Remove columns that are not dependent upon the primary key.
BCNF (Boyce-Codd
Normal Form): A
relation is in BCNF is if and only if every determinant is a candidate key. (A
candidate is any attribute on which some other attribute is (fully)
functionally dependent.
Fourth normal form (4NF):
- A relation is in 4NF if it has no multi-valued dependencies.
Finally
we have Fifth Normal Form (5 NF): In some cases there may be no
losses join decomposition into two relation schemas but there may be a losses
join decomposition into more than two relation schemas. These cases are handled
by the join dependency and fifth normal form, and it’s important
to note that these cases occur very rarely and are difficult to detect in
practice.
We can also always define stricter forms that take into account
additional types of dependencies and constraints. The idea behind domain-key
normal form is to specify, (theoretically, at least) the "ultimate
normal form" that takes into account all possible dependencies and
constraints. A relation is said to be in DKNF if all constraints and
dependencies that should hold on the relation can be enforced simply by
enforcing the domain constraints and the key constraints specified on the
relation.
For a relation in DKNF, it becomes very straightforward to enforce
the constraints by simply checking that each attribute value in a tuple is of
the appropriate domain and that every key constraint on the relation is
enforced. However, it seems unlikely that complex constraints can be included
in a DKNF relation; hence, its practical utility is limited.
Consider a following table structure:
Stud_no
Stud_name
Stud_addr
teacher_code
Sub_code
Sub_name
Sub_teacher
In the above table definition, the attribute
groups Sub_code, Sub_name represents repeatable group as one student can take
many subjects, so that information is repeated for the same student.
We can convert above schema into 1NF by separating repeating columns
(multi-valued attributes) into separate subject table.
Student Student_subject
Stud_no Stud_no (CPK)
Stud_name
Sub_code (CPK)
Stud_addr Sub_name
teacher_code
Sub_teacher
In above schema the attribute ‘Sub_name’ is
not fully functionally dependant on composite key (Stud_no+Sub_code) of
Student_subject table.
So we can achieve 2NF by separating the attributes not fully functionally dependant
on primary key into separate tables e.g.
Student Student_subject Subject
Stud_no
(PK) Stud_no (FK) Sub_code (PK)
Stud_name
Sub_code (FK) Sub_name
Stud_addr teacher_code
Sub_teacher
In the above schema, the attribute
Sub_teacher, which represents teacher for that subject, can be derived from
that table itself looking at the two columns Sub_code and teacher_code.
So we can achieve 3NF by removing such attribute, which is independent of non-key
attributes. For ex:
Student Student_subject Subject
Stud_no Stud_no Sub_code
Stud_name
Sub_code Sub_name
Stud_addr teacher_code
In the above schema, the attribute
teacher_code represents the teacher for particular subject. This attribute is
not the candidate key for the primary key of the table. We need to have one
separate table called sub_teacher.
The above normalization is called BCNF (Boyce/Codd Normal form) e.g.
Student Student_subject Subject Sub_teacher
Stud_no Stud_no Sub_code Sub_code
Stud_name
Sub_code Sub_name teacher_code
Stud_addr
BCNF is improved form of 3NF.
Consider the schema where in students
understanding in each subject is mentioned as one of the best, average, poor.
The table looks somewhat like this:
Student_subject
Stud_no
Sub_code
Sub_interest
In addition to above normalization levels, we
also have other levels viz. 4NF, 5NF etc. which are complex are not applied in
normal situations.
DKNF
(Domain Key Normal Form)
Exercise:
Following is the un-normalized database;
normalize it up to BCNF through 1NF, 2NF, 3NF.
Following is structure of order table:
Order
Order_No.
Order_date
Customer_no
Item_no
Item_name
Qty_ordered
Rate_per_unit
Item_value
(Item_value can be calculated using Qty_ordered * Rate_per_unit)
12 CODD Rules
Rule 0: For any system to be called as RDBMS, it must be able to manage
database entirely through its relational capabilities.
Rule 1: All information in a RDBMS is represented explicitly in exactly
one way, by value in the table.
According to this rule anything
that does not exist in a table in the database does not exist at all.
Rule 2: Each and every datum is logically accessible through a combination
of table name, column name, and primary key value.
Rule 3: Inapplicable or missing information can be represented through
null values.
Rule 4: The description of the database is held in the same way as
ordinary data that is in tables and columns and is accessible to authorized
users.
Rule 5: There must be at
least one language which is comprehensive in supporting data definition, view
definition, data manipulation, integrity constraints, authorization, and
transaction control.
Rule 6:
All
views that are theoretically updatable are updatable by the system.
Rule 7:
The
capability of handling a base or a derived table as a single operand applies
not only to the retrieval of data but also to the insertion, update and deletion
of data.
Rule 8: Application programs
and terminal activity remain logically unimpaired whenever any changes are made
in the storage representation or access method.
Rule 9: Application programs
and terminal activity remain logically unimpaired when information preserving
changes of any kind that theoretically permit unimpairment are made to the base
tables.
Rule
10: All integrity constraints must be
definable in the data sub language and storable in the catalogue, not in the
application program.
Rule 11: The system must have a data sub-language
which can support distributed databases without impairing application programs
or terminal activities.
Rule 12: If the system has a low-level language this language cannot be
used to bypass the integrity rules and constraints expressed in the higher
level relational language.
Very nice Bhiya...
ReplyDelete