Monday, April 6, 2009

SQL

SQL (structured Query Language) is a computer language used to retrieve and manage data in relational DBMS. It can also be used to create and manage the different schema used in the database. SQL was first standardized by ANSI, and is now an ISO standard, Most DBMS use the standardized SQL, but also expand upon it and add some proprietary aspects to it. SQL is a declarative language, meaning that the end user will only explain the logic of the operation, and will not explain the algorithms with which the computer will complete the operation, however as stated above some vendors expand upon standardized SQL, and some included procedural elements to SQL, since this aspect is proprietary to the DBMS, we will not go into detail. Most operations that users will do in SQL are queries, which are used to retrieve data.
The following shows a typical SQL statement, and it is structured.

The capitalized words are keywords, reserved by SQL (meaning that users cannot use theses words for anything else than the actual SQL opeartion), the other words are names of fields (columns). As we can see, an SQL statement starts with the operation to do, followed by the field on which to do the operation, and then, the conditions. A single operation can affect multiple fields, in multiple tables. Wild card operator are also permited in sql, such as * for anything, or ? for any one character.

A few examples:
-Retrieve all the fields of the users with the name John.
SELECT * FROM users WHERE user_name = 'John'

- Retrieve all the unique entries in the names of the users
SELECT DISTINCT user_name FROM users

-Multiple conditions are also allowed, such as all the users named John or Jack
SELECT * FROM users WHERE user_name = 'John' OR user_name = 'Jack'

sources/further reading
http://www.w3schools.com/sql/default.asp
http://en.wikipedia.org/wiki/Sql

Metadata

Metadata is data on data, or data about data. Databases are made up of not only the actual data, but also
Information on the data, such as what kind of data is stored, what kind of attributes they have, or how they relate to each other. In DBMS, this information is stored in different schemas.
In relational DBMS the most important schema is the conceptual schema which represents the relations between each entity. For example, in a database storing employee information for a company, we might have relations like:
-each employee has one manager
-each manager has one or more employee
-each manager has one department
-each department has one or more employee
And so on
Even though the schema is in the strictest form in text form, the term is now often used for the graphical representation.

sources/further reading:
http://en.wikipedia.org/wiki/Database_schema
http://en.wikipedia.org/wiki/Metadata

Architecture of a DBMS

There are 2 levels of architecture in a DBMS, the logical level, and the physical level. The logical level deals with how the user interacts and perceives the DBMS, and does not concert itself with the actual (physical) processing and storing of data, processing the queries, and managing users, permissions etc..



The logical architecture
The logical architecture provides a level of abstraction between the end user and the physical storage of the data. This allows the end user to manipulate the data without being concerned with how and where the data is stored. The logical architecture is mostly based on the ANSI/SPARC generalized DBMS architecture. This standardized model divides the DBMS in 3, the internal, the conceptual, and the external level. The internal level (or physical) is how the files are stored on the secondary storage. The conceptual level brings the whole of the database into a single entity, showing the entirety of the data in the database. This level also provides data definition (metadata). The last level, the external level is the level with which the end user interacts, which allows the user to see the data relevant to them.



The physical DBMS architecture

The physical DBMS architecture is the software components used to enter and process the data. Unlike the logical architecture, which uses the ansi/sparc architecture, the physical architecture does not have a standardized architecture. There are, however, a number of functions that are standard, and that appear in all DBMS. The physical DBMS architecture is usually separated into two, the front end, with which the user actually interacts, and the back end, which actually manages the database itself. The front end, is, in a separate application that gives the user an interface and that interacts with the back end throught it’s API (application programming interface).
Since the front end is completely separated from the back end, and interacts with it through it’s API, it is possible to change the front end without changing the underlying DBMS. Because of this the front end can come from the DBMS vendor, or a third party, and separate systems can be used to do various specialized tasks, such as backups, restores, etc…
The back end can be seen as more of the heart of the dbms. We can further subdibide this in 3 main parts. First, the API, which as mentioned above allows the front end to interact with the back end. Then, there is the Qeury language processor, which receives and interprets the queries sent by the user, so that the third part, the DBMS engine can respond to it. The DBMS engine can be seen as the core of the DBMS. It is the part which is responsible for all the data management in for the DBMS.

Some DBMS systems include, some commercial application such as Microsoft SQL Server, or Oracle, but also include free alternatives, such as MySQL, SQLite, or ProstgreSQL

Sources/further reading
http://www.dbmaker.com.tw/reference/manuals/tutorial/tutorial_03.html
http://yaddarabullah.blogspot.com/2008/07/rdbms-architecture-of-database.html

Introducing a DBMS to an organization

Introducing a DBMS to an organization introduces many advantages. First of all, it centralizes the data, and links it together, it might, for example link an organization’s inventory, with its sales and it’s accounting department. This not only enables all data to be linked, and more complete information to be available, it also reduces redundancies from all departments, and minimizes errors in the data, in changes and in new data entry. This, depending on the size of the organization, and the data type, it could save a lot of storage space.

A centralized DBMS also enforces data consistency. Since all columns have a declared data type (int, varchar, text, date) it force the data entry to be valid. It removes unexpected errors from unexpected data returned, such as text instead of a date.

Furthermore, it allows for a standardized way to retrieve data, and to query data.
One last note, which is not a advantage per say, however, the expensive price of a DBMS, which used to be a barrier to get one for small/medium organizations has disappeared with free DBMS.

History Of DBMS

A database management software (DBMS) is software program that organizes, stores, manages and retrieves data in a database. Since DBMS’ primary function is to interact with data, the history and evolution DBMS is closely related to the evolution of storage mediums available.

One of the first steps towards computing, and more specifically data storage was with the Jacquard loom, invented in 1801 by Joseph Marie Jacquard, a loom that used punch cards in order to store patterns. While the loom did not do any computations, it was the first to used punch cards a way to control a series of operations. This is considered an important step in computing, and inspired the first general-purpose computers, Charles Babbage’s Analytical engine. From then on, the punch card was used as the first storage medium for computers, and would influence the first DBMS.


Navigational/Hierarchical

Because of the way that data was stored, first with punch cards, paper tape (like a punch card, but a long strip of paper with holes like punch cards), and magnetic tapes, all data was stored and read sequentially, there was no random access. This physical limitation of the storage medium is reflected in the first type of DBMS, the Navigational DBMS, first created in the 60s. In this type of database, the looking for data is done by navigating through a series of pointers until the needed data was found. In this type of database, queries that are taken for granted today, such as returning all the entries with a specific characteristic would require the DBMS read all the entries one by one and return the data which had the characteristic, there were no searches, as that operation would have been too costly for the support.

Relational

With the advent of hard drives as a suitable storage device, the limitations of sequential reading were removed and DBMS were able to evolve into relational DBMS, which are still being used today. Relational DBMS were designed by Edgar Codd, who at the time was working at IBM, in the offices primarily involved in the development of hard drives. Relational DBMS are designed on a branch of mathematics called tuple calculus. In relational DBMS, data is stored in separate tables, storing different types of related data. In each table, every entry must be unique, with one value, called a primary key indentifies a table’s entry. Tables are linked together by inserting one table’s primary key into an other table. This key becomes in second table (the one where the key is inserted) a secondary key, or foreign key. This sort of database allowed for more efficient storage, eliminating some redundencies, and allowd for more efficient searches, for either single elements or sets of data.

The final type of DBMS is still a relational database; however it includes adds a standardized query language or SQL. This standardized the end user’s interaction with the different databases. Before this each different vendor had their own way of querying their databases.

Sources/further reading:
http://en.wikipedia.org/wiki/Database_management_system
http://www.mountainman.com.au/software/history/intro.html
http://www.comphist.org/computing_history/new_page_9.htm