SQL Overview and Tutorial

Overview
Connecting to and Disconnecting from the Server
Entering Queries
Creating and Using a Database
Selecting Particular Rows
Date Calculations
Using More Than one Table
Getting Information About Databases and Tables
Examples of Common Queries

Overview

Relational database storage is made available in Analytic Information Server via a collaboration with the MySQL open source project and is available in both imbedded, local server, and remote server versions. A great deal of reference documentation and important information is available from the MySQL Project web site. In this AIS reference documentation we provide only the basic reference materials for using SQL.

Access to all SQL services in Analytic Information Server are provided by the builtin sql function. This chapter provides a tutorial introduction to AIS SQL by showing how to use the sql function to create and use a simple database. The sql function allows you to connect to a MySQL server, run queries, and view the results. This chapter assumes that MySql is installed on your machine and that a MySQL server is available to which you can connect. If this is not true, contact your MySQL administrator. (Normally, each copy of Analytic Information Server has at least an imbedded copy of MySQL available for use.) This chapter describes the entire process of setting up and using a database. If you are interested only in accessing an existing database, you may want to skip over the sections that describe how to create the database and the tables it contains. Because this chapter is tutorial in nature, many details are necessarily omitted. Consult the relevant sections of the manual for more information on the topics covered here.

Connecting to and Disconnecting from the Server

To connect to the AIS SQL server, you will usually need to provide an sql user name when you invoke sql and, most likely, a password. If the server runs on a machine other than the one where you log in, you will also need to specify a host name. Contact your administrator to find out what connection parameters you should use to connect (that is, what host, user name, and password to use). Once you know the proper parameters, you should be able to connect like this:

(setq sqlHandle (sql connect:)) ;; Connects to the embedded SQL server.
(setq sqlHandle (sql connect: "" "database")) ;; Connects to the embedded SQL server and selects a database.
(setq sqlHandle (sql connect: "localhost" "" "username" "password")) ;; Connects to the SQL server in localhost.
(setq sqlHandle (sql connect: "localhost" "database" "username" "password")) ;; Connects to the SQL server in localhost and selects a database.
(setq sqlHandle (sql connect: "host" "" "username" "password")) ;; Connects to the SQL server in "host".
(setq sqlHandle (sql connect: "host" "database" "username" "password")) ;; Connects to the SQL server in "host" and selects a database.

Host and user represent the host name where your MySQL server is running and the user name of your MySQL account. Substitute appropriate values for your setup. If that works, the sql function should return an sql handle for use in sending additional SQL commands to this connection. More than one connection may be established between AIS and the MySQL server and held open simultaneously. If, when you attempt to log in, you get an error message such as !sql.connect: Can't connect to local MySQL server!, it means that that MySQL server daemon (Unix) or service (Windows) is not running. However, if the sql function returns an SQL connection handle, then you are ready to send SQL commands to the now established connection. After you have connected successfully, you can disconnect any time like this:

(sql disconnect: sqlHandle) ;; Disconnects from the previously established SQL server connection.

Most examples in the following sections assume that you are connected to the server.

Entering Queries

Make sure that you are connected to the server, as discussed in the previous section. Doing so does not in itself select any database to work with, but that's okay. At this point, it's more important to find out a little about how to issue queries than to jump right in creating tables, loading data into them, and retrieving data from them. This section describes the basic principles of entering commands, using several queries you can try out to familiarize yourself with how SQL works. Here's a simple command that asks the server to tell you its version number and the current date.

(sql sqlHandle {select version(), current_date})

\|VERSION()\|	CURRENT_DATE
"5.1.2-alpha-log"	#Nov,10,2005

This query illustrates several things about SQL: A command normally consists of an SQL statement in a String. (The String may be embedded in double quotes "this is a string" or may be embedded in braces {This "is" a string}) When you issue a command, sql sends it to the server for execution and returns the results in the form of a Record data type. SQL returns query result in tabular form (rows and columns) in the form of a Record data type. For illustration purposes herein, the first row contains labels for the columns. The rows following are the query results. Normally, column labels are the names of the columns you fetch from database tables. If you're retrieving the value of an expression rather than a table column (as in the example just shown), sql labels the column using the expression itself.

Keywords may be entered in any lettercase. The following queries are equivalent:

(sql sqlHandle {SELECT VERSION(), CURRENT_DATE})
(sql sqlHandle {select sin(pi()/4), (4+1)*5})

\|SIN(PI()/4)\|	\|(4+1)*5\|
0.70710678118655	25

Creating and Using a Database

Once you know how to enter commands, you are ready to access a database. Suppose that you have several pets in your home (your menagerie) and you would like to keep track of various types of information about them. You can do so by creating tables to hold your data and loading them with the desired information. Then you can answer different sorts of questions about your animals by retrieving data from the tables. This section shows you how to:

Create a database
Create a table
Load data into the table
Retrieve data from the table in various ways
Use multiple tables

The menagerie database is simple (deliberately), but it is not difficult to think of real-world situations in which a similar type of database might be used. For example, a database like this could be used by a farmer to keep track of livestock, or by a veterinarian to keep track of patient records. A menagerie distribution containing some of the queries and sample data used in the following sections can be obtained from the MySQL Web site. It is available in both compressed tar file and Zip formats at the MySQL Project web site.

Use the SHOW statement to find out what databases currently exist on the server:

(sql sqlHandle {use test})

database changed

You can use the test database (if you have access to it) for the examples that follow, but anything you create in that database can be removed by anyone else with access to it. For this reason, you should probably ask your MySQL administrator for permission to use a database of your own. Suppose that you want to call yours menagerie. The administrator needs to execute a command like this:

(sql sqlHandle {create database menagerie})

Under Unix, database names are case sensitive (unlike SQL keywords), so you must always refer to your database as menagerie, not as Menagerie, MENAGERIE, or some other variant. This is also true for table names. (Under Windows, this restriction does not apply, although you must refer to databases and tables using the same lettercase throughout a given query. However, for a variety of reasons, our recommended best practice is always to use the same lettercase that was used when the database was created.) Note: If you get an error such as ERROR 1044 (42000): Access denied for user 'monty'@'localhost' to database 'menagerie' when attempting to create a database, this means that your user account does not have the necessary privileges to do so. Discuss this with the administrator.

Creating a database does not select it for use; you must do that explicitly. To make menagerie the current database, use this command:

(sql sqlHandle {show tables})

Empty set

The harder part is deciding what the structure of your database should be: what tables you need and what columns should be in each of them. You want a table that contains a record for each of your pets. This can be called the pet table, and it should contain, as a bare minimum, each animal's name. Because the name by itself is not very interesting, the table should contain other information. For example, if more than one person in your family keeps pets, you might want to list each animal's owner. You might also want to record some basic descriptive information such as species and sex. How about age? That might be of interest, but it's not a good thing to store in a database. Age changes as time passes, which means you'd have to update your records often. Instead, it's better to store a fixed value such as date of birth. Then, whenever you need age, you can calculate it as the difference between the current date and the birth date. SQL provides functions for doing date arithmetic, so this is not difficult. Storing birth date rather than age has other advantages, too:

You can use the database for tasks such as generating reminders for upcoming pet birthdays. (If you think this type of query is somewhat silly, note that it is the same question you might ask in the context of a business database to identify clients to whom you need to send out birthday greetings in the current week or month, for that computer-assisted personal touch.)
You can calculate age in relation to dates other than the current date. For example, if you store death date in the database, you can easily calculate how old a pet was when it died.

You can probably think of other types of information that would be useful in the pet table, but the ones identified so far are sufficient: name, owner, species, sex, birth, and death. Use a CREATE TABLE statement to specify the layout of your table:

(sql sqlHandle {show tables})

TABLES IN MENAGERIE
pet

To verify that your table was created the way you expected, use a DESCRIBE statement:

(sql sqlHandle {load data local infile '/path/pet.txt' into table pet ignore 1 lines})

You can specify the column value separator and end of line marker explicitly in the LOAD DATA statement if you wish, but the defaults are tab and linefeed. (On an Apple machine running OS X, you would likely want to use LINES TERMINATED BY '\r'.) These are sufficient for the statement to read the file pet.txt properly. If the statement fails, it is likely that your SQL installation does not have local file capability enabled by default. Note that if you created the file on Windows with an editor that uses \r\n as a line terminator, you should use: When you want to add new records one at a time, the INSERT statement is useful. In its simplest form, you supply values for each column, in the order in which the columns were listed in the CREATE TABLE statement. Suppose that Diane gets a new hamster named "Puffball." You could add a new record using an INSERT statement like this:

(sql sqlHandle {insert into pet values ('Puffball','Diane','hamster','f','1999-03-30',NULL)})

Note that string and date values are specified as quoted strings here. Also, with INSERT, you can insert NULL directly to represent a missing value. You do not use \N like you do with LOAD DATA. From this example, you should be able to see that there would be a lot more typing involved to load your records initially using several INSERT statements rather than a single LOAD DATA statement.

To export the altered table back to the text file pet.txt from the pet table, use the SELECT INTO OUTFILE command:

(sql sqlHandle {select * into outfile 'pet.txt' from pet})

NAME	OWNER	SPECIES	SEX	BIRTH	DEATH
Fluffy	Harold	cat	m	1994-03-17
Claws	Gwen	dog	m	1990-08-27
Buffy	Harold	dog	m	1990-08-27
Fang	Benny	dog	m	1990-08-27
Bowser	Diane	dog	m	1979-08-31	1995-07-29
Chirpy	Gwen	bird	f	1998-09-11
Whistler	Gwen	bird		1997-12-09
Slim	Benny	snake	m	1996-04-29
Puffball	Diane	hamster	f	1999-03-30

This form of the SELECT statement is useful if you want to review your entire table. After you've just loaded the pet table with your initial data set this form of the SELECT statement would allow you to view the data as actually loaded to make sure there are no mistakes. For example, you may happen to think that the birth date for Bowser doesn't seem quite right. Consulting your original pedigree papers, you find that the correct birth year should be 1989, not 1979. There are at least two ways to fix this:

Edit the file pet.txt to correct the error, then empty the table and reload it using DELETE and LOAD DATA:

(sql sqlHandle {load data local infile 'pet.txt' into table pet})
(sql sqlHandle {update pet set birth = '1989-08-31' where name = 'Bowser'})

Selecting Particular Rows

As shown in the preceding section, it is easy to retrieve an entire table. Just omit the WHERE clause from the SELECT statement. But typically you don't want to see the entire table, particularly when it becomes large. Instead, you're usually more interested in answering a particular question, in which case you specify some constraints on the information you want. Let's look at some selection queries in terms of questions about your pets that they answer. You can select only particular rows from your table. For example, if you want to verify the change that you made to Bowser's birth date, select Bowser's record like this:

(sql sqlHandle {select * from pet where birth >= '1998-1-1'})

NAME	OWNER	SPECIES	SEX	BIRTH	DEATH
Chirpy	Gwen	bird	f	1998-09-11
Puffball	Diane	hamster	f	1999-03-30

You can combine conditions, for example, to locate female dogs:

(sql sqlHandle {select * from pet where species = 'snake' or species = 'bird'})

NAME	OWNER	SPECIES	SEX	BIRTH
Chirpy	Gwen	bird	f	1998-09-11
Whistler	Gwen	bird		1997-12-09
Slim	Benny	snake	m	1996-04-29

AND and OR may be intermixed, although AND has higher precedence than OR. If you use both operators, it is a good idea to use parentheses to indicate explicitly how conditions should be grouped:

(sql sqlHandle {select name, birth from pet})

NAME	BIRTH
Fluffy	1994-03-17
Claws	1990-08-27
Buffy	1990-08-27
Fang	1990-08-27
Bowser	1979-08-31
Chirpy	1998-09-11
Whistler	1997-12-09
Slim	1996-04-29
Puffball	1999-03-30

To find out who owns pets, use this query:

(sql sqlHandle {select distinct owner from pet})

OWNER
Benny
Diane
Gwen
Harold

You can use a WHERE clause to combine row selection with column selection. For example, to get birth dates for dogs and cats only, use this query:

(sql sqlHandle {select name, birth from pet order by birth})

NAME	BIRTH
Bowser	1979-08-31
Claws	1990-08-27
Buffy	1990-08-27
Fang	1990-08-27
Fluffy	1994-03-17
Slim	1996-04-29
Whistler	1997-12-09
Chirpy	1998-09-11
Puffball	1999-03-30

On character type columns, sorting - like all other comparison operations - is normally performed in a case-insensitive fashion. This means that the order is undefined for columns that are identical except for their case. You can force a case-sensitive sort for a column by using BINARY like so: ORDER BY BINARY col_name. The default sort order is ascending, with smallest values first. To sort in reverse (descending) order, add the DESC keyword to the name of the column you are sorting by:

(sql sqlHandle {select name, species, birth from pet order by species, birth desc})

NAME	SPECIES	BIRTH
Chirpy	bird	1998-09-11
Whistler	bird	1997-12-09
Fluffy	cat	1994-03-17
Bowser	dog	1979-08-31
Claws	dog	1990-08-27
Buffy	dog	1990-08-27
Fang	dog	1990-08-27
Puffball	hamster	1999-03-30
Slim	snake	1996-04-29

Date Calculations

SQL provides several functions that you can use to perform calculations on dates, for example, to calculate ages or extract parts of dates. To determine how many years old each of your pets is, compute the difference in the year part of the current date and the birth date, then subtract one if the current date occurs earlier in the calendar year than the birth date. The following query shows, for each pet, the birth date, the current date, and the age in years.

(sql sqlHandle {select name, birth, curdate(), (year(curdate())- year(birth)) - (right(curdate(),5) < right(birth,5)) as age from pet order by name})

NAME	BIRTH	CURDATE()	AGE
Bowser	1979-08-31	2000-08-31	21
Buffy	1990-08-27	2000-08-31	10
Chirpy	1998-09-11	2000-08-31	1
Claws	1990-08-27	2000-08-31	10
Fang	1990-08-27	2000-08-31	10
Fluffy	1994-03-17	2000-08-31	6
Puffball	1999-03-30	2000-08-31	1
Slim	1996-04-29	2000-08-31	4
Whistler	1997-12-09	2000-08-31	2

To sort the output by age rather than name, just use a different ORDER BY clause:

(sql sqlHandle {select name, birth, death, (year(death)-year(birth)) - (right(death,5)

NAME	BIRTH	DEATH	AGE
Bowser	1979-08-31	1995-07-29	14

The query uses death IS NOT NULL rather than death <> NULL because NULL is a special value that cannot be compared using the usual comparison operators. This is discussed later. See Section 3.3.4.6, "Working with NULL Values". What if you want to know which animals have birthdays next month? For this type of calculation, year and day are irrelevant; you simply want to extract the month part of the birth column. SQL provides several functions for extracting parts of dates, such as YEAR(), MONTH(), and DAYOFMONTH(). MONTH() is the appropriate function here. To see how it works, run a simple query that displays the value of both birth and MONTH(birth):

(sql sqlHandle {select name, birth from pet where month(birth) = 4})

NAME	BIRTH
Slim	1996-04-29

You can write the query so that it works no matter what the current month is, so that you do not have to use the number for a particular month. DATE_ADD() allows you to add a time interval to a given date. If you add a month to the value of CURDATE(), then extract the month part with MONTH(), the result produces the month in which to look for birthdays. Note that MONTH() returns a number between 1 and 12.

(sql sqlHandle {select 1 = null, 1 <> null, 1 < null, 1 > null})

1 = NULL	1 <> NULL	1 < NULL	1 > NULL
null	null	null	null

Clearly you get no meaningful results from these comparisons. Use the IS NULL and IS NOT NULL operators instead:

(sql sqlHandle {select 1 is null, 1 is not null})

1 IS NULL	1 IS NOT NULL
0	1

(sql sqlHandle {select * from pet where name like 'b%'})

NAME	OWNER	SPECIES	SEX	BIRTH	DEATH
Buffy	Harold	dog	m	1990-08-27
Bowser	Diane	dog	m	1979-08-31	1995-07-29

To find names ending with "fy":

(sql sqlHandle {select * from pet where name like '%w%'})

NAME	OWNER	SPECIES	SEX	BIRTH	DEATH
Claws	Gwen	dog	m	1990-08-27
Bowser	Diane	dog	m	1979-08-31	1995-07-29
Whistler	Gwen	bird		1997-12-09

To find names containing exactly five characters, use five instances of the ?_? pattern character:

(sql sqlHandle {select * from pet where name regexp '^b'})

NAME	OWNER	SPECIES	SEX	BIRTH	DEATH
Buffy	Harold	dog	m	1990-08-27
Bowser	Diane	dog	m	1979-08-31	1995-07-29

If you really want to force a REGEXP comparison to be case sensitive, use the BINARY keyword to make one of the strings a binary string. This query matches only lowercase "b" at the beginning of a name:

(sql sqlHandle {select * from pet where name regexp 'fy$'})

NAME	OWNER	SPECIES	SEX	BIRTH	DEATH
Fluffy	Harold	cat	m	1994-03-17
Buffy	Harold	dog	m	1990-08-27

To find names containing a "w", use this query:

(sql sqlHandle {select * from pet where name regexp '^.....$'})

NAME	OWNER	SPECIES	SEX	BIRTH	DEATH
Claws	Gwen	dog	m	1990-08-27
Buffy	Harold	dog	m	1990-08-27

You could also write the previous query using the {n} ("repeat-n-times") operator:

(sql sqlHandle {select count(() from pet})

COUNT(*)
9

Earlier, you retrieved the names of the people who owned pets. You can use COUNT() if you want to find out how many pets each owner has:

(sql sqlHandle {SELECT species, COUNT(*) from pet group by species})

SPECIES	COUNT(*)
bird	2
cat	2
dog	3
hamster	1
snake	1

Number of animals by sex (in this output, NULL indicates that the sex is unknown):

(sql sqlHandle {select species, sex, COUNT(*) from pet group by species, sex})

SPECIES	SEX	COUNT(*)
bird	null	1
bird	f	1
cat	m	1
cat	f	1
dog	m	2
dog	f	1
hamster	f	1
snake	m	1

You need not retrieve an entire table when you use COUNT(). For example, the previous query, when performed just on dogs and cats, looks like this:

(sql sqlHandle {SELECT species, sex, count(*) from pet where sex is not null group by species, sex;})

SPECIES	SEX	COUNT(*)
bird	f	1
cat	m	1
cat	f	1
dog	m	2
dog	f	1
hamster	f	1
snake	m	1

Using More Than one Table

The pet table keeps track of which pets you have. If you want to record other information about them, such as events in their lives like visits to the vet or when litters are born, you need another table. What should this table look like? It needs:

To contain the pet name so that you know which animal each event pertains to.
A date so that you know when the event occurred.
A field to describe the event.
An event type field, if you want to be able to categorize events.

Given these considerations, the CREATE TABLE statement for the event table might look like this. As with the pet table, it's easiest to load the initial records by creating a tab-delimited text file containing the information:

(sql sqlHandle {load data local infile 'event.txt' into table event})

Based on what you have learned from the queries that you have run on the pet table, you should be able to perform retrievals on the records in the event table; the principles are the same. But when is the event table by itself insufficient to answer questions you might ask? Suppose that you want to find out the ages at which each pet had its litters. We saw earlier how to calculate ages from two dates. The litter date of the mother is in the event table, but to calculate her age on that date you need her birth date, which is stored in the pet table. This means the query requires both tables:

(sql sqlHandle {create table event (name varchar(20), date date, type varchar(15), remark varchar(255))})

NAME	SEX	NAME	SEX	SPECIES
Fluffy	f	Claws	m	dog
Buffy	f	Fang	m	dog
Buffy	f	Bowser	m	dog

Getting Information About Databases and Tables

What if you forget the name of a database or table, or what the structure of a given table is (for example, what its columns are called)? SQL addresses this problem through several statements that provide information about the databases and tables it supports. You have previously seen SHOW DATABASES, which lists the databases managed by the server. To find out which database is currently selected, use the DATABASE() function:

(sql sqlHandle {select database})

DATABASE()
menagerie

To display the tables in the current database issue the following command:

(sql sqlHandle {show tables})

Tables_in_menagerie
event
pet

The name of the column in the output produced by this statement is always Tables_in_db_name, where db_name is the name of the database. See the Section on "SHOW TABLES Syntax" for more information. If you want to find out about the structure of a table, the DESCRIBE command is useful; it displays information about each of a table's columns. Field indicates the column name, Type is the data type for the column, NULL indicates whether the column can contain NULL values, Key indicates whether the column is indexed, and Default specifies the column's default value. Extra displays special information about columns; for example, if a column was created with the AUTO_INCREMENT option, this is shown here. DESC is a short form of DESCRIBE. See Section, "DESCRIBE Syntax", for more information. You can obtain the CREATE TABLE statement necessary to create an existing table using the SHOW CREATE TABLE statement. See Section, "SHOW CREATE TABLE Syntax". If you have indexes on a table, SHOW INDEX FROM tbl_name produces information about them. See Section, "SHOW INDEX Syntax", for more about this statement.

(sql sqlHandle {describe pet})

Field	Type	Null	Default
name	varchar(20)	YES	NULL
owner	varchar(20)	YES	NULL
species	varchar(20)	YES	NULL
sex	varchar(1)	YES	NULL
birth	date	YES	NULL
death	date	YES	NULL

Field indicates the column name, Type is the data type for the column, NULL indicates whether the column can contain NULL values, Key indicates whether the column is indexed, and Default specifies the column's default value. Extra displays special information about columns; for example, if a column was created with the AUTO_INCREMENT option, this is shown here.

Examples of Common Queries

Here are examples of how to solve some common problems with SQL. Some of the examples use the table shop to hold the price of each article (item number) for certain traders (dealers). Supposing that each trader has a single fixed price per article, then (article, dealer) is a primary key for the records.

(sql sqlHandle {create table shop (
article int(4) unsigned zerofill default '0000' not null,
dealer char(20) default '' not null,
price double(16,2) default '0.00' not null,
primary key(article, dealer))
insert into shop values
(1,'A',3.45),(1,'B',3.99),(2,'A',10.99),(3,'B',1.45),
(3,'C',1.69),(3,'D',1.25),(4,'D',19.95)})

After issuing the statements, the table should have the following contents:

(sql sqlHandle {select * from shop})

ARTICLE	DEALER	PRICE
1	A	3.45
1	B	3.99
2	A	10.99
3	B	1.45
3	C	1.69
3	D	1.25
4	D	19.95

The maximum value for a column can be found with the following:

(sql sqlHandle {select max(article) as article from shop})

ARTICLE
4

The row holding the maximum for a certain column can be found with the following:

(sql sqlHandle {select article, dealer, price from shop order by price desc limit 1)})

ARTICLE	DEALER	PRICE
4	D	19.95

The maximum column for a certain group can be found with the following:

(sql sqlHandle {select article, dealer, price from shop s1 where price=(select max(s2.price) from shop s2 where s1.article = s2.article))})

ARTICLE	DEALER	PRICE
1	B	3.99
2	A	10.99
3	C	1.69
4	D	19.95

The preceding example uses a correlated subquery, which can be inefficient. Other possibilities for solving the problem are to use an uncorrelated subquery in the FROM clause or a LEFT JOIN. The LEFT JOIN works on the basis that when s1.price is at its maximum value, there is no s2.price with a greater value and the s2 row values will be NULL.

(sql sqlHandle {select s1.article, s1.dealer, s1.price
from shop s1
left join shop s2 ON s1.article = s2.article and s1.price < s2.price
where s2.article is null})

ARTICLE	DEALER	PRICE
1	B	3.99
2	A	10.99
3	C	1.69
4	D	19.95

User-defined variables can be employed within a query to provide more efficient selection. You can employ SQL user variables to remember results without having to store them in temporary variables in the client. For example, to find the articles with the highest and lowest price you can do this:

(sql sqlHandle {select @min_price:=min(price),@max_price:=max(price) from shop})
(sql sqlHandle {select * from shop where price=@min_price or price=@max_price})

ARTICLE	DEALER	PRICE
3	D	1.25
4	D	19.95