SQL Overview and Tutorial

 

Overview

Relational database storage is made available in Analytic Information Server via a collaboration with the MySQL open source project and is available in both imbedded, local server, and remote server versions. A great deal of reference documentation and important information is available from the MySQL Project web site. In this AIS reference documentation we provide only the basic reference materials for using SQL.

Access to all SQL services in Analytic Information Server are provided by the builtin sql function. This chapter provides a tutorial introduction to AIS SQL by showing how to use the sql function to create and use a simple database. The sql function allows you to connect to a MySQL server, run queries, and view the results. This chapter assumes that MySql is installed on your machine and that a MySQL server is available to which you can connect. If this is not true, contact your MySQL administrator. (Normally, each copy of Analytic Information Server has at least an imbedded copy of MySQL available for use.) This chapter describes the entire process of setting up and using a database. If you are interested only in accessing an existing database, you may want to skip over the sections that describe how to create the database and the tables it contains. Because this chapter is tutorial in nature, many details are necessarily omitted. Consult the relevant sections of the manual for more information on the topics covered here.

Connecting to and Disconnecting from the Server

To connect to the AIS SQL server, you will usually need to provide an sql user name when you invoke sql and, most likely, a password. If the server runs on a machine other than the one where you log in, you will also need to specify a host name. Contact your administrator to find out what connection parameters you should use to connect (that is, what host, user name, and password to use). Once you know the proper parameters, you should be able to connect like this:

Host and user represent the host name where your MySQL server is running and the user name of your MySQL account. Substitute appropriate values for your setup. If that works, the sql function should return an sql handle for use in sending additional SQL commands to this connection. More than one connection may be established between AIS and the MySQL server and held open simultaneously. If, when you attempt to log in, you get an error message such as !sql.connect: Can't connect to local MySQL server!, it means that that MySQL server daemon (Unix) or service (Windows) is not running. However, if the sql function returns an SQL connection handle, then you are ready to send SQL commands to the now established connection. After you have connected successfully, you can disconnect any time like this:

Most examples in the following sections assume that you are connected to the server.

Entering Queries

Make sure that you are connected to the server, as discussed in the previous section. Doing so does not in itself select any database to work with, but that's okay. At this point, it's more important to find out a little about how to issue queries than to jump right in creating tables, loading data into them, and retrieving data from them. This section describes the basic principles of entering commands, using several queries you can try out to familiarize yourself with how SQL works. Here's a simple command that asks the server to tell you its version number and the current date.

|VERSION()|CURRENT_DATE
"5.1.2-alpha-log"#Nov,10,2005

This query illustrates several things about SQL: A command normally consists of an SQL statement in a String. (The String may be embedded in double quotes "this is a string" or may be embedded in braces {This "is" a string}) When you issue a command, sql sends it to the server for execution and returns the results in the form of a Record data type. SQL returns query result in tabular form (rows and columns) in the form of a Record data type. For illustration purposes herein, the first row contains labels for the columns. The rows following are the query results. Normally, column labels are the names of the columns you fetch from database tables. If you're retrieving the value of an expression rather than a table column (as in the example just shown), sql labels the column using the expression itself.

Keywords may be entered in any lettercase. The following queries are equivalent:

|SIN(PI()/4)||(4+1)*5|
0.7071067811865525

Creating and Using a Database

Once you know how to enter commands, you are ready to access a database. Suppose that you have several pets in your home (your menagerie) and you would like to keep track of various types of information about them. You can do so by creating tables to hold your data and loading them with the desired information. Then you can answer different sorts of questions about your animals by retrieving data from the tables. This section shows you how to:

The menagerie database is simple (deliberately), but it is not difficult to think of real-world situations in which a similar type of database might be used. For example, a database like this could be used by a farmer to keep track of livestock, or by a veterinarian to keep track of patient records. A menagerie distribution containing some of the queries and sample data used in the following sections can be obtained from the MySQL Web site. It is available in both compressed tar file and Zip formats at the MySQL Project web site.

Use the SHOW statement to find out what databases currently exist on the server:

database changed

You can use the test database (if you have access to it) for the examples that follow, but anything you create in that database can be removed by anyone else with access to it. For this reason, you should probably ask your MySQL administrator for permission to use a database of your own. Suppose that you want to call yours menagerie. The administrator needs to execute a command like this:

Under Unix, database names are case sensitive (unlike SQL keywords), so you must always refer to your database as menagerie, not as Menagerie, MENAGERIE, or some other variant. This is also true for table names. (Under Windows, this restriction does not apply, although you must refer to databases and tables using the same lettercase throughout a given query. However, for a variety of reasons, our recommended best practice is always to use the same lettercase that was used when the database was created.) Note: If you get an error such as ERROR 1044 (42000): Access denied for user 'monty'@'localhost' to database 'menagerie' when attempting to create a database, this means that your user account does not have the necessary privileges to do so. Discuss this with the administrator.

Creating a database does not select it for use; you must do that explicitly. To make menagerie the current database, use this command:

Empty set

The harder part is deciding what the structure of your database should be: what tables you need and what columns should be in each of them. You want a table that contains a record for each of your pets. This can be called the pet table, and it should contain, as a bare minimum, each animal's name. Because the name by itself is not very interesting, the table should contain other information. For example, if more than one person in your family keeps pets, you might want to list each animal's owner. You might also want to record some basic descriptive information such as species and sex. How about age? That might be of interest, but it's not a good thing to store in a database. Age changes as time passes, which means you'd have to update your records often. Instead, it's better to store a fixed value such as date of birth. Then, whenever you need age, you can calculate it as the difference between the current date and the birth date. SQL provides functions for doing date arithmetic, so this is not difficult. Storing birth date rather than age has other advantages, too:

You can probably think of other types of information that would be useful in the pet table, but the ones identified so far are sufficient: name, owner, species, sex, birth, and death. Use a CREATE TABLE statement to specify the layout of your table:

TABLES IN MENAGERIE
pet

To verify that your table was created the way you expected, use a DESCRIBE statement:

You can specify the column value separator and end of line marker explicitly in the LOAD DATA statement if you wish, but the defaults are tab and linefeed. (On an Apple machine running OS X, you would likely want to use LINES TERMINATED BY '\r'.) These are sufficient for the statement to read the file pet.txt properly. If the statement fails, it is likely that your SQL installation does not have local file capability enabled by default. Note that if you created the file on Windows with an editor that uses \r\n as a line terminator, you should use: When you want to add new records one at a time, the INSERT statement is useful. In its simplest form, you supply values for each column, in the order in which the columns were listed in the CREATE TABLE statement. Suppose that Diane gets a new hamster named "Puffball." You could add a new record using an INSERT statement like this:

Note that string and date values are specified as quoted strings here. Also, with INSERT, you can insert NULL directly to represent a missing value. You do not use \N like you do with LOAD DATA. From this example, you should be able to see that there would be a lot more typing involved to load your records initially using several INSERT statements rather than a single LOAD DATA statement.

To export the altered table back to the text file pet.txt from the pet table, use the SELECT INTO OUTFILE command:

NAMEOWNERSPECIESSEXBIRTHDEATH
FluffyHaroldcatm1994-03-17
ClawsGwendogm1990-08-27
BuffyHarolddogm1990-08-27
FangBennydogm1990-08-27
BowserDianedogm1979-08-311995-07-29
ChirpyGwenbirdf1998-09-11
WhistlerGwenbird 1997-12-09
SlimBennysnakem1996-04-29
PuffballDianehamsterf1999-03-30

This form of the SELECT statement is useful if you want to review your entire table. After you've just loaded the pet table with your initial data set this form of the SELECT statement would allow you to view the data as actually loaded to make sure there are no mistakes. For example, you may happen to think that the birth date for Bowser doesn't seem quite right. Consulting your original pedigree papers, you find that the correct birth year should be 1989, not 1979. There are at least two ways to fix this:

Selecting Particular Rows

As shown in the preceding section, it is easy to retrieve an entire table. Just omit the WHERE clause from the SELECT statement. But typically you don't want to see the entire table, particularly when it becomes large. Instead, you're usually more interested in answering a particular question, in which case you specify some constraints on the information you want. Let's look at some selection queries in terms of questions about your pets that they answer. You can select only particular rows from your table. For example, if you want to verify the change that you made to Bowser's birth date, select Bowser's record like this:

NAMEOWNERSPECIESSEXBIRTHDEATH
ChirpyGwenbirdf1998-09-11
PuffballDianehamsterf1999-03-30

You can combine conditions, for example, to locate female dogs:

NAMEOWNERSPECIESSEXBIRTHDEATH
ChirpyGwenbirdf1998-09-11
WhistlerGwenbird 1997-12-09
SlimBennysnakem1996-04-29

AND and OR may be intermixed, although AND has higher precedence than OR. If you use both operators, it is a good idea to use parentheses to indicate explicitly how conditions should be grouped:

NAMEBIRTH
Fluffy1994-03-17
Claws1990-08-27
Buffy1990-08-27
Fang1990-08-27
Bowser1979-08-31
Chirpy1998-09-11
Whistler1997-12-09
Slim1996-04-29
Puffball1999-03-30

To find out who owns pets, use this query:

OWNER
Benny
Diane
Gwen
Harold

You can use a WHERE clause to combine row selection with column selection. For example, to get birth dates for dogs and cats only, use this query:

NAMEBIRTH
Bowser1979-08-31
Claws1990-08-27
Buffy1990-08-27
Fang1990-08-27
Fluffy1994-03-17
Slim1996-04-29
Whistler1997-12-09
Chirpy1998-09-11
Puffball1999-03-30

On character type columns, sorting - like all other comparison operations - is normally performed in a case-insensitive fashion. This means that the order is undefined for columns that are identical except for their case. You can force a case-sensitive sort for a column by using BINARY like so: ORDER BY BINARY col_name. The default sort order is ascending, with smallest values first. To sort in reverse (descending) order, add the DESC keyword to the name of the column you are sorting by:

NAMESPECIESBIRTH
Chirpybird1998-09-11
Whistlerbird1997-12-09
Fluffycat1994-03-17
Bowserdog1979-08-31
Clawsdog1990-08-27
Buffydog1990-08-27
Fangdog1990-08-27
Puffballhamster1999-03-30
Slimsnake1996-04-29

Date Calculations

SQL provides several functions that you can use to perform calculations on dates, for example, to calculate ages or extract parts of dates. To determine how many years old each of your pets is, compute the difference in the year part of the current date and the birth date, then subtract one if the current date occurs earlier in the calendar year than the birth date. The following query shows, for each pet, the birth date, the current date, and the age in years.

NAMEBIRTHCURDATE()AGE
Bowser1979-08-312000-08-3121
Buffy1990-08-272000-08-3110
Chirpy1998-09-112000-08-311
Claws1990-08-272000-08-3110
Fang1990-08-272000-08-3110
Fluffy1994-03-172000-08-316
Puffball1999-03-302000-08-311
Slim1996-04-292000-08-314
Whistler1997-12-092000-08-312

To sort the output by age rather than name, just use a different ORDER BY clause:

NAMEBIRTHDEATHAGE
Bowser1979-08-311995-07-2914

The query uses death IS NOT NULL rather than death <> NULL because NULL is a special value that cannot be compared using the usual comparison operators. This is discussed later. See Section 3.3.4.6, "Working with NULL Values". What if you want to know which animals have birthdays next month? For this type of calculation, year and day are irrelevant; you simply want to extract the month part of the birth column. SQL provides several functions for extracting parts of dates, such as YEAR(), MONTH(), and DAYOFMONTH(). MONTH() is the appropriate function here. To see how it works, run a simple query that displays the value of both birth and MONTH(birth):

NAMEBIRTH
Slim1996-04-29

You can write the query so that it works no matter what the current month is, so that you do not have to use the number for a particular month. DATE_ADD() allows you to add a time interval to a given date. If you add a month to the value of CURDATE(), then extract the month part with MONTH(), the result produces the month in which to look for birthdays. Note that MONTH() returns a number between 1 and 12.

1 = NULL1 <> NULL1 < NULL1 > NULL
nullnullnullnull

Clearly you get no meaningful results from these comparisons. Use the IS NULL and IS NOT NULL operators instead:

1 IS NULL1 IS NOT NULL
01
NAMEOWNERSPECIESSEXBIRTHDEATH
BuffyHarolddogm1990-08-27
BowserDianedogm1979-08-311995-07-29

To find names ending with "fy":

NAMEOWNERSPECIESSEXBIRTHDEATH
ClawsGwendogm1990-08-27
BowserDianedogm1979-08-311995-07-29
WhistlerGwenbird 1997-12-09

To find names containing exactly five characters, use five instances of the ?_? pattern character:

NAMEOWNERSPECIESSEXBIRTHDEATH
BuffyHarolddogm1990-08-27
BowserDianedogm1979-08-311995-07-29

If you really want to force a REGEXP comparison to be case sensitive, use the BINARY keyword to make one of the strings a binary string. This query matches only lowercase "b" at the beginning of a name:

NAMEOWNERSPECIESSEXBIRTHDEATH
FluffyHaroldcatm1994-03-17
BuffyHarolddogm1990-08-27

To find names containing a "w", use this query:

NAMEOWNERSPECIESSEXBIRTHDEATH
ClawsGwendogm1990-08-27
BuffyHarolddogm1990-08-27

You could also write the previous query using the {n} ("repeat-n-times") operator:

COUNT(*)
9

Earlier, you retrieved the names of the people who owned pets. You can use COUNT() if you want to find out how many pets each owner has:

SPECIESCOUNT(*)
bird2
cat2
dog3
hamster1
snake1

Number of animals by sex (in this output, NULL indicates that the sex is unknown):

SPECIESSEXCOUNT(*)
birdnull1
birdf1
catm1
catf1
dogm2
dogf1
hamsterf1
snakem1

You need not retrieve an entire table when you use COUNT(). For example, the previous query, when performed just on dogs and cats, looks like this:

SPECIESSEXCOUNT(*)
birdf1
catm1
catf1
dogm2
dogf1
hamsterf1
snakem1

Using More Than one Table

The pet table keeps track of which pets you have. If you want to record other information about them, such as events in their lives like visits to the vet or when litters are born, you need another table. What should this table look like? It needs:

Given these considerations, the CREATE TABLE statement for the event table might look like this. As with the pet table, it's easiest to load the initial records by creating a tab-delimited text file containing the information:

Based on what you have learned from the queries that you have run on the pet table, you should be able to perform retrievals on the records in the event table; the principles are the same. But when is the event table by itself insufficient to answer questions you might ask? Suppose that you want to find out the ages at which each pet had its litters. We saw earlier how to calculate ages from two dates. The litter date of the mother is in the event table, but to calculate her age on that date you need her birth date, which is stored in the pet table. This means the query requires both tables:

NAMESEXNAMESEXSPECIES
FluffyfClawsmdog
BuffyfFangmdog
BuffyfBowsermdog

Getting Information About Databases and Tables

What if you forget the name of a database or table, or what the structure of a given table is (for example, what its columns are called)? SQL addresses this problem through several statements that provide information about the databases and tables it supports. You have previously seen SHOW DATABASES, which lists the databases managed by the server. To find out which database is currently selected, use the DATABASE() function:

DATABASE()
menagerie

To display the tables in the current database issue the following command:

Tables_in_menagerie
event
pet

The name of the column in the output produced by this statement is always Tables_in_db_name, where db_name is the name of the database. See the Section on "SHOW TABLES Syntax" for more information. If you want to find out about the structure of a table, the DESCRIBE command is useful; it displays information about each of a table's columns. Field indicates the column name, Type is the data type for the column, NULL indicates whether the column can contain NULL values, Key indicates whether the column is indexed, and Default specifies the column's default value. Extra displays special information about columns; for example, if a column was created with the AUTO_INCREMENT option, this is shown here. DESC is a short form of DESCRIBE. See Section, "DESCRIBE Syntax", for more information. You can obtain the CREATE TABLE statement necessary to create an existing table using the SHOW CREATE TABLE statement. See Section, "SHOW CREATE TABLE Syntax". If you have indexes on a table, SHOW INDEX FROM tbl_name produces information about them. See Section, "SHOW INDEX Syntax", for more about this statement.

FieldTypeNullKeyDefaultExtra
namevarchar(20)YESNULL
ownervarchar(20)YESNULL
speciesvarchar(20)YESNULL
sexvarchar(1)YESNULL
birthdateYESNULL
deathdateYESNULL

Field indicates the column name, Type is the data type for the column, NULL indicates whether the column can contain NULL values, Key indicates whether the column is indexed, and Default specifies the column's default value. Extra displays special information about columns; for example, if a column was created with the AUTO_INCREMENT option, this is shown here.

Examples of Common Queries

Here are examples of how to solve some common problems with SQL. Some of the examples use the table shop to hold the price of each article (item number) for certain traders (dealers). Supposing that each trader has a single fixed price per article, then (article, dealer) is a primary key for the records.

After issuing the statements, the table should have the following contents:

ARTICLEDEALERPRICE
1A3.45
1B3.99
2A10.99
3B1.45
3C1.69
3D1.25
4D19.95

The maximum value for a column can be found with the following:

ARTICLE
4

The row holding the maximum for a certain column can be found with the following:

ARTICLEDEALERPRICE
4D19.95

The maximum column for a certain group can be found with the following:

ARTICLEDEALERPRICE
1B3.99
2A10.99
3C1.69
4D19.95

The preceding example uses a correlated subquery, which can be inefficient. Other possibilities for solving the problem are to use an uncorrelated subquery in the FROM clause or a LEFT JOIN. The LEFT JOIN works on the basis that when s1.price is at its maximum value, there is no s2.price with a greater value and the s2 row values will be NULL.

ARTICLEDEALERPRICE
1B3.99
2A10.99
3C1.69
4D19.95

User-defined variables can be employed within a query to provide more efficient selection. You can employ SQL user variables to remember results without having to store them in temporary variables in the client. For example, to find the articles with the highest and lowest price you can do this:

ARTICLEDEALERPRICE
3D1.25
4D19.95