UtterAccess HomeUtterAccess Wiki

Welcome Guest ( Log In | Register )

Custom Search
Edit Discussion
> Linked Table Operations    
Image:NotifWarningWeak.gif This page has been marked as incomplete

Notes:
Some sections needs filling in

If this page has been completed please remove the {{INCOMPLETE}} template from the page's source markup.
Image:NotifInfo.gif This page has been marked as requiring review

Notes:
New article; need to be validated and expansion

If this page has been reviewed please remove the {{REVIEW}} template from the page's source markup.

The purpose of the article is to provide an illustration of how Access works with a ODBC linked table. Note that the example was done with Access 2010 and MySQL 5.1, specifics may vary slightly. MySQL was chosen primarily because of convenient log format that shows a readable SQL statement received from Access. The similar process can be constructed even with SQL Server though with SQL Server, ad hoc procedures are created and re-used. However, intent is primarily on how Access builds the commands and process those commands, rather than the specifics of commands and how ODBC drivers transform them.

Note that in all cases, we are working through Access and therefore passthrough queries are not discussed here. We're mainly interested in how Access' database engine interprets and submit the SQL statement so we can work effectively with the engine. Furthermore, note that the process is generally similar regardless whether we open a table, a query, a form, or a report. In all cases, it involves opening a recordset for Access to navigate through. Thus for the article, we'll refer to "recordset" as a catchall for where process does not depend on specific Access object we're working with.

Contents

Viewing Data from Linked Tables

SELECT

Whenever Access opens an object, it almost invariably submit a SELECT statement to collect the data so it can then populate the object (e.g. form, report or table/query datasheet) with datasheet. Access incorporates several technique to help optimize the requests so though SELECTing a set of records seems simple in theory, the actual behavior are complicated as we'll see soon.

Filling, Navigating, and Caching Behavior

In this section, we will assume that the SQL statement we're using for the access object is simply "SELECT * FROM city;". Note that in case of forms & reports, we may also enter simply the table name (e.g. "city") which has same effect. Though the statement we use is simple, the actual statements Access generate and submit to the backend database server is very different. When Access opens a recordset and the type is dynaset, it needs a way of navigating the individual records. To make this happen, the first statement Access will submit is:

CODE
SELECT `city`.`ID` FROM `city`

ID column is the primary key of city table. At time of creating a new linked ODBC table, Access attempts to ascribe the primary key (and if it fails, it may prompt you with a dialog asking you to select column(s) to uniquely identify a row). Without that information, Access cannot make any updates because it'd be unable to know where to direct the updates. Behind the scenes, Access creates a local index on the columns identified as primary key (or at least unique).

Here's is how Access then fills the recordset:

CODE
SELECT `ID`,`Name`,`CountryCode`,`District`,`Population`  FROM `city`  WHERE `ID` = 129 OR `ID` = 1 OR `ID` = 2 OR `ID` = 3 OR `ID` = 4 OR `ID` = 56 OR `ID` = 57 OR `ID` = 58 OR `ID` = 59 OR `ID` = 60
SELECT `ID`,`Name`,`CountryCode`,`District`,`Population`  FROM `city`  WHERE `ID` = 61 OR `ID` = 62 OR `ID` = 34 OR `ID` = 55 OR `ID` = 33 OR `ID` = 64 OR `ID` = 65 OR `ID` = 66 OR `ID` = 67 OR `ID` = 68
SELECT `ID`,`Name`,`CountryCode`,`District`,`Population`  FROM `city`  WHERE `ID` = 69 OR `ID` = 70 OR `ID` = 71 OR `ID` = 72 OR `ID` = 73 OR `ID` = 74 OR `ID` = 75 OR `ID` = 76 OR `ID` = 77 OR `ID` = 78
SELECT `ID`,`Name`,`CountryCode`,`District`,`Population`  FROM `city`  WHERE `ID` = 79 OR `ID` = 80 OR `ID` = 81 OR `ID` = 82 OR `ID` = 83 OR `ID` = 84 OR `ID` = 85 OR `ID` = 86 OR `ID` = 87 OR `ID` = 88

Using the primary key Access fetched in previous statement, it has a chain of keys that it "walks". It happened that the order of keys came in 129, 1, 2, 3, 4, 56, 57, 58.... Access followed the same ordering. Ordering will be discussed more in a bit.

Note that Access will fill as many records as is needed to paint the screen. Thus, opening a datasheet or continuous form will request for more records than if we opened a single record form.

CODE
SELECT `ID`,`Name`,`CountryCode`,`District`,`Population`  FROM `city`  WHERE `ID` = 129

When the user does not interact with Access for a while, Access will continue to fetch records in background.

CODE
SELECT `ID`,`Name`,`CountryCode`,`District`,`Population`  FROM `city`  WHERE `ID` = 1 OR `ID` = 2 OR `ID` = 3 OR `ID` = 4 OR `ID` = 56 OR `ID` = 57 OR `ID` = 58 OR `ID` = 59 OR `ID` = 60 OR `ID` = 61
SELECT `ID`,`Name`,`CountryCode`,`District`,`Population`  FROM `city`  WHERE `ID` = 62 OR `ID` = 34 OR `ID` = 55 OR `ID` = 33 OR `ID` = 64 OR `ID` = 65 OR `ID` = 66 OR `ID` = 67 OR `ID` = 68 OR `ID` = 69

This will occur regardless whether Access needs to paint the screen or not. This is to make navigation appear faster when the user do decide to move to the next record. Note the background fetching will be always interrupted. If the user decides to move to the last record, for example:

CODE
SELECT `ID`,`Name`,`CountryCode`,`District`,`Population`  FROM `city`  WHERE `ID` = 4073 OR `ID` = 4073 OR `ID` = 4073 OR `ID` = 4073 OR `ID` = 4073 OR `ID` = 4073 OR `ID` = 4073 OR `ID` = 4073 OR `ID` = 4073 OR `ID` = 4073

Again using the same list of keys, Access grabs the last key and thus obtain the "last" record.

When user has moved to a different position that isn't adjacent or close to previous position (e.g. as in moving to last or perhaps by jumping over a large number of records), the "buffer" also moves. After moving to the last record, the background fetching obtains those IDs:

CODE
SELECT `ID`,`Name`,`CountryCode`,`District`,`Population`  FROM `city`  WHERE `ID` = 3579 OR `ID` = 537 OR `ID` = 4067 OR `ID` = 3769 OR `ID` = 3770 OR `ID` = 3771 OR `ID` = 3772 OR `ID` = 3773 OR `ID` = 3774 OR `ID` = 3775

In this situation, the ID 3579 happens to be the 100th record from the last record, 537 99th from the last, 4067 being 98th and so on. Thus, Access is always building and filling in a buffer. When user moves within the buffer, Access will not fetch a new request; it uses the data already cached in the buffer.

After a long enough while, Access will periodically refresh the data in the buffer, requerying the same keys to ensure that users do not navigate to a stale data.

NOTE: There are settings that may affect the interval of data refresh in Access Options/Client Settings but more investigation are needed whether it's actually relevant here and in what ways.

Snapshot-Type Recordset & Non-updatable Recordset

The previous section discussed the behavior of caching and navigating with a dynaset-type recordset based on the assumption that we'd be working with dynaset-type recordset more often. However, when we use a snapshot-type recordset with same object, Access will actually submit the SQL statement as it is and will do no buffering or refreshing. This is only statement sent:

CODE
SELECT `ID` ,`Name` ,`CountryCode` ,`District` ,`Population`  FROM `city`

Any subsequent navigations do not submit any statements as was the case with dynaset-type recordset. For read-only scenarios, using snapshot-type recordset can mean less network traffic overhead and thus is very appropriate for certain objects such as comboboxes and/or listboxes where data are typically read-only. Note that those also would also benefit from using passthrough queries. One possible reason to use a Access query with snapshot type instead of passthrough queries is to be able to call a VBA or Access function as shown in this sample:

CODE
SELECT ID, [Name], CountryCode, District, Format([Population],"0,000.00") AS Population FROM city;

We want to apply a special formatting to the Population column. What happens is Access will continue to submit this statement:

CODE
SELECT `ID` ,`Name` ,`CountryCode` ,`District` ,`Population`  FROM `city`

and then apply formatting to the population locally. We can just use MySQL's native functions to provide similar output and thus use passthrough but in cases where it's more convenient to be able to use VBA functions, using a snapshot-type recordset instead of a passthrough query can be a valid alternative. This also can help somewhat with concurrency by shifting some of load from server to client, let Access do the work of performing calculation. Whether this is an appropriate thing to do ultimately depends on your final functional requirement.

Furthermore, be aware that when a query is rendered nonupdatable, perhaps because we have an aggregation, distinct selection, union or so forth, the query is automatically a snapshot-type recordset and thus will behave like one regardless of the recordset type setting.

CAUTION: Refer to the WHERE section for caveats with using VBA or Access functions as it can have adverse performance impacts when used in those section. It is generally acceptable to use VBA/Access functions in the SELECT clause because it does not change the impact on network traffic as is the case with using same thing in the WHERE section.

Ordered SELECT

If the object has a statement that includes the ORDER BY statement, such as this:

CODE
SELECT * FROM city ORDER BY [Name];

and we're using dyanset-type recordset, the initial statement will be now:

CODE
SELECT `city`.`ID` FROM `city` ORDER BY `city`.`Name`

The navigation and caching behavior remains unchanged. The only difference is that Access retrieves the list of key in a ordered fashion, so it is able to fetch records that seems to be in same order based on this list of keys.

CAUTION: It is legal to use expressions in ORDER BY, but this must be done with caution. Using VBA or Access functions in an ORDER BY may adversely impact performance. Refer to WHERE section for details.

Master-Detail Linking

Access supports linking between subform and subreport which takes care of filtering the child records based on the parent record for us. When we open a form that uses the country table as its recordsource and contains city as a subform, this is what Access will submit on opening of the form:

CODE
SELECT `country`.`Code` FROM `country`

As before, it first retrieves the keys to walk the parent form. The next statement:

CODE
SELECT `Code`,`Name`,`Continent`,`Region`,`SurfaceArea`,`IndepYear`,`Population`,`LifeExpectancy`,`GNP`,`GNPOld`,`LocalName`,`GovernmentForm`,`HeadOfState`,`Capital`,`Code2`  FROM `country`  WHERE `Code` = _latin1'ABW'

populates the first record of the single-record form. After that:

CODE
SELECT `frmCity`.`ID` FROM `city` `frmCity` WHERE (`CountryCode` =  'ABW' )

Note: Access assigns an alias "frmCity" to the table "city" for this statement. The optional "AS" keyword was not included.

Access collects all primary key for city filtered by the current ID of parent record then fills the subform (which is a continuous subform):

CODE
SELECT `ID`,`Name`,`CountryCode`,`District`,`Population`  FROM `city`  WHERE `ID` = 129 OR `ID` = 129 OR `ID` = 129 OR `ID` = 129 OR `ID` = 129 OR `ID` = 129 OR `ID` = 129 OR `ID` = 129 OR `ID` = 129 OR `ID` = 129

In this case, the country had only one city record. If it had more records, we'd see IDs with different values. Note that if there were more than 10 child records, Access will still batch the requests of 10 records each.

In background, only parent form's recordset is cached. Subform content are not fetched until the user actually navigates to a different parent record. This is true regardless of whether the recordset type for either form are set to dynaset or to snapshot. In case of subreports, Access will send out individual SQL statement for each instance of subreport. Therefore, if a report has 100 parent records, Access will send out 101 SQL statement, one SQL statement to build the parent records and one each for the subreport.

Objects with RowSource

When an Access object contains certain controls such as comboboxes or listboxes which has RowSource property, those will submit a separate SQL statement in addition to the form's recordsource. Those are always treated as a snapshot-type recordset, regardless of the query's setting. Thus the cost of obtaining entries for populating comboboxes and listboxes are normally paid only once and at opening of the form. However, Access appears to submit two SQL statement per control:

CODE
SELECT `Code` ,`Name` ,`Continent` ,`Region` ,`SurfaceArea` ,`IndepYear` ,`Population` ,`LifeExpectancy` ,`GNP` ,`GNPOld` ,`LocalName` ,`GovernmentForm` ,`HeadOfState` ,`Capital` ,`Code2`  FROM `country`
SELECT `Code` ,`Name`  FROM `country`
NOTE: This may be MySQL-specific. Verifications are needed to confirm whether the same behavior occurs with other RDBMS and explanation is needed as to why Access thinks it needs to submit the first statement

Only 2nd statement is actually used for the combobox. First statement seems to have no use. To avoid this extra overhead, it is recommended that a passthrough query be used as a rowsource to ensure that Access does not issue extraneous statements.

WHERE (and GROUP BY, HAVING, ON, ORDER BY)

Generally, Access will attempt to pass the criteria back to the server. However, it is legal for us to specify some criteria that is only expressible in VBA, Access' built-in functions, or Access' SQL dialect. In the latter case, Access has no choice but to pull down everything and filter the results locally, discarding the mismatches. There is no visual feedback when this happens, other than query appears to run slowly. Thus extra care is required to write a effective filter that is also easily portable. Generally speaking, the only functions that are portable are also part of SQL standard.

Though the section focuses on WHERE clause of SQL, it is very important to understand that considerations discussed for the WHERE clause is also true of other clauses that also does filtering such as the join's ON clause, GROUP BY clause and HAVING clause. All those clauses can have expressions and because they play a role in filtering, they also have much bigger impact on the final performance of a query. SELECT clause is a special case because it does not affect the filtering and Access would have had gotten all of the values anyway so the impact of using expressions in a SELECT is typically negligible. After that, careful attention to the details is essential to writing an effective query. Though ORDER BY clause does not actually filter any rows, we cannot be certain that the impact on performance is equally negligible as would be with SELECT clause. The reason is that some database servers may attempt to optimize on ORDER BY clause and thus improve the performance than if we did it locally on Access' side. So for most scenarios, we probably want to pass off the expressions in ORDER BY back to the database server and reap every possible optimization offered by the server.

One frequent mistake to make is to use VBA's IsNull() instead of SQL's IS NULL function. If we created a new Access query with this SQL statement:

CODE
SELECT * FROM city WHERE IsNull(District)=True;

This is what Access will submit:

CODE
SELECT `District` ,`city`.`ID` FROM `city`

Essentially, it pulls down all District values so it can then evaluate whether any one of them are null, then use the remaining ID as the chain of keys to navigate. Contrast this with modified SQL for our Access query:

CODE
SELECT * FROM city WHERE District IS NULL;

Access will submit instead:

CODE
SELECT `city`.`ID` FROM `city` WHERE (`District` IS NULL )

which is much closer to the SQL we wrote in the Access query and allows the database server to do the work of filtering out District and returning only IDs for Access to walk.

However, if the same query has additional criteria that are expressible, Access will try to pass those criteria. Suppose we had this instead:

CODE
SELECT country.Code, country.Name FROM country WHERE country.Name BETWEEN 'a' AND 'azz' AND IsNull(country.Region) = True;

Access will submit:

CODE
SELECT `Name` ,`Region` ,`country`.`Code` FROM `country` WHERE (`Name` BETWEEN 'a' AND 'azz' )

Region is still pulled down but at least only subset of the Region for Access' local evaluation. Therefore, if you have a complex criteria that is not easily expressible in SQL, it may be acceptable to filter a subset using expressible SQL and tolerate the waste arising from downloading additional records that is then discarded prior to showing the users.

As mentioned earlier, certain standard SQL functions can be ported across:

CODE
SELECT * FROM city WHERE ((UCase([Name])="DENVER"));

This is what Access will submit:

CODE
SELECT `city`.`ID` FROM `city` WHERE ({fn ucase(`Name` )}= 'DENVER' )

In this case, UCase() is a part of standard SQL set and thus is expressible via ODBC. Though the query is not recommended because this will negate index optimization MySQL may use to do a case-insenstiive match of a city name. There are better alternatives to do same thing but if there's a scenario that requires use of function, it may help to use one that's SQL-standard rather than a custom VBA or built-in Access function that are not expressible.

Extra care should be given to what data types will be returned as a result of an expression. For example, if we use Val() function to coerce a piece of string into a number as thus:

CODE
SELECT city.Name FROM city WHERE city.Population=Val("0");

Access is smart enough to evaluate Val("0") before submitting it but the data type isn't just what we quite meant:

CODE
SELECT `city`.`ID` FROM `city` WHERE (`Population` =  0.00000000000000000e+000 )

Population is supposed to be an integer and by submitting a double data type, we may be inadvertently forcing the database server to negate any optimizations due to data types being not compatible with what was indexed. Because several VBA functions are designed to be flexible, they tend to return Variant or for numeric functions, Double to accommodate various possible inputs. If in doubt, consult the documentation or explicitly coerce the result of expression into a data type that's more suitable for submission.

Modifying Data in Linked Table

There are two different contexts where we can modify data; via an action query or via a form or datasheet view of SELECT query or table. The main difference between those two contexts is that with latter, Access submits additional SELECT queries as a means to reselect and thus validate the modified row. Thus, executing an action query will execute exactly one SQL statement. The additional SELECTs are mainly used for maintaining the bound forms or datasheets. If Access did not submit the additional statements, it would have no way of verifying that the data it's painting on the user's screen is consistent with what the database server actually has.

INSERT

When working through forms or datasheets, all INSERTS will always be a single-row INSERT. Access will vary its INSERT statement sent to database server based on which columns the users actually dirtied. So if the user entered values for a city name and country code but left the rest blank, the resulting INSERT statement generated by Access will come out as:

CODE
INSERT INTO  `city`  (`Name`,`CountryCode`) VALUES ('Test City','ZWE')

If the user also dirtied the District and Population, the statement would change to thus:

CODE
INSERT INTO  `city`  (`Name`,`CountryCode`,`District`,`population`) VALUES ('Test City','ZWE','Test District',1)

It should be noticed that if the user clears out a dirtied field, that will not be removed from the generated SQL statement. If user follows the exact sequence:

  1. Start to type a value in the Population's control.
  2. Move the focus to another control on the *same* row/record.
  3. Return the focus to the Population's control.
  4. Clear the original values
  5. Save the record
The resulting SQL generated by Access will be:

CODE
INSERT INTO  `city`  (`Name`,`CountryCode`,`District`,`population`) VALUES ('Z-Test','ZWE','Z',NULL)

It is important to note that between step 1 to 4, the same record is dirty through the whole time. Had the user moved focus to another row and thus save the record then later clear the original value, this would be now a separate UPDATE operation and thus we don't observe this. Nonetheless, the fact that NULL is passed in may have undesirable consequence as will be discussed more shortly. The extra NULL may also have ramifications on the server side. For example, if we had a trigger on the city table that reads the population value and population field had zero, rather than null as the default, it may execute statements that should not have been executed because of the explicit NULL being passed in.

By same token, Access does not make any checks whether any fields are actually required, so if we did not fill in CountryCode field, a required field, Access will go ahead and submit this:

CODE
INSERT INTO  `city`  (`Name`,`District`) VALUES ('Test City','Test District')

This will fail on the server's side and server will return an error to Access. If the case occurs when using forms, it can be trapped via form's OnError event. However, as KB article explains, the detailed error description isn't available via Access. If your RDBMS server allows, you may be able to query for the previous errors but not all RDBMS server has this facility.

In case where we use copy & paste to paste in multiple records via forms or datasheets, Access will process each insertion separately. So if we copied a group of 2 records and pasted it back in:

CODE
INSERT INTO  `city`  (`Name`,`CountryCode`,`District`,`population`) VALUES ('Test City','ZWE','Test District',1)
<additional SELECT statements>
INSERT INTO  `city`  (`Name`,`CountryCode`,`District`,`population`) VALUES ('Test City','ZWE','Test District',1)

So though we did only one paste operation, it counts as single insert for each record we pasted.

Volatile Primary Key

The previous section focused entirely on the resulting SQL statement generated by insert operations. In case of forms and datasheets, since we're interacting with data, it's a necessity for Access to re-select the newly inserted row so it can validate that the insert was successful and paint the values on users' screen. At that point, we must then consider the problem of obtaining the primary key. Typically, several RDBMS may have something similar to Access' AutoNumber, but we can break it up into two major classifications: auto-incrementing and sequencing. We are probably already familiar with auto-incrementing, as that is what Access' AutoNumber does in a nutshell. SQL Server and MySQL uses this. With Sequencing, which is used by Oracle and PostgreSQL, there is a separate database object where its sole function is to generate a number and typically requires a call to functions such as nextval() to obtain the next value. Note that depending on implementations, the call may actually made implicitly so even if you never have called it, the RDBMS may do it for you just as it generate a number with autoincrement. One difference between sequences and auto-increment is that sequence can be shared among more than one tables. Nonetheless, in both cases, the number to identify a new record is generated server-side. Access will not have that information until the generated INSERT has been submitted. How then would Access know what was the new ID? Thus lies the crux of problem.

There is no single universal method to retrieve the last inserted identity. Thus with any other ODBC backend, Access will make 2 attempts to retrieve the newly inserted row.

CODE
INSERT INTO  `city`  (`Name`,`CountryCode`,`District`,`population`) VALUES ('Test City','ZWE','Test District',1)
SELECT `ID`,`Name`,`CountryCode`,`District`,`population`  FROM `city`  WHERE `ID` IS NULL
SELECT `city`.`ID` FROM `city` WHERE `Name` = 'Test City' AND `CountryCode` = 'ZWE' AND `District` = 'Test District' AND `population` = 1
SELECT `ID`,`Name`,`CountryCode`,`District`,`population`  FROM `city`  WHERE `ID` = 4099

The first attempt is to select the row based on the value of the primary key at the submission. Because we're using auto-incrementing primary key, the value on Access's side is null. However, Access will not assume that null values aren't permitted -- the local index is merely unique, not an actual primary key. Hence, it submits the seemingly frivolous "ID IS NULL" selection. Because that will return zero records, Access re-tries, this time doing a comparison of each fields. Note that this actually does not guarantee that it'll return only one row. After all, this is perfectly legal to have two rows that has same values for all columns but auto-increment primary key. Access will then evaluate the resultset, eliminating any returned key values that it already has in its local index. If there is only one key left, that is the key that was most recently inserted. However, if it has zero or more than one matches after eliminating any matches with local index, then Access is unable to make an accurate determination and in this case will display the row as #Deleted. There is no way for Access to distinguish between a genuine deletion and an ambiguous identity re-select.

When Access is working with SQL Server tables that has an IDENTITY column, it will make an additional step of calling:

CODE
SELECT @@identity;

immediately after the INSERT statement and before it performs the beforementioned reselect attempts. Thus users will observe less incidences of #Deleted when working with SQL Server tables. Note that, however, when we work with SQL Server views, there is no identified IDENTITY though it may be a part of the view. It simply is not represented as an IDENTITY and appears as a plain integer. In this case, @@identity will not be called and thus SQL Server views has same risk of returning #Deleted as other ODBC sources. At the time of writing, PostgreSQL ODBC Driver is the only other known ODBC driver that provides a "SQL Server emulation mode" and thus cajole Access to generate that special @@identity call which is internally translated by PostgreSQL ODBC Driver to a currval() call.

Known #Deleted Causes

UPDATE

Row Concurrency Check

DELETE

Edit Discussion
Custom Search
Thank you for your support!
This page has been accessed 7,878 times.  This page was last modified 18:18, 17 April 2011 by Jack Leach. Contributions by BananaRepublic  Disclaimers