UtterAccess.com
X   Site Message
(Message will auto close in 2 seconds)

Welcome to UtterAccess! Please ( Login   or   Register )

Custom Search
 
   Reply to this topicStart new topic
> To Index A Set Of Unordered Records, Access 2007    
 
   
Kaarlo Tuomi
post Jan 22 2020, 05:59 AM
Post#1



Posts: 62
Joined: 9-November 18



I have been playing with the database analysis tools in ACCESS 2007, and in its analysis of one of my tables the Wizard suggested I add an index to three fields, [sname], [fname], and [birthdate].

I have previously read on this forum that a table is a set of unordered records. this came up in a thread where someone was asking about renumbering their records to fill in gaps in the number sequence. so a poster said that a table is just a set of unordered records so the number itself is irrelevant. all that mattered was that the number was uniquely identifying that particular record. which makes sense.

ok, so if a table is a set of unordered records, why is ACCESS encouraging me to index them. that seems to be a contradiction, an attempt to impose order on a fundamentally unordered set.

in your response, could you also please consider whether there is a threshold, in terms of number of records in the database, at which this might matter. a database with 100 records in it is probably not going to suffer from whatever problem not having your fields indexed causes, but a database with 3 million records probably would (assuming the problem exists). so somewhere between 100 records and 3 million records is a threshold where it starts to matter, I guess.


thank you,


Kaarlo Tuomi
Go to the top of the page
 
GroverParkGeorge
post Jan 22 2020, 07:32 AM
Post#2


UA Admin
Posts: 36,754
Joined: 20-June 02
From: Newcastle, WA


Perhaps you'll get some insight from learning more about indexes and how they work. Link

As a matter of fact, there is a very good reason to use indexes in conjunction with unordered sets of records. Indexes give you a way to find specific records within that big ol' bucket of records much more quickly!

While numbers of records does influence the importance of indexes, there are other factors. For example, if your Relational Database Application will primarily be used for entering transactions, then it's possible that more indexes might actually slow performance because records must be indexed as they are added or modified. Creating records and updating indexes for those new records takes additional time to process. If, on the other hand, it's primarily used for analysis and reporting, then having appropriate indexes can speed processing significantly.

In short, like so many things we encounter in the realm of Relational Database Application design and development, one size does NOT fit all. You will find the optimal solution for YOUR environment by trying it out.

--------------------
My Real Name Is George. Grover Park Consulting is where I did business for 20 years.
How to Ask a Good Question
Beginning SQL Server
Go to the top of the page
 
AlbertKallal
post Jan 23 2020, 01:28 AM
Post#3


UtterAccess VIP
Posts: 2,954
Joined: 12-April 07
From: Edmonton, Alberta Canada


Indexes "allow" you to get data in some order, but they don't in fact re-order the data in the tables!

So there is an issue of semantics here. The tables are still un-ordered buckets of data, but an index allows you to get/grab/retrieve that un-order bucket of data in some order you want. You still not re-organized the bucket of data - but are simply using an index to get that order you want. In other words, because you need + want some order, then you use an index, and that achieves the same goal.

And, if you don’t have an index, then the access data engine will WORK hard, and sort the data for you! – That sort is expensive in terms of computing power.

Now while the above sounds a bit "semantic" here, it actually is an important concept.

The reason why is that for a VERY long time, in data processing, order of data was important, assumed, and was a "big" part of the data processing scene.

In other words:
How you wrote code assumed that input data had SOME order.

You as a general rule had to write code, develop code that assumed order in the data. So for a "considerable" amount of time, order of data was assumed, and you could when processing data ASSUME order.

And if you pulled data, then EVERY single time, the reading of data would ALWAYS no matter what occur in the same order!

Much of the reason for this assumption was this thing: A punched card!



So, you did not have a "on line" or even a random access file system when writing code.

And to store say 100,000 subscribers to Readers digest, or say what the insurance policies are for your customers?

Well, you piled them up on pallets - often in the basement!!!

You get this:



So, some companies’ actually would use fork lifts for their data processing department!

Eventually, then magnetic tape came out, so we went from punched cards to this:



And like always, as customers grew and computer useage?

Then you wound up with this result:



But, again, you can ONLY read the data from the start. You can’t jump in the middle.

I mention the above, since a considerable amount of computing occurred in a time when order of data was fixed, but ALSO assumed by developers.

Even in the early days of desktop computers, often data processing was reading the start of a file, and then reading row by row. dBase and early FoxPro programs were such examples.

So, if I tell you “all of a sudden” that your data is NOT to be considered in order? While you may not give this much thought, a WHOLE generation of developers will say, what are you taking about? That’s like saying a car is not to have a motor anymore!

With the advent of low cost hard drive storage, then the ability to random access data started to occur. And this was REALLY great for multi-user systems, since multiple users could use the same data. And more than one user would read and write data!

However, if I can add names into a database, and so can someone else, then I can’t really assume that the last 3 records added were mine, since in-between that time, someone else might have added some records also!

As a result, the concept of no order had to be adopted and TOLD to developers. In other words, because systems are now random access, and also multi-user, then some order of data (such as invoice number) etc. cannot be assumed anymore.

And it is VERY significant that this message be conveyed to developers, as they now have to get rid of 50+ years of habits from writing software that assumed order of data.

This new “assuming” of no order of data in fact forced developers to THINK a different way. No longer would developers say to the system, please get me some customer file (that WAS in some order).

Now, we use a query language, and say get me some data, and please order the data you PULL from that big vast bucket, but that bucket don’t have order anymore.

And, this means if you write out say 5 records to a table, and then a day later read back the records, they just might not be in the same order anymore! (so, you better tell the system to get those records, and “apply” some order to invoice number, or ID or whatever. So you “apply” order to data with SQL now, but you do not store the data in order.


So, this concept of “order” of data sounds very simple, but the whole idea of “very harsh” stating that data is no longer to be considering in some order is important.

Armed with the above:
You not assume data from a system is in some order.
You not write code like we did in that past that COULD assume order. (so reading invoices or customers from punched cards, or a tape machine would ALWAYS come back in the same order it was stored. And often if new names were added, then a new punched card was created). They would then put the new cards into a big huge mechanical machine called a punched card sorter:

(and they are the most fun to watch!!!)

https://www.youtube.com/watch?v=AyO3n20SpoI
(above is a 15 second video of a card sorter running).

Now, I never really worked with magnetic tapes or punched cards, but I DID know how to merge the two tapes into one tape, and NOT have to ever store the data in the computers system – a cool trick this is, and one known by old timers, but I was OH just so lucky to have read about how to do this!! (and did this in a computer room to help some developers that did not know how to do this!!). The system had a great database, but it also had the ability to read computer tapes - and we needed that data!

Anyway, the un-ordered data concept is “so” common today, we actually have forgotten why we have to state or tell people that data is not ordered anymore!


Today, we just tell SQL to go get the data in some order we want and that is the end of the discussion.

So, today this is a VERY minor issue, but at one time the whole data processing industry was actually built around this idea of order in data.

However, Access being a whopping 28 years old?

If you poke around, you see some old “history” commands or legacy concepts STILL existing in Access! We don’t use such commands anymore, but they are like cobwebs from the past history of our industry.

For example, seek() is such a example and command. The original concept of seek() would usually re-wind the tape drive, and then start at the beginning and read for X number of records.

Today, in place of seek, we use some form of SQL command. Even the “.findFirst” command allows use of SQL “where” syntax.

Seek() in fact does not use SQL. And there also the ability to open a binary file from the hard disk in Access as “random” access.

So that advice about un-ordered data has a lot of history in our industry.

The whole simple answer here is that we in “general” now do not assume any order of data. If you want some order of data, then you tell the database engine to get the data, and put in some order you want or need.

And if the database engine has use of a high-speed index to order that data, then great. (but you not writing out the data to the database in order).

How high-speed indexing works is a subject on to its own, and medical sized textbooks are written about this subject. Access uses what are called balanced binary trees and linked lists to achieve this magical feat. It allows you to add a new name to a table with 1 million rows, but NOT have to push down the table to make a hole or “spot” for this new name in the middle of the list in the correct order.

So, you can always get the list of 1 million rows in order and do so without having to sort the data anymore!!

I hope you found this computer history lesson interesting!

Indexing should not be overused, as it does cost some record writing time. However, if the table grows beyond say 2000 rows, and you often need to grab data in some order, or pull out one record say by invoice, then I would feel safe recommending that you index that column. For rare hardly used columns, then no index is required.

So:
You don’t have to worry about this.
And a index does not actually re-order the big bucket of data, but it certainly lets you grab and get that data in order!


Regards,
Albert D. Kallal (Access MVP 2003-2017)
Edmonton, Alberta Canada
Go to the top of the page
 
Kaarlo Tuomi
post Jan 25 2020, 09:20 AM
Post#4



Posts: 62
Joined: 9-November 18



Albert, thank you very much for taking the time to write all of this. I did read it, and found it very informative, and illuminating, and relevant to the question I asked. in fact the reason I was originally confused was precisely because of your opening point, I had not realised that an index was NOT simply a way of ordering the records in a table, it was a separate object altogether.

QUOTE
I hope you found this computer history lesson interesting!


I did. Very interesting, thank you. I actually still use a very old fashioned database program called VPInfo that not many people have heard of but which I still find very useful.

thank you very much,


Kaarlo Tuomi
Go to the top of the page
 


Custom Search


RSSSearch   Top   Lo-Fi    19th February 2020 - 04:37 AM