My Assistant
![]() ![]() |
|
|
Jul 2 2010, 01:21 PM
Post
#1
|
|
|
UtterAccess Enthusiast Posts: 59 |
I was reading Much ADO about Text Files: http://msdn.microsoft.com/en-us/library/ms974559.aspx
I am trying to make a somewhat universal UI for taking data files (Access, Excel, Fixed Width Text, and Delimited Text) and sending them to MS SQL server in a string. I've already built something but it has file size limitations as it loads the data into a single string. I need to find a way to split the files into chunks so that I can process large files (I was given a 750MB text file as the worst case scenario example). I also think the chunking will add some benefits of being able to report progress to the user, especially with large files. It might even open the door to allowing a pause/resume for really big files! Here is what I'm doing now: Text: open the file using 'Open filename for Input' and then using strOutput = Input$(LOF(FileNum), FileNum) to send the entire file into a string - as/is. As I mentioned - this works pretty well, until I get files that exceed the limitations of a string. This process has served me well for the past year, but it won't work for the new process where there can be significantly larger files. Access/Excel - open an ADO connection and use If Not rs.EOF Then strOutput = rs.GetString(, , strColumnDelimiter, strRowDelimiter) After reading much ADO about text files, it seems to me that perhaps opening the text files in ADO would be my best method - because then Access, Excel, and Text will use the same source (ADO recordset) and I can build a process that splits the recordset into chunks and sends them up to SQL. My problem: First of all, if I understood properly, you have to have a registry entry or an INI file to send these text files to a recordset? Secondly, with Excel and Access it's alright to convert the data into a delimited string, but with Text I want to keep each row intact, exactly as it was received on the input file and send it to SQL where we can handle the parsing and centrally store the file layouts. Is there a way I could build some kind of universal INI file that would keep the rows intact? Perhaps a varchar(MAX) column. Maybe treat it as a delimited file and use some kind of random character that would never appear on a file - a foreign sybmbol, null, or something else? This is just a thought? I just need the user to be able to select ANY text file and send it up to SQL. SQL will piece the chunks back together, and let the user decide how to parse it. We want to keep the source files intact on a somewhat universal table we're building. If anyone thinks there is a better method than ADO, let me know! I had been happy with the basic file i/o stuff since I do want to keep it intact. The benefit of using ADO is that it's the same methodology I use for Access/Excel. However, if it's too hard to keep the rows exactly as they were sent to me, maybe I need to just use the Open file option and use Line Input. |
|
|
|
Jul 3 2010, 08:42 AM
Post
#2
|
|
|
UtterAccess VIP Posts: 569 From: North Augusta, SC |
My problem: First of all, if I understood properly, you have to have a registry entry or an INI file to send these text files to a recordset? Only if you wish. You can also use a Schema.ini file which basically tells ADO the format to expect when dealing with your data. QUOTE Secondly, with Excel and Access it's alright to convert the data into a delimited string, but with Text I want to keep each row intact, exactly as it was received on the input file and send it to SQL where we can handle the parsing and centrally store the file layouts. Is there a way I could build some kind of universal INI file that would keep the rows intact? Assuming your data is rigidly formatted, you should be able to build a Schema.ini file that would parse it correctly. For example, if every row is delimited by the Pipe character ( | ), then you can specify this in your Schema.ini file. QUOTE If anyone thinks there is a better method than ADO, let me know! I had been happy with the basic file i/o stuff since I do want to keep it intact. The benefit of using ADO is that it's the same methodology I use for Access/Excel. However, if it's too hard to keep the rows exactly as they were sent to me, maybe I need to just use the Open file option and use Line Input. In many cases, simply importing the file using standard Access methods is sufficient for this, plus it gets your data into a table where you can work with it more easily. It's hard to say whether this is the "best" way, since each project has their own requirements. The only advice I can give is to try each way with different sources of Text files and see which works best for you. |
|
|
|
Jul 6 2010, 12:13 PM
Post
#3
|
|
|
UtterAccess Enthusiast Posts: 59 |
Scott,
I appreciate your time in replying. You mentioned using standard Access methods. I do not and cannot have direct table access. We also don't want any local tables in the access database file that we distribute to the users. We're 99% likely to keep it in an access data project. I also already have plenty of stored procedures written that will take the string and parse it into it's proprietary format. So the one goal that is not really up for debate is that SQL gets the whole file in a string and it handles all of the parsing from there. I'd like to adapt that process so that rather than sending the whole string I can send it in small chunks. So my goal is mainly to find the best way to build those chunks. |
|
|
|
Jul 7 2010, 04:54 AM
Post
#4
|
|
|
UtterAccess VIP Posts: 569 From: North Augusta, SC |
Do you have control over how the Text file is originally built? If so, then you can use a special character, as you suggested in your original post, to delimit a "record". I've used a combination like [|], which would never occur in "real life" to do this, then build the Schema.ini file to tell ADO how to parse the file. From there, you could send an individual record to your SP.
|
|
|
|
Jul 7 2010, 03:42 PM
Post
#5
|
|
|
UtterAccess Enthusiast Posts: 59 |
No control at all. That is the reason for the project, and why we're trying to leverage the SQL server a little. Basically the goal is to get the files loaded to SQL and let SQL handle it from there. Basically we get hundreds of different formats that need to be imported into a system. Sometimes those files come consistently year after year and other times they do not. Our goal is to load the entire file as-is to SQL. Then map the layout in SQL and right a script for any conversions (filters, concatenations, amount adjustments, etc). The goal is basically to create a simple user friendly reformatting engine where we can keep the original data intact and keep track of exactly what manipulation was done to it. Access will basically be a working prototype which most likely will be rewritten one day by our developers in C# and Silverlight. As much as possible we try to leverage the SQL server and make Access nothing more than a temporary UI. Needless to say, SQL has a lot more power to work with these files. It can work with them much faster than access and in a centralized controlled environment. I don't want access to parse the fields at all. I really just want it to send a string to the SQL server in chunks. For text files I'd like these strings to be the exact same as what the input file contained. For Excel and Access Files I'm creating a recordset and then converting it into a delimited string since it's as close as possible to the original format.
If I create a recordset from the text files what I'd really like to do is load every row into one row/column on the recordset. So using the strange character would just be to trick it into thinking every file - whether it's fixed width or delimited - is a delimited file with a single column. My goal is to break the text files into chunks so that I can send them in pieces in order to be able to track progress and also so the user can cancel if they need to. I also need to do this in order to support large files and get past the maximum string length. I just want to send these segments to SQL and let SQL sort it all out. I already have the process working, but i'm reading the whole file at once. The only reason I'm thinking of using ADO recordset is because it's what I'm already using for Access and Excel. So then I really only need to build one universal process that takes an ADO recordset and sends it to SQL in reasonably sized strings. I don't know what the fastest way to do this is. The present way I'm using for both text and ADO creates the string very fast - but I'm reading the whole file at once and not getting segments. One thing i do know is that some files will be too large to write to one string - even if I wanted to. |
|
|
|
Jul 8 2010, 11:22 AM
Post
#6
|
|
|
UtterAccess Enthusiast Posts: 59 |
I gave up on this idea. It seems like way too much hassle. Instead I stopped using 'Open filename for Input' and switched to Scripting.FileSystemObject.ReadLine. I don't need to do any filtering, don't want any parsing, and don't need to go backwards. I really didn't need any of the benefits of ADO. I only wanted to use them for text because i have to use them for Access/Excel. Creating registry entries or INI files in the file's directory is really more work than it's worth. Then it'll also be less efficient to build the string using ADO. GetString is great, but if I want chunks I'll have to take the arrays from GetRows and convert them into a string so I can send it up to SQL. It seems to me that I should only do that when I have to and use more efficient methods with plain text files.
ADO connections with text files are much more of a pain then working with ADO connections to Access and Excel! They'd be great for consistent defined processes, but I want the user to be able to take ANY text file and put it into a string and send to SQL. |
|
|
|
![]() ![]() |
|
Go to Top · Lo-Fi Version | Time is now: 26th May 2013 - 03:21 AM |