UtterAccess.com
X   Site Message
(Message will auto close in 2 seconds)

Welcome to UtterAccess! Please ( Login   or   Register )

Custom Search
 
   Reply to this topicStart new topic
> Can I Whinge Here?    
 
   
haresfur
post Feb 11 2018, 10:03 PM
Post#1



Posts: 277
Joined: 4-April 12
From: Bendigo, Australia


Is this an appropriate place to vent about someone sending out periodic data updates as .csv files where:

  • The field names are not all valid because they contain characters that are not allowed
  • The second row is dashes for each field like, "----"
  • Null fields usually but not always contain "NULL"
  • Some text field are delimited with quotes but some are not
  • The second last row is blank
  • The last row is something like "(445135 rows affected)"


--------------------
-- Evan
Go to the top of the page
 
WildBird
post Feb 11 2018, 11:14 PM
Post#2


UtterAccess VIP
Posts: 3,362
Joined: 19-August 03
From: Perth, Australia


That's a report, not a CSV. What system are the files coming from?

--------------------
Beer, natures brain defragging tool.
Go to the top of the page
 
haresfur
post Feb 12 2018, 12:03 AM
Post#3



Posts: 277
Joined: 4-April 12
From: Bendigo, Australia


Lord only knows. I'm pretty sure it is generated from a govt agency's database by a subcontractor.

They have changed a few things recently and messed up at least one table. But yes they supply it as a csv file, however they got to that.

--------------------
-- Evan
Go to the top of the page
 
JonSmith
post Feb 12 2018, 05:46 AM
Post#4



Posts: 3,545
Joined: 19-October 10



Consider yourself lucky thats all you need to deal with.

In the organisation I work for they wanted to create a standard format for file received from external sources and processed by our SSIS. XML was chosen (smart choice) but at the time we had lazy developers in one of the main systems and they didn't want to re-write their import routine.
They demanded an XML file with line indents and formatting and the CSV data to simply be dumped into a single node formatted as CDATA[] with line breaks at the top and bottom. This meant the lazy developer could use the exact same import scripts as before but tell it to skip the top 5 rows and the bottom 3. It creates an pretty much unusable file in terms of it being XML since its putting an entire table of data inside a single field. To make matters worse some of the CSV's embedded are 'fixed width' only they aren't. They start fixed width but when numeric values come in they mess it all up.

See attached a screenshot with the data obscured (Data is already mocked up for testing so sensitivity is low as all the values are garbage but best to keep any codes or data sources private).
Attached File(s)
Attached File  XML_Lite.jpg ( 306.9K )Number of downloads: 21
 
Go to the top of the page
 
haresfur
post Feb 12 2018, 05:06 PM
Post#5



Posts: 277
Joined: 4-April 12
From: Bendigo, Australia


Ouch

--------------------
-- Evan
Go to the top of the page
 
DougY
post Feb 28 2018, 01:11 PM
Post#6


Utterly Abby-Normal
Posts: 15,031
Joined: 30-June 04
From: Seattle, WA [USA]


Wildbird is right. It's not a CSV file. Sounds more like "Results to Text" dump from SSMS (or any other database engine equivalent), especially based on the "(56702 row(s) affected)" line with the blank line above.

Do you have a contact person at source? Maybe they can change to "Results to Grid" and then save the file for you. That will provide a cleaner recordset.


--------------------
- Doug

When I said the program was fool proof, I hadn’t realized there were so many fools.
Go to the top of the page
 
zaxbat
post Feb 28 2018, 01:46 PM
Post#7



Posts: 1,040
Joined: 26-January 06
From: .....the wiregrass (either you know or you don't)


Yeah....looks like screen output of an executed query piped into a txt file. But I see the problem....they don't want you to process the whole table...they only want you to process the new updates and this was their fastest way to provide it to you....hmmmmm In a way, they are doing you a big favor by not asking you to go through the whole table and only process the records that are different from last run. That's not fun either, and no guarantees that the nulls and quotes problems wouldn't be there too.


Well then.....easy enough to strip off the first two rows and the last two rows but having to parse through and check each field to fix null vs non-null and quotes vs. non will really slow this thing down. Simple, but slow. That's a pretty big table.
This post has been edited by zaxbat: Feb 28 2018, 02:37 PM

--------------------
Kindest regards, and Cheers!
ZAX

A picture is worth a thousand words and a zipped DB is worth a thousand pictures.
Oh, and....please don't disappear into the Twilight Zone.... Holler back with your results!
Go to the top of the page
 
haresfur
post Mar 1 2018, 07:31 PM
Post#8



Posts: 277
Joined: 4-April 12
From: Bendigo, Australia


xaxbat,

I believe they are returning the entire table, not just new updates - which will generally be only appends. However, I have no guarantee that there are not any changes included. I have my own table where I can flag data points that are off-trend. Unfortunately if they change the value (say by correcting a data-entry error), I can't be sure that my flag still applies.

What I ended up doing with help from UA is to attach the .csv files to access and set the correct data types. When I replace the .csv files with more recent ones they are still attached correctly. Then I empty out my access tables, and append all the records back in, blasting the header/footer records and turning "NULL" to actual null values e.g.

CODE
INSERT INTO tblGW_level_data ( BoreID, [DateTime], METHOD, Condition, QUALITY, Wlmp_mAHD, Dbns_mAHD, RWL_mAHD )
SELECT GW_level_data.BORE_ID, CDate([DATE] & " " & CStr(Int([TIME]/100)) & ":" & Right([TIME],2)) AS [DateTime], GW_level_data.METHOD, IIf([GW_level_data].[CONDITION]<>"NULL",[GW_level_data].[CONDITION]) AS Condition, GW_level_data.QUALITY, GW_level_data.[WLMP_(m)], GW_level_data.[DBNS_(m)], GW_level_data.RWL_mAHD
FROM GW_level_data
WHERE (((GW_level_data.BORE_ID) Is Not Null And (GW_level_data.BORE_ID)<>"-------" And (GW_level_data.BORE_ID) Not Like "(*"));


DougY,

Thanks for the suggestion on using "Results to Grid". I'll bring it up with them. My ultimate goal is to be able to get the data through a service and not have to keep my own version.

--------------------
-- Evan
Go to the top of the page
 


Custom Search
RSSSearch   Top   Lo-Fi    17th July 2018 - 02:47 AM