UtterAccess.com
X   Site Message
(Message will auto close in 2 seconds)

Welcome to UtterAccess! Please ( Login   or   Register )

Custom Search
 
   Reply to this topicStart new topic
> Converting (quite) Wide Public Dataset To Normalized Tables, Access 2016    
 
   
dancingwwaves
post Feb 10 2020, 11:21 AM
Post#1



Posts: 175
Joined: 2-November 09
From: USA


Good morning Access gurus!

I have a puzzle for you. I am working on a project that requires using public records from the US Dept of Education's Special Education data: https://www2.ed.gov/programs/osepidea/618-d...iles/index.html

I want to convert multiple years of the "B Child Count and Educational Environment" tables into long format, and be able to continue to do so as new tables are added each year. (I also want to convert the other tables posted on the website, but I think with some starter code I can do that)

Sample table is here: https://www2.ed.gov/programs/osepidea/618-d...ents2017-18.csv.

As you can see, this is a very wide table with multiple categories in the headings. It is actually a combination of several other tables, but I won't get into that.

I'm comfortable with editing VBA/ SQL code, but need some help getting started. I want to take the csv file above and, after importing into a table, create a new table with columns:
Year
StateName
SEAEducationEnviron
SEADisabCat
ColName: {Current column name}
ColValue: {Value} - text field - I'll handle the special characters in later code

I'm pretty sure I can normalize it from there, but I don't know how to do the initial pivoting. Can anyone help me?

Thanks!


--------------------
~Becca
Go to the top of the page
 
MadPiet
post Feb 10 2020, 11:43 AM
Post#2



Posts: 3,477
Joined: 27-February 09



Do it in Excel an clean it up there?

Would be really nice if it were easier/possible to do all the cleaning in PowerBI and then export that back to Excel, but I'm not sure that's possible. I wouldn't even bother trying to unpivot all this in Access. Excel would be much better.
Go to the top of the page
 
projecttoday
post Feb 10 2020, 11:51 AM
Post#3


UtterAccess VIP
Posts: 11,604
Joined: 10-February 04
From: South Charleston, WV


How would you do that in Excel, Mad? Copy and paste? This isn't a one time thing.

The only way I know of is to import the sheet and construct the normalized tables with code.

This has been asked before (but i don't recall the exact answer.) Might try searching on "normalize Excel Access", something like that.

--------------------
Robert Crouser
Go to the top of the page
 
dancingwwaves
post Feb 10 2020, 12:49 PM
Post#4



Posts: 175
Joined: 2-November 09
From: USA


Thanks MadPiet and ProjectToday!

Mad - doing some of the work in Excel is a possibility. I found this weblink that has helped me do it with smaller projects: https://www.listendata.com/2015/02/excel-fo...om-wide-to.html. However, I was hoping I could get some help coding it into Access instead so I don't have to do all of that by hand every time.


ProjectToday - I did find this previous post, but it wasn't particularly helpful:
http://www.UtterAccess.com/forum/reshaping...o-t1946472.html
Is there another one that I missed? I'm hoping for a VBA code sample that I can tweak. I'm awful at looping through recordsets... smile.gif

Thanks!
Becca

--------------------
~Becca
Go to the top of the page
 
projecttoday
post Feb 10 2020, 12:58 PM
Post#5


UtterAccess VIP
Posts: 11,604
Joined: 10-February 04
From: South Charleston, WV


Ha! Look who posted the answer.

Sorry, I don't recall any other thread specifically.

You would be looping through the sheet. (I don't think that's too difficult.)

--------------------
Robert Crouser
Go to the top of the page
 
MadPiet
post Feb 10 2020, 01:19 PM
Post#6



Posts: 3,477
Joined: 27-February 09



If I had my druthers, I would send it through PowerBI, only to clean it, then pump it back out again. Except MSFT doesn't want people using the PowerBI/M GUI to clean their data and then send it back out to another destination. =( Total bummer, because it's so much better and easier to use than the rest of their data cleaning tools.

Oh cool... just discovered something I hadn't seen before (or they sneaked when I wasn't watching). I did all the transforms in PowerBI and then copied the contents of the sheet and pasted it into Excel.

That was actually ridiculously easy. You could even grab the data for each of the years, add the year to the dataset, merge them all, and output one large data file for all the years. Do you need all the different years?
This post has been edited by MadPiet: Feb 10 2020, 01:39 PM
Attached File(s)
Attached File  TransformedData.zip ( 5.1K )Number of downloads: 1
 
Go to the top of the page
 
MadPiet
post Feb 10 2020, 02:38 PM
Post#7



Posts: 3,477
Joined: 27-February 09



I wouldn't do the transforms in Excel at all. I would do them in PowerBI, and then copy them to Excel. I was at a lecture that Reza Rad gave, and he did a demonstration of combining some ridiculous number of tables from a website though M. I keep hoping MSFT will make the PowerQuery part of PowerBI like a separate application, because it's insane powerful, but pretty easy to use. I think I have to use something like Tally table with a function to generate a list of files to download and merge, and then change my code to loop through that and I'd end up with one big file with all the data in it. And since PowerBI basically creates macro-like transforms, I can just rerun it for more data.

As I said in another message on this discussion, the transforms I did took maybe a couple of minutes. (No code libraries hiding up my sleeve...) Oz du Soleil has some pretty insane tutorials online showing how to use PowerQuery… Gil Raviv wrote a whole book on it. I've been mucking with Access and VBA for like 20 years, and M/PowerQuery just absolutely blows the doors of anything you can do with Access in terms of transformations.
Go to the top of the page
 
MadPiet
post Feb 10 2020, 03:22 PM
Post#8



Posts: 3,477
Joined: 27-February 09



That's the whole point of doing it in PowerQuery. It's basically a macro language, so you can re-run an existing macro to get new data, or modify it a little (or use variables) so it can grab the data from (say) all years of "Child Count" on that webpage, and insert them into your dataset. (Still have to figure out how to export from PowerBI/M), but then you could just open the file to kick off the data grab, and then you can just use the data... no more work required. The macro grabs the data from the web page(s), merges it, and cleans it all up doing the transforms.

Oh, found it... here's an article by Reza Rad. He explains how to loop through a collection of files and process them all at once into a single table.
https://radacad.com/be-fitbit-bi-developer-...h-all-csv-files
This post has been edited by MadPiet: Feb 10 2020, 04:18 PM
Go to the top of the page
 
projecttoday
post Feb 10 2020, 04:14 PM
Post#9


UtterAccess VIP
Posts: 11,604
Joined: 10-February 04
From: South Charleston, WV


Destination of this is an Access database. right? Is this programmable? OP wants to put this into production, not just a one-timer. With Access VBA you can to program it to one-click. Is it expensive?

--------------------
Robert Crouser
Go to the top of the page
 
MadPiet
post Feb 10 2020, 04:28 PM
Post#10



Posts: 3,477
Joined: 27-February 09



https://powerbi.microsoft.com/en-us/

I have a copy of PowerBI… runs on my desktop... It's free. Might require Gil Raviv's book to get your head around the more complicated parts... but there may be videos on how to do pretty much anything you can think of by either Reza Rad/Leila Etaati, and/or GuyInACube (Adam Saxton and Patrick LeBlanc). Once you get your head around PowerQuery (even if it's figuring out which buttons to push), it's mad crazy handy. Oh right... the other guy to check out is OzDuSoleil - he's really good at PowerQuery too.
Go to the top of the page
 
dancingwwaves
post Feb 11 2020, 01:31 PM
Post#11



Posts: 175
Joined: 2-November 09
From: USA


Thanks Mad. I appreciate the links and information. I'll look into Microsoft Power BI, however I don't think it is the solution I'm looking for.

Due to security restrictions I can't install new programs on my computer. I do have R and could use that to transform the datasets, but I was really hoping to keep this in Access.

I also need to keep the data column as a text field due to the special characters. In the sample you sent, I think the special characters were converted to nulls, which is a problem I also found with using the Excel method to transform the tables. Power BI is strong enough I'm sure there is a way around it, but I'm really just looking for some starter VBA code.


Thanks for your help and the resources!

Becca

--------------------
~Becca
Go to the top of the page
 
MadPiet
post Feb 11 2020, 01:45 PM
Post#12



Posts: 3,477
Joined: 27-February 09



You can do all the transforms you need to fix this.

Without doing it all, this is what you could do:
1. put all the files in the same workbook.
2. For each file
a. Remove the top 4 rows
b. promote header row to labels or whatever it's called
c. select first like 5 columns (they're all text), and in PowerPivot UNPIVOT OTHER.
3. then you can merge all the files. (appends one to another... just tidier in PowerBI, but absolutely possible in Excel.)

I *think* you can copy/paste that back out to a normal Excel table.
This post has been edited by MadPiet: Feb 11 2020, 02:39 PM
Go to the top of the page
 
MadPiet
post Feb 11 2020, 02:58 PM
Post#13



Posts: 3,477
Joined: 27-February 09



I was going to edit my response, but this site wouldn't let me.

it looks like you'd have to do the
1. remove top 4 (?) rows
2. promote headers
3. Unpivot other columns (except leftmost 5 or so … they're all text)
4. remove the dashes/convert them to NULL/nothing. (F&R?)

Then you can append all these to get one big dataset for all the years.

If you do this in Excel, it uses PowerQuery - so unless I'm mistaken (given that I only use Excel when they force me to, that's entirely possible), you have to do the above steps for each file, and then append/combine them in PowerQuery. Then you can do whatever analysis you want. The weird part is that there are a bunch of overlapping groups in your dataset.

Pieter
Go to the top of the page
 
dancingwwaves
post Feb 11 2020, 03:08 PM
Post#14



Posts: 175
Joined: 2-November 09
From: USA


Hi Mad,

Wow, I think that worked! I didn't know about PowerPivot in Excel, thank you!

--------------------
~Becca
Go to the top of the page
 
MadPiet
post Feb 11 2020, 03:20 PM
Post#15



Posts: 3,477
Joined: 27-February 09



I haven't worked out the merging of all the data files (I assume you're appending them all so you have one dataset, right?)

Yeah, PQ is super awesome once you get your head around it...
Go to the top of the page
 
MadPiet
post Feb 11 2020, 03:27 PM
Post#16



Posts: 3,477
Joined: 27-February 09



If you find yourself doing a lot of this kind of thing, you might want to check out Gil Raviv's book on PowerQuery. The number of things you can do with it once you get your head around how it works is staggering.

https://www.amazon.com/gp/product/150930795...46964494a1e2bd3

Did you do each of the files separately, or did you figure out how to do them all at once? That's the only part I was having trouble with.
This post has been edited by MadPiet: Feb 11 2020, 03:31 PM
Go to the top of the page
 
dancingwwaves
post Feb 11 2020, 03:39 PM
Post#17



Posts: 175
Joined: 2-November 09
From: USA


I was planning on doing all the merging in Access, but if it turns out to make more sense to do it through PowerPivot (PivotQuery?) I'll do it there.

My plan is to get all the old data into an Access database, restructure it to normalize it, and build code to help with restructuring it/ appending it in future years. This is a good start, thank you!

--------------------
~Becca
Go to the top of the page
 
MadPiet
post Feb 11 2020, 03:50 PM
Post#18



Posts: 3,477
Joined: 27-February 09



Oh, okay... then you can stop after the UNPIVOT, I think. Then you'll only have a few columns. Then you could just link to the data table/query in your database and summarize in Excel again. <g>.
Go to the top of the page
 
MadPiet
post Feb 13 2020, 07:20 PM
Post#19



Posts: 3,477
Joined: 27-February 09



Which columns were you going to keep? There are whole bunch that have either no values or are not discrete ages, but age groups. So, depending on what kind of analysis you intend to do, you may not need those records at all and could filter those out.

If I keep only the [age N] values, I can strip that and convert. It's the stuff like this that's the problem "Hispanic/Latino Age 3-5"
Go to the top of the page
 


Custom Search


RSSSearch   Top   Lo-Fi    22nd February 2020 - 09:30 PM