UtterAccess.com
X   Site Message
(Message will auto close in 2 seconds)

Welcome to UtterAccess! Please ( Login   or   Register )

Custom Search
 
   Reply to this topicStart new topic
> Parsing Contract Files In Word To Export To Salesforce, Any Version    
 
   
MadPiet
post May 2 2019, 03:41 PM
Post#1



Posts: 3,043
Joined: 27-February 09



Someone's asking if I want a job that's essentially (as I understand it), collecting information from a ridiculous number of Word documents (they said 100,000!!!) and somehow transferring that to SalesForce. I researched enough to find out that Loader or whatever in SF can read Excel files, so any thoughts on how to make this less painful? I was thinking that it might be bearable if the non-boilerplate parts of the contract were in Bookmarks, because then I could loop over the bookmarks, grab the names and values, and dump that in maybe Access and then work out how to map the bookmark Name to a column in SalesForce.

Has anybody ever done this before? I found Daniel Pineault's code to extract bookmark data, which is fine - I just am a bit at a loss if they don't use bookmarks. even if they do, what's the sanest way of harvesting this kind of information? Something like this:

1. In VBA, open the Word doc,
2. Loop over the bookmarks collection, and write the {File Name, bookmark name, bookmark value} to a table.
3. Query to get the values they want to import to SF, and combine / split in the query.
4. Export to Excel.

(right?)

has anybody ever done anything like this with SalesForce? Any suggestions on where to start?
Sorry, don't know anything more about the Word files. Haven't spoken to the company that's asking for this stuff yet.

Thanks for any pointers!
Pieter
Go to the top of the page
 
DanielPineault
post May 5 2019, 08:01 AM
Post#2


UtterAccess VIP
Posts: 6,573
Joined: 30-June 11



Your general approach makes perfect sense assuming they documents use bookmarks. I wouldn't get ahead of myself and wait to see how the documents are actually structured.

  • Do they used Bookmarks?
  • Do they use the newer Content Controls?
  • Are they just plain text?
  • Is the structure standardized?
  • ...

--------------------
Daniel Pineault (2010-2018 Microsoft MVP)
Professional Help: http://www.cardaconsultants.com
Free MS Access Code, Tips, Tricks and Samples: http://www.devhut.net

* Design should never say "Look at me". It should always say "Look at this". -- David Craib
* A user interface is like a joke, if you have to explain it, it's not that good! -- Martin LeBlanc


All code samples, demonstration databases, links,... are provided 'AS IS' and are to be used at your own risk! Take the necessary steps to check, validate ...(you are responsible for your choices and actions)
Go to the top of the page
 
MadPiet
post May 5 2019, 12:23 PM
Post#3



Posts: 3,043
Joined: 27-February 09



Thanks Daniel. I'll ask them if there's something in the documents I can use to harvest the information. A hundred thousand of them means there's just no way that it makes sense to do it completely manually. Is there a tutorial or something somewhere on reading the contents of Content Controls? I just need a quick overview and maybe a sample document or two.

Time to go say hi to Google...
Go to the top of the page
 
AlbertKallal
post May 5 2019, 05:06 PM
Post#4


UtterAccess VIP
Posts: 2,810
Joined: 12-April 07
From: Edmonton, Alberta Canada


Well, just obtaining a sample of 5 documents would answer what the format of those documents are now in a matter of minutes.

But this begs the question?

How and where did they make all those documents?

If they have that many documents, then surly some process must be in place now that:

Created the documents from some data source.

Some means and system is in place to find and retrieve a document. No one just has 100,000 documents in some folder. There HAS to be a set of procedures in place that created these documents, and MORE important some existing means to find and pull a document.

So, one would be wise to figure out how the documents are being created, since bottoms to dollars, likely some data and information system is in place that helped create all those documents in the first place.

Those documents did not appear out of thin air.

I would consider going and pulling from the actual database and system that created the documents.

You may well be able to avoid looking at the documents. You simply go to the data source and database system that was used to create the documents, and transfer that data to the SF system.

It would be rather surprising that 100,000 documents were created by hand and are not part of some data base and existing information system(s) that created these documents. These documents are part of a WELL established and existing information and workflows that are in place now.

So, perhaps you don’t need to load, or look at all these documents – you hit and go to the data source that was used to create the documents and transfer that information to SF.

Regards,
Albert D. Kallal (Access MVP 2003-2017)
Edmonton, Alberta Canada

Go to the top of the page
 
MadPiet
post May 5 2019, 05:32 PM
Post#5



Posts: 3,043
Joined: 27-February 09



Okay. I guess I'm going to have a lot of questions in the interview. =) I can't imagine creating 100K documents manually, so they must be generated by code. Makes sense to figure out where the values are being generated from. Only way to find out is to ask, I suppose. Or see their documents. =)

Thanks, Albert.
Go to the top of the page
 
MadPiet
post May 5 2019, 07:42 PM
Post#6



Posts: 3,043
Joined: 27-February 09



Well well well.... if I look hard enough, even I can find it...
https://developer.salesforce.com/docs/atlas...actLineItem.htm

(basically shows the Contract objects and their properties... so at least now I know what kinds of things I'd have to get out of the Word docs...)
Go to the top of the page
 
MadPiet
post May 13 2019, 08:11 PM
Post#7



Posts: 3,043
Joined: 27-February 09



Phew! Lucky me! Someone else got elected to do the job! So no longer my problem. =)
Go to the top of the page
 
DanielPineault
post May 13 2019, 08:20 PM
Post#8


UtterAccess VIP
Posts: 6,573
Joined: 30-June 11



LOL!

--------------------
Daniel Pineault (2010-2018 Microsoft MVP)
Professional Help: http://www.cardaconsultants.com
Free MS Access Code, Tips, Tricks and Samples: http://www.devhut.net

* Design should never say "Look at me". It should always say "Look at this". -- David Craib
* A user interface is like a joke, if you have to explain it, it's not that good! -- Martin LeBlanc


All code samples, demonstration databases, links,... are provided 'AS IS' and are to be used at your own risk! Take the necessary steps to check, validate ...(you are responsible for your choices and actions)
Go to the top of the page
 
MadPiet
post May 13 2019, 08:54 PM
Post#9



Posts: 3,043
Joined: 27-February 09



I sat down and figured if there are 100,000 documents, and like 8 people working on them.. If I could fully automate the process, that's 12,500 docs per computer at say 1 minute apiece?
That's still like 8 days - even if it required zero human intervention.

Glad I didn't get the job. Not worth the $15 or so an hour they were offering. I'd go crazy after about two hours. I had that experience the last time I worked in healthcare and they told me at the interview that "they were doing basic frequency counts and having trouble with their databases".

So I think I dodged a bullet.
Go to the top of the page
 
WildBird
post May 13 2019, 09:39 PM
Post#10


UtterAccess VIP
Posts: 3,534
Joined: 19-August 03
From: Auckland, Little Australia


Sorry, did you say $15 per hour?

--------------------
Beer, natures brain defragging tool.
Go to the top of the page
 
MadPiet
post May 13 2019, 09:49 PM
Post#11



Posts: 3,043
Joined: 27-February 09



Yes, I did. And since that's WAAAAY too much like work for money like that, my initial response was to automate absolutely everything I could and log any part that failed. Then a person could clean up the "messy parts" - you know, because I'm disgustingly lazy and impatient like that. Even if it took maybe 6 minutes per document to read and process, that's an insane amount of manual labor.

100,000 docs X 6 mins apiece = 600,000 minutes = 10,000 hours / 2,000 hours/yr = 5 man years.

So either these documents must be really easy to process (and reading the various bits should be relatively easy) or it's pure insanity. And I had a hard enough time dealing with people who couldn't explain normalization and were creating "databases" once. that one day job lasted 8 months.
Go to the top of the page
 
WildBird
post May 14 2019, 01:09 AM
Post#12


UtterAccess VIP
Posts: 3,534
Joined: 19-August 03
From: Auckland, Little Australia


I used to get paid in fish, and I don't even eat seafood, yet I wouldn't consider doing that job for $15 per hour :-)

I think you dodged a bullet. Also agree with Albert, that sounds like it must have come from a system somehow, hard to imagine 100,000 documents in a folder. Possible though, I had to write and print about 30,000 letters at one job. But that was data from a system, out to print, and so you wouldnt read from the documents, but rather the system of course. Bu sounds of it, they dont really know much about the systems they have, I would guess a business unit who perhaps cant get IT dept to do anything for them? I have worked for many orgs like that.

--------------------
Beer, natures brain defragging tool.
Go to the top of the page
 
MadPiet
post May 14 2019, 02:26 AM
Post#13



Posts: 3,043
Joined: 27-February 09



It's a healthcare company. So the whole thing boggles my mind. I think what happened is that the original files were created by a merge of some kind, but then I think people modified them. But like I said, not my problem anymore. Just seems completely insane to let it get that far out of hand... (Or maybe it was from a company that they bought and they want the data?)

Only sane way I could think is if they used Controls or whatever and I could map those to fields in a table. Would have to be somewhat backward, because I'd have to do something like

For each ctl in ActiveDocument.Controls
'see if there's a matching field in my table and if so...
rs.Fields(ctl.Name) = ctl.Value
Next ctl

This post has been edited by MadPiet: May 14 2019, 02:27 AM
Go to the top of the page
 


Custom Search


RSSSearch   Top   Lo-Fi    20th May 2019 - 08:16 PM