UtterAccess HomeUtterAccess Wiki

Welcome Guest ( Log In | Register )

Custom Search
Edit Discussion
> Access and the web HTML Object Library    
Access and the web HTML Object Library

Written originally in Spanish and published at the Foro de Access y VBA, converted by original author for contribution to the UA Wiki.

Nowadays, a common task for Access users is to get data from or put data in to web pages.

In the following article, presented are three different approaches for achieving this task using the Access 2010 Native WebBrowser Control, the ActiveX WebBrowser Control and the method .createDocumentFromUrl.

The whole article is based on the Microsoft HTML Object Library, so that library needs to added to your references in the VBA editor.

The article covers the most common HTML elements a user may need to work with, how to dimension them, get their attributes and use them inside Access VBA'.

Finally, there are some real examples in order to illustrate each of the aforementioned approaches.

Contents

Generalities


Dimensioning

CODE

' Document
Dim HTML    As HTMLDocument
Dim tHTML   As New HTMLDocument  ' --> For being used with .createDocumentFromUrl, NOT with the WebBrowser control

' Textboxes
Dim txt    As HTMLInputTextElement  ' OR As HTMLTextElement... it depends, but in general terms both work in a similar fashion.

' Command Buttons
Dim btn    As HTMLButtonElement

' Checkboxes
Dim chk    As HTMLInputElement

' Option groups
Dim opt    As HTMLOptionButtonElement

' Comboboxes
Dim sel    As HTMLSelectElement

' Tables
Dim tbl    As HTMLTable
Dim row    As HTMLTableRow
Dim cell   As HTMLTableCell
Dim col    As HTMLTableCol

' All elements
Dim elc   As HTMLHtmlElement

Late Binding can be used for dimensioning all the elements (As Object).

However, by using early binding we expose the properties related to the specific type of object we are using, enabling the intellisense support.

Setting the WebBrowser or the createDocumentFromUrl

CODE

' Set the Access 2010 Native WebBrowser Control
Set HTML = Me.WebBrowser0.Object.Document

' Set the ActiveX WebBrowser Control
Set HTML = Me.WebBrowser0.Document

' Set the .createDocumentFromUrl
Set HTML = tHTML.createDocumentFromUrl("http://web page address.html", vbNullString)

The decision of using a WebBrowser or the .createDocumentFromUrl method depends basically on whether we are going to dynamically interact with the web page (WebBrowser) or not (.createDocumentFromUrl).

Some Selected Methods

.createDocumentFromUrl With this we create a new HTML document without using a WebBrowser Control!!! The remaining syntax is the same as the one used with the WebBrowser Control.

.getElementByID For looking for an element by its "id". It is the preferred method because the "id" has to be unique, when the "id" attribute exists. However this is not always the case.

.getElementsByClassName For looking at elements by their "class".

.getElementsByName For looking for an element by its "name".

.getElementsByTagName For looking at elements by their "tag".

In order to decide the method we are going to use, it's necessary to get an idea about the objects contained at the webpage we are going to work with.

Checking the HTML elements in a web page

For that, simply right click on the web page (in any browser) and select the "View Page Source" option.

That opens the HTML page source. There we need to explore:

  • Tags, and their properties: "name", "id", "value" and associated events (i.e. "onClick", "onChange", etc.)
  • Command buttons INPUT TYPE=SUBMIT
  • Comboboxes SELECT NAME= and their values OPTION VALUE=
  • Option buttons INPUT TYPE=RADIO
  • Checkboxes INPUT TYPE=CHECKBOX
  • Textboxes INPUT TYPE=TEXT
  • Tables TABLE, their headings TH, their rows TR, etc
Heading 1 Heading 2 Heading 3 Heading 4
Row1 Value1 Value2 Value3
Row2 Value1 Value2 Value3
  • etc.

This way we can locate the desired element and interact with it.

Another commonly used option is looping through the element collection and getting the attributes needed into the immediate window in the VBA editor.


Interacting with a web page using Access VBA


Looping through the element collection for getting element's attributes

For looping through all the Tags of a web page:

CODE

For Each elc In HTML.all
   Debug.Print elc.tagName
Next

For getting all the IDs

CODE

For Each elc In HTML.all
   Debug.Print elc.getAttribute("id")
Next

... and so on.

On the other hand, we can get the attributes of just the type of element we want to work with, by iterating through all the element collection and using the 'tags' for filtering.

For getting the ID of all tables:

CODE

For Each elc In HTML.all.tags("table")
   Debug.Print elc.getAttribute("id")
Next

Finally, while the aforementioned methods work well, we can use the more specific methods .getElementsByTagName, .getElementsByName, and .getElementByID for getting the attributes of an element.

Therefore, the above code can be rewritten the following way, using the method .getElementsByTagName:

CODE

For Each btn In HTML.getElementsByTagName("Input")
   Debug.Print btn.getAttribute("id")
Next

As mentioned before, the "id" attribute of an element is a great tool for interacting with HTML elements when it exists, because it has to be unique.

In such cases where the "id"is unavailable we can use the "name" attribute of the element.

However this attribute does not have to be unique.

Whenever there are 2 or more HTML elements with the same name, we can use the Item number in order to define the element we are interested in.

CODE

HTML.all.Item("An_element").Item(1).Checked = False
HTML.all.Item("An_element").Item(2).Checked = True
HTML.all.Item("An_element").Item(3).Checked = False

The important thing here is that the attributes we get this way will work when we write our VBA code, avoiding the need of using specialized tools for this task.

Located the elements and knowing its attributes ("id", "name", etc), we can start interacting with them

Interacting with the HTML elements

A complete description of the HTML Object Library methods, elements, attributes and properties, would be as long and complex as the object itself.

Therefore, the aim of this short exposition is just to bring some useful introductory tips to the reader for interacting with web pages from Access VBA.

A complete list of the HTML elements can be found at the msdn web page:http://msdn.microsoft.com/en-us/library/ie/hh772721

In the following examples, the methods previously discussed will be used at random for illustration purposes only.

The reader should choose the most appropriate method that fits his/her needs better.

Each example prints in the immediate window the values of the attributes of the element considered.

This approach may be very useful since it allows a fast and accurate inspection of the web page we are interacting with.

For dimensioning the elements please refer to the Dimensioning section.

Textboxes

Attributes:

CODE

For Each txt In HTML.getElementsByName("name of the textbox")
   Debug.Print txt.Value, txt.id, txt.className, txt.sourceIndex, txt.isTextEdit, txt.disabled, txt.outerHTML, txt.readOnly, txt.Type
Next

Filling textboxes in a webpage:

CODE

HTML.getElementsByName("name of the textbox").innerText = "Our text"

Command Buttons

Find the button by using its value attribute and then click:

CODE

For Each btn In HTML.all.tags("Input")
   If btn.Value = "get output" Then
       btn.Click
   End If
Next btn

OR

CODE

HTML.all.Item("Name of the command button").Click

Checkboxes

Attributes:

CODE

For Each chk In HTML.getElementsByName("Name of the Checkbox")
   Debug.Print chk.Value, chk.id, chk.sourceIndex, chk.disabled, chk.Type, chk.Checked, chk.outerHTML
Next

Check a CheckBox:

CODE

HTML.getElementsByName("Name of the Checkbox").Item.Click

Option Buttons

Get the number of Option Buttons in an Option Group

CODE

Debug.Print HTML.getElementsByName("Name of the Option Button Group").length

Attributes:

CODE

For Each opt In HTML.getElementsByName("Name of the Option Button Group")
   Debug.Print opt.Value, opt.id, opt.className, opt.sourceIndex, opt.Checked, opt.defaultChecked, opt.status, opt.Type, opt.disabled
Next

Select an Option:

CODE

HTML.getElementsByName("Name of the Option Button Group").Item(Option Number).Checked = True

Comboboxes

Attributes:

CODE

For Each sel In HTML.getElementsByTagName("select")
   Debug.Print sel.id, sel.Name, sel.className, sel.sourceIndex, sel.Value, sel.selectedIndex, sel.Type, sel.isMultiLine, sel.length, sel.multiple
Next

Explore the associated events:

CODE

For Each sel In HTML.getElementsByTagName("select")
   Debug.Print sel.OnChange
Next

Select an index and execute the event (in this case onChange):

CODE

HTML.getElementsByName("Name of the combobox")().options(combobox index).Selected = True
HTML.getElementsByName("Name of the combobox")().FireEvent ("onchange")

Tables

General structure for looping through a table:

CODE

For Each tbl In HTML.getElementsByTagName("table")
   For Each row In tbl.rows
       For Each cell In row.cells
           Debug.Print cell.innerText
       Next
   Next
Next

Table attributes:

CODE

For Each tbl In HTML.getElementsByTagName("table")
   Debug.Print tbl.id, tbl.rows.length, tbl.scopeName, tbl.sourceIndex, tbl.cols, tbl.isMultiLine, tbl.localName, tbl.nodeName, tbl.nodeType
Next

Extract in a single string all the text contained in a table:

CODE

For Each tbl In HTML.getElementsByTagName("table")
   Debug.Print tbl.textContent
Next

Obtain the outterHTML text:

CODE

For Each tbl In HTML.getElementsByTagName("table")
   Debug.Print tbl.outerHTML
Next

Obtain the innerText:

CODE

Debug.Print HTML.getElementById("Table ID").innerText
Rows

Row attributes:

CODE

For Each tbl In HTML.getElementsByTagName("table")
   For Each row In tbl.rows
       Debug.Print row.cells.length, row.ie9_nodeName, row.nodeType, row.rowIndex, row.sectionRowIndex, row.innerText, row.outerText, row.sourceIndex
   Next
Next

It's worth to highlight the difference between row.rowindex and row.sectionRowIndex.

  • With row.rowindex we enumerate all rows of the table independent of the section they belong to.
  • With row.sectionRowIndex, we enumerate all rows of the table according to the section they belong to.

This is of particular importance when we work with tables which have merged headers because if we calculate the number of columns based on the first row (heading) we will get a miscalculation of the real number of columns of the body of the table. If we build an array from this, we are going to get an Error 9.

Cells/Columns

Despite in some cases we can work with the column object, this is unusual and more often we access to the contents/properties of a column by using the attributes of the cells in a row.

Attributes:

CODE

For Each tbl In HTML.getElementsByTagName("table")
   For Each row In tbl.rows
       For Each cell In row.cells
       Debug.Print cell.innerText, cell.nodeName, cell.nodeType, cell.parentNode.nodeName, _
       cell.parentElement.sourceIndex, cell.rowSpan, cell.scopeName, cell.sourceIndex
       Next
   Next
Next

Finally, we can equal the parent elements of a row with the elements of a column by using indexes. Since the indexes in a web table work in a similar fashion to those of recordsets, arrays, listboxes and comboboxes, we can equal their indexes and pass the contents of a web table to any of these objects.

Body Inner Text

Sometimes we need to capture information from a web page that is not embedded in tags. For that we capture the whole text into a string that we can process further as we do with any string

Capture the body inner text into a string:

CODE

Dim str As String
str = Trim(HTML.body.innerText)


Working Examples


Textbox & Command Button --> ActiveX

For this example the WebBrowser ActiveX Control is used and loads the Google Maps web page pointing to Chicago IL.

  • Add a WebBrowser ActiveX Control to the form
  • Add a TextBox ("Text2" in my form) to the form
  • Add a Command Button ("Command1" in my form) to the form
  • In the OnLoad event of the form put the following:
CODE

Private Sub Form_Load()
   Dim strURL As String
   Dim HTML As Object

   strURL = "http://www.nationsonline.org/oneworld/map/google_map_Chicago.htm"
   Set HTML = Me.WebBrowser0.Object
   HTML.Navigate strURL
End Sub
  • In the OnClick event of the command button put the following:
CODE

Private Sub Command1_Click()
   Dim HTML As HTMLDocument
   Dim txt As HTMLInputTextElement
   Dim btn As HTMLButtonElement

   Set HTML = Me.WebBrowser0.Document

   For Each txt In HTML.getElementsByName("address")
       txt.innerText = Me.Text2 & ", "
   Next
   DoEvents

   For Each btn In HTML.all.tags("Input")
       If btn.Value = "Go!" Then
           btn.Click
       End If
   Next btn
End Sub
  • In your textbox write the name of another city, i.e. New York

The map changes and now is displaying the map of New York

Table --> Native Access 2010

For this example, the WebBrowser control native of Access 2010 is used.

Again, using Google Maps, this time for capturing the coordinates of a city (Bogotá - Colombia)

For comparison purposes, here the WebBrowser control is unbound.

  • Add a WebBrowser control ("WebBrowser0" in my form) to the form.
  • Add a command button ("Command1" in my form) to the form.
  • In the OnLoad event of the form put the following code:
CODE

Private Sub Form_Load()
   Dim strCS As String ' Web Address
   
   strCS = "=(""http://www.nationsonline.org/oneworld/map/google_map_Bogota.htm"")"
   
   Me.WebBrowser0.ControlSource = strCS
End Sub
  • In the OnClick event of the command button put this code:
CODE

Private Sub Command1_Click()

Dim HTML        As HTMLDocument
Dim tbl         As HTMLTable
Dim row         As HTMLTableRow
Dim cell        As HTMLTableCell
Dim strCoord    As String


Set HTML = Me.WebBrowser0.Object.Document

Do While HTML.ReadyState <> "complete": DoEvents: Loop

On Error Resume Next

For Each tbl In HTML.getElementsByTagName("table")

If tbl.innerText Like "*Coordinates*" Then
   For Each row In tbl.Rows
       If row.innerText Like "*Coordinates*" Then
           For Each cell In row.Cells
               strCoord = Trim(Replace(Replace(Replace(cell.innerText, vbCrLf, " | "), " (of Map center):", ""), "City Coordinates: ", ""))
               MsgBox strCoord
           Next
       End If
   Next
End If

Next

Set HTML = Nothing
End Sub
  • Zoom in a little bit the map
  • Click the button "Command1"

A message box displaying the coordinates will appear.

Tables --> .createDocumentFromUrl

This example uses the .createDocumentFromUrl method. For the purposes of the example, the fifa standings table for Spain soccer teams are used.

The code does the following in order:

  • Opens a new HTML document
  • Captures the standings table into an array
  • Pass the array to Excel, so I don't have to worry with data types
  • Imports the table from Excel to Access
  • Deletes the Excel table


CODE

Option Compare Database
Option Explicit

Private Sub Command0_Click()
'----------------------------------------------------------------------------------
' Description: Import all the tables from an HTML document into Access
' Author: Diego F. Pereira-Perdomo
' Date Created: 29 de Marzo de 2012
'----------------------------------------------------------------------------------

' HTML Elements
Dim HTML    As HTMLDocument
Dim tHTML   As New HTMLDocument
Dim tbl     As HTMLTable
Dim row     As HTMLTableRow
Dim cell    As HTMLTableCell

' Path
Dim strPath   As String
Dim strFile   As String
Dim strTable  As String

' Array
Dim w
Dim i    As Integer
Dim j    As Integer

' Excel
Dim xlApp   As Object
Dim xlWb    As Object
Dim xlWs    As Object

' Path of my database
strPath = CurrentProject.Path & "\"

' Set the HTML document
Set HTML = tHTML.createDocumentFromUrl("http://www.fifa.com/associations/association=esp/nationalleague/standings.html", vbNullString)

' Wait until the web page loads
Do While HTML.ReadyState <> "complete": DoEvents: Loop

' Loop through the table collection of the web page
For Each tbl In HTML.getElementsByTagName("table")
   strTable = tbl.getAttribute("id") ' Get the table "id". Since it is unique, I use it for naming my Access tables
   strFile = strPath & strTable & ".xlsx" ' Path and filename for exporting to Excel
   
   ' Excel
   Set xlApp = CreateObject("Excel.Application") ' Set Excel application
   Set xlWb = xlApp.Workbooks.Add ' Creates a new Workbook
   Set xlWs = xlWb.Worksheets("Sheet1") ' Creates a new WorkSheet
   
   ' HTML
   i = tbl.Rows.Length ' Get the number of rows of the HTML table
   j = tbl.Rows(0).Cells.Length ' Get the number of columns of the HTML table
   ReDim w(i - 1, j - 1) ' Redimension my dynamic array

   For Each row In tbl.Rows ' Loop through the rows of the HTML table
       For Each cell In row.Cells ' Loop through the cells/columns of the HTML table
           w(row.rowIndex, cell.cellIndex) = Trim(cell.innerText) ' Equal the HTML table indexes to the array indexes
       Next ' Next cell/column
   Next ' Next row

   ' Excel
   With xlWb
       xlWs.Cells(1, 1).Resize(i, j).Value = w ' Pass the array to Excel
   End With

   xlWb.SaveAs (strFile) ' Save the Excel Workbook
   xlApp.Quit ' Quit Excel

   ' Cleaning
   Set xlWs = Nothing
   Set xlWb = Nothing
   Set xlApp = Nothing

   ' Access
   DoEvents
   DoCmd.TransferSpreadsheet acImport, 10, strTable, strFile, True ' Import the WorkSheet from Excel to Access
   DoEvents

   Kill strFile ' Delete the Excel Workbook

Next ' Next HTML table

' Cleaning
Set HTML = Nothing


MsgBox "Ok"
End Sub

ftp, http textfiles--> .createDocumentFromUrl

For reading .txt files stored in an ftp folder, the usual way is to do it by using the API.

By using .createDocumentFromUrl, we can accomplish the same task.

CODE

Private Sub Command0_Click()
Dim HTML    As HTMLDocument
Dim tHTML   As New HTMLDocument
Dim str     As String

Set HTML = tHTML.createDocumentFromUrl("http://Your_Web_Page/files.txt", vbNullString)

Do While HTML.ReadyState <> "complete": DoEvents: Loop

str = Trim(HTML.Body.innerText)

Debug.Print str

Set HTML = Nothing

End Sub


Translated for UtterAccess June 23, 2012

Edit Discussion
Custom Search
Thank you for your support!
This page has been accessed 91,826 times.  This page was last modified 23:50, 27 December 2014 by DanielPineault. Contributions by MargaretBartley, Jack Leach, Mark Davis and dipetete  Disclaimers