XPATH And XPATH Expressions
Earlier, I told you about xmllint and xmllint for html files. Let's say you just want to parse the <span> tags within your html file or just your <span lang="el"> tags?
Enter: Xpath.
Xpath is yet another option available within the xmllint language. Remember, an Xpath is used to navigate through elements and attributes in xml and html documents. Xpath uses Xpath Expressions to select nodes or node sets within a document.
Example 1. Looking for all of the <span> tags within an html document.
xmllint --html --xpath "//span" StedmanLesson10.html
xmllint = This tells the command line that we are going to be using the xmllint language.
space = because we always have space in between commands
-- = Remember, these are the two hyphen-minus characters that we need to tell the command line that we are going to use an xmllint option.
html = This is the xmllint option we want to use because our file is an html file.
space
-- = We are using yet another option, so we need these before the next option we are using.
xpath = Xpath is the other xmllint option we are going to use. Why? Because we want to write an xpath expression that tells the command line that we ONLY want ALL of the span tags (<span>) within this document. So we must first tell the command line that we are going to write an xpath expression.
xpath expression ---> The expression will be contained in quotation marks ("").
// = These double slashes mean ALL. We want ALL of the span tags, so we need to type this first.
span = this is the name of the tag that we want to parse
space
StedmanLesson10.html = The name of the file you want to parse goes here.
See the picture below to see what this command looks like once executed:
Notice the command on the first line........you see the xmllint language followed by the html option, then the xpath option. Then the xpath expression is given followed by the html file name. Everything under that line is what the xmllint parsed. It returned ONLY the <span> tags (including the span tags with attributes) within the StedmanLesson10.html file.
Pretty cool huh?
Now, what if we want the xmllint to be more defined? What if we just want the span tags with the lang attribute that has a value of "el"? Then we would type the command like this:
xmllint --html --xpath "//span[@lang='el']" StedmanLesson10.html
As you can see, the attribute for the span tag is typed within square brackets and begins with the @ character. The value of that attribute, "el" is placed within SINGLE quotes. Then you have your closing square bracket and closing quotation mark.
When I give this command, it returns the following:
When I give this command, it returns the following:
As you can see, only the <span lang="el"> tags have been parsed within this document.
This is just one example of how you can get really defined with xmllint for html using xpath expressions.
Photo by Caleb Jones on Unsplash
Comments
Post a Comment