How can I claim HTMLParser is a teaching tool about how browsers are built?

It’s such a grandiose claim. I might as well claim I invented the Internet.

I’m just saying, if you look at the code, and you figured out how it works, then it should be easy to pick out a new HTML5 <address> tag, to pull the addresses from an HTML file. It’s not implemented in my code. It didn’t exist as a tag when I wrote the HTML Parser. But if you wanted to add it, it shouldn’t be more than 25 to 30 lines of code, just to recognize there is a tag named <address>.

In fact H1 to H5 aren’t implemented explicitly, either. Not really. I have one class for all 5. I never thought I needed to distinguish between the 5. A browser needs to. Each will get displayed differently. Sometimes it’s text. Sometimes it’s a box, like a <table> tag. Eventually, I’ll probably add IRender interfaces and IFit interfaces to control what they each draw on a Bitmap, and the size of the element, based on the elements above it in the tree, if I ever wanted to render the HTML graphically.

The .BuildTree() method creates something similar to a DomDocument that the Javascript uses to read things from HTML. You know the Javascript element.childNodes() method? Well, you know one way of implementing it now, if you understand my code.

If you ever saw a CSS selector…

DIV > A[href='10']
{
    background-color:red;
}

… what is it doing? Keep that thought in the back of your mind, as I describe what I call the HPath query in HtmlParser.

HPath was created to mimic the XSL’s XPath query, to select tags in an XML document. My implementation of the Html analogy, the HPath is just a element test against the tree it creates. As it plugs the tag object in the right place in the tree, it runs hpath.Test(Node). Which just tests if the Node has attributes or is a particular kind like Hyperlink. And the tests are nested, so a HPath of “DIV/Hyperlink[@href=10]” test, will create a object test tree that tests…
1. is the current node a HYPERLINK object?
2. Does it have an attribute of “href”
3. Does this attribute=10?
4. Does it have a parent on the tree (because when it’s parsed first, it isn’t on a tree), that is a DIV object?

If so, then the tests pass.

Now consider how CSS selectors are formatted. Notice the similarity in the order? If the HPath has an eventhandler for match, then it could just do something. If this is how the CSS selector works, then the browser will paint the background red. See. In the fewest lines of code, that I can imagine to illustrate how CSS selectors and XSL queries share the same codebase. I use it, to extract text.

If you see that Hpath is a string, then you’ll realize there is a rudimentary parser that will probably give you some guidance on how to produce a language parser on your own. I didn’t take programming languages in college, but I imagine it’s very similar idea.

And all of this, in just a few hundred lines of code, as opposed to the thousands, if not millions of lines of code in any opensource project, that you have to analyze and distill what is important, before you can even begin to modify it.

TictAwf.com

Blog

How can I claim HTMLParser is a teaching tool about how browsers are built?

Leave a Reply Cancel reply