Difference between revisions of "Parser"

From ThorstensHome
Jump to: navigation, search
m (1 revision(s))
 
(One intermediate revision by one user not shown)
Line 1: Line 1:
 
There are two different approaches how to write a parser with KDE:  
 
There are two different approaches how to write a parser with KDE:  
 
* use a [http://doc.trolltech.com/4.4/qxmlinputsource.html QXMLInPutSource].
 
* use a [http://doc.trolltech.com/4.4/qxmlinputsource.html QXMLInPutSource].
* use the DOM model.
+
* [[use the DOM model]].
  
 
It is important to understand that you cannot write an html parser using QXMLInPutSource unless you use strict XHTML. A line like
 
It is important to understand that you cannot write an html parser using QXMLInPutSource unless you use strict XHTML. A line like
 
  <body lang=DE link=blue vlink=purple bgcolor=#eeeeff>
 
  <body lang=DE link=blue vlink=purple bgcolor=#eeeeff>
stops a QXMLInPutSource-Parser completely because the quotation marks are missing.
+
stops a QXMLInPutSource-Parser completely because the quotation marks are missing. To convert an html file into an xhtml file use
 +
* tidy
 +
* QTextEdit
 +
 
 +
You can use a parser for
 +
* converting html syntax in mediawiki syntax, see [[html2mediawiki]]
 +
* programming a spider to follow all links in an html file, see [[spider]]
 +
* creating an automated table of content of an html file, see [[add_toc]]

Latest revision as of 10:39, 19 October 2008

There are two different approaches how to write a parser with KDE:

It is important to understand that you cannot write an html parser using QXMLInPutSource unless you use strict XHTML. A line like

<body lang=DE link=blue vlink=purple bgcolor=#eeeeff>

stops a QXMLInPutSource-Parser completely because the quotation marks are missing. To convert an html file into an xhtml file use

  • tidy
  • QTextEdit

You can use a parser for

  • converting html syntax in mediawiki syntax, see html2mediawiki
  • programming a spider to follow all links in an html file, see spider
  • creating an automated table of content of an html file, see add_toc