xpathgrep
Every once in a while I (or someone else) need to scan through a bunch of XML files looking for a particular pattern. Usually something like: find all of the modules that have <code> tags. If the pattern is simple enough you can usually get by with grep. Today however, the topic of the <tgroup> tag came up along with the fact that you can actually have multiple '<table>. Unfortunately the number of trgoup children of a table is not something easy to test with grep.
It is, however, easy to test with an XPath: //cnx:table[count(cnx:tgroup) > 1] so I modified my xpath evaluator to create xpathgrep. Feed it an XPath expression and a list of CNXML files and it will tell you which files match the pattern. For example:
<167 yoda:~/tmp/xmlpages > xpathgrep "//cnx:table[count(cnx:tgroup) > 1]" */index.cnxml m10184/index.cnxml: 1 matches m10511/index.cnxml: 1 matches m12131/index.cnxml: 1 matches
What do you know? Three modules make use of this. Of course our stylesheets currently turn them into 3 separate tables but that's a problem for another day.
If you want to make use of xpathgrep you'll find it in our subversion repository. Just svn co svn+ssh://software.cnx.rice.edu/scripts/trunk scripts
Notes:
- I taught it about CNXML and MathML namespaces but not MDML, QML, or BibTeXML. Feel free to modify as you need.
- You need to specify
cnxas the prefix for CNXML tags because our XPath evaluator doesn't understand default namespaces
