[MLton-user] Second try at PDF version of guide

Adam Goode adam@evdebs.org
Mon, 14 Nov 2005 01:28:10 -0500


--=-j152hIvyZCkB3LmL/gcU
Content-Type: text/plain
Content-Transfer-Encoding: quoted-printable

Here is my second attempt at a PDF version of the Guide (automatically
generated):

http://www.cs.cmu.edu/~agoode/mlton2.pdf

I have stopped using html2ps, and switched to HTMLDOC (www.htmldoc.org)
instead. HTMLDOC is in Debian.

Features of this version of the PDF guide:
  * Looks pretty nice
  * Pages are in sane order
  * Page names are prominent
  * Intro page
  * Fast to generate

Bugs:
  * I could not get the table of contents feature of HTMLDOC to work,
    it kept generating page numbers off by different amounts!
  * The "A | B | C | D | ..." on the Index page links instead to the
    same named anchors in the References page! (Anchor name collision?)
  * The picture of the developers is too large and gets cut off

Bug #2 above worries me the most, since it means we could have
collisions in the future between pages in the Wiki. It also means that
we need to rename the #A #B #C #D ... anchors in References to something
else (like #Aref #Bref, etc) to work around this bug. (I've held off on
this for now.) But even with this bug, I think we can ship the PDF along
with the release (if there's still time).

As for making changes in the Wiki, I've already added line breaks to
code in a few pages to make the PDF come out correctly. Long lines were
being truncated and reported by HTMLDOC as errors. The changes were easy
enough, there were only around 6 changes needed in the entire Wiki. (I
can't remember all the pages. Does MoinMoin have a "RecentChanges"
page?)=20



To generate the PDF, use the patch to grab-wiki below. It does 3 things:
  1. Adds <h1> to each page with the page's name. Thanks to stylesheet
     trickery (ignored by version 1.8.x of HTMLDOC), this is hidden in
     regular browsers.
  2. Creates mlton.book, the input file to HTMLDOC.
  3. Creates title.html, for generating the title page in the PDF.
     I just wrote some text for this which can easily be changed.

My sed skills are sorely lacking, so there's probably a better way to
extract the title from each page and generate the <h1> tags. I do 2
passes over each file, along with the initial sed cleanup pass.

Once grab-wiki is run, you can do=20
  $ htmldoc --batch mlton.book
to generate mlton.pdf. Or, leave out "--batch" to load up the GUI.

Patch is below for grab-wiki. As usual, comments and suggestions are
welcome.


Thanks,

Adam




Index: grab-wiki
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--- grab-wiki   (revision 4214)
+++ grab-wiki   (working copy)
@@ -27,6 +27,8 @@

 base=3D'http://mlton.org'
 version=3D`date +%Y%m%d`
+book=3D'mlton.book'
+titlepage=3D'title.html'

 index=3D'.index'
 script=3D'.script'
@@ -91,13 +93,54 @@

 for f in $(cat $index); do
        echo $f
-       head -n -19 <$f >$tmp
+       pagetitle=3D$(sed -ne 's;<title>\(.*\) - .*</title>;\1;p' $f)
+       head -n -19 $f | sed -e "s;\(<body .*\);\1\n<h1 style=3D\"display: =
none\">$pagetitle</h1>;" > $tmp
        (
                sed -f $script <$tmp
                echo '</body></html>'
        ) >$f
 done

-rm -f $tmp $index $script

+echo "Generating PDF titlepage:"
+
+cat >$titlepage <<EOF
+<html>
+<head><title>MLton $version Guide</title></head>
+<body>
+<h1>MLton Guide</h1>
+<p>
+This is the guide for MLton, an open-source, whole-program,
+optimizing Standard ML compiler.
+</p>
+
+<p>
+This guide was generated automatically from the MLton wiki,
+available online at <a href=3D"http://mlton.org/">http://mlton.org</a>.
+It is up to date for MLton $version.
+</p>
+
+</body>
+</html>
+EOF
+
+echo $titlepage
+
+
+echo "Generating htmldoc script:"
+
+cat >$book <<EOF
+#HTMLDOC 1.8.24 Open Source
+-t pdf13 -f mlton.pdf --webpage --title --linkstyle underline --size Unive=
rsal --left 1.00in --right 0.50in --top 0.50in --bottom 0.50in --header .t.=
 --footer h.1 --nup 1 --tocheader .t. --tocfooter ..i --portrait --color --=
no-pscommands --no-xrxcomments --compression=3D9 --jpeg=3D0 --fontsize 11.0=
 --fontspacing 1.2 --headingfont Helvetica --bodyfont Times --headfootsize =
11.0 --headfootfont Helvetica --charset iso-8859-1 --links --embedfonts --p=
agemode document --pagelayout single --firstpage p1 --pageeffect none --pag=
eduration 10 --effectduration 1.0 --no-encryption --permissions all  --owne=
r-password "--user-password"  --user-password "" --browserwidth 680 --no-st=
rict
+$titlepage
+Home
+Index
+EOF
+grep -v '^Home$' $index | grep -v '^Index$' >> $book
+
+echo $book
+
+
+rm -f $tmp $index $script
+
 cp Home index.html


--=-j152hIvyZCkB3LmL/gcU
Content-Type: application/pgp-signature; name=signature.asc
Content-Description: This is a digitally signed message part

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQBDeC56lenB4PQRJawRAkwAAJ9BAqRFqVDrgc/UyXVgQUpKnAU26QCeNdUy
Bq/9Si0SzQyKxkyi4mGeNqQ=
=2I7n
-----END PGP SIGNATURE-----

--=-j152hIvyZCkB3LmL/gcU--