Contents
Using a wild cut-and-paste to upload texts on the library is bad, because you are going to lose the emphasis and the logical markup (headings, blockquotes, lists). Unless the original piece is very simple and doesn't have emphasized words or chapter division, the text uploaded on the library would be broken (because you've lost pieces in the process).
Having a *nix box (such as Linux in whatever flavour) is strongly recommended. The days when GNU/Linux was a geek-only thing are remote past. Go install a dual boot with Ubuntu, if you want to do some serious work. If not, you're on your own and you'll find yourself doing repetitive tasks (such as search and replace) for no reason
The instructions below have been tested. You should be able to copy and paste them without errors.
Get an editor
To work comfortably on the texts the form is not adequate. You can install these two pieces of software:
* https://addons.mozilla.org/en-US/firefox/addon/4125/ an add-onn for firefox that permits you to edit the text-area with a real editor (or grab it without pain).
sudo apt-get install xul-ext-itsalltext
- And, more important:
* Emacs: http://www.gnu.org/software/emacs/ This is huge and complex program, but having menus and icons you won't feel lost. See also EditingTips. To have the highlighting of the tags you need to type M-x sgml-mode (hold the alt key down, press x, release the alt key, type sgml-mode and then enter).
sudo apt-get install emacs23 emacs23-common-non-dfsg emacs23-el
Using other editors like gedit, kate, scite, vim, whatever, is fine. Be sure to set the highlighting to HTML.
Install the library tools
Download and install the stuff:
mkdir $HOME/anarcholib
cd $HOME/anarcholib
git clone git://gitorious.org/the-anarchist-library-tools/the-anarchist-library-tools.git
sudo apt-get install libdate-calc-perl libhtml-parser-perl tidy
cd the-anarchist-library-tools
cd AnAnarchistLibrary
perl Makefile.PL
make
sudo make install
mkdir -p ~/bin
cd ~/bin
for i in ~/anarcholib/the-anarchist-library-tools/{bin,utils}/* ; do ln -s $i ; done
cd ~Usually smart GNU/Linux distributions set the user PATH (where the executable are searched) to ~/bin too, if that directory exists. So open another terminal and check:
which talimporter.pl
If it returns the full path of it, it's good. If not, you need to weak some files. Execute the following commands:
echo 'if [ -f ~/.bashrc ]; then source ~/.bashrc; fi' >> ~/.bash_profile echo 'export PATH=$HOME/bin:$PATH' >> ~/.bashrc source ~/.bashrc
Check again if which talimporter.pl returns something. If not, ask your local unix guru (or come to IRC). You have to get the executables in ~/anarcholib/the-anarchist-library-tools/utils and ~/anarcholib/the-anarchist-library-tools/bin somewhere in you PATH. If nothing else seems to work, do so (guaranteed to work):
sudo cp ~/anarcholib/the-anarchist-library-tools/{bin,utils}/* /usr/local/bin== Optional ConTeXt installation ==
If you want to produce the PDF locally, you need to install ConTeXt.
mkdir -p $HOME/usr/context cd $HOME/usr/context rsync -av rsync://contextgarden.net/minimals/setup/first-setup.sh . sh ./first-setup.sh --context=current source $HOME/usr/context/tex/setuptex echo "export PATH=$(dirname $(which context)):\$PATH" >> ~/.bashrc echo "export TEXROOT=\$HOME/usr/context/tex" >> ~/.bashrc echo "export TEXMFOS=$TEXMFOS" >> ~/.bashrc source ~/.bashrc
If in doubt, check http://wiki.contextgarden.net/ConTeXt_Standalone
Be sure that context is in your path (open another terminal):
which context
Go back to the code directory and install the auxiliary files for ConTeXt, into texmf-local:
(Assuming that ConTeXt is installed in ~/usr/context)
rsync -avh ~/anarcholib/the-anarchist-library-tools/texmf-local/ \
~/usr/context/tex/texmf-local/
mktexlsr
mtxrun --script fonts --reload
context --generate
mtxrun --script fonts --reloadNow everything should be setup.
Retrieve the page
talimporter.pl
As September 2011, the preferred way to import texts from other sources is talimporter.pl. From the help:
Usage: talimporter.pl [ options ] <url or file>
The url or the file provided is cleaned up, interesting tags are preserved and normalized. It outputs to the standard output (so if you want to save it you have to redirect the output to a file).
Example:
talimporter.pl http://url.org/file.html > myfile.xml
Options:
--encoding (default utf-8): the default will work for a lot of
site. If you see garbage in the output, often this means the
encoding is something else then utf-8 (which is what is used
nowadays). Look at the source and specify the encoding. See man
Encode::Supported for the possible options. Very often is just
latin1 or iso-8859-1 (for western countries).
If it's still messed up, contact the author
marco -at- theanarchistlibrary.org
--lang
(default "en") specify the language of the document. (like "en",
"it", "ru"). This actually doesn't do much. It just set the
language property in the output file.Another example (with the encoding):
talimporter.pl --encoding latin1 \ http://www.marxists.org/reference/archive/guillaume/works/bakunin.htm > my.xml
The text should be almost ready for the upload, but you still need to remove menus, titles, comments from the page, and fix notes, chapter, typos and so on. However, does a nice job, fixing annoying things like quote, preserving the italics and similar and adding the header.
Notably, if you want to compile the file, you have to fill the various fields in the header of the outpuf file.
Finally, you can use the webform to upload the cleaned and fixed file.
properly-rename-file.pl
Given that you filled the author and title field in the myfile.xml, you should rename it conforming to the library standard:
properly-rename-file.pl myfile.xml
rearrangethefootnotes.pl
Another useful script is utils/rearrangethefootnotes.pl a perl script that will get the numbering of the footnotes right, assuming that these follow the Guidelines. (Say, you added a note in the middle, and you want to get them right again).
rearrangethefootnotes.pl file.xml
Will output file.xmlfixed with the footnotes reordered.
Create the output formats
This is normally done on the server, by you could do it locally to check that everything is fine.
alprocessor-ng.pl --force --prefix="/tmp/" --formats "a4" Author__Title.xml tidy -quiet -utf8 -e /tmp/HTML/Author__Title.html
Optionally, you can output and check the epub.
First, install this java piece of code: http://code.google.com/p/epubcheck/, say in ~/usr/epubcheck:
mkdir -p ~/usr/epubcheck cd ~/usr/epubcheck wget http://epubcheck.googlecode.com/files/epubcheck-1.2.zip unzip epubcheck-1.2.zip
Build and test:
tal2epub-ng.pl Author__Title.xml java -jar $HOME/usr/epubcheck/epubcheck-1.2.jar Author__Title.epub
This testing routine is wrapped in the testalentry.sh script, which at this point should be in your path.
Example of a successful testing session:
user@machine:/tmp$ testalentry.sh Wolfi_Landstreicher__Critical_Thinking_as_an_Anarchist_Weapon.xml HTML output in /tmp/HTML/Wolfi_Landstreicher__Critical_Thinking_as_an_Anarchist_Weapon.html Epubcheck Version 1.2 No errors or warnings detected ePUB output in /tmp/epub/Wolfi_Landstreicher__Critical_Thinking_as_an_Anarchist_Weapon.epub * Building the pdfs for Wolfi_Landstreicher__Critical_Thinking_as_an_Anarchist_Weapon ** Compiling the a4 context --batchmode --noconsole --arguments=minsignature=30,maxsignature=84 --mode=a4 Wolfi_Landstreicher__Critical_Thinking_as_an_Anarchist_Weapon.tex ** Overfull boxes: 36 ** I needed to add 3 pages ** Compiling the a4_imposed context --batchmode --noconsole --arguments=minsignature=30,maxsignature=84 --mode=a4imposed Wolfi_Landstreicher__Critical_Thinking_as_an_Anarchist_Weapon.tex ** Overfull boxes: 20 ** I needed to add 3 pages maxpage: 44 pages: 44 signature: 44 Output in Wolfi_Landstreicher__Critical_Thinking_as_an_Anarchist_Weapon.pdf PDFs in /tmp/pdfs/
Example of a broken file that will jam the library system if it make its way to the master branch of the library git:
user@machine:/tmp$ testalentry.sh Wolfi_Landstreicher__Critical_Thinking_as_an_Anarchist_Weapon.xml HTML output in /tmp/HTML/Wolfi_Landstreicher__Critical_Thinking_as_an_Anarchist_Weapon.html line 138 column 1 - Warning: missing </em> before </p> line 141 column 1 - Warning: inserting implicit <em> line 138 column 1 - Warning: trimming empty <em> line 137 column 1 - Warning: trimming empty <p>
There are markup errors. So open /tmp/HTML/Wolfi_Landstreicher__Critical_Thinking_as_an_Anarchist_Weapon.html, find the line and the column reported by the checker, and find out the error. Most likely, there's an open tag that wasn't properly closed.
Fix it in the source file (the .xml) and retry. Repeat the procedure until you get 0 errors.
Advanced
If you have git access, check out GitMaintainerRoutine
