Using a wild cut-and-paste to upload texts on the library is bad, because you are going to lose the emphasis and the logical markup (headings, blockquotes, lists). Unless the original piece is very simple and doesn't have emphasized words or chapter division, the text uploaded on the library would be broken (because you've lost pieces in the process).

Having a *nix box (such as Linux in whatever flavour) is strongly recommended. The days when GNU/Linux was a geek-only thing are remote past. Go install a dual boot with Ubuntu, if you want to do some serious work. If not, you're on your own and you'll find yourself doing repetitive tasks (such as search and replace) for no reason

The instructions below have been tested. You should be able to copy and paste them without errors.

Get an editor

To work comfortably on the texts the form is not adequate. You can install these two pieces of software:

* https://addons.mozilla.org/en-US/firefox/addon/4125/ an add-onn for firefox that permits you to edit the text-area with a real editor (or grab it without pain).

sudo apt-get install xul-ext-itsalltext

* Emacs: http://www.gnu.org/software/emacs/ This is huge and complex program, but having menus and icons you won't feel lost. See also EditingTips. To have the highlighting of the tags you need to type M-x sgml-mode (hold the alt key down, press x, release the alt key, type sgml-mode and then enter).

sudo apt-get install emacs23 emacs23-common-non-dfsg emacs23-el

Using other editors like gedit, kate, scite, vim, whatever, is fine. Be sure to set the highlighting to HTML.

Install the library tools

Download and install the stuff:

mkdir $HOME/anarcholib
cd $HOME/anarcholib
git clone git://gitorious.org/the-anarchist-library-tools/the-anarchist-library-tools.git
sudo apt-get install libdate-calc-perl libhtml-parser-perl tidy
cd the-anarchist-library-tools
cd AnAnarchistLibrary
perl Makefile.PL
make
sudo make install
mkdir -p ~/bin
cd ~/bin
for i in ~/anarcholib/the-anarchist-library-tools/{bin,utils}/* ; do ln -s $i ; done
cd ~

Usually smart GNU/Linux distributions set the user PATH (where the executable are searched) to ~/bin too, if that directory exists. So open another terminal and check:

which talimporter.pl

If it returns the full path of it, it's good. If not, you need to weak some files. Execute the following commands:

echo 'if [ -f ~/.bashrc ]; then source ~/.bashrc; fi' >> ~/.bash_profile
echo 'export PATH=$HOME/bin:$PATH' >> ~/.bashrc
source ~/.bashrc

Check again if which talimporter.pl returns something. If not, ask your local unix guru (or come to IRC). You have to get the executables in ~/anarcholib/the-anarchist-library-tools/utils and ~/anarcholib/the-anarchist-library-tools/bin somewhere in you PATH. If nothing else seems to work, do so (guaranteed to work):

sudo cp ~/anarcholib/the-anarchist-library-tools/{bin,utils}/* /usr/local/bin

== Optional ConTeXt installation ==

If you want to produce the PDF locally, you need to install ConTeXt.

mkdir -p  $HOME/usr/context
cd $HOME/usr/context
rsync -av rsync://contextgarden.net/minimals/setup/first-setup.sh .
sh ./first-setup.sh --context=current
source $HOME/usr/context/tex/setuptex
echo "export PATH=$(dirname $(which context)):\$PATH" >> ~/.bashrc
echo "export TEXROOT=\$HOME/usr/context/tex" >> ~/.bashrc
echo "export TEXMFOS=$TEXMFOS" >> ~/.bashrc
source ~/.bashrc

If in doubt, check http://wiki.contextgarden.net/ConTeXt_Standalone

Be sure that context is in your path (open another terminal):

which context

Go back to the code directory and install the auxiliary files for ConTeXt, into texmf-local:

(Assuming that ConTeXt is installed in ~/usr/context)

rsync -avh ~/anarcholib/the-anarchist-library-tools/texmf-local/ \
       ~/usr/context/tex/texmf-local/
mktexlsr
mtxrun --script fonts --reload
context --generate
mtxrun --script fonts --reload

Now everything should be setup.

Retrieve the page

talimporter.pl

As September 2011, the preferred way to import texts from other sources is talimporter.pl. From the help:

Usage: talimporter.pl [ options ] <url or file>

The url or the file provided is cleaned up, interesting tags are preserved and normalized. It outputs to the standard output (so if you want to save it you have to redirect the output to a file).

Example:

     talimporter.pl http://url.org/file.html > myfile.xml

Options:

   --encoding (default utf-8): the default will work for a lot of
     site. If you see garbage in the output, often this means the
     encoding is something else then utf-8 (which is what is used
     nowadays). Look at the source and specify the encoding. See man
     Encode::Supported for the possible options. Very often is just
     latin1 or iso-8859-1 (for western countries).

     If it's still messed up, contact the author
     marco -at- theanarchistlibrary.org

   --lang 

     (default "en") specify the language of the document. (like "en",
     "it", "ru"). This actually doesn't do much. It just set the
     language property in the output file.

Another example (with the encoding):

talimporter.pl --encoding latin1 \
  http://www.marxists.org/reference/archive/guillaume/works/bakunin.htm > my.xml

The text should be almost ready for the upload, but you still need to remove menus, titles, comments from the page, and fix notes, chapter, typos and so on. However, does a nice job, fixing annoying things like quote, preserving the italics and similar and adding the header.

Notably, if you want to compile the file, you have to fill the various fields in the header of the outpuf file.

Finally, you can use the webform to upload the cleaned and fixed file.

properly-rename-file.pl

Given that you filled the author and title field in the myfile.xml, you should rename it conforming to the library standard:

properly-rename-file.pl myfile.xml

rearrangethefootnotes.pl

Another useful script is utils/rearrangethefootnotes.pl a perl script that will get the numbering of the footnotes right, assuming that these follow the Guidelines. (Say, you added a note in the middle, and you want to get them right again).

Will output file.xmlfixed with the footnotes reordered.

Create the output formats

This is normally done on the server, by you could do it locally to check that everything is fine.

alprocessor-ng.pl --force --prefix="/tmp/" --formats "a4" Author__Title.xml
tidy -quiet -utf8 -e /tmp/HTML/Author__Title.html

Optionally, you can output and check the epub.

First, install this java piece of code: http://code.google.com/p/epubcheck/, say in ~/usr/epubcheck:

mkdir -p ~/usr/epubcheck
cd ~/usr/epubcheck
wget http://epubcheck.googlecode.com/files/epubcheck-1.2.zip
unzip epubcheck-1.2.zip

Build and test:

tal2epub-ng.pl Author__Title.xml
java -jar $HOME/usr/epubcheck/epubcheck-1.2.jar  Author__Title.epub 

This testing routine is wrapped in the testalentry.sh script, which at this point should be in your path.

Example of a successful testing session:

user@machine:/tmp$ testalentry.sh Wolfi_Landstreicher__Critical_Thinking_as_an_Anarchist_Weapon.xml 

HTML output in /tmp/HTML/Wolfi_Landstreicher__Critical_Thinking_as_an_Anarchist_Weapon.html
Epubcheck Version 1.2

No errors or warnings detected
ePUB output in /tmp/epub/Wolfi_Landstreicher__Critical_Thinking_as_an_Anarchist_Weapon.epub
* Building the pdfs for Wolfi_Landstreicher__Critical_Thinking_as_an_Anarchist_Weapon
** Compiling the a4
context --batchmode --noconsole --arguments=minsignature=30,maxsignature=84 --mode=a4 Wolfi_Landstreicher__Critical_Thinking_as_an_Anarchist_Weapon.tex

** Overfull boxes: 36 **
I needed to add 3 pages
** Compiling the a4_imposed
context --batchmode --noconsole --arguments=minsignature=30,maxsignature=84 --mode=a4imposed Wolfi_Landstreicher__Critical_Thinking_as_an_Anarchist_Weapon.tex

** Overfull boxes: 20 **
I needed to add 3 pages
maxpage: 44
pages: 44 
signature: 44

Output in Wolfi_Landstreicher__Critical_Thinking_as_an_Anarchist_Weapon.pdf
PDFs in /tmp/pdfs/

Example of a broken file that will jam the library system if it make its way to the master branch of the library git:

user@machine:/tmp$ testalentry.sh Wolfi_Landstreicher__Critical_Thinking_as_an_Anarchist_Weapon.xml 

HTML output in /tmp/HTML/Wolfi_Landstreicher__Critical_Thinking_as_an_Anarchist_Weapon.html
line 138 column 1 - Warning: missing </em> before </p>
line 141 column 1 - Warning: inserting implicit <em>
line 138 column 1 - Warning: trimming empty <em>
line 137 column 1 - Warning: trimming empty <p>

There are markup errors. So open /tmp/HTML/Wolfi_Landstreicher__Critical_Thinking_as_an_Anarchist_Weapon.html, find the line and the column reported by the checker, and find out the error. Most likely, there's an open tag that wasn't properly closed.

Fix it in the source file (the .xml) and retry. Repeat the procedure until you get 0 errors.

Advanced

If you have git access, check out GitMaintainerRoutine

EditingHowTo (last edited 2011-09-15 10:22:11 by marco)