Here is a representation of Soho's process:
Soho lets you define filters which will be used by the page builder before the reST->HTML conversion or after the rendering of the page through the template.
Filters can change the content of the page: beautify the text by applying typographic rules, add or modify content, etc.
A filter is a function which takes one parameter (the text itself) and return a (possibly) modified version of this text. For example, we could define a filter that fixes the spelling of words, using British English rules. Putting this piece of code in a myfilters.py file would do it:
def useBritishSpelling(text): text = text.replace('licence', 'license') ## ... (other replacements) return text pre_filters = (useBritishSpelling, ) post_filters = ()
This file is a normal Python module, so you can use other Python packages (e.g. the re package), as usual. Just make sure to define pre_filters or post_filters variables.
When you are done with this file, you can include it in the configuration file with the following statement:
filters = /path/to/myfilters.py
Soho comes with built-in filters.
>>> from soho.filters import *
As always, there is a dummy filter, which does nothing:
>>> dummy('While my guitar gently weeps') 'While my guitar gently weeps'
If you want to use this filter (or any other built-in filter), just use this in your custom filters module:
from soho.filters import dummy pre_filters = (dummy, )
The somewhat misnamed useHTMLentity filter replaces some characters with their equivalent HTML entity:
>>> useHTMLentity('Once upon a time in the West...') 'Once upon a time in the West…'
There is no way to insert non-breaking spaces in reST. Hopefully for the typography maniacs (and I am one, actually), there is a filter for French typography:
>>> ## Guillemets ouvrants et fermants >>> applyFrenchTypographyRules(u"\xab Mes souliers sont rouges \xbb, s'exclama-t-il !") u"\xab Mes souliers sont rouges \xbb, s'exclama-t-il !" >>> applyFrenchTypographyRules("C'est extraordinaire ! N'est-ce pas ?!") "C'est extraordinaire ! N'est-ce pas ?!" >>> applyFrenchTypographyRules('Oui ; et jamais deux sans trois') 'Oui ; et jamais deux sans trois' >>> applyFrenchTypographyRules('Oui : je le ferai.') 'Oui : je le ferai.'
When you write the documentation of a program (for example), it is frequent to link to other files. However, they are reStructuredText files, too. And when you generate your HTML site, it is convenient to automatically convert all your links so that they point to HTML files.
>>> text = '''\ ... This is a `link`_. This is `another link`_. ... ... .. _link: linked.html ... .. _another link: linked2.html ... ''' >>> print changeLinksFromTxtToHTML(text) 'This is a `link`_. This is `another link`_. .. _link: linked.html .. _another link: linked2.html'
Docutils and other tools generate XHTML-like tags that close themselves (a.k.a. short tag). However, this can be a problem if you want to use HTML, since this is not HTML compatible. Hopefully, you can use the replaceXHTMLShortTags filter.
>>> replaceXHTMLShortTags('<img src="foo.png" />') '<img src="foo.png">' >>> replaceXHTMLShortTags('<br/>') '<br>'
Note that you should use this function as a post-filter, since it processes HTML code.