Mediawiki2HTML machine
I’d like some help here.
I’m trying to build a Mediawiki to HTML converter or function in PHP. I’ve reached some goals already, but I’d like to ask the community to perfectize this one, because mediawiki2html is of public interest and there are some other projects out there (e.g. Mediawiki2pdf export) that could benefit from this one. My main problem are the tables.
Links and other projects
- http://meta.wikimedia.org/wiki/Help:MediaWiki-Werkzeuge (is there an english version somewhere?)
- http://meta.wikimedia.org/wiki/HTML2FPDF_and_Mediawiki HTML2FPDF and Mediawiki: converting wiki markup to pdf with PHP, but using the mediawiki engine.
- http://meta.wikimedia.org/wiki/WINOR Offline Reader project, but dead.
- http://cvs.sourceforge.net/viewcvs.py/wikipedia/phpwiki/newcodebase/OutputPage.php Outputpage, but uses MySQL and mediawiki engine
- http://meta.wikimedia.org/wiki/Alternative_parsers
Didn’t look through this one yet.
Whats special about this project
- I don’t want to run the Mediawiki engine. This is intended to be a true alternative parser.
- It’s in PHP, without mysql, perl or some other strange stuff...
This project should convert Mediawiki markup to html without the use of the Mediawiki engine.
The Code
- see code for online insight.
- You can also download a zip file of it here.
Open TODO items
I need some help on the table converting function, especially on chemical elements wiki pages. You can find the table markup definition at http://meta.wikimedia.org/wiki/Help:Table
You can have some test table data here:
| Table test data | current rendering result | Test description |
|---|---|---|
| table-example-1.txt | rendering 1 | Einsteinium: 1 simplified table (css stripped) |
| table-example-2.txt | rendering 2 | Einsteinium: 3 tables cascaded |
| table-example-3.txt | rendering 3 | Einsteinium: 1 table |
| table-example-4.txt | rendering 4 | Isotop table (more columns) |
Any help appreciated. This project is GPL-ed now.
Your comments / tipps
- Maybe I should strip of all style stuff, but this breaks the design... Anyway, we can agree that not *all* wikipedia pages are in excellent condition considering wiki markup / HTML. — Johannes Buchner 2006/04/04 10:22
- Visit mediawiki2pdf - a new web service which uses mediawiki HTML to generate nice looking PDF document — robert@blogpaper.com 2008/04/08 13:54