I have a similar problem: I have to extract text from a magazine composed in QuarkXpress (don't know which version) to be used for an ebook project.
The publisher uses a "design first" method that means, the latest uptodate content is in QuarkXpress and not in the original .doc .rtf files. Scraping the text from the output PDF is slow, painful, error-prone. I don't have QuarkXpress, so I must instruct the person at the publisher's to export the magazine in some sort of intermediate format.
In this forum post I found that there is an html/xhtml export function. Is it correct?
http://forums.quark.com/t/20543.aspx
If you just need HTML or XHTML, then it is easily created out of QuarkXPress 7 or 8:
- Convert
your print layout into a web layout (layout > duplicate).[...]
- Export as HTML or XHMTL (file > export > HTML)