Skip links: CONTENT Main Menu

Citizens With Disabilities - Ontario


Structured Accessible HTML From PDF

There are hundreds of millions of PDF files on the web. The two main reasons for this popularity are:

Unfortunately PDF files on the web can be problematic for people with disabilities, especially users of screen-readers because:

Adobe have defined an extension to the PDF format to provide more information to screen-readers such as alternative text for images and heading levels to aid navigation around the document. Most existing PDF files have not been created as accessible PDF, and the task of converting existing documents is complex and not always achievable. Creating new documents as accessible PDF is perfectly possible and straightforward but requires the use of specific tools and the understanding and cooperation of the document originators. So it is inevitable that many new documents will be produced that are not accessible.

PDF documents are an ideal format for downloading off the web and printing out, but because of all the above reasons there is a need to provide these documents in an alternative format. The obvious alternative is for the document to be available in HTML that is designed for use by users who are blind or have a vision-impairment. The user is not interested in the document looking identical to the original but needs a document that can be read efficiently using a screen reader; to do this the document must:

There are a number of pdf-to-html converters available but I believe that the recently announced RiverDocs Converter is the first aimed specifically at the creation of structured, accessible html documents that are optimized for screen-reader usage.

The converter will take any PDF document and analyse it to recognise multi-column pages, headings, tables, images and other formatting and convert it all into XHTML. Correctly recognising text that wraps around a picture, or the cells in a table requires sophisticated artificial intelligence algorithms.

Having completed the conversion it checks the output for accessibility issues that could not be fixed automatically. The most obvious issue is the lack of descriptions of images using the alt tag.

The user interface to the product allows the user to see the list of issues and at the same time see the relevant sections of the original PDF file, the generated XHTML and a preview of the document on a browser. Clicking on an issue will position the preview to the context of the issue and then the user can fix the problem.

The final output will be a well-structured and annotated document that will give a blind user an excellent experience whilst reading the document.

The UK Disability Equality Duty, that I discussed in a recent blog, has put significant pressure on public authorities and their suppliers to ensure all the content of their web sites is accessible. Providing structured, accessible XHTML versions of all the PDF files is considered to be the only way to comply with the Duty.

The volume and size of the files that need to be converted has meant that the authorities have outsourced this task to specialist web agencies. RiverDocs Converter automates most of the conversion process and means that an agency using it will provide a very competitive bid.

RiverDocs Converter should appeal to any organisation that has a large number of existing documents that need to be made accessible, or that publishes new documents that are not created to be accessible and will need conversion.

Thursday, March 29, 2007
By Peter Abrahams

Peter Abrahams, Practice Leader, Accessibility and Usability, Bloor Research
Published: 29th March 2007

Taken from ATT00043.txt ATT00043.txt.

More accessibility articles.