Quantcast
Channel: html2openxml Discussions Rss Feed
Viewing all articles
Browse latest Browse all 228

New Post: OpenXML Validation patch

$
0
0
First, a bit of background.

We have taken an interest in getting our OpenXML Word documents to pass schema validation, in other words this check:
    public static void AssertThatOpenXmlDocumentIsValid(WordprocessingDocument wpDoc, string message)
    {
        Check.RequireNotNull(wpDoc);
        var validator = new OpenXmlValidator(FileFormatVersions.Office2010);
        var errors = validator.Validate(wpDoc).ToList();
        ReturnAssertErrorsForOpenXmlValidation(message, errors);
    }
OpenXmlValidator is provided in the SDK.

The current output of HtmlToOpenXML fails this validation for our not-terrily-complicated HTML input, and we'd like it to pass. It's all very nitpicky stuff which the Word Application accepts, but which fails schema validation. It revolves around the order of elements. This article covers it in a bit of detail:

http://blogs.msdn.com/b/brian_jones/archive/2009/01/12/open-xml-sdk-the-basics.aspx

--Quote---
There's even more useful functionality in those four lines. Here's the equivalent without using the first class properties, can you spot what's wrong?
RunProperties rPr = new RunProperties(); 
rPr.AppendChild(new Italic()); 
rPr.AppendChild(new Bold()); 
rPr.AppendChild(new NoProof()); 
This snippet actually creates a schema invalid document. The schema specifies the children of the rPr element as a sequence, so order matters. Bold (w:b) must come before Italics (w:i) for the file to be valid according to its schema. The code snippet using the property assignments gets this right (because the code behind those assignments knows about the order, which the second one just obeys the calls).
--Quote---

The key is to use the attributes instead of just AppendChild. For example this:
new StyleRunProperties(
    new Bold(),
    new BoldComplexScript(),
    new DocumentFormat.OpenXml.Wordprocessing.Color() { Val = "4F81BD", ThemeColor = ThemeColorValues.Accent1 },
    new FontSize { Val = "18" },
    new FontSizeComplexScript { Val = "18" }
)
becomes this:
new StyleRunProperties
{
    Bold = new Bold(),
    BoldComplexScript = new BoldComplexScript(),
    Color = new DocumentFormat.OpenXml.Wordprocessing.Color() { Val = "4F81BD", ThemeColor = ThemeColorValues.Accent1 },
    FontSize = new FontSize { Val = "18" },
    FontSizeComplexScript = new FontSizeComplexScript { Val = "18" }
}
I would like to tell you this is a 100% complete fix, but there are undoubtedly problem areas in the HtmlToOpenXml code we haven't found yet. We are hoping to get these initial changes included into the trunk to at least get a start going on this, and we pledge to bring you any more such changes as we encounter them. Alternately we would be happy to work against a test suite, but I didn't see one in the project file we have, and I don't know if you have anything like it already.

It seems I can't attach files here, so here's a link to the patch:

https://www.dropbox.com/s/1zk7r4jqjvn6yu4/HtmlToOpenXML_unified_validation_patch.diff

Viewing all articles
Browse latest Browse all 228

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>