Page MenuHomePhabricator

Parsoid outputs <figure> inside of <b>
Closed, ResolvedPublic

Description

Wikitext: '''foo[[File:Image.jpg|thumb]]bar'''

Resulting DOM: <body><b><figure><a><img /></a></figure>bar</b>

Block nodes like figure aren't supposed to occur inside of inline-ish (annotation) tags like b. When confronted with this, VisualEditor alienates the image to preserve its sanity (data model consistency), so bolded images can't be edited in VE.

Reported by mattk_ on IRC.

$ echo "'''foo[[File:Image.jpg|thumb]]bar'''" | node tests/parse.js

<!DOCTYPE html>
<html prefix="dc: http://purl.org/dc/terms/ mw: http://mediawiki.org/rdf/"><head prefix="mwr: http://en.wikipedia.org/wiki/Special:Redirect/"><meta property="mw:parsoidVersion" content="0"/><link rel="dc:isVersionOf" href="en.wikipedia.org/wiki/Main_Page"/><title></title><base href="en.wikipedia.org/wiki/Main_Page"/><link rel="stylesheet" href="en.wikipedia.org/w/load.php?modules=mediawiki.skinning.elements|mediawiki.skinning.content|mediawiki.skinning.interface|skins.vector.styles|site|mediawiki.skinning.content.parsoid&amp;only=styles&amp;debug=true&amp;skin=vector"/></head><body data-parsoid='{"dsr":[0,37,0,0]}' lang="en" class="mw-content-ltr mw-body-content" dir="ltr"><b data-parsoid='{"dsr":[0,36,3,3]}'>foo<figure class="mw-default-size" typeof="mw:Image/Thumb" data-parsoid='{"optList":[{"ck":"thumbnail","ak":"thumb"}],"dsr":[6,30,2,2]}'><a href="./File:Image.jpg" data-parsoid='{"a":{"href":"./File:Image.jpg"},"sa":{},"dsr":[8,28,null,null]}'><img resource="./File:Image.jpg" src="upload.wikimedia.org/wikipedia/en/thumb/7/78/Image.jpg/220px-Image.jpg" height="30" width="220" data-parsoid='{"a":{"resource":"./File:Image.jpg","height":"30","width":"220"},"sa":{"resource":"File:Image.jpg"}}'/></a></figure>bar</b>
</body></html>


Version: unspecified
Severity: normal

Details

Reference
bz66749

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 3:20 AM
bzimport added a project: Parsoid.
bzimport set Reference to bz66749.

Not sure this is a bug:

div = document.createElement('div')

<div>​</div>​

div.innerHTML="<b><figure></figure></b>"

"<b><figure></figure></b>"

div.outerHTML

"<div><b><figure></figure></b></div>"

div.innerHTML="<b><div></div></b>"

"<b><div></div></b>"

div.outerHTML

"<div><b><div></div></b></div>"

Note that HTML5 is perfectly happy to let the block element live inside the <b> tag.

At a minimum, we should not create output that has the figcaption inside a figure wrapped into a formatting element.

Yet that's what I seem to get with this input:

<p>'''foo[[File:Image.jpg|thumb|caption]]bar'''

(Taking the bug, since if it ends up being a real bug with our image spec, I'll fix it in Parsoid. It appears that, to the extent this is a real bug, it's a dup of bug 63642 though.)

What Gabriel points out in comment 2 is exactly the same as bug 63642. The DOM I get from <p>'''foo[[File:Image.jpg|thumb|caption]]bar''' looks like <p><b>foo</b></p><figure><b><a><img /></a><figcaption>caption</figcaption></b></figure><p><b>bar</b></p>

(In reply to C. Scott Ananian from comment #1)

Not sure this is a bug:

div = document.createElement('div')

<div>​</div>​

div.innerHTML="<b><figure></figure></b>"

"<b><figure></figure></b>"

div.outerHTML

"<div><b><figure></figure></b></div>"

div.innerHTML="<b><div></div></b>"

"<b><div></div></b>"

div.outerHTML

"<div><b><div></div></b></div>"

Note that HTML5 is perfectly happy to let the block element live inside the
<b> tag.

Yeah, HTML lets you do it, but I don't think it's very sensible, and VE doesn't like it. I also see no reason why the behavior should be radically different depending on whether Parsoid has decided that the text is in a paragraph or free text. In the paragraph case, the bold tag is interrupted (and reapplied inside the figure, but that's bug 63642; really it should be reapplied in the caption instead, or not at all) and the figure isn't bolded, but in the free text case the figure is bolded. Given that the distinction between free text and paragraph text is pretty arbitrary, I don't think this makes much sense.

Speaking of, this paragraph vs free text stuff is also pretty broken. The following wikitext:

'''foo[[File:Image.jpg|thumb|caption]]bar'''

'''foo[[File:Image.jpg|thumb|caption]]bar'''

produces a DOM that looks like <body><b>foo<figure><a><img /></a></figure>bar</b>\n\n<b>foo<figure><a><img /></a></figure>bar</b></body> which is wrong: bar and foo will be on the same line but shouldn't be.

Change 162819 had a related patch set uploaded by Subramanya Sastry:
(Bug 66749) Fix the test for DU.isGeneratedFigure

https://gerrit.wikimedia.org/r/162819

Change 162819 merged by jenkins-bot:
(Bug 66749) Fix the test for DU.isGeneratedFigure

https://gerrit.wikimedia.org/r/162819

(In reply to Gerrit Notification Bot from comment #6)

Change 162819 merged by jenkins-bot:
(Bug 66749) Fix the test for DU.isGeneratedFigure

https://gerrit.wikimedia.org/r/162819

This makes it better, but the resulting HTML still violates the Parsoid HTML spec.

Old output: <body><b>foo<figure><a><img /></a></figure>bar</b></body>

New output: <body><p><b>foo</b></p><figure><a><b><img /></b></a></figure><p><b>bar</b></p></body>

At least the figure isn't bolded any more, but there's still a <b> tag between the <a> and the <img> whereas according to the spec, the structure should be <figure><a><img>

Roan: see below .. I don't see a <b> inside the <a> tag. Can you provide the wikitext where you are seeing the problem html?

[subbu@earth lib] echo "'''foo[[File:Foobar.jpg|thumb|caption]]bar'''" | node parse --normalize

<p><b>foo</b></p>
<figure><a href="File:Foobar.jpg"><img src="//upload.wikimedia.org/wikipedia/commons/3/3a/Foobar.jpg" height="26" width="220"/></a><figcaption><b>caption</b></figcaption></figure>
<p><b>bar</b></p>

(In reply to ssastry from comment #8)

Roan: see below .. I don't see a <b> inside the <a> tag. Can you provide the
wikitext where you are seeing the problem html?

[subbu@earth lib] echo "'''foo[[File:Foobar.jpg|thumb|caption]]bar'''" |
node parse --normalize

<p><b>foo</b></p>
<figure><a href="File:Foobar.jpg"><img
src="//upload.wikimedia.org/wikipedia/commons/3/3a/Foobar.jpg" height="26"
width="220"/></a><figcaption><b>caption</b></figcaption></figure>
<p><b>bar</b></p>

Sorry, I'm blind. You're right, the output is correct now.

Arlolra subscribed.

From the commit message of https://github.com/wikimedia/parsoid/commit/d473791ea982178af7a0fe15aff5cf8e21aaa5e8

However, in T68749, there's a request/argument to avoid nesting blocks
in formatting tags, since VE can't handle that data model violation and
marks the nodes as uneditable. In the past, we've supported that
inconsistently. For example, the case in "1. List embedded in a
formatting tag" is uneditable, despite it being a popular pattern.
In "2. Treebuilder fixup of formatting elt", only one of two images was
editable.

We should consider reopening T68749 and adding a pass to solve the issue
generally. Note that the Parsoid output now matches Remex so any
changes to the DOM is explicitly introducing a difference to support
editing clients, and should be noted as such.

Let us open a fresh task since this is specifically about figure in b that was addressed. You are talking about a generic html content model violation.

Let us open a fresh task since this is specifically about figure in b that was addressed.

Ok, but the only thing that was addressed here was T68749#702519 (similarly T65642).

As quoted in T68749#4452211, the output of T68749#702575 was really case dependent.

Let us open a fresh task since this is specifically about figure in b that was addressed.

Ok, but the only thing that was addressed here was T68749#702519 (similarly T65642).

As quoted in T68749#4452211, the output of T68749#702575 was really case dependent.

Got it .. but, did you reopen to only fix the figure in block tags case or something more general? But, I think either way ... a fresh task with clear description of what needs fixing is better since some of this history is too old and stale .. especially since you will have to edit the description and title.