Page MenuHomePhabricator

New paragraph before section heading becomes line break
Closed, ResolvedPublic

Description

As reported on German WP, whenever one tries to add a new paragraph before a section heading (by pressing enter either with the cursor before the first character of the heading or with the cursor after the last character of the preceding paragraph), VE apparently converts that to an HTML line break. I guess that is not supposed to happen, is it?

Related Objects

Event Timeline

matmarex subscribed.

I suspect a relation to T184755 (in some cases we need to generate <br> tags internally when there are empty paragraphs in wikitext).

CC @ssastry does this look like a Parsoid issue?

Similiar problem observed on frwiki, diff : 1 and 2.

CC @ssastry does this look like a Parsoid issue?

I don't know offhand .. can someone repro this and report the VE generated HTML for the edited area in question?

It's reproducible. I just added an empty paragraph between the lead and section heading in this edit: https://en.wikipedia.org/w/index.php?title=User:Matma_Rex/sandbox&diff=885235533&oldid=885235196&diffmode=source

There should be no <br /> in the wikitext output. Only a bunch of newline characters.

Recording:

VE HTML (ve.init.target.surface.getHtml()) when loading the page:

<p id="mwAg">Foo.</p>

<h2 id="Bar">Bar</h2>
<p id="mwBA">Baz.</p>

VE HTML after adding the empty paragraph:

<p id="mwAg">Foo.</p><p id="mwAg"></p>

<h2 id="Bar">Bar</h2>
<p id="mwBA">Baz.</p>"

I know about that task (I even linked it here). There should be <br /> generated in the HTML output for multiple newlines in wikitext. There must not be any in the wikitext output for multiple empty paragraphs in HTML, though.

I know about that task (I even linked it here). There should be <br /> generated in the HTML output for multiple newlines in wikitext. There must not be any in the wikitext output for multiple empty paragraphs in HTML, though.

See the commit message of the linked gerrit patch. <p></p> will disappear during html->wt. If VE wants Parsoid to insert newlines in wikitext and inserts multiple <p></p>, Parsoid normalizes them to the wikitext-output form by inserting <br/> tags to mimic that. See T184755#4656731 specifically.

But you do not need <br> in wikitext to emit empty paragraphs in HTML? You just need newlines. I don’t understand why the <br> tags end up in wikitext.

But you do not need <br> in wikitext to emit empty paragraphs in HTML? You just need newlines. I don’t understand why the <br> tags end up in wikitext.

The <br/> is an edge case (literally) because of a single newline needed before headings (and I suppose Parsoid's html->wt figure that is the only way to preserve that newline) .... but, we could probably figure out what to do there if the <br /> is undesirable. But, in general, if there are multiple empty newlines inserted in VE, those newlines will make their way to wikitext (mentioning that because T217205 got closed as a dupe of this).

[subbu@earth:~/work/wmf/parsoid] echo "<p>a</p><p></p><p></p><p>b</p>" | parse.js --scrubWikitext --html2wt
a



b
[subbu@earth:~/work/wmf/parsoid] echo "<p>a</p><p></p><p>b</p>" | parse.js --scrubWikitext --html2wt
a


b
[subbu@earth:~/work/wmf/parsoid] echo "<p>a</p><p></p><p></p><h2>x</h2>" | parse.js --scrubWikitext --html2wt
a


== x ==
[subbu@earth:~/work/wmf/parsoid] echo "<p>a</p><p></p><h2>x</h2>" | parse.js --scrubWikitext --html2wt
a

<br />

== x ==

example edit (with link corrected)
And today a new message in village pump: https://fi.wikipedia.org/wiki/Wikipedia:Kahvihuone_(tekniikka)#%22Br-merkinn%C3%A4t%22

I think this should be make a higher priority to work on.
Users are blaming visual editor when it leaves mess behind.

ssastry triaged this task as Medium priority.EditedAug 14 2019, 11:42 AM

We are currently in a feature freeze for Parsoid because of the ongoing porting of Parsoid to PHP. As for bug fixes, we are only addressing critical bugs at this time. We should soon (probably October) out of this freeze and can start working on bug fixes again after.

JTannerWMF subscribed.

Moving this to External due to Parsing working on it whenever they lift the feature freeze

@matmarex Tha latter seems to be really similar to this task. TemplateData make it add a newline before template, where two newlines already are from user, odd number makes this issue. The first one might be different as there should be only one newline (not three) added, but VE somehow adds <br /> anyway (perhaps it adds three newlines where it should not?).

Any updates here? Similar problem noticed on sr.wiki (village pump)

This needs more investigation, but I am going to bet this is related to the fact that we (Parsoid or the core parser) cannot distinguish between the 2 scenarios mentioned in the commit message of https://gerrit.wikimedia.org/r/c/mediawiki/services/parsoid/+/416845 ... Looks like this is not as esoteric of an use case and looks like we need to figure out how to handle this problem when it comes to generating the right wikitext for edited HTML.

On cswiki we time to time fix excess <br>s in categories and at the top/bottom of articles by bot

ssastry claimed this task.

The fix for this will be deployed this week.