As reported on German WP, whenever one tries to add a new paragraph before a section heading (by pressing enter either with the cursor before the first character of the heading or with the cursor after the last character of the preceding paragraph), VE apparently converts that to an HTML line break. I guess that is not supposed to happen, is it?
Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Open | None | T250184 Non-semantic (whitespace, formatting) issues w/ Parsoid output on Talk pages | |||
Resolved | ssastry | T251920 Parsoid should not introduce a stray <br/> tag in wikitext just to preserve all visual newlines (<p></p>) introduced by VE | |||
Resolved | ssastry | T215002 New paragraph before section heading becomes line break |
Event Timeline
I suspect a relation to T184755 (in some cases we need to generate <br> tags internally when there are empty paragraphs in wikitext).
I don't know offhand .. can someone repro this and report the VE generated HTML for the edited area in question?
It's reproducible. I just added an empty paragraph between the lead and section heading in this edit: https://en.wikipedia.org/w/index.php?title=User:Matma_Rex/sandbox&diff=885235533&oldid=885235196&diffmode=source
There should be no <br /> in the wikitext output. Only a bunch of newline characters.
Recording:
VE HTML (ve.init.target.surface.getHtml()) when loading the page:
<p id="mwAg">Foo.</p> <h2 id="Bar">Bar</h2> <p id="mwBA">Baz.</p>
VE HTML after adding the empty paragraph:
<p id="mwAg">Foo.</p><p id="mwAg"></p> <h2 id="Bar">Bar</h2> <p id="mwBA">Baz.</p>"
I know about that task (I even linked it here). There should be <br /> generated in the HTML output for multiple newlines in wikitext. There must not be any in the wikitext output for multiple empty paragraphs in HTML, though.
See the commit message of the linked gerrit patch. <p></p> will disappear during html->wt. If VE wants Parsoid to insert newlines in wikitext and inserts multiple <p></p>, Parsoid normalizes them to the wikitext-output form by inserting <br/> tags to mimic that. See T184755#4656731 specifically.
But you do not need <br> in wikitext to emit empty paragraphs in HTML? You just need newlines. I don’t understand why the <br> tags end up in wikitext.
The <br/> is an edge case (literally) because of a single newline needed before headings (and I suppose Parsoid's html->wt figure that is the only way to preserve that newline) .... but, we could probably figure out what to do there if the <br /> is undesirable. But, in general, if there are multiple empty newlines inserted in VE, those newlines will make their way to wikitext (mentioning that because T217205 got closed as a dupe of this).
[subbu@earth:~/work/wmf/parsoid] echo "<p>a</p><p></p><p></p><p>b</p>" | parse.js --scrubWikitext --html2wt a b [subbu@earth:~/work/wmf/parsoid] echo "<p>a</p><p></p><p>b</p>" | parse.js --scrubWikitext --html2wt a b [subbu@earth:~/work/wmf/parsoid] echo "<p>a</p><p></p><p></p><h2>x</h2>" | parse.js --scrubWikitext --html2wt a == x == [subbu@earth:~/work/wmf/parsoid] echo "<p>a</p><p></p><h2>x</h2>" | parse.js --scrubWikitext --html2wt a <br /> == x ==
example edit (with link corrected)
And today a new message in village pump: https://fi.wikipedia.org/wiki/Wikipedia:Kahvihuone_(tekniikka)#%22Br-merkinn%C3%A4t%22
I think this should be make a higher priority to work on.
Users are blaming visual editor when it leaves mess behind.
We are currently in a feature freeze for Parsoid because of the ongoing porting of Parsoid to PHP. As for bug fixes, we are only addressing critical bugs at this time. We should soon (probably October) out of this freeze and can start working on bug fixes again after.
Moving this to External due to Parsing working on it whenever they lift the feature freeze
@matmarex Tha latter seems to be really similar to this task. TemplateData make it add a newline before template, where two newlines already are from user, odd number makes this issue. The first one might be different as there should be only one newline (not three) added, but VE somehow adds <br /> anyway (perhaps it adds three newlines where it should not?).
This needs more investigation, but I am going to bet this is related to the fact that we (Parsoid or the core parser) cannot distinguish between the 2 scenarios mentioned in the commit message of https://gerrit.wikimedia.org/r/c/mediawiki/services/parsoid/+/416845 ... Looks like this is not as esoteric of an use case and looks like we need to figure out how to handle this problem when it comes to generating the right wikitext for edited HTML.
On cswiki we time to time fix excess <br>s in categories and at the top/bottom of articles by bot