Wiktionary:Beer parlour: difference between revisions

From Wiktionary, the free dictionary
Latest comment: 17 years ago by Muke in topic Uncountable
Jump to navigation Jump to search
Content deleted Content added
Muke (talk | contribs)
Muke (talk | contribs)
m →‎Ancient Greek: oops, colon in link
Line 2,278: Line 2,278:
:# On the Latin Wiktionary I've been giving both the Romanization and [[w:Beta code|Beta code]], linking to the Beta Code without the accents and the Romanization as is (e.g. [[:la:ἕκτος|ἕκτος]] = ''[[hectos]]'' and <tt>[[ektos|E(/KTOS]]</tt>).
:# On the Latin Wiktionary I've been giving both the Romanization and [[w:Beta code|Beta code]], linking to the Beta Code without the accents and the Romanization as is (e.g. [[:la:ἕκτος|ἕκτος]] = ''[[hectos]]'' and <tt>[[ektos|E(/KTOS]]</tt>).
:# I should expect dialect forms to be treated the same way as dialects in any other language (I'm not sure what the current policy is; but cf. [[armor]] and [[armour]], [[octante]] and [[huitante]]).
:# I should expect dialect forms to be treated the same way as dialects in any other language (I'm not sure what the current policy is; but cf. [[armor]] and [[armour]], [[octante]] and [[huitante]]).
:# The use of a template can be explained on the page itself, using &lt;noinclude&gt; tags. Example at [[la:Formula:grc-declinatio-adj-oxy]]. The ''meaning'' of a template (i.e. what all the cases and such are for) might be better on a separate page, e.g. [[Appendix:Greek first declension]]. —[[User:Muke|Muke Tever]] 02:46, 20 October 2006 (UTC)
:# The use of a template can be explained on the page itself, using &lt;noinclude&gt; tags. Example at [[:la:Formula:grc-declinatio-adj-oxy]]. The ''meaning'' of a template (i.e. what all the cases and such are for) might be better on a separate page, e.g. [[Appendix:Greek first declension]]. —[[User:Muke|Muke Tever]] 02:46, 20 October 2006 (UTC)


== Uncountable ==
== Uncountable ==

Revision as of 04:02, 20 October 2006

Wiktionary:Beer parlour/header

Policies in development

Full list

  1. Wiktionary:Policies and guidelines
  2. Wiktionary:Assume good faith
  3. Wiktionary:Civility
  4. Wiktionary:No personal attacks
  5. Wiktionary:About Japanese-English bilingual
  6. Wiktionary:Neutral point of view
  7. Wiktionary:Obsolete and archaic terms
  8. Wiktionary:Entry layout explained/POS headers
  9. Wiktionary:Redirections
  10. Wiktionary:Spelling variants in entry names
  11. Wiktionary:Translations & /Wikification
  12. Wiktionary:Transliteration
  13. Wiktionary:Usage notes
  14. Wiktionary:Bots

Summarized sections

Except for those four subpages below, I've moved all listed pages here to Wiktionary talk:Beer parlour, temporarily. I'm going to see how they can be recycled or stored more efficiently, so in the meanwhile they don't take up space here. — Vildricianus 15:40, 16 June 2006 (UTC)Reply



August

Han characters

In doing a bit of work on templating CJKV languages to get entries much closer to our standard style, I ran into a large mess ... you may know of ... the pages for Han characters are completely different, due mostly to a user "Nanshu" who really wanted the Wiktionary to be an XML database ...

See . Which is just barely a dictionary definition of the Han character for "word". I'd like to see these become real pages, while retaining the useful information (but see note below on "Morobashi").

I think the entry for (e.g.) should have a Translingual header, under that Han Character (like Symbol is used), with some of the information (radical/stroke number, etc), References, with some of the dictionary information, code points in Unicode, etc. This should (IMHO) be in two templates so we can mess with the formatting and categorization.

Then language sections for the 17 languages that use Han characters and either use the instant one as a word or as a combining form with definitions. Each would include language specific templates with romanizations and readings for that language.

Each entry will be categorized (by the template) in Category:Han characters in proper radical/stroke sort key order. Then also (by the templates) in language categories as established for each language.

Note: the information loaded by "NanshuBot" is apparently useful. But one wonders when one of the dictionaries referred to is named "Morobashi". The dictionary in question is the Dai Kanwa Jiten, compiled by Morohashi ...

Comments? Robert Ullmann 15:46, 9 August 2006 (UTC)Reply

I will wait to see an example. It is easier for me to react if I can see something concrete. In general however, it sounds like a worthy cause, if somewhat daunting. I am a little concerned about the translingual header because I'm not sure we can count on individual characters always meaning exactly the same thing across languages. For example, one meaning of (Mandarin, Pinyin: kòng) is "free time" or "leisure time" but it is not used this way in Japanese or Min Nan so far as I know.

A-cai 13:04, 10 August 2006 (UTC)Reply

I'm going to set up an example presently. The character should be defined under the language headers; the only thing that might be part of the Translingual section is the "common meaning" listed now, which is not a definition. If we want to make it go away later, we can just remove it from the template ... Robert Ullmann 16:04, 10 August 2006 (UTC)Reply
I've set up the promised example. It is -- of course -- . (What else?!) The character info is in the templates, but I haven't sorted out the language sections or tried to add proper definitions. Japanese is already fairly good. Cantonese and Mandarin need to be separated, and Min Nan added (presumably). I don't know enough to separate out the "compounds"; and we should decide what to call them; it is usually just "Related terms" I think. Anyway, plenty to look at. I carefully made sure the templates are close enough in structure to the NanshuBot stuff so that they could be modded in by bot (this point is critical; we aren't doing this by hand!) Than we can go from there to wherever we want. The language definitions of course have to be edited, no help for that, they all need definitions. Robert Ullmann 14:59, 11 August 2006 (UTC)Reply
I will work on the Chinese sections in the next day or so. BTW, I have not liked the compounds section in individual entries for a while. It seems like excessive work to put compounds in there. Don't get me wrong, we need that information, but wouldn't Special:Allpages/字 suffice? If we are serious about putting compounds into individual character entries, prepare for me to dump 453 Mandarin compounds into (based on 國語辭典 on-line Chinese dictionary)!

A-cai 15:43, 11 August 2006 (UTC)Reply


Robert, I am finished with my initial stab at . Please take a look, and let me know what you think. Several format points that I'm unsure of. Where exactly to put the archaic meanings? I couldn't decide so I put them under both Translingual and Mandarin for the time being. When I tried to do the Bronze script and Seal script images as thumbnails, they were too big, so I shrunk both down to 100px each. Unfortunately, we loose the caption because of this. Is there a way to make the images both small and have the caption?

A-cai 05:19, 12 August 2006 (UTC)Reply

I think the archaic meaning should just be under Mandarin? As you noted, even the "common meaning" bit is confusing; anything that is a definition should be under a language header. The Etymology is in just the right place. I took out the IPAfont template; IMHO in Wiktionary (unlike the 'pedia) IPA occurs all over, so it isn't necessary to keep mentioning it. Besides: if you've loaded East Asian character support, I think by that time you have IPA?
Notice what happens to the images when you click on "hide" in the contents box. Wiki just doesn't handle images very well when there are more images than text. Are there images like this for lots of characters? Perhaps we go add bronze=, seal=, and something for stroke order (I used so=, but I could change that) to the Han char template (as optional). I still need to learn a bit more about what we can do with images. Looks good. Robert Ullmann 11:07, 12 August 2006 (UTC)Reply
Look at it now. In general, you only want to use something small like the zh-forms template(s) (or, e.g. interwiktionary, etc.) at the very top of the page, else it looks terrible with the contents box collapsed (hide). I put the image references right after the translingual header. Good place I think. Left stroke order at default width, forced the bronze and seal scripts down to 70px. Robert Ullmann 11:23, 12 August 2006 (UTC) Inlined stroke order in the template. Robert Ullmann 12:01, 12 August 2006 (UTC)Reply
The image files are part of a Wikimedia project to create a complete set of SVG images depicting ancient Chinese characters. Several dozen have already been scanned in. It would be terrific if we could eventually include images in each individual Han character entry that would be representative of at least the main styles of calligraphy. These include:
  1. seal script: Template:zh-ts
  2. clerical script: Template:zh-ts
  3. regular script: Template:zh-ts
  4. running script: Template:zh-ts
  5. grass script: Template:zh-ts

Of course, all this will take many years, but you have to start somewhere. Someday, when we have much faster computers and networks, we may be able to turn Wiktionary into a calligraphy dictionary by adding images of all major calligraphy variations of a given glyph.

Question, given that the root meaning of does not closely match the modern day common meaning, do you think that the etymology section is sufficiently detailed? Should we be more explicit in tracing the evolution of its meaning over the centuries or are the provided archaic meanings sufficient for this purpose?

More etymology would be very good! Robert Ullmann 11:36, 13 August 2006 (UTC)Reply

Other than that, I think the only other major thing we're missing are the sound files for pronunciation. Again it will take quite some time to get them all in, but well worth it. A-cai 14:56, 12 August 2006 (UTC)Reply

Comments from the peanut gallery:
  1. I plan to 'bot replace the Morobashi/Morohashi mess at some point. Is this becoming a priority?
  2. Yes, please add audio. I mean, PRETTY PLEASE add audio. The current technique is spelled out at Help:Audio pronunciations.
--Connel MacKenzie 17:22, 12 August 2006 (UTC)Reply
I'm working on something like a bot to move all the NanshuBot stuff into templates as in this example. The template (Han ref) at the moment uses Dai Kanwa Jiten instead of "Morobashi". I'm saying "something like a bot" because I don't think it should run on its own; there are too many little variations to tweak. I'm thinking I can run some Python code on each entry, then check it by hand. But that's just the present line of thinking. Robert Ullmann 11:36, 13 August 2006 (UTC)Reply
Um, the problem being that there are 17,970 of these entries (down from 17,971 ;-). (Oh, Connel: please don't fix Morobashi! This is a very useful flag for these entries. Although there are other ways to figure out what to feed a hungry bot ;-) So I'm going to have to have the "bot" code know enough to identify the troublesome ones ... Robert Ullmann 11:55, 13 August 2006 (UTC)Reply
I don't want to seem ignorant, but what exactly does Morobashi refer to? Is that a dictionary or something? Also, what are the numbers for? (ex. Hanyu Da Zidian: 21010.020) Are those a page number or something? I have an abridged version of this dictionary, but those numbers don't seem to apply to my copy, or if they do, I can't figure out how.

A-cai 12:10, 13 August 2006 (UTC)Reply

"Morobashi" is a bad Kanji reading of the name of the compiler of the Dai Kanwa Jiten: Morohashi Tetsuji 諸橋轍次; it was introduced by user "Nanshu" the author of "NanshuBot" that loaded all of these entries into the wiktionary; it occurs nowhere else on the web according to google. The number is the position in the unabridged dictionary, e.g. Hanyu Da Zidian: 21010.020 means volume 2, page 1010, line 02, 0 means it is the character on that line, 1 would mean a character that would appear between that line and the next, but isn't in the dictionary. The other dictionaries have similar numbering schemes. (These were not invented by "Nanshu", they are the standard reference numbering for these dictionaries.) Robert Ullmann 15:32, 13 August 2006 (UTC)Reply

I have a number of observations.

  1. While Nanshu's work may be subject to valid criticism, it would not be proper to disparage his efforts. At the time he was the only one working on this kind of material, and the most common complaint was why there should be so much Chinese on the English Wictionary. The term "translingual" was not yet invented in a Wiktionary context.
  2. I'm glad to see that you have not overworked the templates to serve multiple purposes as has happened in English.
  3. The reference to the radical should show the standard radical number. Many dictionaries are organized that way and that will make it easier for the person looking up material.
  4. Although we know that is not the radical for we should find some way to accomodate those people who find this intuitive.
  5. "Dumping" the 453 compounds of into that article (or perhaps into a sub-page) is still a worthwhile exercise. This will show a series of red links for things that still need work.
  6. I agree with removing the IPA font template for the reasons given. I know that Steve like these things, but I don't find that they accomplish much.
  7. Etymology and core meanings in the translingual section are still important, as long as people take them for what they are. Elsewhere, everytime we tag something as "archaic" we are expressing a point of view that should be substantiated. In the short term this is highly impractical, but ideally we should eventually have quotations with dates for each meaning.
  8. The Mandarin entries should have the pinyin wikified. This can be the basis for a lot of cross referencing.
  9. The Mandarin entries should show the Wade-Giles romanization, and probably some other old ones. It is now obsolete but a lot of books from the past used it.
  10. Is it necessary to have separate categories for simplified and traditional characters when there is no difference.
  11. The Mandarin section currently divides the meanings into "Noun" and "Verb"; should we consider whether this is appropriate?

Eclecticology 22:31, 29 August 2006 (UTC)Reply

Special:Allpages/字 ?

Would it be better if we use Special:Prefixindex/字 instead of Special:Allpages/字?

(deprecated template usage) Beer parlour

-- Hiòng-êng 02:09, 29 August 2006 (UTC)Reply

Apparently yes; there's no point to listing a lot of irrelevant material. Eclecticology 21:38, 29 August 2006 (UTC)Reply
Eclecticology: In general, I agree with your observations. With respect to your question about characters that are the same in both traditional and simplified. Yes, these should be placed in both categories. There are several good reasons to do so. The most basic reason is that a character which is the same in simplified and traditional belongs to both the traditional character set and the simplified character set. This is particularly important for people who only know one or another script. If I only know simplified, I should be able to go to the simplified category and find all my words there. This way, I would not have to wonder if any traditional characters have been slipped in, which I would not want to see. I would also not have to worry about a word not being in the simplified category because it happens to be identical in traditional script.
Hiòng-êng, I like the Special:Prefixindex, I would have suggested it myself had I known about it.

A-cai 08:22, 30 August 2006 (UTC)Reply

Ec, one more factoid for you to help frame the issue. One of the most common characters, 人, has the following in Guoyu Cidian Chinese dictionary:
  • 2,662 - number of words and phrases which contain 人
  • 465 - number of words and phrases that start with 人

A-cai 08:30, 30 August 2006 (UTC)Reply

I edited Template:zh-lookup to use Special:Prefixindex instead of Special:Allpages. Hope no one objects.
-- Hiòng-êng 01:11, 4 September 2006 (UTC)Reply


Category:English irregular plurals

See Category talk:English irregular plurals#Category name. bd2412 T 02:23, 14 August 2006 (UTC)Reply

Since no one has popped up on the page linked above, I'll copy my proposal here: Let's move Category:English irregular plurals (which currently contains the singular of words that have an irregular plural) to Category:English nouns with irregular plurals and make this Category:English irregular plurals a category for the irregular plurals themselves, with subcats for Category:English irregular plurals ending in "-i", Category:English irregular plurals ending in "-ae", Category:English irregular plurals ending in "-en", Category:English irregular plurals ending in "-ves". bd2412 T 18:21, 16 August 2006 (UTC)Reply

It seems to me a good startpoint. My question as a newbie is: who decides these moves and how are they effected? I have these questions since I observed that the category Grammar had subcategories for a number of things, including "Part of speech" which in turn hosts many things out of which "Conjuntions" but

  • in "English conjunctions" I couldn't find the expected branch for "English subordinating conjunctions" (hosting for example "in so far as"), and
  • in "French conjunctions" (my basic language reference), I couldn't find the due differentiation tree between "French coordinating conjunctions" and "French subordinating conjunctions" (all were put in one place, on same level).

I would have been more than happy to contribute the wiktionary by adding some such differenciation but could not realize how to make it. Please, help. PhL (philippe.lebourg at st.com)

Support. Makes much more sense. Jeffqyzt 00:19, 19 September 2006 (UTC)Reply
Thanks, but this topic has not really gotten enough attention, so I'm going to pull an old lawyer's trick: Does anyone object to the fact that I will make the above proposed change if no one says otherwise within the next 24 hours? bd2412 T 02:53, 20 September 2006 (UTC)Reply
NO!! ... um, yes, of course the category should contain the irregular plurals themselves .... ;-) cherubim and cherubims are good candidates. Robert Ullmann 05:59, 20 September 2006 (UTC)Reply
Well, the singulars are all recategorized - now for the plurals... bd2412 T 02:12, 21 September 2006 (UTC)Reply

Category:English nouns

First section of discussion

Considerable confusion has today resurfaced regarding this category. The historic objections (as far as I remember) to it were:

  1. Large categories cannot be navigated
  2. Additional categories in the wikitext is distracting

Both of those complaints have since been addressed.

Category navigation has better linking available. And direct navigation is not the only use of a category. Categories are used for building indexes, random entry tools, as well as building things for sister projects.

Category linking text no longer appears in the wikitext, by merit of being hidden properly in the inflection templates. While this does raise the issue of "hidden magic is happening" type of confusion, it is offset by the use of ever-simpler templates. Given enough leeway to proceed, the template simplification/automation can only make things easier for contributors, especially lost newcomers.

If there is something I've forgotten from the previous few years' conversations, or something that wasn't complained about previously, please list it here. --Connel MacKenzie 17:13, 22 August 2006 (UTC)Reply

Connel has responded to this issue in several places, but I will limit my responses for now to this place alone. The question of whether Category:English nouns should be used has indeed been a matter of debate for some time, and is likely to remain debatable for some time yet. It is no secret that I find it utterly useless, and have held that view for a long time; that being said I have never engaged in its wholesale deletion. I have in most cases, out of respect for the variety of views in the community,only engaged in removing it when I replaced it with a more informative option. Burying this categorization in a complicated template that has nothing to do with categories can only be viewed as a way of making sure that that category remains the same no matter what anyone else thinks. As Scs said above: "In particular, there seem to be several efforts to use them to impose additional bits of structure, and enable additional bits of automated processing, beyond what MediaWiki natively supports."
The events that led to this imposition went quite quickly. The proposal to put the category in the inflection templates came in the middle of the "unnecessary adjective senses?" thread above; it did not even appear in the title of the thread. The thread was begun on August 5, the proposal was made on August 6, and by August 9 a bot which made massive changes was fully operational and soon managed to change many thousands of articles. There was also some discussion on Grease Pit but that is no place policy decisions because non-technical people do not spend time there. All approval processes for the bot were ignored, and an inexperienced bureaucrat authorized the bot without so much as an attempt to discuss it with his collegagues. (Dvortygirl and I were both at Wikimania.) There was no emergency. Connel has been through the bot approval process before, and he will vouch that it can be long and tedious, but I see no complaints from him about the process that was used. I have made the bot inoperative, but I'm afraid that it has already done its damage. I will only reinstate it temporarily if someone can use it to undo the damage.
I can and have shown some tolerance for the inflection templates, but it needs to be emphasized that these are entirely optional, and any wholesale attempt to impose them on all articles, or to remove them from all articles should be viewed with extreme concern. If you want to change one in either direction, go ahead, but that should probably only be done at the same time that you are making other more substantial edits to the article. Similar comments can be made for the {{m}} templates to represent m. for masculine gender, and its related templates. It is hard to imagine a more pointless use of a template.
We need to remember that Wiktionary was not devised to be a playground for techies, even though the rest of us appreciate their service in the vast majority of circumstances. The core interest of Wiktionarians is words, not technical tricks. They like to see and edit real and understandable material, not puzzle over templates whose meaning is far from clear. In exchange for this they can accept the burden of having to type a little more than would be needed with a template. There is some value to standards and uniformity, but there is also value to a wiki markup whose total basics can be explained in very few lines. That simplicity has no doubt been one of the important factor in the success of wikis in general. Eclecticology 08:25, 23 August 2006 (UTC)Reply
I for one agree with Ec in finding English Nouns a pointless category, nor do I think any visitors make use of it. However, it doesn't offend me especially and I don't object to it if there are really valid technical uses for it (but are there?). As for inflection templates, I like them more than Ec seems to, but again I agree that casual users should not feel they are compulsory. Widsith 12:12, 23 August 2006 (UTC)Reply
Whether there are valid technical uses for it has never been established. For me the biggest problem is that the two have been merged effectively preventing the category from being removed or changed on any individual page. Eclecticology 10:32, 25 August 2006 (UTC)Reply
I'm sorry but I must make attempt at humor (always difficult via text messages :) Wiktionary is repleat with pointless information and pointless categories! How many people really care about the etymology of the word nunchaku? I have no idea, but let's not get overly pious about useful vs non-useful. Having said that, I agree that it is valid to debate about the inclusion of absolutely ridiculous information. However, I don't think the nouns category quite reaches that level ;)
A-cai 13:37, 23 August 2006 (UTC)Reply

Scs's first response

I agree with just about everything Ec said above. Here's the key place where I think you're wrong, Ec: in imagining that these templates cause any significant problems for the non-techy users who don't care about them.

Yes, the templates are techy. Yes, the automatic categorization tags that might be lurking within them are techier still. Yes, Category:English nouns is useless if you're not a robot. But -- so what? No one's forcing people to use the templates; no one's forcing people to step through each page of Category:English nouns until their eyes glaze over.

I'm a pretty pathetic tech-head myself, and I don't even bother to use {en-noun} (or whatever it's called) when I create a new entry. (I simply don't care.) But I know that Rod or Hippietrail will be along to add it shortly.

I suppose it could be argued that those templates are a hindrance to editors working on existing pages, but in this case, I suspect there's a pretty good correlation between editors who don't care much about (and would be happy to ignore) inflection templates, and editors who don't care much about (and would be happy to ignore) inflection lines at all. (Ergo, they can happily ignore them either way).

It would be good if we could somehow get some input from actual, ordinary, in-the-field editors on this. Do they really care one way or the other? I don't know; all I know is that ivory-tower speculation is unlikely to yield a valid answer. (I'm speculating that most editors don't care; you may well speculate that they do care; I won't argue with you, but I wish we knew for sure.) —scs 14:30, 23 August 2006 (UTC)Reply

I wouldn't speculate either way. I chose not to make this an argument about templates in general, or about whether inflection templates should be used at all. I don't and won't use the templates, but when I choose not to use one I expect that to be respected without someone coming behind me to replace what I have done. Eclecticology 10:32, 25 August 2006 (UTC)Reply
"If you don't want your writing to be edited mercilessly and redistributed at will, then don't submit it here." — Vildricianus 10:18, 3 September 2006 (UTC)Reply
(reply to EC, if it isn't clear) I don't understand why you hate templates so much. As Scs said above: "In particular, there seem to be several efforts to use them to impose additional bits of structure, and enable additional bits of automated processing, beyond what MediaWiki natively supports." (as you quoted). This is absolutely essential for Wiktionary. We are building a very highly structured data base on software that simply does not, without "additional bits of structure", provide what is needed.
Some templates are indeed essential, but when equivalent material can be as easily included with basic wiki markup that template is not "needed". With nouns, whether plain markup or the template is used the result is exactly the same. Eclecticology 10:32, 25 August 2006 (UTC)Reply
Templates allow new users (and old) to get the structure and style exactly consistant, without twiddling every ' and constantly hand-checking that each entry is in exactly the desired catagories. It also, critically, allows us to make changes in structure and style without massive bot runs that can be hard to revert (easy to revert the template), and allow per-user presentation options. Of course we don't demand people use them, but we certainly must encourage it.
As Cunctator said on the WikiEN mailing list, "some degree of consistency is good but too much is the hobgoblin of little minds." Getting structure and style so exactly right is of minor importance. And what is "exactly the desired categories?" If one chooses to subdivide a category amending the template will be of no help whatsoever. If there is real consensus about the bot run who would want to revert it?
This is an aside as it doesn't really address this thread directly. I'm sure you find at least some bots useful, so you'll understand that a lot of very minor details have been standardized that could be confused with hobgoblin though these standardizations were only for the sake of the bot's mechanical output. If anyone runs across these issues, it is a second standard for bots that is not intended to be imposed on contributors. Spacing for instance is negligible and random coming from contributors, but unfortunately consistent and exact from any bot. A grayer area is the standardization of headings, which as you know is vital for languages, key for parts of speech, and standard though more open as the ranking decreases. DAVilla 23:22, 29 August 2006 (UTC)Reply
It would be unrealistic for me to say that no bots should be allowed. When misused bots have a great potential for damage, or if not damaging at least can be a way for someone to impose a particular vision or idea where the community is ambivalent about the idea. Once imposed, such a vision can be very difficult to undo without another bot. In the present circumstances it is very easy to remove the category from the template, but that does not put it back as a simple category in the articles that had it before. We can't easily carry on as if the bot had never been activated. When the current tempest is cleared up I hope to go into more detail about bot approvals, but the process should make it clear that a bot is being requested, and that request should not be mixed in with a decision about the underlying task. If the community does not accept that a process is good if done manually, there should be not question of doing it with a bot. At the same time it should make it easier and clearer to deal with completely non-controversial tasks like the spacing that you mention. Eclecticology 00:42, 30 August 2006 (UTC)Reply
Keep this in mind: the MediaWiki software is not sufficient for this task in the long term; we will have to go to some kind of relational DB-based system (whether WiktionaryZ or something else). When that happens, templated entries will convert automatically. Almost every entry that doesn't use inflection templates, etc, will have to be converted or checked by hand. By using templates now we are making that work easier. It may be it is the only thing that will make it possible. Robert Ullmann 14:38, 23 August 2006 (UTC)Reply
The understanding is that WiktionaryZ will not replace any existing Wiktionary unless the people there want it. WiktionaryZ is still far from being able to do what you want. I'm working for this project, not WiktionaryZ. How can you be sure that they will want the inflection templates? They haven't even decided how they will handle parts of speech. Based on the conversations that I have had with GerardM there is a lot of room for accomodation. There are ways that the MediaWiki software is not appropriate to what we do here, particularly in handling the translations, but in many ways it's doing a much better job than what some techies would have us believe. Eclecticology 10:32, 25 August 2006 (UTC)Reply
To bring this to the original point: you insist that the category doesn't belong in the POS template? Correct? Consider this: with the Category in the POS template (en-noun), and with the template used consistantly, we can:
  • add the category English Nouns
  • change the name of the category to (say) Nouns
  • or delete the category
Each takes a one line change, easily revertable (as you know) to Template:en-noun and a wait of a day or so for the background sweep to update all categorizations (this happens automatically). Without the templates, each would be a massive bot run. See why templates are useful? Robert Ullmann 15:15, 23 August 2006 (UTC)Reply
Again I'm not talking about templates in general but about this template. The changes that you mention are all or nothing. If, for example, I wanted to subdivide the nouns into concrete and abstract nouns, how would you propose doing that? Eclecticology 10:32, 25 August 2006 (UTC)Reply
Japanese verbs are divided into Type 1, 2 and 3, and Japanese adjectives into い or な declined forms, as well as others. Works just fine with or without the templates. When you use the templates, you specify the type or declension. Robert Ullmann 13:00, 25 August 2006 (UTC)Reply
Good, then either way should be usable, and you don't have categories buried in the template. Eclecticology 09:04, 28 August 2006 (UTC)Reply
The templates and categories are useful (e.g. [1], [2], using Google), even if EC believes otherwise. They give readers a consistent product whose style we can easily change. All of EC's edits in the main namespace today were reverts of my additions of the headword/pos/inflection templates. That behavior is very counter-productive.
You conveniently fail to note that in that batch of words there were at least some where I had added a plain inflection line where nothing at all existed before, and you had previously reverted that. Please mnke sure you tell the whole tale if you're going to tell any.
Wiktionary belongs to this community, not to EC. His "tolerance" comment is inappropriately authoritative. He is in no position to declare the primary focus of all Wiktionarians as he attempts by saying, "The core interest of Wiktionarians is words, not technical tricks". He is in no position to ban community-endorsed edits made to improve Wiktionary, as he attempts to do by reverting me and saying, "If you want to change one in either direction, go ahead, but that should probably only be done at the same time that you are making other more substantial edits to the article." He is in no position to stagnate our automated cleanup efforts, as he attempted by blocking my bot and decreeing, "I will only reinstate it temporarily if someone can use it to undo the damage."
Better authoritative than authoritarian as you seem to want to be. If a dictionary is not about words, what is it about? What was the "ban"? I understand that your POV is that any approach to edits that differs from yours is not done to improve Wiktionary. My comment about changing the inflection line in either direction was directed to avoid wholesale changes and promote mutual respect. You're confusing automated cleanup with automated dictatorship. The bot was not properly authorized. You made no effort to seek community consensu about using it. Please look at the discussions regarding Connel's earlier bots that were approved.
EC speaks of imposition, but his reverts over the past two days are the only attempts of imposition here. Rod (A. Smith) 16:12, 23 August 2006 (UTC)Reply
I don't see how a handful of reverts can be more of an imposition than a systematic burying your preferences in a template where it will be nearly impossible to change something at the individual article level. Eclecticology 10:32, 25 August 2006 (UTC)Reply

DAVilla's response

On the subject of large categories, there has been a recent change in attitudes due to their potential utility. This is a new development, and despite wanting to add to that conversation I won't do it here.
There's a big difference between a real utility and the potential one that you imagine. Eclecticology 10:32, 25 August 2006 (UTC)Reply
I suppose the plural you is meant, because I've always fought against the creation of large categories. DAVilla 22:33, 29 August 2006 (UTC)Reply
When I am responding to multiple comments in a complicated thread, it is safer to treat "you" as generic. ;-) Eclecticology 22:42, 29 August 2006 (UTC)Reply
On the subject of templates, in particular the speed in which their use was implemented, I think the reason it developed so quickly is precisely because there were no objections to it. Frankly you're the only person in my memory who has raised any flags (granting I've been here only shortly), and others if they're not entirely supportive are arguing the terms, e.g. they're better just don't require me to use them. Yes, there were probably too many techies around, and I'm sorry if you were away, but I don't blame Connel or anyone for not playing devil's advocate with himself, asking what possible objections there might hypothetically be to every little issue beyond the real if few objections raised here, not to mention that many dictionary communities for other languages have already accepted and implemented the use of templates wholescale.
Again we aren't talking about templates in general. There were no objections because no time was given for objections. This is hardly a little issue when at least 10,000 articles were affected. Each Wiktionary sets its own policies. Eclecticology 10:32, 25 August 2006 (UTC)Reply
On the overall issue of burying categories in templates it was noted that there had been question in the past, but that things had resolved now and the conclusion was that it would be easier for a newcomer to use a single template than having to remember the code for styling (which templates have in fact diversified), the names of the specific categories, etc. which were more commonly just omitted. This change came before August, so by the time you saw things moving in high gear there weren't really even any questions to raise. This is meant to address the speed at which things are moving, and not so much to support the conclusions themselves. Certainly the questions can be put to review, but if you really think the bot you stopped "has already done its damage" then I'd hate to see your reaction when, after reviewing all of these issues again, the bot is reactivated under consensus.
Remembering the category is not difficult when there is only one at issue. It's much easier than remembering all the template varieties. Where in the past was ther a consensus for burying the categories in such a template, especially when its a controversial category. Eclecticology 10:32, 25 August 2006 (UTC)Reply
My understanding is that a lot of the templates have been rolled into each other with fairly consistent naming like {{en-noun}} and {{en-verb}}, and where naming isn't consistent a redirect is a completely transparent patch. If this discussion is narrowly dealing with Category:English nouns then, extending arguments above, the templates are more useful in changing the use of that category if, for instance, we decide not to include all nouns in that category (and then, perhaps, change our minds, and twice again). I don't think anyone is arguing that subcategorization shouldn't be allowed, regardless of the template. Of course that takes human intervention, which is expensive, and so the question we should be asking is the best method of subcategorization. DAVilla 22:33, 29 August 2006 (UTC)Reply
I think that the discussion has become more focused on the narrower issue. It is dealing with including Category:English nouns (with parallel arguments for adjectives and verbs) in the inflection templates. Our categorization and sub-categorization scheme is nowhere near to being developed enough as to reasonably allow categories that are so consistent that the use of bots or templates would be a benefit. I believe that categories should be scalable enough that reasonable subdivisions can be applied. They should also be collapsible in that a higher level category can optionally be made to show its immediate members only or its submembers to any whatever depth the user requests. A category that cannot be easily scaled is probably poorly chosen. Eclecticology 00:59, 30 August 2006 (UTC)Reply
Unfortunately the software for categories is not developed to the point you envision, and I can see a snag in trying to do so. In a system like yours a basic simplifying assumption is that the graph structure of categories is a tree with no cycles; that is, no category can contain itself. While this is a sound policy and a fairly easy restriction for our mental construction, the software is not written this way, nor would it be feasible to impose this restriction in the wiki syntax since no error message could be given on such attempt, nor the edit rejected. It would be better to leave out this simplifying assumption, in which case collapsing is possible but probably not in the same folder-style GUI you imagine.
The simplest extension I can think of is this: The basic option we currently have of "entries in this category" would be superceded. For me, "entries in this category but not in any subcategory at any depth" is the most essential option for browsing, and "in this category and all subcategories to any depth" for a random page utility if not both. Of course there could be endlessly many more requested features.
What would be possible to do at present is to modify all templates that include organic chemistry as a category, for instance, to also include chemistry and the sciences, as the most inclusive option. The most exclusive option might also be possible if we restrict ourselves to using templates for categorization, but it would be incredibly rigid, to your disliking, and incredibly tricky, to everyone else's. The third idealized option is the somewhere in the middle laissez-faire: standardized categories where standards allow, plus hand-crafted sub-categorization. This wastes less of our time if in the long run the software we're imagining comes to fruition.
Of course there are many ways to achieve the last option, and here I'm really pushing my own opinions. As many of us see it, standardized categories are easiest to do by automation, because of process, because of consistency, because of the reasons above. Given that the templates would already be there, sub-categorization is also easy to do making use of these templates with an extra parameter. Another option would be to leave the category information in the file, although buried in the template as well, only as a reminder for the purpose of potential sub-categorizations. DAVilla 15:16, 30 August 2006 (UTC)Reply
I'm sorry if I don't follow everything that you say, but this is a fundamental problem for this project between those who attach priority to content and those who attach it to structure. This is not a matter for blame, but it does mean that people are often talking at cross-purposes. If I look at the recent template work by someone like User:Fabartus I find his efforts totally incomprehensible, although it does seem to have something to do with correlating categories and templates between projects. As much as I don't like the inflection templates, at least I can see what they are trying to accomplish. If I don't understand what a template is trying to accomplish, it becomes very difficult to evaluate its implications on a social level. Sorry too if this became a little rantish.
Yes I do see categories in a treelike structure, and I do see each category in the main namespace as being traceable back to Category:*Topics. I'm still ambivalent about whether a category should contain itself, and I haven't at all considered graphical interfaces. Nor did I consider the use of error messages, by which I presume to include something like, "The category you have chosen does not exist."
Whatever you think I said, ignore it, practically all of it. It's not really your fault either. I actually have a difficult time getting ideas across even to other tech-savvy people because I jump right into them rather than tediously describing the boring backdrop, which I think should be obvious. For you I think I'll back up an additional step. Here's what I was trying to say:
You suggested that categories might some day have extra utility. I was analyzing what kind of utility they could and could not have in the long term so as to help determine the best policy in the short term. My conclusion was that the software would be more difficult to write than desired, (especially for a lazy programmer like myself,) but still feasible. Therefore categories do have a lot of potential use. So, how do our actions in the short term affect that future use? (By the way, a clean tree-like structure of categories I would take as a given since it's actually difficult to construct counter-examples in our minds, but ask me if you want one.)
Anways, as it pertains to these megacategories, there's no problem with listing a word in both the category and in subcategories. What will happen down the road is that the subcategorized terms will be automatically struck out from a view of the higher-level category itself. Thus if a particular area such as Chemistry is very well sub-categorized into Organic etc., then despite having several hundred thousand terms in the actual Chemistry category itself, fewer than 200, potentially, will actually appear as being "in this category but not in any subcategory at any depth".
By the way, the best reason to populate the various Parts-of-speech categories would be precisely what you'd want, to catch all of the words that fall through the cracks, that are not yet sub-categorized or under unique circumstances (there are always exceptions) could never be sub-categorized. The reason you're so abject to the category buried in the template is that you're trying to sub-categorize now when the tools to help you do that are not yet available. But really the number of categorizations added by the bot must be several orders of magnitude greater than the number of words that have part-of-speech subcategorizations, so I don't see how this would impede your work for quite a while. DAVilla 17:38, 31 August 2006 (UTC)Reply
Some kind of standardized categories could be acceptable, but are we anywhere near to establishing those standards? Thus far there have been a lot of ad hoc categories where I don't think that very many people have thought out how their category of the moment fits into the structure. A thesaurus is a kind of structure. I also refer to my copy of the Library of Congress Subject Headings. (nearly 7,000 pages with 3 columns each) The latter suggests 14 subdivisions for "Organic chemistry" which is one of 39 for "Chemistry" alone. I doubt that we would ever use them all. ... and concrete nouns are the easiest to categorise.
I don't completely discount the possibility of automation, but one huge and highly debatable category is not the place where we should be looking for consensus at this stage. Nor is it even decided that categories will give us the best way of organizing the data. We have indexes and appendixes; we have Wikisaurus which deserves more attention than what it gets when someone discovers 5,000 synonyms for "penis". What will be the relation between categories and the Wikisaurus.
In general I like to use my seniority on this project to look for common ground that brings things together, and to try to build a global view of Wiktionary. I make no apologies for insisting from the beginning that the project include all words in all languages. That alone has opened some incredible and unique challenges which, if overcome, will lead to unparalleled results. We can't let technical solutions run ahead of the problems that they purport to fix. Eclecticology 22:24, 30 August 2006 (UTC)Reply
I can't find fault with this assessment. It looks like we're all looking at the same problems from different angles, which can only be constructive. I don't think the megacategories are trying to tackle the issues you're raising though, or if so barely half a step. It might be easier for us to reach the same conclusions considering the motivation to be a do-what-we-can-now-to-have-the-most-impact sort of attitude. The difference is that you consider the potential utility of unproven modifictions shaky and worry it will all have to be undone, while the rest of us have supreme confidence that it is leading in the right direction. DAVilla 17:38, 31 August 2006 (UTC)Reply
I'd like to bring some of these issues to vote. Certainly not all of them have been discussed, but a number have. I take objection to the use of "only" in your statement that "burying this categorization...can only be viewed as a way of making sure that that category remains the same." That is not the only opinion of the issue. The point is not to make the category inalterable, it's to make the category name consistent, to minimize the potential for a mess that requires so much effort as to evade all but the most massively organized attempts. Even if the category seems unrelated, provided any page with that template should necessarily belong in the category then it's okay by me to put it there. This ensures that the categories are populated with fewer omissions, and in actuality makes it many times easier to alter the name if necessary.
Maybe easier to change overall, but clearly more difficult to sudivide the category or to use a different category for certain items. Eclecticology 10:32, 25 August 2006 (UTC)Reply
The use of a different category would indicate that "any page with that template should necessarily belong in the category" does not hold, in which case I would oppose categorization by template. As to subdividing categories, that needs discussion. The clean techie solution is to pass a parameter to the template, or to another template such as {{en-pos|noun|color|verb|phrasal|irregular}}. The dirty techie solution is to append a term to the template name. The third option is more viable when there isn't a consistent structure, where uniformity is not necessary and therefore not desireable. DAVilla 22:33, 29 August 2006 (UTC)Reply
I am therefore also opposed to the replacement of a template on any page if there are no corrections made. I don't know how to use all the templates that are provided and I don't think anyone should be expected to. If there's another plural form then any way a contributor finds to add it should be sufficient. But when and where they work properly templates should be preferred, and so I would oppose any effort to undo them just for the sake of undoing them. DAVilla 15:59, 23 August 2006 (UTC)Reply
Okay then accept too that the same should apply when people have not used templates to start with.
I absolutely agree that making and modifying entries should be as easy as possible for new users and for users who are more interested in the substance. Even for regular contributors, who are expected to follow some basic guidelines, we don't want to impose too many regulations. Even the templates themselves should be fairly straightforward so that they can be modified at later dates. Even the trickiest templates, however, are not necessarily difficult to use, and I would be very interested in discussing conventions for templates like {{italbrac}} which is not clearly linked to synonyms by its name. DAVilla 22:33, 29 August 2006 (UTC)Reply

Connel MacKenzie's response

  • For questions of process I wish to say a few things:
  1. Regular contributors have no excuse for ignoring WT:GP, especially when they are very well aware that it exists. Yes, the more important issues are brought to WT:BP (if it is clear the topic is "policy"-ish and not "techie"-ish.)
    Regular contributors should not be required to wade through a lot of technobabble that they can't understand. Eclecticology 10:32, 25 August 2006 (UTC)Reply
    You clearly understand it and have no reason for disobeying the community consensus decisions reached there. --Connel MacKenzie 18:27, 27 August 2006 (UTC)Reply
    I may very well have understood this one, but that doesn't mean that I will understand every techie argument on that page. Eclecticology 09:04, 28 August 2006 (UTC)Reply
  2. The fundamental complaints Ec has raised are all "techie"-ish, and belong on WT:GP, not WT:BP.
    Whether Categories should be included in noun-templates is not a techie issue, but how to do it is. Eclecticology 10:32, 25 August 2006 (UTC)Reply
    Your complaint was about the "how." Furthermore, all the extended uses of categories are techie issues, therefore, according to you, should only be discussed there. --Connel MacKenzie 18:27, 27 August 2006 (UTC)Reply
    If they are merely techie issues they should not affect what non-techies do iin any way. Eclecticology 09:04, 28 August 2006 (UTC)Reply
  3. Claiming that discussions were inadequate is rather silly. the concept of using the category has been tossed around for over a year.
    And still mostly unresolved; so what if it's tossed around for another year? Eclecticology 10:32, 25 August 2006 (UTC)Reply
    Completely untrue. The English Wiktionary community has resolved the issue, to everyone's satisfaction (except for one person who chose not to participate in any of the numerous discussions, until now, after-the-fact.) --Connel MacKenzie 18:27, 27 August 2006 (UTC)Reply
    Some who have commented don't necessarily see this category as a good thing; they just didn't feel like arguing about it. Even conceding that there have been numerous discussions on this does not mean that any of those discussions are conclusive. The more there are the less certain the alleged decision. And why should any decision be so final. Maybe we need to review the decision making process so that it becomes perfectly clear to everybody when a decision has really been made. A casual understanding among those who happen to be around to comment on BP does not seem enough to commit a whole community. Eclecticology 09:04, 28 August 2006 (UTC)Reply
  4. Jumping to a random entry within a category is a feature of navigation that has been requested numerous times, but remains disabled currently.
    References for those requests? Eclecticology 10:32, 25 August 2006 (UTC)Reply
    Wikimedia IRC channels are not allowed to be publicly logged. --Connel MacKenzie 18:27, 27 August 2006 (UTC)Reply
    Ahhh! That proves my point. There is no evidence. Eclecticology 09:04, 28 August 2006 (UTC)Reply
    100% false. --Connel MacKenzie 17:16, 31 August 2006 (UTC)Reply
  5. Categories have other uses than navigation. Still. One way to use the categories would be to generate an appendix (but then, the other main Bureaucrat Paul has suggested the opposite several times; that appendices should be migrated to categories!) My opinion is that both are useful for very different reasons.
    Evidently more discussion of this is required. Eclecticology 10:32, 25 August 2006 (UTC)Reply
    Because of one after-the fact objection (that ignores things like dynamicPageList and other extensions?) A category is not an appendix; they serve very different functions. --Connel MacKenzie 18:27, 27 August 2006 (UTC)Reply
    Templates:
    1. Previous "template wars" were primarily over appearance. With the initial "preferences" page the proof of concept has been demonstrated. Lower-level MediaWiki changes are being discussed (mainly on irc://irc.freenode.net/wiktionary) and seem quite likely.
    2. Templates need to be encouraged strongly. Consistent, parsable entries are impossible in the free-text scheme; we've seen that demonstrated for years, here, now.
      • This reflects a change of opinion, for me. While I strongly supported having text inflections in the past, the new features of the templates (making their appearance cater to user preferences) outweigh my previous concerns.
    3. The natural place to correctly categorize entries is in templates, to make (as Jimbo said at WikiMania) the process easier for newcomers.
      I don't recall his saying anything about putting categories in templates, and I would not draw that inference from his desire to make things easier for newcomers. Nevertheless, he has previously expressed concerns about instruction creep. Eclecticology 10:32, 25 August 2006 (UTC)Reply
      Your primary objection so far has been the inclusion of the category (in the appropriate template!) Which inference do you dispute? That Wikimedia edits should be easier for newcomers? --Connel MacKenzie 18:27, 27 August 2006 (UTC)Reply
      Your rhetorical question distorts the situation. Making things easier for newcomers to edit is not the same as making it easier for them to comply with your vision of the way things should look. Eclecticology 09:04, 28 August 2006 (UTC)Reply
    4. In combination with further preload-automation, the templates will become only easier than they currently are.
  6. You've missed the conversations on 'bot policy reform, it seems. I do not think the bot approval was a mistake by a new bureaucrat. Rather, it was a reflection of the change of practice with regard to the pointlessly onerous bot policy.
    The bot policy should be onerous. Eclecticology 10:32, 25 August 2006 (UTC)Reply
    The opposite is true. --Connel MacKenzie 18:27, 27 August 2006 (UTC)Reply
  7. Your out-of-process removal of the 'bot flag needs to be reverted, if for symbolic reasons only. As you said, the "damage" has already been done.
  8. Months have passed since 'bot reform was first discussed. Nothing was rushed into. To not participate in the discussions, then complain about the results, seems quite strange. FWIW, 'bot reform has been tossed about (and needed) longer than Category:English nouns.

--Connel MacKenzie 19:55, 23 August 2006 (UTC)Reply

On the last point, I will review the links and comment later. Eclecticology 10:32, 25 August 2006 (UTC)Reply
Please do. The point is not to be adversarial; having you start wheel-warring because you don't "like" a certain category (that the rest of the community has agreed is not only good, but necessary,) is very counter-productive. You have a lot of information retained about categories that can really help how CJKV categories are ultimately arranged. But I don't see any reasonable objection to Category:English nouns. --Connel MacKenzie 18:27, 27 August 2006 (UTC)Reply
The first reference had to do with approval of a different bot. I saw it mentioned in the bot approval log, and chose not to argue about it. The other raised the question of whether a bot should be restricted to a single task; I have no problem with that though I think the low participation rate makes the result inconclusive. Sometimes your claims of community support are on shaky grounds, and it is often not very clear as to just what the community agreed to. Furthermore, please don't misrepresent my position as an argument that is simply againt the English nouns category; it is to exclude the category from the inflection template. Eclecticology 09:04, 28 August 2006 (UTC)Reply

Scs's second response

Wiktionary (like any Wiki project) is many things to many people. Some just want to write prose for other people to read, while some want to create a structured database that can be used or extended in more "interesting" ways. The challenge, as ever, is to arrange that the various parties can work together in harmony (or simply ignore each other), despite their divergent motivations and goals.

I think it's important to recognize (though I'm admittedly quite biased here) that the goals of imposing more structure and enabling automatic processing are important. Not everyone is interested in that structure, but plenty are. More importantly, I think it's safe to suggest that the future of Wiktionary depends on it. The current Wikimedia software is woefully inadequate for what we're ultimately trying to do here. Someday, Wiktionary will want to migrate to something that's truly more structured (that is to say, inherently more structured, not "structured" by manually-maintained ad-hoc consensus). If that eventual migration is to be anything like successful (or possible), without requiring massive amounts of manual labor then or abandonment of massive amounts of today's content, we've got to do what we can today to maintain some uniformity and structure.

The question, then, is how to maintain the balance between those who are keenly interested in this structure already versus those who are not so interested, without imposing a lot of, as Ec puts it, "technobabble" on the latter. To that point, my attitude is similar to Wikipedia's infamous "ignore all rules" policy: if the templates are too complicated to remember, or if they change every week, I just won't use them. But where Ec and I part is that I do not mind if someone comes along and "fixes" my inflection lines later. Because, remember, they're not "my" inflection lines; they're the project's. I was not making some value judgement when I chose not to use the template; I was just being lazy.

The next question (for the current inflection-line and noun-category question, anyway) is whether there's any loss when a non-templated inflection line is replaced by a fancy, technical template. Did the editor who initially declined to use the template do so because

  1. He was a lazy bastard like Scs
  2. He wanted a different look and feel to the inflection line than the template would give
  3. He wanted to ensure that the noun in question did not appear in Category:English nouns?

Obviously (1) is no problem, but if there are non-inflection-template-users who are trying to assert (2) or (3), we've got a problem.

Personally, I'm not too interested in look and feel either way. If the consistency we were worried about here were merely visible, it might well be (as Emerson put it and someone quoted Cunctator as paraphrasing) somewhat foolish and perhaps a hobgoblin of little minds. But the consistency we're most worried about (or at least I think we ought to be most worried about) is not merely visible.

Cunc did not credit Emerson. Thanks for pointing this out. Eclecticology 09:04, 28 August 2006 (UTC)Reply

The automatic uses of category-based processing that people have in mind work only if the categories are properly comprehensive. When the people doing that processing notice that a noun is "missing", they're going to need to add it to the category, presumably by changing to (or reverting to) the category-containing inflection template. So, on this point, we do need wide consensus: if the categories are to work for these purposes at all, everyone has to agree that this is a good idea. We would need to discourage people from changing inflection templates back to free-form inflection lines, or otherwise removing words from their part-of-speech categories.

(Sorry this was so long. I'll stop now.) —scs 14:12, 25 August 2006 (UTC)Reply

You are right in distinguishing between those of us who see content as important, and those who see structure as impportant.
We also differ in terms of migrating this project to something else. Where do you see us migrating? For the things that matter to me, an historical and verifiable record of language that also discusses the subtleties of linguistic change. I do find the software generally adequate for my purposes, except as it relates to the long tables of translations, where I don't participate much anyway. It could very well end up that WiktionaryZ could absorb that part. Shouldn't we know where we are migrating to before we start planning for it?
"Ignore all rules" is one of the more progressive policies on Wikipedia, but it cuts both ways, as is "be bold". "Fixing" can work both ways.
Of the three enumerated reasons for resisting the templates, the first is clearly trivial, the second is not applicable because identical results can be achieved with or without templates, but I do assert the third. The process is completely inflexible. It does not allow for more refined categories which could themselves be subsets of the English noun category, and might themselves be picked up in an "include subsets" extension. There is already at least one entry which should be marked as a "noun phrase", but is already categorized is a noun.
Where do we distinguish between nouns and noun phrases?
ice cream, guinea pig, legal tender, milk of magnesia, high spirits, prisoner of war, batchelor's son, palm tree justice, apples and oranges, law of diminishing returns, one of his majesty's bad bargains
DAVilla 22:43, 30 August 2006 (UTC)Reply
I don't know what kind of automatic processing anyone has in mind, and without knowing that it is impossible to know whether there are any alternative ways of accomplishing the same thing. Even if there is some need for processing based on this category, why should it be a part of the inflection, when having it separate would do just as well and be more flexible. I cannot agree that it is a good thing to have them together.
Certainly diligence in approving bots has its place. Nobody here questions that we should evaluate and test bots before turning them loose in the database, but that is already part of the process. To make the process deliberately "onerous" or "tedious" and to leave them explicitly in administrative limbo for a wikiternity while we ponder (and then ignore) their appropriateness is worse than to reject unwanted bots outright. If the process works as it should, then a bot that is truly inappropriate to the needs of the community will be stopped at the door in a timely fashion, and the volunteer programmer who proposed it may simply shelve it and proceed to another project. It can be revived later if needs change. The appropriate ones should be approved in an equally timely fashion, to improve the conistency of articles and spare us all the tedium of making sweeping, repetitive changes manually.
The programmers and the "inexperienced" bureaucrats who approved these bots are not newcomers to Wiktionary, nor to programming. They are not some interlopers out to destroy the project. Rather, they are trying to improve matters, with the general support of the community. I believe we may trust them to keep an eye on one another and to run and maintain bots in a reasonable and responsible manner. Let us write the policy with reasonable time limits and safeguards, and then use it. —Dvortygirl 05:43, 26 August 2006 (UTC)Reply
I certainly support writing the policy in a way that does this. The other policy where an evident need is one that states when a policy is adopted. Casual agreements in the Beer parlour among those who happen to be around should not mean that the policy in question has been adopted. A discussion begun here a mere week ago will already be difficult to spot when other unrelated discussions begun later have become lengthy. Adopting policies by default is also not the way to go; a minimum level of support within a reasonable time frame should be essential. Eclecticology 09:04, 28 August 2006 (UTC)Reply

Widsith's response

Some observations:

  1. Ec should not be criticised for objecting to decisions ‘after the fact’. There is no fact; previous decisions are not set in stone; the consensus opinion is open to continual debate and revision, depending on the introduction of new users or indeed changes of opinion.
  2. The output of templates may look the same as Wiki markup for Ec and others, but it looks very different for (eg) me, because I have set my .css to display inflection templates as boxes. This is the beauty of them. If templates don't look any different, there shouldn't be any objection to their use – except where it affects the next point.
  3. Clearly (from this discussion), templates are much more widely accepted than the use of Category:English nouns, and we should perhaps separate the two discussions.

Widsith 18:53, 28 August 2006 (UTC)Reply

Your first point is very good. Although there are some things we clearly need to discuss, anything can be brought up for re-discussion. Your second point is great and good to state clearly, although I'd have to say that the customizations have more an effect of quelling disputes than their effect on nearly all users who are not regular contributors, at least until the mirrors take advantage.
Your third point is also excellent. For one, we need to decide if any noun belongs in Category:English nouns even if it's in a subcategory. We also need to decide what tools, if any, will help us better categorize nouns. And probably a few other things. DAVilla 23:05, 29 August 2006 (UTC)Reply
It's always easier to find common ground when someone is not trying to defend a past action. On the second point there seems to be some understanding that there will be no wholesale attempt to replace all the templates, but for a wide variety of reasons some will need to be replaced at the level of the individual article. The outstanding question has been narrowed to one of how to deal with a certain class of categories that are now in templates. Eclecticology 01:16, 30 August 2006 (UTC)Reply

scs's third response

Wow. This has certainly turned into a discussion!

We've danced around (if not trampled into the ground) at least three separate questions, and I'm not sure which ones people still find open and/or interesting:

  1. Should Category:English nouns exist?
  2. Should category tags be buried inside of templates, in general?
  3. Should inclusion into Category:English nouns be buried inside of {en-noun}, in particular?

To that third point, Ec suggested that "Burying this categorization in a complicated template that has nothing to do with categories can only be viewed as a way of making sure that that category remains the same no matter what anyone else thinks", and this was quite an illuminating comment, because I had no idea it could be viewed that way. (So I think the word "only" needs to be stricken from that sentence.)

Honestly, I don't believe there was any POV-pushing agenda here; the idea of tying noun categorization to that template was a result of several separate, individually-rational-seeming steps:

  1. It's potentially useful to have a list of every English noun, and
  2. A category is a good way under the current Mediawiki software to generate such a list, but
  3. It's a nuisance to manually categorize every noun; people will tend to forget, and then our list won't be complete. But also
  4. I's equally a nuisance to format inflections lines consistently, so
  5. It'd be good to have a template for that. Finally,
  6. Since the entry for every noun will obviously want to use the template to generate its inflection line conveniently and consistently,
  7. We can just put the categorization tag inside the template invocation, and
  8. Hey, presto, we automatically get a much more definitively comprehensive list of English nouns, with much less tedium and duplication of effort to maintain it.

Now, it's true, there are a few assumptions lurking in there: (1) that a list of all the nouns is useful and worth all this; (2) that "the entry for every noun will obviously want to use the template to generate its inflection line conveniently and consistently"; (3) that putting the categorization tag inside the template invocation is a "we can just do it" thing. To the gearhead, all three of these are just blindingly obvious; the possibility that anyone could feel otherwise just doesn't even register. But this discussion reminds us that there are more things in heaven and earth than are dreamt of in the gearhead's philosophy.

Up above, Ec asked, "Shouldn't we know where we are migrating to before we start planning for it?" My answer is, actually, not necessarily. This is, I strongly believe, a case in which even if we have no idea where we specifically might end up some day (and it's true, we don't have any idea), there are yet several points on which we can make some very accurate educated guesses.

In particular, I think everyone who is interested in structure would agree that the thing we migrate to, whatever it is, will have more definitive ways of specifying the vital properties of an entry. Regardless of how it's implemented, it will almost certainly have the flavor of

word: dirt
language: English
p-o-s: noun
...
word: schmutzig
language: German
p-o-s: adjective
...

Now, even under a more structured scheme, definitions and other "prose" sections would certainly remain much more free-form, which is why I didn't try to illustrate them in my example. But the point here is that vital properties like "language" and "part of speech" are specified in a structured, formal way, as opposed to the current Wiktionary scheme, in which both of these properties are indicated only by the presence of various level-2 or level-3 headers, which any editor is free to leave out, or rearrange, or spell inconsistently, or whatever.

In other words, right now we tag language and part-of-speech in an ad-hoc way, enforced only by consensus. Under a more-structured scheme, the specification of these properties would be inherent, and it wouldn't even need to be "enforced", because it simply wouldn't be possible to not have the language tag, or to spell it wrong.

(At this point, I realize, all sorts of secondary questions and objections spring up: how does the hypothetical database structure tie multiple senses in multiple languages to one "word"? What if you've got a word in an obscure language that isn't on the official list yet? What if a word is in more than one language, or is a symbol that's "interlingual?" Those are questions we'd have to answer in actually setting up the hypothetical much-more-structured scheme, but I think they do have answers. Let's not get sidetracked by them now.)

Anyway, the whole point of this discussion is that we'd like to do what we can, today, to arrange that the eventual migration to some more-structured scheme can be at least somewhat automated. So, for those word properties that the eventual scheme is likely to treat as inherent -- such as language and part-of-speech -- we'd like to start simulating that "inherentness" today; we'd like to find ways to remove the ad-hocness and free-formness from those attributes and the ways we specify them. And we'd like to do this in some way that combines both convenience and accuracy. If our mechanisms aren't convenient people won't use them, but if they're not accurate they won't work and the eventual conversion won't be automatable and so it's all a waste of time.

So the next question is, how "convenient" can these mechanisms be? Up above, Ec said, "simplicity has no doubt been one of the important factor in the success of wikis in general." This is very, very true, and not to be overlooked or brushed aside. But we're at an awkward spot here because we are definitely not, on a couple of points, as simple as Wikipedia is. We've got a particular, highly stylized entry layout which is very nearly mandatory. You can't just come over from one of the other wikis and bang out a "dictionary definition" using any old level-2 and level-3 headers you like. Well, you can, but one of the regulars will be along shortly to clean up your offering and structure it according to WT:ELE.

So what about those templates? Well, yes, they take a certain amount of additional effort to learn about, and to remember to use. No, they're not as easy as just banging out a free-form inflection line. (But of course easier still is not banging out an inflection line at all!) So any decision we reach on template usage is necessarily going to be a compromise.

I believe that our attitude towards inflection-line templates (and other uniformity-assuring templates) should be very similar to our attitude towards the proper Wiktionary usage of level-2 and level-3 headings. You're free to ignore them if you haven't learned about them yet or can't be bothered to remember; you're free to bang out any free-form prose you want to in a new entry you're composing. But, anyone else is just as free to come along later and rejigger the entry to use them. When this happens, it is not some POV war which the original writer should be incensed by, it is rather an improvement which the original writer should be thankful for. It's a win-win situation, really: you get to be lazy and not use the "official" templates or layout structure (and I don't even mean this pejoratively; laziness can be a virtue), but we get the additional structure which the larger project needs, and we all get to share the newly-added information.

Finally, to a couple of other points:

Up above, Ec said, "Once imposed, such a vision can be very difficult to undo without another bot. In the present circumstances it is very easy to remove the category from the template, but that does not put it back as a simple category in the articles that had it before." But this is really no objection, because there's no need to assume the nonexistence of that other bot. If we decided that Category:English nouns were to be kept but that manual category tags were the way to go, we could and would use a bot to re-add the tag to every entry that had used the template.

Explicit in my suggested additions to WT:BOT is a statement to the effect that bot operators are expected to roll things back (via a new bot if necessary) if later consensus decides that the bot-imposed changes were unwanted. (If we can't be reassured by that contingency, we really can't afford to use bots at all, for anything.)

Finally, one last observation about templates. Up above, Ec spoke of the {{m}} and {{f}} templates and said "It is hard to imagine a more pointless use of a template." I have to disagree: for me, anyway, this is a perfect use of a template, in fact it's hard to imagine a better one! I don't want to remember whether we like to use m and f  versus masc and fem; I don't want to remember whether they're in italics or bold; I don't want to remember whether a period goes after them. By using a template, I don't have to remember any of these things -- and that laziness alone is enough for me to want to use the template. As a bonus, {{m}} is exactly as many keystrokes as ''m'', and fewer than ''m.''. As another bonus, having used the template, we can change our minds about those formatting details later. As another bonus, we can rig it up so that different viewers see the gender tag differently depending on their preference. As another bonus, we can use the presence of the template as a definitive tag for (say) generating lists of masculine and feminine nouns, or driving the eventual transition to a more-structured database in which this property, too, is inherently and definitively specified. Even if I weren't a gearhead, I think I'd still have to like these templates.

--scs 16:06, 1 September 2006 (UTC)Reply

[P.S. "I apologize for the length of this message; I didn't have time to write a shorter one."]

Vild's response

Since it was I who proposed the concerned category be included in the template, I need to comment here. Reading the above statements, I think I can summarize that there are two opinions regarding Category:English nouns.

  • Eclecticology states that its classification should be more refined where possible, much like the structure employed by the topical categories. That means that an entry should not be included in English nouns if it can be included in a more specific category.
  • The proponents of English nouns state that, regardless of any more specific classification, all English nouns should be part of Category:English nouns.

I think this clearly states the main point of contention, and someone mentioned a vote to establish which assertion has the majority of supporters. I'm not pro-votes, though, and I'll simply state why I personally adhere option 2:

  • Categories are not aligned in a tree-like structure, but broader categorization can coexist with a narrower lineup. The reasons why a broad category should be kept in place even though the classification is at the same time more specific, have both technical and non-technical roots. The former is more important, since the latter only comes down to getting a list of English nouns. But simple technical tricks and use of the available tools (see m:DynamicPageList) provides a lot of opportunities for the concerned category. They can generate specific lists, filter Recent changes, allow automatic operations, and so forth. For example, give me all English nouns that are also verbs, or chemistry terms, or that are derived from Russian, etc. Some may question the usefulness of this, but then, some may question the usefulness of a free dictionary as well. A disadvantage is, apart from the cluttering of an otherwise mostly ideologically conceived categorization scheme, that a given page may take many categories. That will already be the case on multi-language pages, though, and is inevitable in many other cases as well. This should be solvable or customizable on software level.
  • I also assert that Template:en-noun is to this day the best solution for the layout and treatment of English noun inflections. I find it hard to think of disadvantages here, the only one being that one needs to spend about two minutes looking for how to use it. Is it even two minutes? With some additional clarification and explanation of the documentation, it may be a lot less. Template:en-noun is, whether you like it or not, both a structural and aesthetical improvement over the previous stuff that cannot be denied. And still everyone is free to ignore it, so where's the problem?

— Vildricianus 13:11, 3 September 2006 (UTC)Reply


The English Index - a proposal

There have been discussions in the past about deleting Index:English. Certainly it has lots of rubbish in it, and it is far from complete. We also have many other lists of words that need to be added (see intro to Wiktionary:Requested articles:English for example).

I propose that its contents be replaced with a list of all those English words that we actually have, making it a true index. I'm sure that one of our computer-literate people could generate it automatically from all those entries that have a ==English== section, possibly limiting it to single, or hyphenated words (so as not to include proverbs and the like). It could be updated from time to time, as needed.

What do you think? SemperBlotto 10:46, 30 August 2006 (UTC)Reply

What is this index for, or should it be for, anyway?
If it's where a reader might go to "look up a word", it's clearly nonsense and an utterly unnecessary relic, an inappropriate holdover from paper-based dictionaries. In an electronic dictionary, the right way to look up a word is to use the search box. If we want to make it easier on readers who aren't sure how to spell the word they're looking for, one way (rather than spending time maintaining a redundant index) is to work at improving the search function's ability to do fuzzy matches.
If we want to make it possible to scan a list of nearby words (which some readers may well want to do anyway, especially if they have no idea how to spell the word they're looking for), we could do that by... oh, lookit that, it's there already: the "search from here" link on the search results page (which, to be sure, we could also stand to spend some time improving).
If the index is supposed to be a downloadable list for users who need such a list, it clearly needs to be in a different format (i.e. as one big downloadable list, not with all those "user friendly" subpages and intermediate headings).
But if there is a desire to have such a list, then yes, clearly it cries out to be generated automatically. Trying to maintain it manually is a preposterous duplication of effort, a big waste of time, and guaranteed to be perpetually out-of-date.
Finally, however, if a key aspect of the English index is that it's limited to English words (as opposed to all the foreign words which the English Wiktionary somewhat paradoxically contains), then maintaining it automatically is at least a little bit problematic, because of the ad-hoc ways in which we currently tag languages. (See threads elsewhere, e.g. the huge one on #Category:English nouns up above.) (But yes, I do know why the foreign-words-in-the-English-Wiktionary paradox exists.) —scs 11:24, 30 August 2006 (UTC)Reply
There's no doubt that the English index is a little klunky, and it will be perpetually out of date. Automated revision will update it for the words that we have, but will do nothing for the words we don't have, and which should appear in red in the index.
Requested pages in one form or another tend to be a favorite of people who come with great ideas for work that other people should be doing. This exhausts them, and they disappear. A better idea might be to merge all of these "requested pages" lists into the English, and where needed subdivide the index pages.
At worst, the index pages are harmless since they do not interfere with the work of others. Eclecticology 20:39, 30 August 2006 (UTC)Reply
This may sound like heresy, but I still find paper dictionaries charming. The delight of paging through words and finding a word I'd forgotten or never knew is, well, nice.
(Oh, hey, no argument there at all -- I feel exactly the same way! —scs 14:09, 1 September 2006 (UTC))Reply
That curiosity is what I was hoping to duplicate with the Gutenberg page rankings. Some have expressed that the "rank" things tend to do the opposite - in that they cause confusion because they aren't related at all.
Index:English does seem to be the "best" place to consolidate the various request lists (as Eclecticology suggests) in addition to the terms we have already. Unfortunately, it is a larger task than perhaps people realize. The Gutenberg concordance thing (alone) has always been too large to be meaningful. Each list that would be automatically added to the index would need its own criteria.
  1. For example, looking at the current English Requested Entries, that list seems to be done...the only entries remaining there are typos or "protologisms" that are not likely to meet our criteria for inclusion.
  2. The Gutenberg cutoff could be either 100 or 1000 hits (in the entire corpus of Project Gutenberg texts) depending on how archaic/picky we want to be. (In response to Ec's barb: yes, I do use my generated lists of top 1,000 to enter terms here - others sometimes help, if they feel like it.)
  3. The list of three major dictionaries sorted together remains something that I do not wish to touch, as I feel it is copyright-tained/unnecessary exposure. And my concerns about that lists' legal status have never been countered.
  4. The Project Gutenberg copy of Webster's comes with a copyright notice. The other version of it (without the odious copyright restrictions) is only the first one hundred pages. I don't know the status of the (even more outdated) Century Dictionary.
These are each pretty significant problems.
So I think SemperBlotto's original request remains the most reasonable. Having the Index rebuilt on a regular basis will provide the foundation for the enhancements that User:Scs suggests above. Especially when the entries themselves need to be inspected to determine language. --Connel MacKenzie 00:35, 1 September 2006 (UTC)Reply
I've been doing some of my own work at extracting word lists from various corpuses (corpii?),
You're looking for "corpora". — Vildricianus 09:51, 3 September 2006 (UTC)Reply
and the impression I'm getting is that strict hitcount limits don't work very well. If you set the cutoff high enough to filter out things like one-time personal names some author has invented, you end up missing plenty of interesting, truly-worth-of-inclusion, real words which just don't get used that often. For example, I see from Concordance:Holmes_G that gasogene -- clearly an interesting word! -- is only used there once.
So I'm afraid there's no substitute for a lot of hand filtering and subjective decisions.
On the question of copyrightability of word lists, that's an interesting question. If I take a published, copyrighted dictionary and strip off all the "interesting" content -- definitions, pronunciations, etymologies, everything -- is the resulting list of headwords still copyrighted, and is it a violation if I use it in building my own free dictionary? It's tempting to argue (and plenty of people would) that such a list isn't copyrightable, but as we've just seen, since a fair amount of work can go into selecting a set of words that's "interesting" enough to go into a dictionary of a certain size, I think there's a certain amount of intellectual capital left in such a list.
(But what if I take that list, mechanically diff it against the complete list of words I've already got in my free dictionary, and manually scrutinize the delta for candidates I might want to add? I think I'm on much stronger, pretty much unassailable ground there.)
scs 14:06, 1 September 2006 (UTC)Reply

I don't like the idea, especially in the long term. A list of existing words is what Category:English language is for. That has all the mechanisms in place to keep it up-to-date, although it can't be browsed as an index as of yet. And I would extend this argument for any Index: page that purportedly shouldn't have any red links. Long-term, I don't see the Index: space as being very useful, or at least not in this capacity. Short-term, do what you like. I'm certainly for deleting or overriding the current content, as per Connel. DAVilla 03:22, 1 September 2006 (UTC)Reply

Category:English language has never been for listing all English words. In a sense Category:*Topics does that with an implicit "en" sub-namespace. Long term no index should have any red links, but when that happens we might as well all pack up and do something else because Wiktionary will have been completed. Indexes and categories are where the top-down and bottom-up perspectives on sorting data interface. If you're thinking of Wikipedia instead of Wiktionary read "lists" instead of "indexes". The interface is not perfect until the two give the same result. Thus far this only happens on close-ended lists like "Days of the week" (at least as we commonly understand that term in English; who knows what would happen if someone invented a metric calendar).
If you're responding to me then your logic is a bit backwards. I never said that indexes shouldn't have red links; that was SemperBlotto's suggestion for this index. I said I don't like the idea precisely for the reasons you gave: a page with no red links isn't an index (unless the project is complete in the ultra-long-term, if that's even possible).
My vote for what to do with this page isn't very strong. It seems like what's there now isn't very useful, so I would like to see it either deleted or changed under some proposal, as per anyone, even SemperBlotto if nothing else can be agreed to.
By the way, is there any word that shouldn't be somewhere in some subcategory of Category:English language? At minimum I would think they all at least have a part of speech. Remember I'm talking long-term. Maybe I should have said that "a list of existing words is what Category:English language will be for." DAVilla 14:28, 1 September 2006 (UTC)Reply
While I've always been happy to shoot barbs at Connel I wasn't aware that I had done so in this thread. I've consistently supported the Gutenberg concordance, and feel that an expansion of that has enormous possibilities. I also am well aware of the tremendous amount of work that would be needed to have that doing what I would want it to do using statistical techniques. At some point I believe that we should have an article on every word in that Gutenberg corpus, but there is no priority to such depth. The three dictionary merger may be difficult, but not because of copyright. The presence of a copyright notice is not what makes something copyright. If that right exists it does so with or without the notice. If you read the copyright notice attached to the Gutenberg Webster you will see that it is clearly stated that it is in the public domain; their copyright applies only to the "small print". Copyright in dictionaries, even modern ones, is trickier than for other writings because most individual entries may not be copyrightable. This is partly because much of what is contained repeats earlier editions which have since gone into the public domain, and partly because of the merger principle, which states that if the information can only be expressed in one way that expression is not copyrightable. See http://www.law.pitt.edu/madison/copyright/supplement/lexmark_v_static_control.htm Eclecticology 08:46, 1 September 2006 (UTC)Reply

Nouns used as adjectives: policy / format.

English is quite wanton about using nouns as adjectives, for example beer parlor. In the spirit of multilingualism, there should be a standard way to place the adjectival usage as a place to anchor translations of the adjectival usage. If there is no adjectival usage or it is otherwise non-standard to mark this (e.g. zero takes both plural and singular nouns). Rmo13 01:54, 31 August 2006 (UTC)Reply

Not everything that modifies a noun is necessarily an adjective. --Ptcamn 02:26, 31 August 2006 (UTC)Reply
That nouns may be used attributively is a morphological fact of English nouns. It does not make them adjectives. It makes them attributive nouns. Theoretically any noun may be used attributively, but in practice some are much more common as such. This may warrant a usage note on the noun. --sanna 07:26, 31 August 2006 (UTC)Reply
This is a simplification. What part of speech a word belongs too is largely an artificial tradition, having been invented by grammarians analysing languages a long time ago. Morphology plays no part in the matter and deals with how individual words are built from parts and how they are inflected. If you take how a word is used to be primary and what category it belongs to as secondary, calling the use adjectival is accurate. If you take the category as primary only then does it seem to make sense to say some particular word is never an adjective. Using this kind of thinking just doesn't work at all for many languages including Chinese and Polynesian languages where individual words just don't have any meaningful part of speech category until they are within the context of a sentence. For English the case also is fuzzier than you expect. — Hippietrail 07:44, 31 August 2006 (UTC)Reply
Actually, many linguists believe that parts of speech are in fact a very accurate representation of the way the human brain processes language (Steven Pinker for one). But that is beside the point. I don't think we need to list every noun's attributive use as ‘place to anchor translations’, because this is a function not of vocabulary but of grammar, and as such does not belong here but on Wikibooks or something. E.g. if you have a compound noun in English XY, a French-speaker knows that it will generally be rendered as Y de X. These are grammatical details and not to do with translations of words. Widsith 08:18, 31 August 2006 (UTC)Reply
Except that it is very usual that an English attributive noun translated to an adjective in other languages. Dealing with clear indications of how to translate terms in all contexts is a major function of Wiktionary and it does need to be addressed. — Hippietrail 08:33, 31 August 2006 (UTC)Reply
See also the (inconclusive) "unnecessary adjective senses" thread above.
But in general I think we have this situation pretty well in hand. Since the adjectival uses of nouns are usually pretty specific and idiosyncratic (is this true?), and since we're pretty liberal in having whole separate entries for compound words and set phrases such as beer parlour and sister city, we can always list the appropriate translations and other information there. And we're also pretty good at linking from base words such as beer and sister to derived terms involving them such as beer parlour and sister city (although as usual we can never quite decide what to call the section containing those links). —scs 13:36, 31 August 2006 (UTC)Reply

All of this tends to support my view that the Category:English nouns should not be a part of the inflection template. Maybe that category shouldn't even exist at all. The part of speech is often determined by the usage; it is not inherent in its inflections. In theory any root form can be any of the major parts of speech (noun, verb, adjective, and possibly adverb) by simply varying the inflections. If a word is ordinarily viewed as being a noun, and there is no record in the corpus of its being used as a verb or adjective, I can still use it as an adjective or verb and a reader can understand it. Grammatical purists have objected to using impact as a verb, but it works; the hearer understands perfectly well when that happens. He also understands what it means when I use it adjectivally in "impact statement". As languages mature they simplify and become more syntactic. This is most evident in Chinese, and only somewhat less so in English. Chinese has been stable for a very long time.

Someone added the inflections "more Parisian" and "most Parisian" to the adjective "Parisian". This is structurally correct, but semantically questionable. I don't think I have ever seen "Parisian" used as a verb, but if I said, "The interior designer Parisianed the apartment," a certain impression would be conveyed. I can take this further and devise a sentence that gives meaning to "more Parisianed". Maybe we should look into the possibility of not using parts of speech at all! Eclecticology 07:20, 1 September 2006 (UTC)Reply

You're joking about that last bit, right? I'm one who loves all manner of verbing nouns and nounificating verbs and engaging in nounificatory behavior with adjectives, but I think those hoary old part-of-speech tags still have some benefit (if only so that we can talk about the liberties we're taking with them...) —scs
Why should I be joking about it? Hippietrail hit upon a very interesting point about linguistics, that goes beyond what is merely convenient or conventional. Our concepts of parts of speech is rooted in the thinking of renaissance grammarians who saw Greco-Latin structures as the ideal basis for all language structures, and so proceeded to impose those structures on other languages. It would be impractical to suddenly say we're going to stop using parts of speech, but we should at least be open to that eventual possibility. Eclecticology 16:53, 1 September 2006 (UTC)Reply
On this specific question, I haven't heard it in English, but trop Parisien became common in the 1990s as a description of surly, snooty service, which the French (and many others of us) associate with Paris, rather than France generally. --Enginear 10:05, 2 September 2006 (UTC)Reply
This is a very interesting question. Would it be more appropriate to use language headers in the linguistic senses rather than the ones we know in English or developed in the history of another language? While probably superior in a theoretical framework, unfortunately I don't think this approach is practical for either contributors or dictionary users as being needlessly duplicative and simply overwhelming, aside from parting traditional dictionary style which is only possible with massive momentum. However, we shouldn't be dismissive of some day incorporating this sort of classification.
Another question is how low we've set the bar. Right now there's a tremendous concentration on drawing the boundary between words that are accepted or not, rather than on senses like the use of amazing as a noun or some of the examples I've recently (re-)raised in the Tea room. Eventually it may be questions along the lines of EC's examples that are more commonly put to RfV, and I have a suspicion that many of them will pass quite easily. DAVilla 18:07, 1 September 2006 (UTC)Reply
The question is also relevant to Latin (which is suposed to be the root of all this wrangling). In theory, any Latin adjective may be used as a substantive (as a noun), though the word retains at least some of the inflection of an adjective. Likewise, Latin verbs have participle forms that are used as adjectives and gerundive forms that are used as nouns. For English, I would tend to prefer the headers of Adjective and Noun for these participle and gerund cases (with an additional Verb/participle form header), but in Latin this becomes more of a headache. In Latin, I think I would prefer to list adjectival verb forms with Participle as the part of speech, since the inflective grammar and meaning lie in the verb. I haven't decided yet how substantive senses of Adjectives in Latin ought to be listed, since calling them Nouns makes misleading assertions about the inflection, but the substantive sense may have subtle distinctions from the adjective sense that need to be explained. Arrgh! --EncycloPetey 20:22, 4 September 2006 (UTC)Reply


September

collective nouns - appendices in general.

From WT:TR

Below is the discussion from the Tea room relating to collective nouns. Certain policy considerations arise. I will not add to the collective nouns appendix articles until we have sorted this out.

  1. Do we want to link appendices to the article word á la Widsith and Andrew massyn?
  2. Do we want to create an new inflection line for articles usually reserved for appendices e.g. ====Collective nouns==== á la Hippietrail and DAVilla
There are pros and cons to both solutions. I have listed some here. The third solution is to have a combination of the two.
  1. Do we want to link appendices to the article word ?
Pro
  • It's harmless and promotes use of appendices.
  • It removes a lot of dross from the article page, while still giving information.
Anti
  • Appendices are exactly that. They should be away from the main article.
  • Appendices are not reliable in terms of usage, currency and the like.
In the collective nouns appendix, there is a disclaimer regarding usage and currency.
  • Appendices often deal with things which do not form part of the main business of a dictionary. e.g. A list of Presidents of the United States.
  1. Do we want to create new inflection lines with info underneath?
Pro:
  • It gives the necessary information on the page where it should be.
  • Appendices can be difficult to edit. see e.g. Appendix: Animals.
Anti:
  • The information is often not reliable.
  • The information is often not part of the the main business of the dictionary.

---

Discussion from tea room Do we have (and do we want) an appendix for collective nouns? I looked up passel esterday and was surprised not to see a passel of brats This got me thinking.... Andrew massyn 05:36, 2 September 2006 (UTC)Reply

Well we have Appendix:Animals for animal ones. These appendices are pretty good actually, and should probably be linked to from more pages than they currently are. Widsith 07:47, 2 September 2006 (UTC)Reply
By all means start one, Andrew, if you'd like to see one! It might be a good idea to cross-reference Appendix:Animals rather than reproducing the content on that page. — Paul G 07:51, 2 September 2006 (UTC)Reply

OK I've created one at Appendix: Collective nouns. Please add to it. Andrew massyn 09:25, 2 September 2006 (UTC)Reply

Well, it looks like its going to be a huge page, so I hope the format is allright. It has already turned up some interesting results. Did you know, for example, that you get a bike of ants and a kettle of hawks; or that we dont have a plural entry for porpoises? I certainly didn't. Andrew massyn 12:36, 2 September 2006 (UTC)Reply
Would it be better not to include the terms on the animals page? DAVilla 14:39, 2 September 2006 (UTC) Strike that. DAVilla 20:14, 3 September 2006 (UTC)Reply
Yeah. But editing the animal page is a bitch( I am talking about Appendix Animals not each animal page). What I am doing is putting the Appendix: Animals at each of the animals I get to as well as the Appendix: Collective nouns. (Also there are collective nouns for non-animals in the normal sense - What do you call a bunch of lawyers by the way}? Andrew massyn 16:01, 2 September 2006 (UTC)Reply
An expense? Or maybe a quibble? Widsith 16:21, 2 September 2006 (UTC)Reply
  • Putting a link to the collective nouns category or appendix from the word for every animal which has a collective noun is very wrong, especially under See Also. What you should do is add the collective terms under See also and put the links to the appendix or category only on the articles for the collective terms. If editing the articles properly is a bitch it's better if you don't edit them at all than editing them badly. — Hippietrail 20:14, 2 September 2006 (UTC)Reply
Yes and No. I understand your objection, but if someone is going to look for the collective noun for e.g. cats, they are going to look under cat or cats, and not under clowder. What is the point of making any appendix impossible to use? P.S. I meant the Appendix: Animals not each animal article page. Perhaps that clarifies? I agree that the collective noun should be added to the article page under "see also", and will do that in future.
Also, I was here for months before I even realised that there was an Appendix: Animals. If it is not going to be linked, it is not going to be used. Andrew massyn 20:23, 2 September 2006 (UTC)Reply
It might not be good to link to collective nouns from each animal page, but certainly you could link to the animals appendix from each, and from the animal category, and to collective nouns from that. DAVilla 05:34, 3 September 2006 (UTC)Reply
Instead of adding the whole category which doesn't really make anything particularly easier, why not add the collective term under a new heading or as part of the headword/inflection line? — Hippietrail 06:32, 3 September 2006 (UTC)Reply
O.K. I will then make an inflection line called ====collective nouns====. That should satisfy? I'll get to the changes in due course. I still think the appendices should be added to the words. Not only for collective nouns or animals, but for all appendices. I know when I browse, I get sidetracked onto various appendices. For e.g. Say I wanted to know in what year George III reigned, I might at the nonce want to find out at the same time when William and Mary reigned. The only way to do that simply is to follow a link. Andrew massyn 07:26, 3 September 2006 (UTC)Reply
Oh no, I hate that solution! There are a million related words that could be added under their own headings but why make things complicated? I don't see what the objection is to linking to our own Appendix, which is very good, and has plenty of useful information that is sensibly kept out of the main dictionary entry. Widsith 10:35, 3 September 2006 (UTC)Reply
The appendix animals is not comprehensive and is a nightmare to edit. it would be better to have the info at a more accessible place. Andrew massyn 13:10, 3 September 2006 (UTC)Reply
If an appendix for collective nouns works out well then maybe we could just hack up the animals appendix into an animal genders appendix, a young animals appendix, etc. The appendix space might prove better at handling some of the other associations that we've been wondering how best to list. DAVilla 20:14, 3 September 2006 (UTC)Reply
I've been wondering if many of the derived terms should be moved to a new heading so that related terms under them, such as these collective nouns, become more prominent. I'm thinking something along the lines of ====Compound terms==== e.g. at time since derived terms like timely get drowned out and related terms like temporal are pushed down. DAVilla 20:14, 3 September 2006 (UTC)Reply
  • A couple of thoughts:
    Using appendices in the manner suggest here just sounds like trying to reproduce categories but in a broken way
Not really. Catagories link words and appendicies have other info attached. If collective nouns were put in a category, you would get unrelated words like bunch and banana but no linking unless we put in set phrases such as a bunch of bananas, which I don't think we want to do in a category. Similarly, any appendix contains more info than you would want in a category: Presidents of the United States of would presumably have dates attached; The Animals appendix has lots of information which cant be conveniently slapped into a category. I think that both categories and appendices have merit, its just how to deal with them on the article page that is the problem. Take for e.g. the word king. It could have a category [heads of state] with words like king, president and dictator, in it and an appendix [kings of England] with the names of the kings and when they ruled. - Andrew massyn 14:04, 4 September 2006 (UTC)Reply
  • ====Collective nouns==== would not be an "inflection line" but a "heading" and we already have a heading where such semantically related but not etymologically related terms belong and that is ===See also===. The terms should be added there whether appendices are linked to or not but if ====Collective nouns==== goes ahead since the more specific heading would overrule the catchall heading. — Hippietrail 01:58, 4 September 2006 (UTC)Reply
No quibbles except for what to do with oddities. See below. Andrew massyn 14:04, 4 September 2006 (UTC)Reply
I think it's fine for them to go under =See also=, I just don't think they merit a heading of their own. Widsith 08:19, 4 September 2006 (UTC)Reply
What about the really bizarre ones like a shrewdness of apes or a superfluity of nuns? I dont think they ever had real currency, although they might have appeared in a book in the 1400s and have been on many lists of collective nouns ever since. Do they get listed, and how do we deal with them? A Usage note? Andrew massyn 10:22, 4 September 2006 (UTC)Reply
Yes, in such cases, some comment is needed in the page. Without a comment, "a superfluity of nuns" may be wrongly considered as anti-religious vandalism or (and it's not better) it might even be used in the real life, with unexpectable reactions... Lmaltier 17:05, 6 September 2006 (UTC)Reply

OK. Here's what I propose. If there are no serious objections in the next week or so, then this is how I intend to implement the discussion above.

  1. On the object of the collective noun - cats I will put under ====see also==== clowder (collective noun) but no linking to the appendix.
  2. At clowder, under ====Derived terms=== I will add A clowder of cats (collective noun) but again no linking to the appendix.
This should then broadly satisfy most people and can be adapted with minor modifications if necessary for any peculiar pages. Andrew massyn 15:57, 30 September 2006 (UTC)Reply

bot policy updated

There was good support for, and no objection to, the suggestions I had made at Wiktionary talk:Bots, so I have folded them in to Wiktionary:Bots. Please give the new policy a read and make sure you agree. (If you want to see just the changes, here's the diff.)

For the most part I have expanded the policy with additional language clarifying an aspiring bot owner's responsibilities. I have relaxed, but not completely eliminated, the former language restricting a Wiktionary bot to one narrowly-defined task.

Please list your support or your (brief) reservations here. But please divert any significant discussion (pro or con) to the talk page at Wiktionary talk:Bots. —scs 15:07, 3 September 2006 (UTC)Reply

Support:


Object / Have reservations or concerns:

  • I appologize for not submitting my rewrite (collected from last week's IRC conversations) yet.
Please post your thoughts soon! I'd rather this didn't languish for another month.
I think Each revision should have a request for comments/further revisions, before a premature vote is called. --Connel MacKenzie 15:28, 3 September 2006 (UTC)Reply
Realistically, I'm not sure how many more comments we're likely to get. After a month of inactivity at Wiktionary:Bots and three weeks at Wiktionary talk:Bots, I really didn't think this rewrite and final call for support (I didn't think of it as a vote) was "premature". —scs 14:01, 4 September 2006 (UTC)Reply

Connel MacKenzie brought up the question of etymologies being copyrightable in #wiktionary. Etymologies are often based on extensive research and sometimes creative leaps, so a certain degree of protection would naturally be expected. The question is whether the details thus uncovered— that English interdiction is from Old English enterditen (to place under a church ban)— are copyrightable (as creative concepts), as opposed to simply the presentation thereof being copyrightable (as in tables or data). The Foundation's intellectual property lawyers are offline at the moment, but I'll point them to the discussion when I see them. // Pathoschild (editor / talk) 04:50, 4 September 2006 (UTC)Reply

I'm no lawyer, but it sure seems like a stretch to me. It reminds me of how Columbus "discovered" America. What about the fact that the vast majority of words, and their etymologies, predate modern copyright laws, and that much of the etymology research mentioned above probably involved regurgitating earlier etymologies written by people who did not have modern copyright laws available to them?
In general terms, I think knowledge should be shared free-of-charge, but application of knowledge should be protected. Lawyers, does my philosophy have legal merit or am I misinformed?

P.S. I understand about putting in the time to do research on these things (take a look at the etymology for 傾國). A-cai 08:15, 4 September 2006 (UTC)Reply

Are not everything here copyrighted already (by corresponding users, who wrote the text) and then licensed under GFDL? -Yyy 06:55, 5 September 2006 (UTC)Reply
Bare facts are not copyrightable, so I consider that etymologies and dictionary definitions and translations are copyrightable only when something creative is made. The contents of very old dictionaries have no more copyright protection.--Jusjih 07:37, 7 September 2006 (UTC)Reply


Categories that contain lists

Many categories contain hand-crafted lists of words that someone wants added to that category (See Category:Telugu language as an example). Is this acceptable, or is it OK to delete the lists? It seems a shame to delete people's work. SemperBlotto 07:24, 6 September 2006 (UTC)Reply

It is a handy way of temporarily entering a list, to seed a category. --Connel MacKenzie 07:25, 6 September 2006 (UTC)Reply
Once the entries have been added, I usually remove these lists (unless there is an important sequence to the list). However, it is important to actually check each blue link to be sure that the definition applicable to the category is on the page. --EncycloPetey 18:17, 6 September 2006 (UTC)Reply


Random page

How does the Random page function work? What type of algorithm does it use?--User:Hurray MH 08:22, 7 September 2006 (UTC)Reply

In a nutshell: Every page, when it's created, gets a random number assigned to it, between 0 and 1. When you ask for a random page, the system generates a random number X, then asks the database, "give me the first article whose random number field is greater than X." (This is a fine example of the grand tradition of letting the database do all the work.)
The advantages of this scheme are that it's simple to implement and very efficient. The disadvantages are that it isn't perfectly random (though I believe the discrepancies are slight),
You'd better believe otherwise! How informative. DAVilla 06:18, 10 September 2006 (UTC)Reply
Hard to say if you're agreeing with me or not! If not, do tell. —scs 13:33, 10 September 2006 (UTC)Reply
The discrepencies would be more than slight. If the article numbers are never reassigned, the probability of hitting some words could be easily several times higher than that of others, and in all likelihood are. The distribution of probabilities depends on the differences between article numbers, which is by no means uniform. There are probably words that have never been hit by random search, for instance, an article with a random number very close to zero, or where there is another article just a tiny fraction below. DAVilla 15:21, 21 September 2006 (UTC)Reply
and that it doesn't lend itself immediately to potentially more useful features like "give me a random page that isn't a stub or redirect" or "give me a random page in English".
(Various people here on Wiktionary have experimented with alternative random-page implementations that do let you, say, restrict the choice to a certain language. I think Connel has one that basically works.)
scs 13:31, 7 September 2006 (UTC)Reply
Please don't feed trolls. --Connel MacKenzie 19:38, 7 September 2006 (UTC)Reply
And I was supposed to know it was a troll how? —scs 19:46, 7 September 2006 (UTC)Reply
The obnoxious sig. --Connel MacKenzie 19:52, 7 September 2006 (UTC)Reply
There is nothing obnoxious about "Hurray MH". And I am not going to check where every sig on every question leads before deciding whether to answer the question. (But yes, after the fact I noticed the odd link in that sig. I'm removing it now, in case it's as dangerous as it looks; interested parties will now have to check the history to see. Would merely clicking on that link have blocked you, if I were a sysop? That sounds just as bad as reading mail using Outlook!) —scs 20:02, 7 September 2006 (UTC)Reply
No, it would take a sysop to the blocking page. If they then accidentally blocked me, I of course could undo their mistake in short order. It has no effect other than being obnoxious. --Connel MacKenzie 20:30, 7 September 2006 (UTC)Reply
I do think that including the option to gain access to a randomly generated page by language would be an important feature. It would provide users with the pleasure of "browsing" through a dictionary without often coming accross pages in (example) Chinese. Thanks if this can be done. Syrius 12:42, 13 September 2006 (UTC)Reply
You are welcome to help me test http://tools.wikimedia.de/~cmackenzie/rnd-en-wikt.html (set a bookmark.) Note that it is subject to my 8-yr old linux box remaining online. The electric company has scheduled maintenance for my neighborhood for tomorrow; it may be offline for eight hours or so.  :-(   --Connel MacKenzie 15:19, 13 September 2006 (UTC)Reply

bot policy in limbo

So, being bold, I made some pretty significant changes to Wiktionary:Bots and asked for comments here. Unfortunately (perhaps due to the somewhat chilling effects of the one comment posted so far), nobody else has commented either way, at all. So, unless you believe that "silence = assent", the changes don't have any support, and should arguably be rolled back until such time as they do. Naturally, I'm reluctant to do that, but I won't complain if someone else does. —scs 13:38, 7 September 2006 (UTC)Reply

I suspect that silence means that people have (rightly or wrongly) had other priorities and have trusted the small band of bot farmers -- after all, if we trust you to run bots, we can surely trust you to edit your own policy.
However, the thought that All it takes for evil to triumph is for all good men to do nothing has prompted me to check. In case I was too subtle for some, the evil I had in mind was staying in limbo, not releasing bots. --Enginear 12:13, 9 September 2006 (UTC) I think your alterations improve the policy by indicating the sort of responsibilities a bot runner is taking on. I support them.Reply
However, they still do not answer my niggling worry that someone may overestimate his competence and, whatever responsibilities he has signed up to, perform some wrong edits which it is impracticable to undo. I am thinking, for example, of wrongful merging of two categories [in the general sense, not just things in curly brackets] without keeping a log of which items were altered. If a log is kept, then almost anything should be reversible by bot, but if not, manual inspection may be required, and if thousands of entries were affected, that would be a considerable task.
So my advice would be that each bot task should be peer-reviewed in advance by an experienced bot runner, to confirm that there is a valid back-out strategy if the changes needed to be reversed. The test run of a bot should then include a test of the backout (unless trivial). If you are prepared to add that to the policy, then I will be much less nervous about supporting bots in the future (at present I nearly always support them, but remain quietly nervous as I do so).
Obviously, there's a risk that the peer reviewer is a sock puppet of the task proposer, or that the bot is coded to do something underhand. However, if such a person failed to arouse the suspicion of other bot runners, it's unlikely any of the rest of us would notice either.
In short, community consensus is important in agreeing what tasks we want done, and may be helpful in judging whether someone is trustworthy enough to be allowed to run bots in the first place, but I believe the details of how you do it are best checked by another bot expert without interference from the rest of us.
I suspect I am speaking for many when I say thanks to the bot runners (and template writers) for your good work. Sorry we haven't joined the discussion, but we're not always sure we know enough to make useful contributions. --Enginear 19:28, 8 September 2006 (UTC)Reply

  • Sorry, the silence meant that I sent my IRC logs to Scs via e-mail. I don't think he's had time to adequately digest the points of that debate, yet. But he did acknowledge receipt.
(Indeed. I'm about halfway through them. —scs 21:03, 8 September 2006 (UTC))Reply

  • It is my belief that users not tasks should be given the bot flag. Due to GPL/GFDL incompatability, the code review requirement as stated can't work. And certainly, a provision for excepting minor tasks (say, less than a thousand edits) should require no approval at all. All 'bot edits are just edits; the only affect of the 'bot flag is to not clog Special:Recentchanges. Nothing else. Did anyone block User:Connel MacKenzieBot or User:BD2412 during the recent spate of en-noun-reg changes to en-noun? No. Did anyone discuss it? No. (Well yes, but months ago and only hypothetically, then.)
  • I agree that trusted and competent users should have permission to do all non-contentious tasks by bot without seeking further permission. I would be happy to vote all current bot runners into this category, and would trust them to seek approval before making contentious changes. --Enginear 12:13, 9 September 2006 (UTC)Reply
  • Trust is the issue. Lack of trust has an extraordinary chilling effect. Bot operation has technical throughput/volume concerns. But WP:AGF should still apply. These concerns have been shown time and again to be completely unfounded: no bot has yet brought any WMF wiki down, anywhere.
  • You want to have all bot tasks peer reviewed? Then PAY someone to peer review them. If not, then you should consider these edits to be edits just like any user editing Wiktionary. Anyone using any tool to assist their editing implicitly takes responsibility for their edits.
  • I also (more or less) trust the current bot runners to discuss with each other when they are unsure of their proposed methodology or coding. However, I work for an organisation where some activities can cause serious consequences and we therefore have a peer-review requirement. I have to admit that from time to time the peer-review of my work has picked up problems I missed and which could have had serious results. (And I am less bold than you!) I said I advised peer-reviews, particularly of back-out strategy, and that their use would make me less nervous. In view of the next paragraph, they could be limited to the rare occasions (if any) where a test run is not practicable. --Enginear 12:13, 9 September 2006 (UTC)Reply
  • All user's contributions are accessible from Special:Contributions, whether they be bot-assisted, bot-driven, AWB, Javascript, offline editor, external editor, or wiki-input form.
  • Do 'bots make mistakes? Certainly. The interwiki bot is notorious for putting interwiki links on Main Page and redirect pages. Does RobotWiktGM get blocked? I hope not!
  • The only point in someone courtiously asking for a 'bot flag is to not annoy people who patrol Special:Recentchanges. Any discussion beyond that courtesy is specious and nefarious. It is only an act of tremendous kindness for someone to request a bot flag. Having the upside-down and backwards policy based only on irrational fear has stifled the English Wiktionary project considerably.
  • It is very hard to even see from the "fear perspective" as a bot operator. Having been here a long time, I do partly understand the concerns. But the backlash from years of onerous, ridiculous 'bot policies needs to begin.
  • Fear is usually based on lack of knowledge, eg that I didn't understand about Special:Contributions. Yes the backlash should begin -- as my edit summary said: Onwards and Upwards! --Enginear 12:13, 9 September 2006 (UTC)Reply
  • Off the top of my head:
    1. Webster 1913 should be imported
    2. GMET should be imported
    3. 1st line defs from Wikipedia should be imported
    4. iSpell dictionaries should be imported as stubs
    5. translations from other language Wiktionaies shoud be imported
    6. translations from here should generate entries
    7. inflections should generate (more) entries
    8. artificial-voice pronunciations should be imported
    9. TV slang should be stubbed
    10. etymologies from W1913 should be inserted
    11. reference links should be inserted
    12. synonyms and antonyms sections should be imported from various thesauri
    13. Wikisaurus and Index namespaces should be auto-generated
    14. abbreviations, acronyms, initialisms should be imported
    15. technical jargon should be imported
    16. medical references should be imported
    17. clinical lexicons should be imported
  • These things may already have been done, if we hadn't had the idiotic policy in practice for the past couple years. Would we have over three million entries already? Probably. Would the entries we have be more consistent? Certainly. Would the collections make en.wikt: a more useful resouce? Of course. Is there any sane reason for continuing the witch-hunt mentality currently in place?
No reason I know of for a witch hunt, though I think you may be over-optimistic re "more consistent"! Of course, the irrational fear which leads to witch hunts is usually from lack of knowledge, so education may help -- it may be hard for you to appreciate how little most of us understand of what you do, and frustrating for you to explain in words of one syll, but it has certainly alleviated most of my nervousness. --Enginear 12:13, 9 September 2006 (UTC)Reply
So, Connel, do you act this truculent and hyperbolic in real life, or is this just your net persona? :-)
I'm as frustrated by foot-dragging and delays as the next guy, but I would hardly characterize any negative attitude towards bots that might exist here as a "witch hunt".
That's a very interesting list of potential auto-import tasks you've got there. But the hard questions are not "How do I write a bot to reformat and import this particular content?", nor "Can I get permission to bot-import this content?". No, the hard questions are "Is this data set high-quality enough that importing it would retain or improve Wiktionary's overall quality level?", and "Is Wiktionary well-served, and are our readers well-served, by having all this information mechanically integrated within Wiktionary?"
I'm not saying that the answers to either of those questions is necessarily "no", but they're certainly not automatic, knee-jerk yesses, either. —scs 21:03, 8 September 2006 (UTC)Reply
Prior to your ad-hominem characterization of me, I would have put you firmly in the "pro-bot policy reform" category.  :-) Seriously, there is one individual that I know of, only, who is responsible for the current "policy." And that is not you.
Smiley noted, as you noted mine, but in case anyone else misses them: it would have been ad hominem if I'd said that you were truculent or hyperbolic. But all I suggested were that you tend to act those ways, here. —scs 21:23, 8 September 2006 (UTC)Reply
Still, any negative comment towards what I've said is certain to be harped on, by the stubborn element that has fostered the current atmosphere. One needs only to look to the "English nouns" specious complaints, to see examples of that. --Connel MacKenzie 21:40, 8 September 2006 (UTC)Reply
ALL of those imports, insertions or corrections would benefit Wiktionary tremendously. A collective resource may contain many entries that on their own would not stand. But as a collective repository they do have value...even the ones "of lower quality."
Generated translation entries had the most specious of all arguments used to stifle it: that "having a stub entry is worse than having a 'proper' entry." I have seen hundreds (perhaps thousands) of times now, where the opposite is true.
Are they all knee-jerk "yeses?" Yes, beyond any doubt.
--Connel MacKenzie 21:14, 8 September 2006 (UTC)Reply
  • Others:
    1. Generate stub entries for idioms
    2. Generate stub entries for all items in each appendix
    3. import language definition sets
    4. generate disambiguations for tone characters
    5. generate stubs for romanizations
    6. import grammar and usage guides to appendixes
  • --Connel MacKenzie 21:40, 8 September 2006 (UTC)Reply

There seems to be some confusion here, namely the distinction between bots and tasks. A bot is simply an account with a flag, and it should not be a big deal to get such an account (for a trusted user). That's what the WT:BOT policy should deal with. The automated tasks done with such account are different; unless they're absolutely uncontroversial (and that's what the bot owner is trusted to distinguish) they need some sort of pre-approvement by the community. All of the above-mentioned proposals (quite obviously) need community approvement, but that should not interfere with the management of bot accounts and bot owners. The current idea is that each task needs a separate account; the proposed change says that many tasks can be run under one account (which saves time and trouble). Whether and how tasks are to be approved should not be dealt with by the bot policy, or if it is, it should be clearly mentioned as distinct from the actual bot account/flag/rubbish rules. — Vildricianus 22:05, 8 September 2006 (UTC)Reply


I'm sorry, but that sounds like it would simply be adding another hoop to jump through, on one's way to getting approval (for what we, today, call a "'bot".) I can't even imagine what bizarre "tests" would be devised to show one's trustworthiness.

I read it differently: a one-off approval after which you would have freedom to do anything you believed to be non-controversial. I haven't noticed any bizarre test in the approval of administrators (or indeed of non-automated contributors, unless "give him enough rope to hang himself" counts as cruel and unusual punishment). I would expect approval of bot runners to be similarly laid back. --Enginear 12:13, 9 September 2006 (UTC)Reply

No. Come off it. The whole setup is perversely wrong. The current concept held by the Wiktionary community (engendered by a single person's POV) needs to be swept clean. There is no reason for any of the delays.

When Primetime was uploading via a bot, did anyone demand that he get a bot flag? No, he was blocked just the same.

When the X*cnt vandal was uploading using a bot, did anyone demand that get get a bot flag? No, he (and all of AOL) was blocked just the same.

When WillyOnWheels wrote his own page-moving bot, did anyone demand that he get a bot flag? No, he was blocked just the same.

That current group of bot-regulars have proved time and again their willingness to correct any problems caused. Why then, should people who not only have never "done bad things" with the technology, but in fact have relieved tremendous amounts of tedious cleanup, be made to walk the coals for each and every petty task?

--Connel MacKenzie 06:34, 9 September 2006 (UTC)Reply

No reason, for petty tasks -- but nor, as far as I can see, has anyone here suggested it. Certainly, Vild and I have both agreed that non-contentious tasks should NOT require approval, and that we trust you with the decision of what is contentious.
For contentious tasks though, it is surely near-essential wikiquette to discuss first, rather than release a Terminator against regular human defenders! For lesser cases, it is surely easier to agree a change quietly first, and point to the consensus later if someone objects, rather than to argue after the event, when you will always be on the back foot, and all manner of objectors will come out of the woodwork. --Enginear 12:13, 9 September 2006 (UTC)Reply
I simply don't get what you mean to change then, or was your post not directed to me, with the weird ---- in between? Now I could only repeat what I said above, but I'll clarify that the confusion is on my part. Are you talking about bot flags or about automated tasks? Do you mean it should be possible to do the stuff of that list without prior discussion? Or what? — Vildricianus 07:53, 9 September 2006 (UTC)Reply

I don't know much about bots, but I am extremely doubtful about importing wordlists wholesale. The material we already have from Webster, for one, is little more than a big clean-up job. Widsith 08:09, 9 September 2006 (UTC)Reply

Good sir, the only reason no one has cleaned them up is because they aren't in the main namespace! Instead, Wiktionary languishes with frighteningly incomplete coverage of the language. The same is more exaggerated for translation. --Connel MacKenzie 08:42, 9 September 2006 (UTC)Reply
Better incomplete than incorrect like some, but that is no reason to languish when there are good quality (if differently formatted) non-copy-vio entries which can be imported. --Enginear 12:13, 9 September 2006 (UTC)Reply
I apologize for my sleep-deprived rants of yesterday. I'm not suggesting (or at least, should not be) that some anarchistic "revolt" of bot-operators begin. Yes, I do take concerns such as those raised by Widsith quite seriously. And yes, I do of course comprehend that controversy surrounds each of the import tasks I listed. (Even if similar or identical imports are accepted out-of-hand on other language Wiktionnaires.)
I do wish it were easier for people to understand just how arbitrary the existing complaints are though. The current community mindset places a certain bias against reasonable progress. Rather than continue in what is probably percieved as an argumentative tact, I'll rewrite WT:BOTS in a manner I think is appropriate, and send it to Scs via e-mail as a starting point. Enginear's warning about inaction is very relevant. --Connel MacKenzie 13:34, 9 September 2006 (UTC)Reply
One big problem, of course, is that the word "reasonable" can be rather slippery! One man's "perfectly reasonable" can be another's "dangerously scary".
Remember (as we've all said in various ways above) that a big part of the bot policy is about building trust. Those who care about the project want to make sure (among other things) that some bot-assisted revolution isn't going to drag the project off violently in some other direction. So they need to be assured that bot operators aren't going to make sweeping changes that a bot operator thinks are "perfectly reasonable" but that others in the community don't. If they're unsure about that assurance, they may insist that all bot tasks be formally approved in advance. And if they're unsure about that assurance, they may drag their feet about allowing bots at all.
Therefore, those of us who want to see the bot policy liberalized have to take this notion of trust- and consensus building seriously. We can't give the impression that, once we manage to get the bot policy liberalized and our own general-purpose bots approved, we're going to race off and make a whole bunch of big, sweeping, "perfectly reasonable" changes that, in fact, not everybody might agree with.
scs 14:36, 9 September 2006 (UTC)Reply

Classification of slang

There's been some debate over whether taboo words that are considered vulgar, but that have been around for a very long time, such as fuck, cunt or cock should automatically be described as slang. There's been some discussion of the issue at talk:cunt#Slang and User talk:Stephen G. Brown#Motivate and discuss. Personally, I don't feel that classifying any definition of a word that is considered to be somehow substandard as (slang) is informative. Especially not when the definition would be recognized by practically all speakers.

I think we need a consistent guideline as to how to comment articles. Which policy documents regulate the use of comments at the moment?

Peter Isotalo 12:15, 9 September 2006 (UTC)Reply

Offhand, I'd say that any term that cannot be used in formal writing (i.e. a speech or presentation) should be classified as slang. Perhaps it is our definition of slang that should be refined? --Connel MacKenzie 13:22, 9 September 2006 (UTC)Reply

It's probably another one of those taxonomy/orthogonality things. There's a formal/informal axis, and a bland/offensive axis, and also an established/cutting-edge axis. Certainly fuck is not at all formal, is fairly (if not extremely) offensive, but is also very well established; it's not some rad new koinage that popped up on urbandictionary yesterday. So is it slang? Connel's right; it depends on your definition of slang, and different definitions of slang address different aspects of the three axes I mentioned. My own definition of "slang" would probably be that it describes terms that are both informal and relatively new, but without necessarily coming down on either side of the bland/offensive line. So by my definition, no, fuck isn't really slang any more, but that's just me. I think real dictionaries tend to call it "vulgar slang", which is probably about right.

The Jargon File has a nice little essay on "Slang, Jargon, and Techspeak" (which brings in yet another axis: "everyday" versus "specialized"). That link isn't working at the time I write this, but a google search turns up several mirrors. —scs 14:58, 9 September 2006 (UTC)Reply

See also Wiktionary:Grease pit/2-level dictionary#Two-level too arbitrary, I think. —scs 00:31, 10 September 2006 (UTC)Reply
...where (in case I've confused people) I say I would welcome more, not fewer tags. It's just that my view (which seems to be a minority) is that, while vulgar was appropriate slang might not be. --Enginear 12:30, 10 September 2006 (UTC)Reply
In my perception of what slang is, "fuck" doesn't classify as it. Vulgar, yes, but slang? I used to think that slang is pretty obscure language, usually quite new or used by only a certain group of people. (That's probably too narrow, but then, I don't usually spend my waking nights trying to define "slang".) — Vildricianus 20:27, 10 September 2006 (UTC)Reply
  • Right: slang in a specialized sense means "unconventional."[3][4] Jargon is technically a type of slang unique to a certain group of people. If you want to take a very broad definition of it, then you can say slang therefore is informal, but the reasoning behind this approach is flawed, as one can use engineer's slang at a engineering conference without being informal. Ironically, the people who use the word slang to mean "informal" tend to be speaking in an informal, off-hand manner. In the example cited above, putting "vulgar slang" while defining slang as informal would be redundant. It would be the same as tagging a word as a "vulgar vulgarity."--Frem 22:39, 10 September 2006 (UTC)Reply

I disagree. fuck is a slang word, because it is extremely informal. That is, and has always been, the primary definition of slang. The sense of ‘jargon’ is a separate, somewhat later, sense of the word which in no way negates the original meaning. Nor is ‘vulgar slang’ redundant, since ‘vulgar’ is used to mean ‘coarse, offensive’. Widsith 05:29, 12 September 2006 (UTC)Reply

The sense of slang you are describing is historical, and I think we should keep our usage labels as precise as possible. Among people studying language, slang almost always means "unconventional." If we broaden our definitions too much, slang will assume the same meaning as vulgar. Further, vulgar can sometimes mean "popular." On the other hand, the definition you just gave for vulgar is very accurate. Vulgarisms are by modern definition informal (i.e., out of proper form). But we could add the word informal to that entry in place of slang to avoid ambiguity. In any case, the word slang in that entry is causing too much confusion.--Frem 09:20, 12 September 2006 (UTC)Reply
There is nothing whatever imprecise or historical about it. A word is slang if it is used only in very informal speech, and vulgar if furthermore it is likely to be seen as offensive or distasteful. fuck qualifies for both of these. Widsith 14:33, 12 September 2006 (UTC)Reply
Is slang then another entry where we should have two definitions, one tagged Template:italbrac and one tagged Template:italbrac [or, if we want to wind people up, Template:italbrac]? (This sort of discussion re tagging was part of Wiktionary:Grease pit/2-level dictionary#Two-level too arbitrary, I think. It is less trivial when applied to, say, medical terminology.) --Enginear 14:26, 12 September 2006 (UTC)Reply
No, That wouldn't be accurate. I agree completely with Widsith; slang meaning "not formal" is in no way archaic nor obsolete. --Connel MacKenzie 17:20, 12 September 2006 (UTC)Reply


Oldest red links?

Is there a way to generate a list of the oldest redlinks on the site (or someplace where such a list exists)? Cheers! bd2412 T 18:28, 14 September 2006 (UTC)Reply

Also (completely unrelated question) can we get a bot to add 'pedia and wikiquote links to articles with entries under the same name on those sites?

I've been thinking of doing that. It can't be fully automated, though (i.e., it needs some oversight), because there are too many reasons why the spelling can be identical but the underlying concept not. —scs 20:37, 14 September 2006 (UTC)Reply

Slightly even more unrelated question, is it appropriate to add 'pedia links to related entries, e.g. to add a link to the 'pedia's Atheism article from atheist, atheists, atheistic, atheistically? bd2412 T 20:14, 14 September 2006 (UTC)Reply

My opinion is that it's fine to have a link from our "central" entry on a term, even if Wikipedia's title is slightly different (i.e. a different form of the word). But for all our little stubby articles like "form of foo" or "one who is foo" or "state of being foo", a Wikipedia link is superfluous, I think. —scs 20:37, 14 September 2006 (UTC)Reply
I think it'd depend on each term; I wouldn't want a blanket prohibition on pedia links from stubs. --Connel MacKenzie 20:57, 14 September 2006 (UTC)Reply
Depends how you define a stub - an article on caterpillars should ideally never say more than that it is the plural of caterpillar, but I think a 'pedia link to caterpillar is just as useful in either article. bd2412 T 00:32, 17 September 2006 (UTC)Reply
I very strongly disagree with your conclusion about that example. The entry caterpillars ultimately should have translations, pronunciation(s), citations, example sentences, synonyms, an image, a video of the sign language for the plural and perhaps a gloss indicating what the singlular form refers to (to save our readers a 10-15 second page load, chasing a link.) --Connel MacKenzie 18:36, 17 September 2006 (UTC)Reply
Pronunciation, absolutely. Synonyms, maybe as a Wikisaurus link. Examples and citations... would that be of any inflected form or of the plural only? An image, fine... so long as that picture actually has more than one caterpillar in it. Sign language... would that be of the American variety? British? The international? Maybe even whatever is used in Singapore or India? DAVilla 15:10, 21 September 2006 (UTC)Reply
I do not know of any tool that lists redlinks by age. While that would be a very nice thing to have, I can't think of a decent way to generate such a list. --Connel MacKenzie 20:57, 14 September 2006 (UTC)Reply
It'd be easy enough to start building one going forward: just fetch the redlink list periodically (once a day or once a week) and tag each new word you find on it with the date you first found it. But doing so retroactively, no, I can't think of a way to do that, either. (If you had a lot of historical database snapshots lying around and a lot of CPU and your own time to spare, maybe...) —scs 21:24, 14 September 2006 (UTC)Reply
Well, just one enwiktionary-latest-pages-meta-history.xml.7z has all the needed history, but that is a lot of CPU crunching. One problem is redlinks that are no longer there (e.g. vandalism) have to be filtered out from such a list. That does become rather tricky. The other problem is that most of them will be translations for the oldest entries dictionary, free, dog etc. So, a third pass would be needed to give preference to English red links, somehow. (Note: it is pretty hard to guess what language a redlink is "for.") (Note: simply decompressing the 8.72GB file is a bit of a challenge itself.) --Connel MacKenzie 14:25, 18 September 2006 (UTC)Reply
Oh! Right. I forgot about pages-meta-history. That'd do it, wouldn't it?
And filtering out the ones that are "no longer there" isn't that tricky, is it? Just use the current redlink list as a basis. (Though there's a mildly interesting epistemelogical question here: I'm wondering what it means for a redlink to be "no longer there", since a redlink is something that wasn't there in the first place. So a missing redlink is something that's no longer no longer there, I guess.) —scs 01:43, 20 September 2006 (UTC)Reply

appendix idea: signs and commands

I had an idea for another appendix: signs and other traditional commands, so that we can start collecting their canonical translations in all languages. Examples:

  • No Smoking
  • Keep Off The Grass
  • Curb Your Dog
  • No Entry
  • Authorized Personnel Only
  • etc., etc.

A reasonable idea, or too goofy for words? —scs 20:32, 14 September 2006 (UTC)Reply

If you want to start a list of directives, wouldn't a category be a better place to build it from the ground, up? --Connel MacKenzie 20:59, 14 September 2006 (UTC)Reply
You mean, have all the sign texts be idiomatic main namespace entries, i.e. No Smoking, Keep Off The Grass, etc.? Would those meet WT:CFI? Would people freak? I wasn't ready to take that plunge yet; that's why I was thinking appendixly. —scs 21:28, 14 September 2006 (UTC)Reply
No, I don't think they'd meet CFI in its present state. Yes, I think they have translation value. Witness, for instance, signs like some in the collection here that could clearly have benefited from such an index. Standard menu commands and dialog box contents to aid the localization of software would likewise be well worth translating, but not meet CFI.
At the risk of fanning some flames, I'd suggest that WiktionaryZ is (or will be) the better location for such matter, since entries should eventually be able to be identified as lexemes as such versus other phrases with translation value and sorted accordingly. Dvortygirl 00:49, 15 September 2006 (UTC)Reply

I like the idea of an appendix, although I agree with Dvortygirl that WiktionaryZ may ultimately be the better place for such a list. However, that said, I think such an appendix here would be a nice to have, particularly with translations. The argument against having it it that it is not part of the business of a dictionary, but rather has the elements of a phrase book. Never-the-less, my view is that; as we are not a paper dictionary and since space is not at a premium; the argument is not valid. I think go for it and see if it works. If it doesn't, it can be abandoned without too much fuss. If it does work, then it will be a useful appendix (or is that an oxymoron?). Andrew massyn 18:38, 15 September 2006 (UTC)Reply

If you mean is it a contradiction in terms (not the same as an oxymoron, although the term is very commonly misused in that way), then the answer is no. I wouldn't call it a tautology either, as appendices can just as easily be useless. — Paul G 16:18, 21 September 2006 (UTC)Reply

template capitalization

Does anyone have strong preferences as to whether template names should be initial-caps or not? I'm about to create a couple new ones (for help in maintaining the above-mentioned appendix for signs I'm about to create), and I can't decide whether to name them "Template:sign..." or "Template:Sign...". —scs 12:53, 15 September 2006 (UTC)Reply

I personally prefer upper cases. Since upper and lower cases are no longer interchangeable here, we may need redirects.--Jusjih 14:11, 15 September 2006 (UTC)Reply
Please do not use upper-case first character template names on en.wiktionary. --Connel MacKenzie 18:08, 15 September 2006 (UTC)Reply
Hmm. Looks like I get to flip a coin, or go with my own personal preference, since no one has come up with any hard reasons or actual arguments one way or the other. Ah, well. (No biggie.) —scs 02:01, 20 September 2006 (UTC)Reply
If the purpose of the template is merely to produce the text "signs" or "Signs" within the page, as a label template for instance, then upper-case is incorrect. Otherwise I would say that the lower-case is only/at least/probably/not unlikely preferred. DAVilla 14:58, 21 September 2006 (UTC)Reply

Phrase List

Dear Wiktionary Beer Paulour,

I have developed a free, no advertisement, noncommercial web site, www.fraze.info, which is intended to be a major language resource with many uses:from academic, personal enjoyment and the workplace. To do this requires participation by a wide variety of those who speak American-English At the momen, I have provided almost all the input of 109,000+ phrases, persons, places and things. So far, the site is not easily accessed via search engines (perhaps I have learned how to avoid their spiders?). My objectives have many similarities with Wikipedias (interactive and open to everyone without personal identifying information, no fee) and some differences (quarantine for suitability of entries and the ability to classify the registered user without their having to give personal identification: no addresses, phone numbers, ID numbers, birth dates. The value of the sight is dependent on having a wide variety of active users. I am interested in knowing if and how our sites can be linked to meet our goals? Check out www.fraze.info and comment. The site has a detailed Q & A section that should answer most of your questions.

Sincerely,

Martin MacIntyre

Perhaps your question would be better directed to the Wikimedia Foundation, somewhere on http://meta.wikimedia.org/. Since your project is copyright oriented (not copyleft) I don't think it will be a very good match. The GFDL is the basis for this site existing. --Connel MacKenzie 18:05, 15 September 2006 (UTC)Reply


Part of speech headings

Discussion copied to Wiktionary_talk:Entry layout explained/POS Headers. Continue discussion there.

In the entry layout explained (from Community portal > Entry layout) you can read:

====The part of speech or other descriptor====
This is basically a level 3 header but may be a level 4 or higher when multiple etymologies or pronunciations are a factor. This header most often shows the part of speech, but is not restricted to "parts of speech" in the traditional sense. Many other descriptors like "Proper noun", "Idiom", "Abbreviation", "Phrasal noun", "Prefix", etc.

I couldn't find a link to further details on POS headings and more specifically to questions as: which POS headings are accepted, and what do they mean? Connel MacKenzie told me that POS headings were discussed last year and again this year, and that an agreement was reached over these headings. Unfortunately I couldn't find the outcome of these discussions, though I looked for them in the Grease Pit, as Connel suggested.

A few examples:

  1. A traffic light was a Noun, till someone called it a Noun phrase. Then Rodasmith changed it back to Noun, but without explaining why or without referring to any guideline.
  2. Many verb or noun forms seem to have a Verb form or Noun form heading, but apparently these headings are deprecated. Probably only Verb or Noun can be used in these cases, though - as several people have written - the inflection templates are inappropriate for non-lemmata.
  3. What about the heading Plural noun? See http://en.wiktionary.org/wiki/Talk:stadia.
  4. Why is Romanian a Proper noun and Russian a Noun?

There seems need for an accepted and easy-to-find guideline on these POS headings. That could surely avoid discussions like the one on http://en.wiktionary.org/wiki/Talk:traffic_light.

—Jan, 16 September 2006

Yes, you are quite right - this is something we do not yet have written policy on. "Russian" should be listed as a proper noun, but presumably whoever created the entry just wrote "Noun" and that has remained.
It can also be argued that the header "Adjective" for "Russian" should be "Proper adjective". We had a discussion some time back about restricting POS headers as they seemed to be proliferating unnecessarily.
Perhaps we should discuss and agree on a fixed set of POS headers. Some points to consider:
  • Ancient or modern? Traditional POS's are noun, adjective, verb, adverb, preposition, pronoun, interjection and article. Some modern dictionaries use terms such as "determiner", and "modifier". For example, "my" is traditionally a pronoun, but some dictionaries describe it as a possessive adjective. Some words do not fit conveniently into the boxes used by traditional grammarians: numerals are an example. "Two" (as in "two people") can be variously described as a numeral, a number, a cardinal number or an adjective, the last of these being the traditional POS. We generally use one of the other terms here, but it is debatable whether these headings are actually parts of speech.
  • Simple or precise? "Running" is an adjective (as in "running water" and "a running sore") but a more descriptive POS is "participial adjective", as "running" is also the present participle of "to run". In the verbal sense, "running" can be described as a "verb", "verb form", "verbal noun", "gerund" or "present participle". Similarly, nouns can be proper, common, abstract or collective, although dictionaries that make any distinction at all do so for the first of these only and label the other kinds as just "noun". "The" is an article, but is also the definite article.
Paul G 15:46, 16 September 2006 (UTC)Reply
Perhaps not a huge issue, but I cannot find any justification for declaring demonyms (e.g. "Russian") to be proper nouns. They do not identify any specific individual, but rather one of a class of individuals, i.e. they are common nouns. English just seems to have been courteous enough to extend them capitalization from their proper noun roots. Rod (A. Smith) 23:55, 16 September 2006 (UTC)Reply

Yes, we need a policy. This should probably go in the policy discussion itself, but after calling things "noun phrase" and so on for awhile, I stopped including the word "phrase". I think most people can easily see that it's a phrase, and it just clutters up the list. In general, I think transitive/intransitive should use the templates {{transitive}}, {{intransitive}} in the definition line, so there should be no need for a "Transitive verb" heading. Countable and uncountable fits neatly these days into the inflection templates or the definition line and likewise shouldn't take up heading space. Likewise, I think there should not be a heading "verb form", just "verb", for simplicity and consistency.

I can offer a couple of arguments for having such a standardized list of headers. First, if you hover the mouse cursor over a header like "Noun" you'll get a tooltip that says "part of speech" or some such. If you hover it over a heading that's not on the list, you'll get some notice about "not a standard header". The list of headers for which tooltips exist might provide an excellent starting point for this discussion, and the list should be updated when our policy is agreed upon. Secondly, a standard list of headers will allow better automated access to the data, both for bot cleanup efforts and for exporting. It'll make things look cleaner and more consistent, besides. Also, consistent formatting should help propagate consistent formatting, since anybody copying from another article will be copying the correct thing.

Incidentally, when we do standardize on preferred headings, it will be the perfect task for our team of bot-runners to go help tidy up the inconsistencies. If we settle on "Alternative forms", say, rather than "Alternative form" or "Alternative spellings", the extra variations will be quick and easy to consolidate using bots. Perhaps we could try before then to establish the bot guidelines we want. —Dvortygirl 16:10, 16 September 2006 (UTC)Reply

And in fact this task is precisely what User:ScsHdrRewrBot is intended to do, and -- lookie that! -- it's already been approved. I haven't run it much yet, but you can look at its contributions to see examples of the relatively few header cleanups it's done so far. —scs 12:52, 17 September 2006 (UTC)Reply
Here are the parts of speech in the tooltips list that Dvortygirl is referring to:
Adjective, Adverb, Conjunction, Interjection, Noun, Prefix, Preposition, Pronoun, Proper noun, Suffix, Verb, Verb form.
—Jan, 18 September 2006

The renewed conversations from this year can be found at WT:GP#Normalization of articles / User talk:Connel MacKenzie/Normalization of articles. Comments are still welcome. --Connel MacKenzie 22:09, 16 September 2006 (UTC)Reply

Other classics, for those who like to read a lot: (Yes, I had trouble finding it because it was moved five times): Wiktionary talk:Entry layout explained/archive 2005BP#Uniform headings, and about eight other (more?) relevant sections of that same page. --Connel MacKenzie 22:21, 16 September 2006 (UTC)Reply
On a side note, I think we need a policy regarding archiving. Most of the 2005 archive of talk:ELE is relevant, as is the 2004 archive. Ironically, the majority of those conversations were from this Beer Parlour, but were vandalously moved without leaving links behind. I don't think any of the conversations that were removed from WT:BP were completely finished with (as is evident by the same questions resurfacing one or two years later.) --Connel MacKenzie 22:25, 16 September 2006 (UTC)Reply
On a side note, one thing that would help to (a) make these discussions easier to find and (b) not keep having them over and over again would be if we could all try to (c) centralize them on the talk pages for the relevant policies in the first place and (d) actually update the policy pages once we reach an actual consensus! —scs 13:48, 17 September 2006 (UTC) [Memo to self: wander on by to WT:ELE sometime soon and be bold in altering it to fit reality.]Reply

Miscellaneous notes and opinions:

  • It would be useful to keep in mind why we're tagging words with their part of speech at all. Is it
    1. For the benefit of readers who are learning English or grammar
    2. To separate the definitions for entries that have senses in multiple parts of speech, and/or
    3. To satisfy our deep inner craving to rigorously categorize things?
For my own part, I'd like to focus on 1 and 2 (although I'm the first to admit that I've got the categorization bug, too; it's just one I try to keep it in remission). There shouldn't be any shame in saying, for the really weird and hard-to-categorize words, that their part of speech is "other". (Of course, there's a significant logistical difficulty here in that our entries don't say "Part of speech: _____". A hypothetical "Other" as a part-of-speech heading under our current scheme would be confusing and wouldn't really work at all.)
  • I'm probably starting to sound like a broken record on orthogonality, but part of speech is really orthogonal to qualifications like "phrase" and "abbreviation". That is, many phrases, abbreviations, and contractions have meaningful parts of speech (though of course many do not). An interesting example I came across recently is HEPA, which you see on more and more vacuum cleaners and air filters, which stands for High Efficiency Particulate Air, and which is therefore pretty much an adjective. My point here is that, strictly speaking, things like "Phrase", "Initialism", "Abbreviation", "Contraction", and "Idiom" are not parts of speech at all, and a mechanism which specifies or categorizes parts of speech should arguably not be overloaded with trying to capture these distinctions, too.
  • If the distinguishing quality of "Noun phrase" versus "Noun" is "has a space in it", that's a pretty useless distinction, because any reader can see this for themselves. If we maintain a distinction for "noun phrase", it should be for longer, true phrases, like "the weather in London". Things like "lawn mower" are, I believe, pure and simply nouns. (In this case it's easy to prove, given that the spelling "lawnmower" also exists.)
  • Personally, I agree with Dvorty and others that the transitive/intransitive distinction is of secondary interest and should appear (if it appears at all) in tags on the definition lines for individual senses, not prominently in the Verb header. Similarly for countable/uncountable (which we do tend to do that way), and for concrete/abstract nouns (which we don't tend to try to capture, which is probably a good thing, 'cos it ends up being not such a clear-cut distinction after all).
  • Yet another distinction is for proper nouns. Those I don't mind being called out in the p-o-s heading, though I could go either way.
  • A somewhat trickier case is for the several words we've currently got listed using variations on "Adjective and adverb", such as quite. I'm not sure what the best way to handle those is.
  • As came up in the "nouns used as adjectives" thread, it can be argued that parts of speech in English are not nearly as rigid as we think they are, such that their use in a dictionary like ours could profitably be abandoned or drastically reworked, although that's probably too radical a proposal for today. (But the idea, I think, would be that instead of saying "moo: noun: 1. the sound made by a cow. verb: 1. to make a mooing sound", we could instead say "moo: 1. The sound made by a cow. 1a. (noun) an instance of this sound. 1b. (verb) to make this sound.")
    Nice attempt, and could work for the nouns that are also verbs, but makes other types of semantic relations more complicated to state. Whatever we change (if anything at all), it should remain both workable for all cases (problematic), and simple as feck (w:KISS principle). — Vildricianus 08:02, 18 September 2006 (UTC)Reply
  • Yes, I have just completely ignored the suggestion I myself just made to keep long screeds like this one centralized on the relevant policy page's talk page...

scs 14:48, 17 September 2006 (UTC)Reply

Discussion copied to Wiktionary_talk:Entry layout explained/POS Headers. Continue discussion there.

— This comment was unsigned.

The main problem with moving conversations around is that the instant they are no longer available on WT:BP, no one notices that they exist anymore. This has now become a rather critical policy-ish issue. Snippets from the previous conversations should probably be sprinkled in here. Annihilating this page is not the correct answer. --Connel MacKenzie 23:43, 17 September 2006 (UTC)Reply
No one is proposing that we annihilate this page. The proposal is to organize this now-critical issue in order to make progress on it. Note that the discussion here is drifting to points of order and discussions over where the discussion should happen. All that is just bureaucratic quagmire. Let's just have the discussion so we can get a working decision to begin from. --EncycloPetey 22:39, 18 September 2006 (UTC)Reply

Like Connel once suggested somewhere (or was it someone else?), the logical thing to do is abolish the POS headers and make a ===Definitions=== heading instead, moving POS mention elsewhere (God knows where). That's from a structural viewpoint; we have standardized headings that mention which information follows in that section (Etymology, Pronunciation, Synonyms, etc.). The broken chain in there is that in each entry, one or more headings bring the information themselves and don't mention what follows (readers are supposed to know that definitions follow in the section marked by the POS header). That's unlogical and a basic structural flaw in our entry layout.

Interestingly, the reason why we would mention POS at all is different for everyone. I see it in the first place as recommended for someone to fully comprehend the meaning of an English word to know which POS it has. That's because English words don't morphologically distinguish per POS (languages like Latin or Russian are the opposite), which makes that speakers or learners (both native and non-native) need to 'feel' which POS a word has in order to know its meaning. I myself had trouble long ago with "cunning" (why is it a noun? and then, why is it also an adjective??), but natives, too, have this kind of problems. — Vildricianus 07:58, 18 September 2006 (UTC)Reply

By this logic, we should abolish the language headers as well, since they communicate the information rather than telling what information follows. But I don't see this as a viable alternative. There are two very good reasons to have information contained in these particular headers themselves. First, it allows for ease of cross-linking. If I want to link clam as a Latin adverb, then I can use clam#Latin|clam to do that, because Latin is a header. Likewise, if we want to link to a particular part of speech sense bewteen pages, we can do so because it's built in as a header. Second, the Language and POS headers allow long pages to be scanned in the Contents for a desired use. If the headers only said "Language" and "Definitions", I wouldn't find it nearly as easy to navigate some of the longer pages. I don't just need to know that definitions exist on a page, because I take it for granted that a dictionary will have those. What I need is to know where the particular definitions I'm looking for are placed. This also has a secondary benefit of allowing a quick scan of the contents to see a list of all the parts of speech that a word is used for. This information is terribly useful when learning a language. And for languages other than English, the POS is just as needed, since anyone not intimately familiar with Latin may not recognize an inflected form, and all the information relevant to the use is tied up in what POS the word is. The inflection, definitions, translations, and so forth all hinge on which part of speech is intended. --EncycloPetey 22:39, 18 September 2006 (UTC)Reply
  • EncycloPetey, I don't understand where you're going with this. First, you attack an active conversation (long overdue, at that) by moving parts of it out while it is finally being discussed. (I'd liken that to an act of war.) Then you make convoluted arguments about abolishing language headings, which, I'm pretty certain no one has suggested. Then you attack the idea of abolishing language headings by presenting examples of why it is a bad idea? (Since you bring it up though, I think Wikipedia-style disambiguation would be more efficient for spellings shared between languages. e.g. clam (en), clam (la) or even clam (English), clam (Latin). But, even I can appreciate that a task that huge would probably not be as benefical as it might seem at first. And no, I do not suggest we try this.)
  • First, I did not move any part of anything out of anywhere. If you think otherwise, you can check the edit history and see that I deleted nothing at all. I copied out all the relevant comments I could find to begin a discussion on an issue that needs focus and cohesion. I then inserted pointers to the new location. The result is that we now have a draft describing known existing practices, which we may now use as a point of reference for discussion. Frankly, many of the headers people have identified I didn't know existed because they're being used in languages whose pages I never investigate. Such progress has not happened in any discussion previously ocurring on this topic in the BP that I have seen.
  • The language heading was an analogy. Vildricianus noted that the logical option is to replace POS headers with "Definitions" for structural reasons (whether or not he supports that view, which isn't altogether clear). I pointed out that the natural extension of that reasoning leads to infeasible and undesirable ends. The point being that logical consistency is heading structure is not the only consideration, because any considered change has to be weighed against loss of utility. Does that help to clarify?
  • The thing that Vildricianus paraphrased above was something I saw an early contributor here do. They had experimented using a ===Definitions=== header to consolidate the different parts of speech so that they would appear grouped together (like you'd see in a normal dictionary) with things like {{pos_n}} or {{pos_vti}} at the start of each line. As I recall, Eclecticology deleted the entry on sight, because of the experimental formatting. I do not know how feasible a wholesale change like that is, at this point. Certainly, the different automation technologies available have been getting more attention (from people other than me!) lately. Obviously, such a dramatic change is possible. But not without a tremendous amount of discussion, and a very clear majority of contributors understanding it, and desiring it, first.
  • I have toyed with ideas pertaining to the introduction of "Definitions" as a header, but haven't found any options that work beyond lemma forms of English entries. After all, we don't put definitions on non-English pages in most cases, and we don't put definitions on non-lemma forms of English words either, but point to the main entry. That said, I could almost see introducing the "Definitions" as a level-4 subheader between the inflection line and the start of the definitions, but only for lemma forms of English words as I said. --EncycloPetey 21:52, 19 September 2006 (UTC)Reply
    • The secondary effect of you removing the relevant conversation here, is that it encourages people to become vigilantes, who are now making changes to WT:ELE without first gaining consensus here, to reflect their own POV. --Connel MacKenzie 14:06, 19 September 2006 (UTC)Reply
    • Examples?? Who exactly are these vigilantes who have altered the ELE in the last 72 hours as a direct result of this discussion ocurring on a draft POS header page? The whole point of having the separate POS header page and associated talk page was to discourage alterations to the ELE until the issue was hammered out in discussion. The separate page keeps any discussion and changes segregated to a new page that is not yet linked from the ELE, and so will not be interpreted as part of it at this time. --EncycloPetey 21:52, 19 September 2006 (UTC)Reply

User:TheDaveBot - Spanish verb Conjugations

Archived at User_talk:TheDaveBot.

Scots Verb present participles and AWB

Conversation of general interest moved from User talk:BD2412. bd2412 T 18:46, 17 September 2006 (UTC)Reply

Hello there I can see now from the conversation above this one that you've had some problems with the Verb form heading stuff. The articles below are Scots words but I think you inadvertantly fixed them with AWB to use the {{present participle of|}} template which would automatically categorise to category:English verb present participles and of course this isn't correct fo a Scots verb form. I've fixed the articles in question and I'll check my other Verb form articles and put the to the Verb heading. Just a note to let you know. Regards --Williamsayers79 10:49, 16 September 2006 (UTC)Reply

airtin
gaun
greetin
walin
sleekin
tynin
mindin
Perhaps we should have a "Template:sc.present participle of"? bd2412 T 16:32, 16 September 2006 (UTC)Reply
why oh why is the template assuming that the language is English? (I think I understand one of Connel's comments now.) If it is really going to do this it needs a lot more magic. (lang= and several conditionals) This is a big issue; we've never sorted out what the heading, inflection line, and defintion lines should be for verb forms. (also noun forms, adjective forms ...) Robert Ullmann 19:31, 16 September 2006 (UTC)Reply
Rather than the template assuming the language is English, I'd like to take the English category out of the template and have it separately occur in the articles. bd2412 T 19:42, 16 September 2006 (UTC)Reply
If you look at Template talk:infl I think that Connel was suggesting that the form-of templates take a language parameter. It could then (if the parameter was present) categorise in (lang) (specific form), or in (lang) (POS) form (e.g. English verb forms) or not at all, depending on the existance of the categories for the language. This would make the form-of templates language independent. Robert Ullmann 20:07, 16 September 2006 (UTC)Reply
To be clear, the template would default to English, but inserting |sc| would, for example, make the word categorize as Scottish? That would be quite brilliant! bd2412 T 20:10, 16 September 2006 (UTC)Reply
Done. Robert Ullmann 20:54, 16 September 2006 (UTC)Reply
Exactly. With the lang= parameter in those templates, the categorization becomes cleaner also, as the correct Category:fr:present participle of would be in the template (and would only need to be corrected there if the category layout changes.) --Connel MacKenzie 18:29, 17 September 2006 (UTC)Reply
Please, can this be moved to somewhere more relevant, like the beer parlour? --Connel MacKenzie 18:29, 17 September 2006 (UTC)Reply

Okay: now look at airtin. Both the inflection line and the definition line categorize the entry the same way. Which should we prefer? I'm inclined to think that the form-of templates shouldn't be categorizing at all. Or should they?

  • where the [expletive deleted] did "English verb present participles" come from? Not "English present participles"?
  • I like the "Verb form" POS heading, and in (e.g.) French is is used a lot! Do we really want to do away with it?
  • I just wasted 20 minutes because some ninny wikilinked "Scots" in the sco template ... (sc is Sicilian)

What do you think? Robert Ullmann 20:54, 16 September 2006 (UTC)Reply

    • Even though I just knocked out severl hundred "Verb form" headings, yes I think that should be the standard for verb forms. I can easily change the "English verb present participles" to "English present participles" as well. sco is Scottish? Will keep that in mind! bd2412 T 21:07, 16 September 2006 (UTC)Reply

Hi, I don't know if you noticed, but I left the template and doc and the airtin example out of sync last night. I had edited one, and the network connection went away. And at 1 AM Sunday morning in Nairobi, there isn't much you can do about it ... ;-) I put the language parameter in more like you suggested. See airtin as I mentioned. I think having the cat in the template is good, as long as it has the conditionals. (lots of redlinked cats for minor languages would be no good) Robert Ullmann 11:53, 17 September 2006 (UTC)Reply

I took the language parameter out for now, and manually inserted the category into all the actual English articles. The template is in Template:new en verb pres part, which is probably the best solution for the moment. bd2412 T 17:42, 17 September 2006 (UTC)Reply

If I understand this correctly I should be using the heading ===Verb form=== and we have to add the category into the verb form article seperately or use the infl template.--Williamsayers79 17:01, 17 September 2006 (UTC)Reply

I don't think so - I believe that ===Verb=== is the proper header, and there is no consensus (as yet) to make ===Verb form=== a header... I've undone most of the ones I added (still hunting a few here and there). bd2412 T 17:42, 17 September 2006 (UTC)Reply

My commentary to all of the above: as noted I have removed the category from Template:present participle of and put it in the cheat template to create new English present participle entries. I've also created Category:Scots present participles and Category:French present participles (it was my understanding from an earlier discussion that categories for parts of speech would use the full name of the language instead of the abbreviated form). Each category is in Category:Present participles and in the Category:Foo verb forms for the appropriate language. I think this is the most logical organization, but am open to any ideas. I have a question also about the entries themselves - should an entry for, e.g., lâchant say "present participle of lâcher" (which it is), or simply indicate the English equivalent, which is releasing - or should it have both? bd2412 T 18:46, 17 September 2006 (UTC)Reply

  • I am of the opinion that the following are the only things that should be used as part-of-speech headings for English entries: Symbol, Noun, Verb, Adverb, Adjective, Pronoun, Interjection, Article, Conjunction, Abbreviation, Initialism, Acronym and the x phrase derivations (noun phrase, verb phrase etc). This excludes "Verb form" because I think that everything which is labelled as "Verb" is a "verb form" whether it is the infinitive or the second-person plural of the past participle. - TheDaveRoss 18:59, 17 September 2006 (UTC)Reply

English POS

Continued from comment above by TheDaveRoss, repeated below

  • I am of the opinion that the following are the only things that should be used as part-of-speech headings for English entries: Symbol, Noun, Verb, Adverb, Adjective, Pronoun, Interjection, Article, Conjunction, Abbreviation, Initialism, Acronym and the x phrase derivations (noun phrase, verb phrase etc). This excludes "Verb form" because I think that everything which is labelled as "Verb" is a "verb form" whether it is the infinitive or the second-person plural of the past participle. - TheDaveRoss 18:59, 17 September 2006 (UTC)Reply


Um, For English I would add Preposition, as well as Cardinal number, Ordinal number, Idiom and possibly Phrase (though I haven't seen a clear example of the latter yet that couldn't be classified as something else). While it is true that headings like Noun form and Verb form have little utility in English, they have tremendous utility in highly inflected languages. I use them in Latin and Spanish when I am writing an entry for a non-lemma entry so that other editors will have a cue that the information about the word is not on that entry page and should not be added there.
Hello, don't forget Prefix and Suffix! bd2412 T 22:43, 20 September 2006 (UTC)Reply
People keep saying that "there has been discussion" but all the links that I can find to such discussion seems not to thve reached conclusion with even a partial list of acceptable POS headers. Could we create an entry layout page (and corresponding talk page) where a list of accepted, debated, and rejected options could accrue? --EncycloPetey 22:06, 17 September 2006 (UTC)Reply
Discussion copied to Wiktionary_talk:Entry layout explained/POS Headers. Continue discussion there.
I think removing the conversation from the beer parlour would be very detrimental. Moving conversations around is exactly the approach many of your predecessors have taken, which is exactly why you cannot find the previous discussions now. --Connel MacKenzie 15:09, 18 September 2006 (UTC)Reply
As I general principle for discussion, I agree with you fully. But for this particular issue, I think we need a page and corresponding discussion. The topic comes up intermittently, with no apparent resolution each time it is discussed. I am trying to copy all the relevant discussion to the single location, which will archive it separately from all the other discussions that have happened here. It should be very easy to find in future, since it has a shortcut of WT:POS which is quite intuitive. Once some points have been fleshed out and agreed upon, the corresponding page would be summarized on the WT:ELE, with the full page linked from there directly. All this should make the past discussion much easier to find, rather than the reverse. --EncycloPetey 21:41, 18 September 2006 (UTC)Reply
With the only exception being the archives, I'd say that every conversation that has ever been moved out of the beer parlour was moved "to make it easier to find." I maintain, that none of those are "easier to find" as a result. Ironically, the BP archives have the only usable, searchable index of topics. (The irony is that the archives exist only in an effort to reduce the page size.) --Connel MacKenzie 02:27, 19 September 2006 (UTC)Reply
This issue is distinct and important enough that I'm going to create a new section for it, below. —scs 00:40, 20 September 2006 (UTC)Reply
We clearly need the page, and that has a talk page which should be used. Connel, I think the operative word here is copy. Discussion is in order in either place. EncycloPetey may have just stated that a bit imperatively ... Robert Ullmann 15:48, 19 September 2006 (UTC)Reply
What is the benefit of fragmenting a conversation in progress? To ilicitly sneak in changes elsewhere because they are out of the spotlight? So you can sneak in "form" because you don't wish to find the earlier conversations that compellingly argued against your POV? --Connel MacKenzie 06:41, 20 September 2006 (UTC)Reply
If you can find these compellingly arguments, please link them or copy them into the WT:POS discussion page/archives. I have never seen any such arguments, but would sincerely like to. --EncycloPetey 17:44, 21 September 2006 (UTC)Reply
Excuse me? Please refer to the second paragraph of the Beer parlour. What the hell, I'll copy it here:
Sometimes discussion identifies an issue as an idea for policy development or rewriting. Such discussions may be taken out of the Beer parlour to the relevant policy page, or a brand new one may be created. See Category:Policies - Wiktionary Top Level for identified policy pages. Some of these may be inactive. Usually, the active policy pages will be listed in one of the sections below. See also the policy development page.
That is, I believe, Beer Parlour policy? And the new page is properly identified as a policy think-tank draft; following Wiktionary:Policies and guidelines. There is nothing "sneaky" going on. And certainly no "sneak[ing] in changes", WT:POS is a very early draft, just as en.wikt policy requires. Connel, what is upsetting you so much that you are conducting a personal attack? We both know that isn't you!
I have read all the previous discussion I can find. I find very little of it "compelling". Whatever was "resolved" was apparently so un-important that no-one bothered to actually write a policy document (see scs comments below). I'm fairly agnostic on "form" in the header; although I do think it is good in the categories. The WT:POS draft is reflecting what is being used, what is not controversial, and some intermediate "take" on what is being discussed. Robert Ullmann 07:07, 20 September 2006 (UTC)Reply
It seems I have completely misunderstood the purpose of that sub-page. I thought that it was part of WT:ELE already. As for the second paragraph of this page, I do not know when that was reworded that way, nor precisely why. It does seem reasonable enough. --Connel MacKenzie 19:13, 20 September 2006 (UTC)Reply
I reworded that a while ago. — Vildricianus 08:22, 28 September 2006 (UTC)Reply
I would like to clarify my initial statement (which I didn't realize was going to spark a new discussion) I think there should be a finite list of things which we all use as "part of speech" headers, and that that list should be clearly spelled out somewhere. That list I wrote is clearly lacking, but we should have one common list that we all rely on. - TheDaveRoss 17:25, 21 September 2006 (UTC)Reply
A draft list is developing at WT:POS, inspired by your statement. There have already been found a number of headers in use that most of us probably didn't know existed. Once we have the list, we can use it as a refernce point for discussions, proposals, and the setting of a "standard" which can then be incorporated into the WT:ELE. Thanks for sparking the discussion! --EncycloPetey 18:09, 21 September 2006 (UTC)Reply

on consensus, and policy, and the beer parlour

There's been frustration evidenced in several of the threads above about the proper form and venue for these policy debates. They can be overwhelming if carried out in full here in the Beer Parlour, but they can get lost, or languish unresolved, if relegated anywhere else.

To some extent these observations are symptoms of what may be a larger problem. We don't currently seem to have a fully worked out way of developing and then promulgating new policy. We're good at debating issues and bringing up good points and counterarguments, and we're almost good at reaching consensus. (Sometimes we do actually reach consensus, but sometimes we can't decide and fall back on "this will be left to each individual editor's discretion"). But we don't seem to be very good at taking the final step, when we actually do reach consensus, of actually calling our new consensus a "policy" that we can write up and point newcomers at.

With respect to the specific metaissues that have been raised above: it's clear to me that, logically at least, the Beer Parlour is not the right place for these extended policy discussions. There are two reasons: one is that, once a debate gets sufficiently abstruse, it's no longer interesting to the majority of Beer Parlour readers, and it ought to instead take place in a more focused space where those who care can concentrate on it. (Two possibilities for such a focused space are the talk page for an existing policy page, or a "working group" or "think tank" page.)

The second reason is that, even though the Beer Parlour may be well-indexed and the regulars who remember past debates may be able to find them in the Beer Parlour's archives, no newcomer is ever going to be able to do that. A responsible newcomer who wants to review the policy debate behind, say, our entry layout is naturally going to start at Wiktionary talk:Entry layout explained, so that really ought to be where most of the interesting, meaty debate about entry layout issues ends up.

Finally, though, we need to ask ourselves why debates which aren't on the Beer Parlour languish and go nowhere, and why those debates that do go somewhere and which achieve (or come tantalizingly close to achieving) consensus don't manage to turn into citable policy. It may be that we don't have a critical mass of people who care about policy. (That's not necessarily a bad thing, of course, given that time spent worrying about policy is time not spent actually writing the dictionary.) It may be that the people who do care about policy are being too deferential to their opponents and not pushing policy forward as long as there's still any opposing viewpoint or dissent. It may be that we're simply too lazy (or too otherwise occupied) to do the boring work of writing a policy page once consensus has been reached, so we go off and do more interesting things (like actually writing the dictionary) until a newcomer comes along and starts asking questions about all the things the existing policy documents don't say, and then we have to take a step back and scratch our heads and try to remember what that consensus we thought we had was, and where the debate about it might be.

I hope I'm not sounding judgemental or accusatory here. (Really, that's not my intent.) And I don't actually have any grand prescription for improvement here, either -- this is all just food (er, beer :-) ) for thought. —scs 01:04, 20 September 2006 (UTC)Reply

I believe you are greatly overstating the problem.
Well, I didn't say it was a huge problem. But when a well-meaning newcomer tries to understand something as basic as our scheme for classifying parts of speech, and even after reading WT:ELE can't find the information and has to ask here instead, and when after considerable roundabout discussion we discover that our best approximation of our preferred header list is buried in the tooltip code in Monobook.js, then yes, I'd say we have at least a little bit of a problem. :-) —scs 17:37, 20 September 2006 (UTC)Reply
People can ask before doing something that seems like it doesn't follow convention. To blithely ignore existing conventions is another thing entirely. The general purpose for not having policies is not, as you assert, laziness, but instead is an effort to maintain some flexibility. That doesn't mean that some things haven't been agreed on, one way or the other. Room exists for lots of experimentation. That doesn't mean we should throw away years of "heading folding" efforts, on a whim.
If you wish to start building up the Wiktionary policies, that is a Good Thing. But it is very adversarial in nature. It is a thankless task which fosters arguments over the most minute details.
I'd appreciate you joining some of the conversations in irc://irc.freenode.net/wiktionary regarding this topic. There are lots of ideas being tossed about, to help the situation.
--Connel MacKenzie 06:56, 20 September 2006 (UTC)Reply

em dash and em-dash

We have both em dash and em-dash, with basically identical entries. How do we handle it when we have words like this? Just put a note on each page that the other is an alternative spelling? RJFJR 15:54, 20 September 2006 (UTC)Reply

If there are two pages which have, and always will have identical content, there is the possibility of transcluding one page into the other. This is rarely used because rarely is it applicable. More often alternate spellings simply have a simpler entry pointing at a fuller one, so an indicator in each of them pointing out the other would be appropriate. - TheDaveRoss 16:09, 20 September 2006 (UTC)Reply

WT:CU - Checkuser policy

I have drafted an initial set of policy and procedure guidelines for Wiktionary's version of CheckUser, they can be found at WT:CU or Wiktionary:Requests for checkuser. Discussion of policy and procedure should take place on the talk page there, which will make it easier in the long run to find it again I think. Have at it! - TheDaveRoss 07:53, 21 September 2006 (UTC)Reply

rfap template in article or talk space?

Does an rfap template (request for audio pronunciation) belong in main space or talk space? (I just put one at talk:hegemony because I didn't want to clutter up hegemony.)RJFJR 16:10, 21 September 2006 (UTC)Reply

In the pronunciation section of an article is where I see it most, I would put it there. The talk page will also get the job done, so that is fine too. - TheDaveRoss 16:13, 21 September 2006 (UTC)Reply

Anti-consensus changes - can recent Policy movement help?

One of the longest problems that en.wiktionary has had, at an organizational level, is the discussion-less changes to fundamental pages such as WT:ELE or WT:CFI. Lately, there have been more people willing to take a "policy" approach to problems.

Several things come to mind immediately. One is that changes to such pages that don't point to a relevant conversation somewhere should be reverted on sight, right? Another is that we probably need a Wiktionary:Votes/WT:VOTE page, where items can be brought to the community's attention for something more solid than "perceived consensus."

Not necessarily. Some changes are points of clarification, such as the minor changes I just added noting that "Related terms" and "Derived terms" link words in the same language, to clarify against "Descendants" which clearly notes that it's for terms in other languages. This is a case where an addition did not lead to the necessary clarification of similar text elsewhere, but should have. --EncycloPetey 17:49, 21 September 2006 (UTC)Reply

Are we ready to take some of these steps now? --Connel MacKenzie 17:40, 21 September 2006 (UTC)Reply

Perhaps. We could certainly try having a VOTE page. If it doesn't work, we could always vote it away later ;) --EncycloPetey 17:49, 21 September 2006 (UTC)Reply
Support the creation of a voting venue, support the solidification of policy, support of most everything. - TheDaveRoss 18:39, 21 September 2006 (UTC)Reply
Support the voting page; support the clarification of decision policies, support the revisiting / revision / update of the ELE as a community. --EncycloPetey 18:53, 21 September 2006 (UTC)Reply
Support the voting page; support the clarification of decision policies, support the revisiting / revision / update of the ELE as a community. --Enginear 21:08, 21 September 2006 (UTC)Reply
Support the voting page; support the clarification of decision policies, support the revisiting / revision / update of the ELE as a community. (what they said :-) BTW, consider that some kind of gatekeeping mechanism for vote proposals might be required, such a vote proposal page, requiring a specified number of seconds either by other users or admins before listing on vote page proper. But I guess that's getting ahead of things... --Jeffqyzt 21:58, 21 September 2006 (UTC)Reply
Support the clarification of decision policies, support the revisiting / revision / update of the ELE and other policy pages as a community, oppose the institution of a formal voting mechanism at this time. —scs 17:23, 22 September 2006 (UTC)Reply
Seeing as I am in the "starting things" mood, WT:VOTE and Wiktionary:Votes have begun. The header might need some revision before it is...complete. - TheDaveRoss 03:47, 22 September 2006 (UTC)Reply
I've made some changes to what you had there. (Rebuilt it entirely, but left the purple.) I added a test vote thing; two other people have already voted on it...so I guess this was perhaps overdue. I stopped short of re-arranging the ongoing WT:C page to have a sub-page arrangement (so that the votes could appear in two places) for several reasons. Better to let that run its course, before getting too fancy. --Connel MacKenzie 09:19, 22 September 2006 (UTC)Reply

This vote is mainly a test, to let people experiment with how voting here will work. TheDaveRoss put the "50 minimum" thing in the original WT:VOTE page, but I figured I'd start things off with an opposition vote, as I think the waivers for sister-language (and sister projects) is critical. --Connel MacKenzie 07:38, 22 September 2006 (UTC)Reply

I noticed that V-ball had reordered the sections on the current WOTD thaumaturgy, so that the Related terms were listed as a level-4 header under the POS (and before the translations), instead of as level-3 at the end. The cited reason in the edit history was the ELE. Looking at the ELE, I found no statement requiring this structure, and in fact the Related terms and Derived terms sections are discussed in sequence after the Translations. However, I did notice that the (complex) example included on the page has this format.

Personally, I think we should change this example for three reasons. (1) Lists of related words and derived words are not directly relevant to the word itself, because they often belong to other parts of speech. This makes the content of these sections very different from what is listed under synonyms, quotations, or translations. They do not pertain to the POS, and should not be listed in such a manner as to imply that they are. (2) Placing them before the Translations section creates a physical and thematic separation between the definitions/quotations and the translations. We want users adding translations to be able to look back and forth between definitions, quotations, and translations without having intrusive material in between. (3) The current example implies a particular format that is not actually described or advocated anywhere in the ELE.

My own practice has been to place the Related/Derived terms as a level-3 header following the translations. This makes more sense to me. Now, I'm not saying that this is always the preferred location, since there are cases in which we want to tie these entries to a particular part of speech or particular sense, but I think for the more general case, this is the most logical sequence. Thoughts? --EncycloPetey 18:03, 21 September 2006 (UTC)Reply

I guess I don't have strong feelings about the level of the headings, although I think having them 4th level makes more sense as they are related to the entry, although, like you say, not always directly (I would think, though, that derived words are directly related).  However, I'm all for having these entries after the translation section as you suggest.  Most importantly, I'm for a standard, a standard described in the ELE for all to see, and a standard that is then used.  —  V-ball 21:05, 21 September 2006 (UTC)Reply
The translations should come after all other English language relationships (synonyms, derived term, etc.) to the term. If there are multiple etymologies for a word, but the synonyms apply to all etymological definitions (unlikely,) then the synonyms should be at level three, with the etymology, which the part of speech heading would be at four, translations at five. --Connel MacKenzie 21:11, 21 September 2006 (UTC)Reply
The relevant sections in ELE that say to me that, e.g. Derived terms, should be one level below the P.O.S. that spawns them (if known) are from WT:ELE#Additional headings, "A key principle in ordering the headings and indentation levels is nesting. The order shown above accomplishes this most of the time. A heading placed at one level includes everything that follows until an equivalent level is encountered. If a word can be a noun and a verb, everything that derives from its being the first chosen part of speech should be put before the second one is started. Nesting is a key principle to the organization of Wiktionary, but the concept suffers from being difficult to describe with verbal economy. If you have problems with this, examine existing articles, or ask questions of a more senior person.", (emphasis mine) and from WT:ELE#Derived terms, "If it is not known from which part of speech a certain derivative was formed it is necessary to have a "Derived terms" header on the same level as the part of speech headings." The example shows the breakout, and furthermore the direction to place derivations of uknown specific provenance at the same level of POS implies that the opposite is true if provenance is known.
That said, it's not quite explicit, and I don't really care all that much; knowing that a decendant derives from a word+Etymology is probably more important than knowing which particular part of speech it derived from. In any case, cross pollination is going to shade any derivative meanings. I am in support of revising (making?) policy that Derived/Related terms should be at the same level as Part of Speech, and all of those be children of Etymology. Like V-ball, though, I'm more interested in standardization than the particulars. --Jeffqyzt 21:51, 21 September 2006 (UTC)Reply
Did I word that wrong above? There are several possible structures:
  1. All of these headings at level three (if there is only one POS, and one etym) with translations coming last.
  2. Etymology and POS at level three, everything else nested below at level four (again, with translations coming last in each section.)
  3. Etymology, POSes and derv/syns/etc at level three, all others nested below a POS.
  4. Etymology and derv/syns/etc at level three, POS at level four, all others nested below at level five.
  5. Etymology at level three, POSes and derv/syns/etc at level four (for cross-polinated synonyms) all else nested lower.
In each case, determining what amount of cross-polination is going on, is what should determine which/where/what level the derv/syns/etc end up at. But in all cases, the translations are supposed to come after the English language relevant parts. If the ELE doesn't say that clearly, with brevity, then perhaps we should think of rewording it (with caution!) --Connel MacKenzie 23:12, 21 September 2006 (UTC)Reply
But current practice has Translations at level-4 nested under POS, so are you saying that POS should come last, after Related terms and the like? --EncycloPetey 23:59, 22 September 2006 (UTC)Reply
From what I've seen (and I've seen a lot of pages briefly), there are three standard level-3 headers in widespread use for languages with a single Etymology: (1) Etymology, (2) Pronunciation, (3) POS, in that order most often. There is also a hefty percentage of pages for which (4) Related terms (and similar headers) are placed as level-3 following the POS section, but as Connell has noted (and is currently modeled in the ELE) there is also a hefty percentage for which these headers are level-4 under POS.
Now, If we put them as level-4, then I understand Connel's position completely about having all the English-specific information preceed the Translations. What I can't reconcile is the logic behind listing derivatives and related terms under the POS (as opposed to Etymology). My own feeling is that these are links to less-related pages than synonyms, antonyms, or even translations, so they don't belong in the POS section at all. Putting them in the Etymology section would make more sense, but then we end up beginning each page with lists of peripherally related terms, pushing the inflection, POS, and definitions far down the page in some cases.
I think we ought to have them as level-3 after the POS, treating them almost like external links. It might even be worth creating a 4th grouping header at level-3 to include as subheaders things like Related terms, Derived terms, and Derivatives. --EncycloPetey 23:56, 22 September 2006 (UTC)Reply

Statistics oddities

The Wiktionary Statistics WT:STATS were just updated, and I note three points of interest:

  • There are 243 Slovenian words, but 112 Slovene words. I can't remember which we decided was the correct language header, but it ought to be uniform. My two cents is that all the dictionaries and grammars I have for the language at home (about 7) all use the term "Slovene" rather than Slovenian to name the language.
  • There are 22 pages whose language is Etymology.
  • There are 42 pages with a language header of References.

If someone knows how to search for the miscreant pages and repair them, it would probably be a GOOD THING(tm). --EncycloPetey 23:46, 22 September 2006 (UTC)Reply

Those should come up in Connel's analysis, and then in his todo pages. - TheDaveRoss 23:56, 22 September 2006 (UTC)Reply
Both Slovene and Slovenian are correct, but some time ago I noted that most Slovene contributors used the word Slovene. I believe most of the Slovenian entries were added by User:Drago, a Hungarian. Generally it doesn’t make much difference except when someone links a word to the particular language, as in [[дом#Slovene|дом]]. If the language header on the дом page reads ==Slovenian==, the link does not work right. For this reason, whenever I encounter Slovenian, I change it to Slovene. —Stephen 02:29, 23 September 2006 (UTC)Reply
It certainly seems easier (to me) to make the correction using $python replace.py -file:slovenian.txt "==Slovenian==" "==Slovene==". It takes me ~30 seconds to select them all (since I have to search the whole text of the wiki) and another two to ten minutes for the bot to run. (Not under a 'bot account though: User:Connel MacKenzieBot is not an official 'bot.) Shall I proceed? --Connel MacKenzie 02:35, 23 September 2006 (UTC)Reply
Go for it. SemperBlotto 07:20, 23 September 2006 (UTC)Reply
I can find only 36 main-namespace entries with "References" as a level-2 header, but I've been working from the 9/13 dump (as opposed to the 9/22 dump which WT:STATS is currently based on), so that may account for the difference. Anyhow, here they are. Anybody who wants to help, please <strike> these out as you fix them. —scs 23:00, 23 September 2006 (UTC)Reply

Etymology:

References:

One more oddity: I was analyzing the breakdown of languages and noticed that Old English does not appear on the list. I then noticed further that Ancient Greek and Old Prussian weren't either. Now, I know that we have many entries for these languages, so I'm wondering if somwhere in the dump or statistics crunching we're losing track of languages whose name includes a space within the header. --EncycloPetey 00:32, 24 September 2006 (UTC)Reply

Excellent catch. I shall fix that error shortly and regenerate those statistics. --Connel MacKenzie 01:46, 24 September 2006 (UTC)Reply
Scs, if you look at these:
  1. User:Connel MacKenzie/todo possibly bogus language headings
  2. User:Connel MacKenzie/todo2 probably very bogus third level headings
  3. User:Connel MacKenzie/todo3 pages with no "#" lines
  4. User:Connel MacKenzie/todo4 no level two heading at all
  5. User:Connel MacKenzie/todo5 pages bereft of wikification
and make corrections, then I'd appreciate it if you do strikeout or remove them from those page's sections. I have a plethora of other cleanup lists I compile semi-automatically after each XML dump. (You are welcome to suggest other things I should check for, and/or create these lists yourself, as well, of course.) On my main user page, I try to keep a fairly coherent list of items that need cleanup, albeit somewhat fragmented.
Also of note, is Patrick Stridvall's toolserver page, which is a little bit more of a dynamic approach to identifying similar entries. I don't know if he still needs to update it after each XML dump anymore, or not. I haven't seen him around in a while, so I don't know if he is keeping that up to date.
--Connel MacKenzie 01:46, 24 September 2006 (UTC)Reply
Good catch. I don't know who generated the WT:STATS statistics, or how. My own crunch (again, based on the 9/13 dump, not 9/22) came up with these counts for the three you mention:
Old English 1937
Ancient Greek 581
Greek, Ancient 2
Old Prussian 166
And also these:
Technical Information 21301
Dictionary Information 21299
Chinese Hanzi 20615
Korean Hanja 8727
Japanese Kanji 1893
Old High German 823
Biblical Hebrew 325
Japanese kanji 323
Spanish (Castilian) 310
Old Norse 240
Scots Gaelic 210
(But take these numbers with a grain of salt; the script I used to generate them is still pretty rough. For comparison, it generated English 90814, Japanese 15538, French 5827, German 5541, Italian 5357, and Spanish 5087.)
See also Stridvall's header tool (though that page is currently still based on the 7/4 dump).
scs 01:40, 24 September 2006 (UTC)Reply
Vild noticed that in my XML dump analysis, I was generating most of these numbers already, so he asked me to consolidate them onto WT:STATS. If you want to take that (or any part of the process over using your tools, please do! Just let me know, so I don't duplicate the effort. --Connel MacKenzie 02:01, 24 September 2006 (UTC)Reply
Technical infomation, Dictionary information, Chinese hanzi, Japanese kanji, Korean hanja are all the NanshuBot entries for single Han characters. I've been working on what to do with them; but there are some issues holding it up. Robert Ullmann 17:00, 24 September 2006 (UTC)Reply
I've been excluding all Nanshubot entries from these "/todo" pages for a very long time. For the first year that I did them, we really didn't have anyone that dealt with CJKV stuff, active here...so I had removed the clutter to make the lists more usable. I'm very hesitant to turn it on again, as that means my various "/todo" lists will automatically be filled with an extra ~17,000 "bad" entries. OTOH, we have at least eight or nine regular contributors now, so maybe it is time. Yes/No? --Connel MacKenzie 20:48, 24 September 2006 (UTC)Reply
If you do include them, I'd put them in separate lists; they will have a large number of instances of a limited set of problems, and as you say, there is a different set of contributors interested. Like maybe run your s/w twice, with that predicate reversed? Another thing I thought of is checking for "conjugation" used under noun or adjective, or (less frequently) declension occuring under verb. Robert Ullmann 14:12, 25 September 2006 (UTC)Reply
That is a very good idea; thank you. I'll try to make a thing that generates /todo6 by the next XML dump. --Connel MacKenzie 16:51, 27 September 2006 (UTC)Reply
[see more on this thread belowscs 20:55, 27 September 2006 (UTC)]Reply

Han character entries (Nanshubot)

Clearly, those 17,000 Nanshubot-ized Han characters don't want to be put on any lists other than the one driving the eventual bot that cleans them all up. (I haven't participated in that debate, but it seems to me that a decent approach would be to put them in a "Language" of Han or Han ideograph, a "part of speech" of character or ideograph, with all the rest of the info moved to level 3 or below. And "all the rest of the info" -- it's the stuff that's lifted straight from the Unicode unihan.txt file, right?) —scs 23:44, 25 September 2006 (UTC)Reply

I'm not sure of that. They have sat around for ages with no one that really even knows what they are. Now, perhaps, there is enough critical mass of contributors to probably even explain to me what a unihan.txt file is.  :-) But either way, I don't see any way for all of them to be "corrected" by bot. Is the problem simpler, than it appears? --Connel MacKenzie 16:51, 27 September 2006 (UTC)Reply
I haven't studied them or even read much of the discussion about them, but my impression is that we've got boatloads of entries that were mechanically generated from the data in ftp://ftp.unicode.org/Public/UNIDATA/Unihan.zip (warning: zip file), which is linked to from http://www.unicode.org/charts/unihan.html and documented in http://www.unicode.org/Public/UNIDATA/Unihan.html. Sample data from that file (which I have reformatted slightly):
U+3485
kCantonese kai2
kDefinition to unbind the collar
kHanYu 10221.030
kIRGHanyuDaZidian 10221.030
kIRGKangXi 0117.060
kIRG_GSource 5-3270
kIRG_KPSource KP1-3690
kIRG_KSource 3-2159
kIRG_TSource 4-422E
kKPS1 3690
kMandarin QI3
kRSUnicode 9.12
kSBGY 269.52
kTotalStrokes 14
(Those inscrutable tags like "kKPS1" and "kIRG_GSource" are all documented in the file, and at [5].)
It's useful data, to be sure, although there's some concern that uploading all of it here might have been a copyright violation, especially in regards to the "kDefinition" lines. But at any rate, you can totally see where, say, the bulk of the data in our entry came from. (All I'm imagining in terms of bot-aided "cleanup" is perhaps rearranging the headers. But I'm sure Robert Ullmann has more to say on this.) —scs 18:53, 27 September 2006 (UTC)Reply
Take a look at . This represents my (with some help) first pass at what the format should be if we keep all the NanshuBot information.
That looks great! I like the way you've put "Han Character" as a level-3 under "Translingual". —scs 20:57, 27 September 2006 (UTC)Reply
It uses two templates, one of which puts the entry in Category:Han characters. There are 21,300 of these entries (+ or - a few). Yes, it is from the Unihan database. I imagined a bot could stuff all the info into the templates. This is the technical side of it. Then there is the copyright. See below. Robert Ullmann 20:37, 27 September 2006 (UTC)Reply
If a decent "transeformed" version can be knocked together by people who have some idea of what these should look like, I will attempt to write a custom bot to transfrom all of them into the new format, hopefully reducing the manual cleanup. - TheDaveRoss 18:57, 27 September 2006 (UTC)Reply
If these are all copyright, or of questionable copyright, they should be deleted. We'd do better to start a fresh import for technical reasons as well. --Connel MacKenzie 18:59, 27 September 2006 (UTC)Reply
See User:NanshuBot, the copyright is very permissive when it comes to use, Amgine and legal counsel gave us the go ahead a few weeks ago. It may be better to reimport than in a new format, but we don't have to based on copyright. - TheDaveRoss 19:02, 27 September 2006 (UTC) apparently I was mistaken, or the discussion I recall was wrong...something. - TheDaveRoss 21:26, 27 September 2006 (UTC)Reply

Actually, that copyright says explicitly "Unicode, Inc. specifically excludes the right to re-distribute this file directly to third parties or other organizations whether for profit or not." I.e. it was a pure copyright violation. The en.wikt is exactly such a redistribution. Each en.wikt entry is (was) just a reformatting of a record (all of the records!) in the Unihan database. In no way did Nanshu have permission to place the data under the GFDL! That's the bad news, the good news is that Unicode's Terms of Use are better now. But they still don't grant permission to place the entire DB under GFDL, which we absolutely require. Similar observations apply to the "Four Corner" and "Canjie Input" data: it is not sufficient to "use with permission" from a copyright holder: there must be no copyright holder.

I don't know who "Amgine and legal counsel" are; I have been talking to Brad Patrick, General Counsel of the WikiMedia Foundation, and he considers it—at this time—an open and potentially troublesome issue. Robert Ullmann 20:37, 27 September 2006 (UTC)Reply

Without weighing in with an actual opinion ('cos IANAL, and on this issue I could be swayed either way), let me point out that Unicode's copyright is somewhat more lenient on derived works than it is on redistributing verbatim files. We may have all of the same information from their files in our Nanshu-botted entries, but we're clearly not redistributing Unicode's files as-is.
There's also the question of whether pure information (which is what many of these CJK dictionary and character set correspondences are) is copyrightable at all. (If the information's not copyrightable, then "releasing" it under the GFDL isn't an issue.) There's also the question of whether Unicode holds or deserves any sort of "compilation copyright" on it all. (And then there's the question of whether it's worth having all this information propagated into Wiktionary at all, or whether any reader who needs this particular information will always go to the official Unicode files anyway.) —scs 21:05, 27 September 2006 (UTC)Reply
  • With deference to whatever Brad Patrick may eventually say on the topic, I think we should delete them all now. There is absolutely no reason for us to have questionable material at all. We would have sucked in all content from the OED by now, if we were trying to be something polluted. But we're not. The fact that the question can even be raised in seriousness, in my mind, is reason enough to delete these 17k+ entries. If and when, a newer version with GFDL compliant licensing is available, we can try re-importing them in a more usable format. --Connel MacKenzie 21:20, 27 September 2006 (UTC)Reply
I will not comment about legality issues since I am not a lawyer. However, I would like to provide my thoughts with respect to the format and accuracy of the information contained in the 17,000 files. First of all, I find it rather silly that this information is copyrighted at all given the number of errors in the entries. As a speaker of Chinese, I find only two pieces of information to be consistently reliable about the files:
  1. the radical/stroke information
  2. the encoding information
The romanization information is at best incomplete, and sometimes wildly inaccurate (ex. two alternate romanizations, one valid and one either bogus or archaic, without any information about proper usage). The common meanings section is basically useless for anything other than a very superficial rough idea as to the basic meaning of a given character. It is rather moot anyway, since the common meanings section is often blank.
The current format of the information is also problematic. For example, simplified or traditional written forms are listed under a heading called alternate forms. The problem with this label is that you're not told which one is which. This may seem obvious, but it is not. To illustrate my point, I will use the following character(s): and . Technically, is the traditional form of , but is the standard form used in "traditional" as well as "simplified" Chinese. We need the entry to tell us such things.
In my opinion, the key to wiktionary's success is the accuracy and completeness of its entries, not the raw number of entries. I am unconvinced that a bot will create the type of entries that would truly be useful as a learning aid, research tool or professional translation reference work. In order to fill wiktionary with the type of entries that I would like to see, I am afraid that it will be a process of human language experts contributing and editing words over a period of many years. This may sound like I am being negative or unoptimistic, quite the contrary. I believe that Wiktionary will eventually be one of the most valuable language references ever created ... even if it takes us 50 years :)

A-cai 07:29, 28 September 2006 (UTC)Reply

I'm surprised and sorry to hear that the data's of such poor quality, because I'd gotten the impression that the Unicode consortium and the researchers who contributed to the Han unification effort had put a huge amount of work into the unihan.txt file. But I just realized something which may explain the discrepancy: Nanshubot did its work a couple of years ago, and I'm pretty sure the unihan.txt file has evolved quite a bit since then. I'll take a look at the discrepancies A-cai has noted and see if they're reflective of errors in older versions of unihan.txt which have since been corrected.
If the Unihan data which Nanshubot imported is now obsolete, that'd be another reason to do a reimport (assuming we decide to keep the data at all) as opposed to futzing with what we have now. But on the other hand, this is also another argument not to try to replicate such data here at all, since if our copy has this tendency to become stale, our readers are better served by going to the up-to-date Unicode consortium files anyway. —scs 23:14, 28 September 2006 (UTC)Reply
The Han Unification effort was a tour de force; extremely impressive (I say this as one who was tangentially involved); the Unihan database is—and was—solid. The problem is that Nanshu derived romanizations and readings for kanji not directly from the database that are very suspect. The raw data (e.g. what I stuffed into the templates in the example) is OK. Robert Ullmann 23:22, 28 September 2006 (UTC)Reply
Hi there. IAAL - in fact, IAAIPL. Where can I see the original database from which this material was gathered (please email directly to me). My initial inclination is "GET RID OF 'EM". That may change once I see the source and read the TOU... but don't gamble on it! bd2412 T 07:49, 28 September 2006 (UTC)Reply
The raw data file in question is ftp://ftp.unicode.org/Public/UNIDATA/Unihan.zip . Its context is (i.e. the public link to it is at) http://www.unicode.org/charts/unihan.html, and there is further documentation at http://www.unicode.org/Public/UNIDATA/Unihan.html. —scs 22:50, 28 September 2006 (UTC)Reply
P.S. A-cai, how is the Wiktionary:Chinese Pinyin index - accurate or far from? bd2412 T 07:49, 28 September 2006 (UTC)Reply
I'm guessing that the Wiktionary:Chinese Pinyin index is based on the Nanshu bot files. For example, if you take a look at , it lists qiāng (which is wrong, I checked several of the largest on-line and off-line dictionaries), which also shows up in Wiktionary:Chinese Pinyin index. On the other hand, it does not list under á, ǎ, à, ā or ē (all valid romanizations for ), which also matches the individual entry for . The entry should indicate that ā is standard Mandarin, and then explain the situations in which the others are used (see: ). The romanization for this character demonstrates the type of inaccuracies that I regularly observe in Wiktionary individual Chinese character entries.

A-cai 08:26, 28 September 2006 (UTC)Reply

We certainly don't want to delete the entries. First of all, the existence of the entries isn't a copyright problem, all they are is one entry for each consecutive code in the IS 10646 "Unified Han" code range; some 20K+ code points. (FYI: what we call "Unicode" is ISO standard 10646, "UTF-8" is an annex to that standard; Unicode was one of the inputs to the IS 10646 process.) The problem is the content loaded by NanshuBot. It is possibly/probably a copyvio, and as observed, what isn't directly from the Unihan DB is suspect or simply wrong. An indication of the quality of the derived information is the use of the name "Morobashi", a misreading of the kanji for Tetsuro Morohashi.
However, there has been lots of good information added by many people since the entries were created; we don't want to discard that. A possibility is to strip all the identifiable Nanshu information from the entries, adding a Translingual/Han character section at the top, and then continuing from there. Robert Ullmann 12:03, 28 September 2006 (UTC)Reply
As to your last point, clearly all of those such entries are "derived works" and therefore also copyvios, no matter who made the subsequent edits. For that reason, they pretty certainly should be removed as well, IF the alegations turn out to be true (and/or no further redeeming information is provided.) But, IANALE. --Connel MacKenzie 13:40, 28 September 2006 (UTC)Reply
Certainly the radical/stroke information could not be copyrightable! Also is anyone claiming that (not the collection but) the few isolated pages that have been edited thus far were under copyright? I say remove the bulk of the data, leaving only the altered pages and otherwise the essential components like the radical and the number of strokes. What do you legals have to say about that idea? DAVilla 13:32, 29 September 2006 (UTC)Reply

I have requested for comment the juriwiki-l mailing list, hopefully they will have time to look into the issue and advise us in some way about what is best to do. - TheDaveRoss 16:07, 17 October 2006 (UTC)Reply

Ok, speaking as an IP attorney, I think this is most akin to the situation in Kregos v. Associated Press, 510 U.S. 1112 (1994). There, a baseball reporter came up with a set of nine statistics that he thought were particularly important to determine which pitchers would win the day's games. The Supreme Court held that although Kregos could receive protection in the arrangment and presentation of the statistics, this protection would be very narrow. In essence, all that could be protected was the exact presentation. Any other paper that chose to publish similar statistics could do so as long as the alternate presentation differed from that created by Kregos in "more than a trivial degree", specifically finding it unlikely that the AP's form infringed where it included only 6 of Kregos' 9 stats, included 4 additional stats that Kregos did not.
In our case, we are attempting to provide as much information as possible about every character available. The actual information we seek to present is in the public domain. It is only, therefore, the particular selection and arrangement to which another party could lay claim. Hence, if we strip all non-essential information originating with the other source, include additional information (which we are going to do anyway), and change the arrangement to suit our purposes, we should be in a position to prevail over any challenge to our use of this information.
Cheers! bd2412 T 16:35, 17 October 2006 (UTC)Reply
Having not heard anything back from the Wikimedia General Counsel, having asked repeatedly ... sigh. Given what you say, I think we should do this:
  • Format the info at the top of the entry, which Unicode can't really claim any copyright on (compliation, derivation or whatever) into Template:Han char under a Translingual header so that the format meets our standard.
  • Delete the "Dictionary information" section; it is page and line references to the unabridged versions of large dictionaries most people don't have anyway. (One of them is ~10,000 pages.) This information was developed by Unicode, they might have some claim, but so what; we strip it.
  • Delete the "Technical information" section; this is the Unicode/IS 10646 code point in hex and decimal, ditto the Big5 code point. Unicode has no claim on this, but there is no reason to have it. We don't give the JIS codes (or ASCII, or whatever).
  • We already have additional information on a large number of entries.
Comments? Objections to my doing this? (It isn't a bot run for all of them, there are so many variations people have introduced; it will take runs with a bot or AWB skipping entries with variants, then going back to collect sets of them.) Is anyone going to see this here, or does it need to be moved to the bottom? Robert Ullmann 13:43, 18 October 2006 (UTC)Reply
I think we should await the reply from jurywiki-l before we do anything, if the verdict is delete there is no need to format them before we nuke them. - TheDaveRoss 03:22, 19 October 2006 (UTC)Reply

Well, phase one of the voting has ended. Apparently, the person driving the vote effort now, is not a Wiktionarian, nor ever was. I'm not sure how appropriate most of the remaining "semi-finalist" logos would be for Wiktionary. My objections to the vote timing, vote conduct, etc. have fallen on deaf ears. I suppose it is water under the bridge, for now, at least until someone tries to tell en.wiktionary that it has to use one of the new (inappropriate) logos.

I would like others from this community to review the meta: "semi-finalists" and begin discussing here, whether or not to even consider using one of them on en.wiktionary. It seems to me like a lot of people's time and effort has come to naught. Sould we wait a month or two, then start our own logo contest (since trying to satisfy so many other factions, has left us with such poor results?)

--Connel MacKenzie 01:31, 24 September 2006 (UTC)Reply

I'm not sure what the problem is. I participated (lightly) in the discussion and vote over there on meta, and I saw several other names from here I recognized, so it's not like there was no representation.
Me, I rather liked several of the candidates, and the one I liked best made it into the next round, so I'm happy. :-) —scs 01:44, 24 September 2006 (UTC)Reply
There are several there which I would actively oppose if they someone tried to make them the en.wikt logo, anything in a speech bubble for instance...what does that have to do with a dictionary? The calligraphy one isn't too bad, but I don't like many of them all that much. - TheDaveRoss 01:49, 24 September 2006 (UTC)Reply
I agree the caligraphy one seems good. But that is very different from saying that it is better than our current logo. --Connel MacKenzie 01:54, 24 September 2006 (UTC)Reply
The problem, as I see it, is that en.wiktionary.org has about 300 good, regular contributors. Over 50 people contribute more than a hundred edits each month! To have only three people from here, comment there, is a problem. --Connel MacKenzie 01:52, 24 September 2006 (UTC)Reply
Does WMF require that we use the same logo as all of the other wikts? - TheDaveRoss 01:56, 24 September 2006 (UTC)Reply
I have no idea. I don't want to think of how ugly it could get, if push came to shove. --Connel MacKenzie 20:50, 24 September 2006 (UTC)Reply
If people didn't vote, that mean they don't care about the logo. If you don't like the proposals, there was an option "keep current logo" (which didn't make it to the second round). So, I don't really see where is the problem with the current vote, and when I look at [6] I feel like changing the current logo... Maybe we could make some javascript so that anyone can choose to display the logo xe likes? Kipmaster 08:49, 25 September 2006 (UTC)Reply
Excuse me, but that is a wildly different thing, from meta assuming they can force-feed a logo choice. The same person that ignored my complaints about the irregularities in voting procedure, is the one who unilaterally decided the cutoff for "round 2" after the fact. The fact also remains that whatever en.wikt: adopts, becomes the defacto default that the other language Wiktionaries will then adapt to their needs.
The fact that no one was able to put in a javascript to rotate the proposed logos in (as suggested a while back) implies that our current javascript "resources" are better spent elsewhere, anyhow. --Connel MacKenzie 16:29, 27 September 2006 (UTC)Reply

Special:Policy pages to which TheDave is the sole contributor

There have been a few new things popping up around here in an effort by some of the population to firm up policy and procedure, as well as organize certain things we do. However, I (and possibly others) have created some pages which haven't been scrutinized nearly enough for my liking, and so I thought I would point out the pages I have made which are intended to establish policy for new features, but have only been editted by me. Please, visit them, comment on them, change them, whatever, if someone else shows up and wonders what our checkuser policy or voting policy is, they might only have my hastily drawn up first draft to go on...and that is scary!

WT:VOTE
WT:CU
Wiktionary:Why_create_an_account
MediaWiki:Signupend

These are the ones that I have made, if anyone else has some "policy" pages which they feel haven't gotten the attention they need to get underway, go ahead and link them too. - TheDaveRoss 21:28, 24 September 2006 (UTC)Reply

There's a typo on Wiktionary:Why create an account where it says At the, for example which seems to be missing a link to the German wiktionary (which I would have filled in if I knew what to put here).RJFJR 13:31, 25 September 2006 (UTC)Reply
Thanks for pointing that out, I had linked the German wikt [[de:Main_page|German Wiktionary]] and it should have been [[:de:Main_page|German Wiktoinary]]. - TheDaveRoss 16:48, 25 September 2006 (UTC)Reply

Highlighted new entries

When I view the new entries, some of them are highlighted in yellow. Can someone tell me what this means? — Paul G 16:13, 27 September 2006 (UTC)Reply

If you are a sysop, "not-patrolled" edits appear with a different colored background (usually yellow.) --Connel MacKenzie 16:19, 27 September 2006 (UTC)Reply
I thought it might be something like that. Thanks. — Paul G 11:06, 29 September 2006 (UTC)Reply

Pinyin number system transliteration entries

I've completed the entries for about 1,400 Pinyin number sytem transliterations, i.e. an alternative spelling of the usual Pinyin transliteration, but substituting a number (1 through 4, or rarely 0 or 5) for the diacritic. Right now, virtually all 1,400+ entries have an only unsubst'ed template. Any changes that need to be made universally should be carried out before those templates are subst'ed.

For example, the current content of the entry for xue3 is:

{{Pinyin-n|xuě}}

The current layout of the page appears as:

Mandarin

Pinyin

xue3
  1. Alternative spelling of xuě, a transliteration for several Chinese characters.
Category:Mandarin pinyin

The xuě in the definition is piped to xuě#Mandarin, which means nothing in this case, but is important for terms which appear in multiple languages, such as . Also, the category entry is piped so the number transliteration will generally show up in the Mandarin pinyin category right after the transliteration using the diacritic. As of now, if the entries are subst'ed, each will have the following setup:

==Mandarin==
===Pinyin===
'''xue3'''
# {{cmn-alt-pinyin|xuě}}
[[Category:Mandarin pinyin|xue3*]]

Does anyone have ideas for improvements that should be made before subst'ing complicates the process of changing the entries wholesale?

  1. Should these transliterations be in a separate category or subcategory?
  2. Should the level 3 header say something other than "Pinyin" - I was thinking of "Transliteration" actually, or something like that.
  3. Should the reference to the transliteration with the diacritic be under a separate heading for "Alternative spellings"?
  4. Should the entry be worded differently?

Cheers, bd2412 T 23:28, 27 September 2006 (UTC)Reply

Note that if you look at the entries right now, you will see a possible version of the above (yi2) that provides for putting the characters with glosses in both the versions with tones and with diacritics. (It also uses "syllable" at this exact moment, but that is one word in a template ;-) Robert Ullmann 17:01, 29 September 2006 (UTC)Reply

Comments on BD2412's section above

(section header changed to L3 to make it part of prev section Robert Ullmann)

Due to the included template, I can't seem to edit the above section.  :-(

I do not "like" the ===Pinyin=== heading; I'd much prefer the English language part-of-speech equivalent be identified. Likewise, the "see also" or "alternative spelling" shouldn't just say the foreign language term, but its translation/gloss in English. Couldn't this be done? I understand that these can be helpful to someone who knows what they mean already. But I had hoped that they would become much more complete, somehow. Does that next step have to wait for the templates to be transcluded/subst:'ed?

Also, please refresh my memory: when did ==Chinese== (the common English term) get replaced with ==Mandarin==?

--Connel MacKenzie 07:08, 28 September 2006 (UTC)Reply

We discussed this a few months back; Chinese is a language group, the word "Chinese" used loosely to mean Mandarin sometimes, and the group at other times. The languages in the group are spoken and written differently. (There is a myth that written Chinese is universal, and only the spoken languages vary; this is entirely incorrect. A native speaker of Min Nan may be able to decode written Mandarin, in the way an educated native speaker of English can decode, say, French. But that's it. Frequently speakers of the other languages will learn written Mandarin as an acquired language, with no idea how it is pronounced.) The languages are Cantonese, Hakka, Gan, Jinyu, Mandarin, Min Bei, Min Dong, Min Nan, Min Zhong, Wu, and Xiang. (And numerous dialects.) In this particular case it is important because this is Mandarin Pinyin, not (e.g.) Hakka Pinyin. Robert Ullmann 12:20, 28 September 2006 (UTC)Reply
The template is not included above - there's an actual section header (or two) in there for realism. With respect to the part-of-speech, that doesn't work well here because each transliteration may stand for as many as dozens of Chinese characters (have a look at ), with different meanings and in all different parts of speech. Many of those individual characters have multiple meanings and can be used as different parts of speech - the language is very loose about that; some characters have no meaning at all apart from their inclusion in other words. It's almost as if we had an entry for a symbol representing a sound like "fō", which can be combined with other symbols to make fōtō or fōrest or camfōr, and so forth.
Also, these can't rightly be under the second level heading of ==Chinese== because they are specific to one dialect. Cantonese, for example, uses altogether different transliterations. Making them more complete would probably entail adding the actual characters, as in xuě. We're still not settled on how to do that in the diacritic entries, which should probably be completed before adding that level of detail to the number-system entries. Cheers! bd2412 T 07:28, 28 September 2006 (UTC)Reply
I think (zh:yì) very clearly depicts my primary complaint. In English we have entries like un- which have many derived terms (Lua error in Module:parameters at line 360: Parameter 1 should be a valid language code; the value "un" is not valid. See WT:LOL. terms starting with “Beer parlour”.) But we do explain each of them. Clearly, I'm missing a big part of the picture here. We also have entries like run where each individual meaning is spelled out. I assume that zh:'s entry for zh:run will eventually be equally detailed. But it probably won't be spelled out in English on the zh: Wiktionary. Am I out to lunch here? --Connel MacKenzie 08:11, 28 September 2006 (UTC)Reply
Well, what I'm saying is to hash those out on the entries for the actual diacritics (e.g. as opposed to yi4, which says the same thing with numbers). Eventually we'll get them all sorted out like that, but I'd rather have the number-system entries the way they are until we have all the diacritic entries done in a manner that comports with what I think you have in mind. bd2412 T 08:16, 28 September 2006 (UTC)Reply
Just to clarify a tiny bit: what I have in mind is that looking at an entry, no matter what language it describes, should convey something meaningful to an English reader (which is the premise of having separate language Wiktionaries, after all.) Having a tone transliteration pointing only to a foreign symbol with no description in English is something we should be moving away from, not towards. --Connel MacKenzie 15:34, 28 September 2006 (UTC)Reply
Right now yi4 is essentially just an alternate spelling, so these arguments apply more to and/or apply with equal weight to all alternate spellings. As for foreign symbols, there is no way to distinguish homophones based on their sound, and there are no written symbols other than foreign ones that do this. DAVilla 13:04, 29 September 2006 (UTC)Reply
Your point is well taken - I'd just prefer to have one set of pages on which those entries are created, worked over, and perfected before they are copied over wholesale to a second set of pages which are, effectively, alternate spellings. Perhaps some kind of transclusion could be worked, but I'm leery of that. bd2412 T 18:43, 28 September 2006 (UTC)Reply
As to the L3 Pinyin heading, I don't like it much either. But we need some standard non-POS L3 heading for these, like Letter or Symbol. Maybe it should be Pinyin syllable? Ideas? Robert Ullmann 12:20, 28 September 2006 (UTC)Reply
I was thinking maybe Transliteration or Phoneme. bd2412 T 14:11, 28 September 2006 (UTC)Reply
Well, not phoneme, is two phonemes, one syllable. Robert Ullmann 14:42, 28 September 2006 (UTC)Reply
That's a good idea. But are they syllables? (Perhaps therein lies most of my confusion?) Perhaps "Pinyin abbreviations" or "Pinyin notation"? Or "Language notation"? "Tone marking for foreign characters"? There really is no easy way to say this; simply calling it "Pinyin" may be accurate, but conveys no information at all to our typical "readers." (Nor me, really.) --Connel MacKenzie 15:34, 28 September 2006 (UTC)Reply
They are almost always syllables in Mandarin Chinese, with at least ㄦ (or something related to it?) as an exception, and as I understand it with many more exceptions in Japanese. So ===Syllable=== isn't really appropriate either. ===Transliteration=== or the name of the system like ===Pinyin=== (or something containing it) are the best options so far. In the former case we would have to find a place to name the system, since there are multiple transliterations for many languages. DAVilla 12:43, 29 September 2006 (UTC)Reply
Okay, the test cases as far as format goes are as follow: in a single language like Mandarin, a word that is a natural collision, the same transliteration of Chinese under two slightly different systems, and a word that is a coincidental collision, the transliteration of completely different Chinese words under significantly different systems. The ===Transliteration=== header would not work well for the second case because it would have to be listed twice, separating the entriely different meanings under different systems. But with ===Pinyin=== and a similar flavor as headers, there will be a lot of duplication. DAVilla 18:26, 7 October 2006 (UTC)Reply
We don't list "==English (American)==" nor "==American English==" nor "==American==" nor "==British English==" nor "==India (country) English==" nor "==Brooklyn English==" nor "==Texas English==" nor any of the dozen other "very, very big" dialects. In English, the various dialects of Chinese are only understood as "Chinese" (the target audience again: English readers) which is why the heading we've used has been "==Chinese==". I remember seeing a scheme where the dialect was identified below the language heading - at least that way our readers would have some clue as to what they were dealing with. "Mandarin" seems like too unfamiliar a term, to our target audience. --Connel MacKenzie 15:34, 28 September 2006 (UTC)Reply
But (e.g.) Min Nan and Mandarin are not by any stretch "dialects" of a "language" called Chinese. They are mutually incomprehensible. If they are dialects, then (as I mentioned above) English and French and Italian are "dialects" of (what? "European"?). Texas English and Brooklyn English share 98% of the vocabulary, and 95% of the pronunciation. Min Nan and Mandarin share 15-20% of the (common, written) vocabulary and none of the pronunciation. They are not dialects. We owe our readers the understanding that "Chinese" is not a language. (No matter how much they have been confused before ;-) A-cai? Robert Ullmann 16:13, 28 September 2006 (UTC)Reply
As I'm beginning to understand it, the right way to think about this is that the relationship from Mandarin, Cantonese, and Min Nan to "Chinese" is the same as the relationship from English, German, Danish, and Swedish to "Germanic languages", and from French, Spanish, and Italian to "Romance" or "Italic" languages. (Or perhaps the same as the relationship from English, German, Danish, Swedish, French, Spanish, and Italian to "Proto-Indo-European", if you believe in such things.) —scs 03:18, 29 September 2006 (UTC)Reply
I understand the concern, but this was already debated and decided upon, or so I thought (Wiktionary:Beer_parlour_archive/July_06#Min_Nan). Connel, I believe you originally suggested giving the idea several weeks before making changes (that was two months ago). Robert is essentially correct. As Davilla pointed out in the July discussions, the words dialect and language are imprecise terms. The ultimate question is whether two forms of communication are mutually intelligible. In July, I provided an example of how Min Nan and Mandarin are unintelligible (for more info, see w:Chinese language#Classification of variations within the Chinese language). I agree with Connel that perhaps the word Mandarin is not as well understood in English as the word Chinese. However, it is not that obscure ("beginning Chinese" google hits vs. "beginning Mandarin" google hits). Besides, there are many languages that are not well known or understood to a monolingual English speaking readership. These might include: Chuvash, Dacian, Kadiwéu (all of these and more are already language headers on Wiktionary. See: Category:All_languages).

A-cai 18:02, 28 September 2006 (UTC)Reply

That is a very unfair comparison though. Can you name even one person who speaks English, that hasn't heard of China? --Connel MacKenzie 23:13, 28 September 2006 (UTC)Reply
But a fair comparison is that many Americans believe that the people of Mexico speak "Mexican", which is not a language. --EncycloPetey 23:22, 28 September 2006 (UTC)Reply
Connel, part of the confusion is that when we say "Chinese" in colloquial English, the majority of the time, we are actually referring to Standard Mandarin. The United Nations lists six official languages. One of those six languages is Standard Mandarin of the PRC (official written correspondence is in Simplified Chinese). However, if you visit the United Nations website, they frequently use the term Chinese. Cantonese is a dialect of Chinese, but if a person could only speak Cantonese and English, but not Mandarin, that person would stand no chance of gaining employment with the U.N. as a Chinese interpreter. The problem is that the association of Chinese to Standard Mandarin is not an absolute. That same Cantonese speaker may very well consider himself to be a Chinese speaker (in other words, he may regard Cantonese and Chinese as being synonymous), despite the fact that Cantonese and Mandarin are not mutually intelligible. Further complicating this is the fact that a huge number of Mandarin speakers are also fluent in at least one other Chinese dialect (see: w:diglossia, and w:code switching). In spoken and written English, we can often determine the intended form of Chinese by looking at the context. An individual entry on English wiktionary often does not provide the necessary context (especially since wiktionary is attempting to document all languages), hence the need for greater precision in the level two header.

A-cai 01:33, 29 September 2006 (UTC)Reply

Question: would it be appropriate to use a compromise notation in the level-2 language headers, something like Chinese/Mandarin, Chinese/Cantonese, Chinese/Min Nan, etc.? The intent would be to simultaneously:
  1. convey that these are not mere dialects (precisely because there would not be any dialectical level-2 headers like "English/American" anywhere else on the wiki to falsely compare them to), but also
  2. reassure and offer some instruction to the dumb Americans like Connel and me, who have had a lifetime to learn the false fact that "Chinese" is one language, and are having trouble letting go of the "fact".
Now, I do realize that issues like this can be very, very sensitive. I realize that the speakers and partisans of some of these languages might not want to be associated with the word "Chinese" in any way, or might be afraid that the intent stated in #1 above would not be understood by readers, that readers would get the unwanted and wrong impression that the languages are mere dialects. Anyway, please, don't shoot me for suggesting this; it's just an idea. —scs 03:07, 29 September 2006 (UTC)Reply
Alternatively, what we really need to do is figure out some good, standard way of dealing with "language groups" at all, because as Robert Ullman pointed out above, that's the right way to think about what "Chinese" is. —scs 03:10, 29 September 2006 (UTC)Reply
The point is that our list of level-two headings aren't all langauges or all dialects, rather languages or dialects, with mutual intelligibility the best yardstick we have for managing the master list of these languages/dialects. Clearly (to anyone who speaks it) some like "Chinese" must be split, whereas I question the wisdom of, on the flip side, maintaining language splits upheld by politics only. DAVilla 12:37, 29 September 2006 (UTC)Reply
Classification of Chinese as a language or language family is problematic because whether you classify it as one or the other has more to do with your political beliefs than with linguistic precision, as has been pointed out before. The question for us is: do we at Wiktionary need to take a point of view on the issue, or can we come up with a descriptive label that both satisfies our need for precise classification of words, while still remaining neutral? I believe the term Mandarin is sufficient for this purpose, since it is a known commodity in English (albeit not as well known as Chinese). However, if Mandarin is thought of as being too obscure, I have no objection to ==Chinese Mandarin==. This matches the ISO 639-3 code of cmn. Cantonese and Min Nan can still stay as is (in other words, no need to say ==Chinese Cantonese== or ==Chinese Min Nan==, which sounds awkward in English anyway).

A-cai 04:34, 29 September 2006 (UTC)Reply

Please note that I was referring to our target audience, moreso than myself. I will admit that I was under the impression that "Min Nan" had nothing to do with Chinese. A-cai, your proposed solution sounds elegant. But it does sidestep the thorny issue raised by Robert Ullman that does need to be addressed at some point. Thank you for pointing out that it isn't America where the language name confusion originated, but China itself (Cantonese speakers calling a "different language" Chinese.) Don't forget, there are worse places than the US that speak English, but are not likely to have any clue, which Chinese language is which.
"Mandarin Chinese" sounds more natural to my ear, than "Chinese Mandarin." I would describe the off-and-on participation in these conversations (no, not just mine) as erratic. Perhaps we should put the issue to a WT:VOTE? There are several possibilities that I see. 1) The A-cai suggested headings. 2) The historic Wiktionary "Chinese" only for all languages in the Chinese language group, to keep a standard heading, 3) Language group prefix before each (e.g. Chinese Mandarin, Chinese Cantonese, Chinese Min Nan,) 4) Use the very-taboo templates with the language headings, so that people can choose to see just "Chinese" for all three or the name with "Chinese" or just the name (or even just the language code, if they really wanted.) Are we nearing a point where we should start voting and writing policies, or do we need a few more days for brainstorming? --Connel MacKenzie 08:56, 29 September 2006 (UTC)Reply
There is a lot of argument against #2, and for those logistical reasons it really doesn't stand a chance IMO. I doubt much need or support for #3, but I'm not opposed to it or to using ==Mandarin Chinese== for clarity, although really ==Mandarin== should suffice, as per #1. Can't comment on #4. Is anyone really pushing for it? DAVilla 12:37, 29 September 2006 (UTC)Reply

Personally I would rather cause temporary confusion to someone who knows nothing about it, than extreme irritation to anyone that has any familiarity with Chinese languages. They are (after all) the people that will be making most use of it. You might expect to see ‘Chinese’ used in some places and on some sites, but on a language website like this one it would be pretty pathetic. Widsith 08:39, 29 September 2006 (UTC)Reply

Widsith, Chinese speakers/readers (all flavors) are using one of the zh: Wiktionaries. The people reading this one are English speakers. So, to facilitate learning, you'd want them to recognize unfamiliar terms? And if they can't then shoo them away? Where should we send people, to go learn Chinese, before they are permitted to look up Chinese entries in en.wikt:? --Connel MacKenzie 09:02, 29 September 2006 (UTC)Reply

No, to facilitate learning I think we should use the correct terms. Lumping Mandarin and Cantonese together as ‘Chinese’ only facilitates ignorance. Also, it is not the case that you have to speak the language to have some understanding of it. Chinese speakers may use the zh:Wiktionary, but people learning or studying Chinese languages will use this one and they require decent treatment. People who don't study it, aren't learning it, and have no interest in it could hardly care either way. Widsith 10:11, 29 September 2006 (UTC)Reply

I created the entry for 草地 so that we can have a concrete example to look at. In particular, note the example sentences. It would be very cumbersome to combine the example sentences, because the sense meanings do not coincide in this case (see false friend). Even if the sense meanings did match, the wording in the Min Nan sentence would need to be completely reworked in order for it to make sense as a Mandarin sentence (I know this is not obvious by looking at it, you'll have to trust me on this point). This is why I ultimately turned away from a generic label of Chinese. As soon as we start to introduce Chinese dialects other than Mandarin into the equation, we run into serious problems.

A-cai 10:58, 29 September 2006 (UTC)Reply

I think the comments earlier about places like Mexico miss the pragmatic point about the number of native "Chinese" speakers -- well over 1 billion, roughly the same size as Europe, or the Americas, or India. I don't remember hearing anyone arguing that Europeans speak European or that all people in the Americas speak American. Certainly, some people believe that Indians speak Indian, but we don't pander to them here -- we say Hindi, etc, etc. Similarly, we should speak of Mandarin, etc, etc.
I think most people in England would recognise that the word Mandarin relates to China (we know the word from history lessons, and also oranges and operattas). I think that most Londoners would also know that Mandarin was an important Chinese language (and the same is probably true in most cities with a significant Chinese population). I can't speak for other English-speaking countries. --Enginear 12:11, 29 September 2006 (UTC)Reply
Hey, he admitted we speak "English" on this side of the pond!! ;-) DAVilla 13:04, 29 September 2006 (UTC)Reply
No he didn't. He was talking about countries like Kenya, where we speak English ;-) Robert Ullmann 13:36, 29 September 2006 (UTC)Reply
Resuming the color flamewar does nothing to lessen the impression that all Brits are pompus idiots. I don't see these underhanded insults as at all helpful or productive. --Connel MacKenzie 08:48, 30 September 2006 (UTC)Reply
Underhanded insults? Did I miss something? —scs 11:10, 30 September 2006 (UTC)Reply
Yeah, you missed the <invisible><font size="zero" color="bgcolor"><!-- ;-| --></font</invisible>DAVilla 18:33, 7 October 2006 (UTC)Reply
I'll note some of the politics here: The United Nations tends to equate "Chinese" with "Mandarin" because the PRC is a charter member, permanent member of the security council. The PRC pushes the political POV that there is one "Standard Written Chinese", which we would call Mandarin written in Simplified Characters. (I'm not knocking the PRC here; there isn't anything wrong with a political body pushing its political POV; that's what it is there for, right? ;-). Look at WT:AC, the first part could have been written by the PRC. They've been doing this for decades; that's why you learned about the Chinese language in school.
The ISO is part of the United Nations; representatives (rapporteurs) to the technical committees represent the national members of the UN. The reason we have just one 2-letter code in IS 639-1 (zh) is that the PRC insisted there be exactly one code, for SWC. This led to awful hacks like zh-tw for Mandarin in Traditional Characters. In IS 639-3 we now have a more reasonable set of codes for the major Chinese languages, and Mandarin is cmn.
One reason that I suggested Mandarin, Min Nan, Cantonese, etc. without identifying one or any as "Chinese", besides technical accuracy, is that it side-steps the political POV(s). Robert Ullmann 13:29, 29 September 2006 (UTC)Reply
I absolutely agree. There is no single language called "Chinese", and there should be no such header. Although the majority of "Chinese" people speak Mandarin, significant populations speak Cantonese and Min Nan. bd2412 T 18:09, 29 September 2006 (UTC)Reply
I agree with the use of headers like Chinese Mandarin and Chinese Cantonese as a solution for this particular case. I prefer putting the word Chinese first, so that the languages and their headers will be grouped together in alphabetical order. As others have noted, many English speakers are ignorant of the different varieties implied in "Chinese", so grouping them will alert users to this difference. As to the use of "mutually intelligible" / "mutually unintelligible" as a yardstick for distinguishhing languages and dialects -- this doesn't work. The languages Macedonian and Russian are mutually intelligible according to a teacher of English I know (who is a native Russian, and realized she could understand the Macedonian that her students were speaking to each other). --EncycloPetey 18:38, 29 September 2006 (UTC)Reply
I did a quick search on google. Your suggestion is not unprecidented. It appears that if we included the word Chinese along with the dialect name, we have the following choices:
  1. ==Chinese, Mandarin== ==Chinese, Cantonese== ==Chinese, Min Nan==
  2. ==Chinese (Mandarin)== ==Chinese (Cantonese)== ==Chinese (Min Nan)==
  3. ==Chinese/Mandarin== ==Chinese/Cantonese== ==Chinese/Min Nan==
  4. ==Chinese-Mandarin== ==Chinese-Cantonese== ==Chinese-Min Nan==
  5. ==Mandarin Chinese== ==Cantonese Chinese== ==Min Nan Chinese==

The ethnologue report for nan lists it as Chinese, Min Nan. I think the "cleanest" is Chinese Min Nan. This reminds me of the saying by Confucius: 工欲善其事,必先利其器. Any opinions? (about the header, not the proverb ;) A-cai 22:13, 29 September 2006 (UTC)Reply

Not dialects. Min Nan should be Min Nan. We can have Chinese Min Nan the day we have European French and Indian Hindi. (Not to mention American English and English English.) This is why a lot of linguists avoid the word "dialect"; preferring "group" and "language" and "variation". Robert Ullmann 22:51, 29 September 2006 (UTC)Reply
That's what I've been doing, sharpening tools. Think about this: our objective is to have the entire written and spoken vocabulary, with regional variations, for every language in the Chinese group, and 7000+ others. Sounds like a huge job, but if a tiny percentage of the speakers of Min Nan, and Wu, and whatever, worked on it, it would done in very little time. (But than of course, not done at all, since the languages will continue to change.) Let us sharpen our tools and be precise; and if someone comes here and learns that "Chinese" is not a language, that is a good thing. Robert Ullmann 23:22, 29 September 2006 (UTC)Reply
I can give my own sense of what can be accomplished by one person toiling away. I have not kept close track of exactly how many words I've entered into Wiktionary thus far, but if I were to venture a guess, it would be somewhere close to 1,000 individual phrases (not counting simplified/traditional duplicates) for Mandarin alone. Not bad for six months by a single contributor. I calculate that if I were to continue at this pace, within five years, we would have over 10,000 entries for Mandarin! My personal hope is that one day, we will attract the attention of language scholars capable of doing what I do and more. We don't seem to have a shortage of programming talent at Wiktionary, but true language experts have not yet arrived in droves (at least, not for Asian languages).
As to the first point, I am trying to not take a firm position on whether or not Chinese should be in the header. Since I am the one adding almost all of the Chinese entries at this point (and thus may be too close to the problem to be objective), it makes more sense for me to provide the community with relevant information, and then let others attempt to reach a consensus (if possible).

A-cai 00:35, 30 September 2006 (UTC)Reply

I've added 25 Chinese phrases. :) ...but then A-cai had to fix most of 'em. :( bd2412 T 03:45, 30 September 2006 (UTC)Reply
I have just found this discussion. I do not agree with the creeping change that "Chinese" is being removed from articles is a good thing. I support the proposal that Mandarin, Min Nan, and Cantonese remain grouped together as before. Another possibility is to use the proposal "Chinese, Cantonese"; "Chinese, Min Nan," etc. The languages are closely related and should be grouped together. Badagnani 22:27, 7 October 2006 (UTC)Reply
Badagnani, allow me to fill in the others on the backdrop of your comments. Badagnani disagreed with my edits to the words 表演 and 木耳. I will therefore use 表演 to refute Badagnani's claim that Cantonese is closely related to Mandarin. The Cantodict website lists the following example sentence in its entry for 表演:
  • English: I was totally speechless with surprise seeing his perfomance.
  • Cantonese: 表演O哂嘴![7]
The above Cantonese sentence makes absolutely no sense in Mandarin!!! Based on the English definition (and a little internet research), the closest Mandarin equivalent would be:


Granted, the hyperlinked characters are cognates, and mean the same thing in both Mandarin and Cantonese. However, the pronunciation is considerably different.
Badagnani, according to your babel template, you speak neither Cantonese nor Mandarin (which seems strange to me, given your strong opinions over issues related to these two languages ;-), so I won't ask you to provide example sentences to contradict the one's I've provided above. However, can you cite credible evidence from a reputable academic source that corroborates your claim that Mandarin and Cantonese are closely related and thus should be lumped together?
P.S. Before you do a bunch of research, please read: Political views on the Macedonian language. Now take a look at a typical Macedonian word on Wiktionary, such as календар. This should help you to put the issue into a broader context.

A-cai 00:12, 8 October 2006 (UTC)Reply

Course in Lexicography

Feel free to move this to a more appropriate place.

The following was posted by Adam Kilgarriff on the CORPORA list.

                        
LEXICOM-ASIA 2006
       A Workshop in Lexicography and Lexical Computing

Venue:     Kowloon, Hong Kong
Hosts:     Language Centre, Hong Kong Univ of Science and Technology
Dates:     December 11th-15th, 2006


Led by Adam Kilgarriff and Michael Rundell of the Lexicography MasterClass,
Lexicom is an intensive one-week workshop, with seminars on theoretical
issues alternating with practical sessions at the computer. There will be
some parallel 'lexicographic' and 'computational' sessions. Topics to be
covered include:

*          corpus creation 
*          corpus analysis:
     o        software and corpus querying
     o        discovering word senses, recording contextual information *
writing dictionary entries 
*          dictionary databases and writing systems
*          using web data

Applications are invited from people with interests and experience in any of
these areas.  

Over the last six years Lexicom has attracted 200 participants from 28 countries including lexicographers, computational linguists, professors, research students, translators, terminologists, and editors, managers and technical support staff from dictionary publishers and information management companies.

The venue, HKUST, is beautifully situated on Clearwater Bay in Kowloon, only 30 minutes from central Hong Kong.

To register for Lexicom, go to: http://lc.ust.hk/~centre/lexi2006/ Early registration is advised (the Workshop has been oversubscribed in previous years), and registrations received before 7th October 2006 carry a discounted fee.

Further details, including reports of past events can be found at: http://www.lexmasterclass.com.

Michael Rundell & Adam Kilgarriff The Lexicography MasterClass --BrettR 13:28, 28 September 2006 (UTC)Reply

Thank you BrettR. Is this going to have on-line participation as well? --Connel MacKenzie 13:44, 28 September 2006 (UTC)Reply
Afraid not.--BrettR 01:53, 14 October 2006 (UTC)Reply

Category:Psychology

Category:Psychology contains a manually maintained list of phobias and words ending in "-philia". Surely these should be in subcategories so that the lists are updated automatically? — Paul G 10:57, 29 September 2006 (UTC)Reply

Indeed, there is already a "Phobias" category - is there one for "philias" (there is no such word, by the way)? — Paul G 10:58, 29 September 2006 (UTC)Reply
Oooh, there is indeed such a word!
Bruce A. Arrigo, Catherine E Purcell, The Psychology of Lust Murder: Paraphilia, Sexual Killing, and Serial Homicide (2006) p. 15
  • He noted that "these philias have a sexual association attached to them".
Nils K Oeijord, Why Gould Was Wrong (2003) p. 68:
  • Phobias, philias, manias, perversities, and mental disorders are abnormal instincts. ... Phobias, philias, manias, perversities, and mental disorders teach us how normal instincts work (=how the mind works).
Raymond J Corsini, The Dictionary of Psychology (1999) p. 719:
  • [Defining "Philia"] The near-opposite of phobia, except that only a few phobias have a specifically sexual context whereas most philias (called paraphilias) are erotic attachments experienced almost exclusively by men, often termed fetishes.
David E. Young, Jean-Guy (EDT) Goulet, Being Changed by Cross-Cultural Encounters: The Anthropology of Extraordinary Experience (1994) p. 262:
  • Philias and phobias can also be included in this category of behavioral traits. The correspondence of the child's philias and phobias to those of the previous personality, or which could be explained on the basis of the previous personality's mode of death, can be assessed.
Rumack H. Rumack, David G. Spoerke, Barry H. Rumack, Handbook of Mushroom Poisoning: Diagnosis and Treatment (1994) p. 11:
  • Wasson traces the movements of certain groups and identifies pockets of philias and phobias.
Gaston Bachelard, Psychoanalysis of Fire (1987) p. 6:
  • Everyone must destroy even more carefully than his phobias, his “philias,” his complacent acceptance of first intuitions.
Cheers! bd2412 T 04:04, 1 October 2006 (UTC)Reply
Thanks for the quotations. This looks like a new coinage, and would be a back-formation from words ending in -philia. Good work. The plural is clearly "philias" (and not "philiae", as the category page give - this is a Latin plural and "philia" derives from Greek). — Paul G 09:30, 10 October 2006 (UTC)Reply

List of protologisms - too long

Wiktionary:List of protologisms is upto 182Kbytes. And it seems to be one of the most active pages. There is a currently a proposal on the talk about separating out the large number of number definitions to a separate page. (First question: what would we call it, if we can determine that we want to do it I'll do the splitting). Is tehre any other thngs we want to do to try to keep the size managable? Or do we just accept it is going to be big and be grateful it isn't scattered across the main article space? RJFJR 13:59, 29 September 2006 (UTC)Reply

  • Something like Wiktionary:Requested_articles:English/DictList would probably be a good way to go for splitting. If any particular letter's list of entries got too big, it would be subdivided on that page. As far as reviewing for later inclusion, the problem is that there is no notation of date added by the entry, so it becomes an all or none thing (or at least an "all until whoever starts it gets tired thing") :-) --Jeffqyzt 16:47, 29 September 2006 (UTC)Reply
If you want to chunk it, just do it by letter of the alphabet. Wiktionary:List of protologisms/A ... and let it grow. Robert Ullmann 16:54, 29 September 2006 (UTC)Reply

I just split off all the numbers from the list of protologisms to a subpage. The subpage is 63Kbytes long, so it represents about 30% of the protologism list. RJFJR 16:56, 29 September 2006 (UTC)Reply

UNTIL it is split, the list can be filtered for entries over a year old by clicking on the appropriate date in the History (it's about 1200 revisions ago!). About 250 are older than that, out of about 1000 total. If anyone can be bothered, those entries can then be checked to see if they are citeable. Once the list is split, then the same will be possible in a year's time for new entries.
I therefore suggest that, before the list is split, the pre-Oct 05 entries are tagged with a "pre-Oct 05" category, and perhaps the later ones tagged with "pre-Jan 06", "pre-Apr 06", "pre-Jul 06", and "pre-Oct 06". This will enable anyone interested to check for cites more efficiently. I would do this categorising myself, but I don't have the knowledge to automate it.
Apart from moving to the main dictionary any words which now satisfy CFI, we could perhaps have a rule that any protologism which does not have at least one (or two) fully independent cites within two years is deleted (or perhaps moved to a list of failed protologisms, which some might consider an intriguing historical record in itself). --Enginear 18:41, 29 September 2006 (UTC)Reply
Um, this is a much better idea than alphabetical, as I suggested above. We could just start a new list periodically. And not remove anything; if they become blue links, fine. We don't need to move them or anything. Once a year would be good, and I think sufficient. LOP/2005, LOP/2006 etc. Robert Ullmann 22:34, 29 September 2006 (UTC)Reply
We'll have to sweep them all occasionally to avoid repeats. bd2412 T 04:05, 30 September 2006 (UTC)Reply

Noun or Noun Form?

Last I noted there was still dispute on whether the POS heading for things like plural were Noun or Noun form. (Similarly for verb/verb form). Has concensus been reached? RJFJR 13:27, 30 September 2006 (UTC)Reply

See WT:POS and the talk page. We seem to be pretty much there; the current revision of the draft policy seems to be acceptable. Short summary is that there are/were people on both sides, but all of the really strong feelings came down against "X form". In any case a plural in a language that doesn't decline nouns other than plurals should not use "Noun form". But please go see, and comment there. Robert Ullmann 13:55, 30 September 2006 (UTC)Reply

help with crossword

i am looking for an answer to a crossword. the clue is "it ends with chalypsography in the Oxford English Dictionary". the answer has 9 letters and i believe the the 2nd letter is O and the 5th letter is M. any help would be greatly appreciated. i could not find the word "chalypsography" in the online Oxford. david

VOLUMETWO (BBC-chalypsography) Robert Ullmann 14:49, 30 September 2006 (UTC)Reply

example sentences from wiki?

In the last day or two, it occurred to me that I could be using wiki much more for example sentences than I have in the past (see: 物换星移). I started doing this with Min Nan, because, as it turns out, Min Nan wikipedia is now one of the largest repositories of written Min Nan on the internet (Min Nan is not usually written down)! Has a policy been formalized on this? As I see it, we have many choices for example sentences (in no particular order):

  1. make something up on the fly (I personally don't like this choice)
  2. cite a printed source that is not available on the internet (books).
  3. cite a printed source that is available on the internet (preferably from wikisource)
  4. cite an internet resource (on-line magazines, chat rooms etc)
  5. cite text from a non-English wikipedia
  6. cite something from a movie or tv show (I have not yet done this, but I am thinking of doing this more often for Min Nan, which is rarely written down. There are a number of Min Nan language tv shows and movies now on dvd that I could use as material for example sentences)

I know that some of this has been outlined in WT:ELE. Two questions, how does everyone feel about using non-English wikipedia articles as a source for example sentences? The advantage would be: no worries about copyright issues, or it disappearing from the web ... that is unless wiki disappears. Another advantage is that I can use the version from English wikipedia as a translation (if it is close enough to the orignal, which is not always the case). Also, should there be some kind of pecking order for example sentences? In other words, something from wikisource is the most desired, followed by printed source that is available elsewhere on the internet, followed by ... ?

P.S. The current guidelines are at Wiktionary:Quotations#How to choose a quotation. As you may surmise from above, I think it should be more detailed. A-cai 23:41, 30 September 2006 (UTC)Reply

Actually, it is good to have both published quotations and sentances made up for purposes of the page. The published quotations provide documentary evidence of the word used. It is therefore good to have such quotations from a variety of dates for each sense of the word, and it is good to qoute from literature, journals, major newspapers, or other sources likely to be widely available or at least reliably archived in major libraries. However, it is also very good to have sentence examples made up for Wiktionary on the page. It is then possible to craft a sentence to demonstrate a particular sense of the word more carefully and in simple examples. These are usually more useful for people learning the word (or the language!). --EncycloPetey 00:38, 1 October 2006 (UTC)Reply
Indeed, for the example sentence the only criteria is that it represents the usage clearly. For the /Citations pages we need durably archived resources. - TheDaveRoss 00:40, 1 October 2006 (UTC)Reply

I added an example sentence to a-soaⁿ to demonstrate what I think a quote from a tv series might look like. Any opinions about the format? A-cai 01:19, 1 October 2006 (UTC)Reply

I'm not sure why you chose the Wikipedia-footnote style, there, where you did. In general, we don't use footnotes the same way Wikipedia does; I haven't found a "good" use for ref/references yet, on Wiktionary. Is your indentation-level significant? It seems fine (other than the ref weirdness) at first glance. --Connel MacKenzie 05:02, 1 October 2006 (UTC)Reply

The indentation-levels are intended to follow the format in Wiktionary:Quotations#Between_the_definitions. In this case, I have provided the original Min Nan text in three different scripts (which are all at the same indentation). The Mandarin subtitles come from the dvd, and can be thought of as a translation (into Mandarin). This is why I put it at the same indentation as the English translation (I included it to make it easier for native Mandarin speakers who may wish to find the scene on the dvd, which lacks English subtitles). The ref/references tags are the best way that I have found, so far, to make these kinds of notes when doing translations. I find it particularly useful when translating classical texts in the etymology sections (see the etymology section of 金屋藏娇). If you can think of a better way to present this kind of information, I am open to suggestions. My goal is to provide enough information for a student of the language to comprehend the original, while still maintaining readability for an English speaking readership. Sometimes, a notes section seems to be the best approach. A-cai 06:26, 1 October 2006 (UTC)Reply

I think it's awesome, as usual. A-cai's entries are consistently excellent. Using example sentences from Min-Nan wikipedia is definitely a good idea. However I still believe the most important citation is one from a referenced, printed source (although that is perhaps more important for English entries than for foreign language ones). Widsith 07:37, 1 October 2006 (UTC)Reply
I don't see any problem with making up example sentences. A poor example will be altered or replaced, and a good example will pass the test of time. The best way to come up with example sentences might be to Google the term and then mix some of the clearer or contextually more easily extracted hits. I've found this to be a great way to come up with ideas, and since you're paraphrasing it's completely legit. It's not that far off from what you're proposing, either, surprisingly enough, just a little less formal. DAVilla 13:39, 1 October 2006 (UTC)Reply

October

User:Ncik vandalism (Again)

Based on his own POV, Ncik (talkcontribsglobal account infodeleted contribsnukeabuse filter logpage movesblockblock logactive blocks) has been removing category items.

I fully expect User:Eclecticology to again support Ncik. I have issues a 15 minute block so he will at least stop for now.

Unlike his similar antics last year, this time he actually replied on a discussion page before engaging in vandalism. However, the lack of discussion is alarming.

Even if some basis for Ncik's arguments can be found, the removal of a category from numerous items (as opposed to renaming the category to Ncik prescriptivist, regional POV's liking) is the worst kind of vandalism en.wiktionary.org enjoys. The various goatse vandals lack the subtlety of Ncik's vandalism, as en.wiktionary.org still has not recovered from his last round of similar vandalism (while on the other hand, image vandalism is usually reverted long before anyone views an image even once.)

All sysop assistance is appreciated. The unilateral decision by Ncik to vandalize the category is unacceptable. The lack of discussion on such an obviously controversial issue is alarming. To have acted in opposition to the initial feedback is inexplicable.

--Connel MacKenzie 13:28, 1 October 2006 (UTC)Reply

I request to desysop Connel. His banning me without even attempting to discuss the issue is not accetable. I was merely removing items from a category which, according to long standing ruls on Category:English nouns with irregular plurals, didn't belong there. Ncik 13:40, 1 October 2006 (UTC)Reply
You want to desysop Connel because he blocked you for vandalism? I don't think so. The topic has been discussed here: http://en.wiktionary.org/wiki/Category:English_irregular_plurals. Where are these rules to which you refer?
These rules:
This category lists English nouns whose plural is formed irregularly with respect to spelling: It includes the singulars of all English nouns except those that:
  • are symbols or letters and form their plural by adding -s or -’s
  • are not proper nouns, end in a consonant + y, and form their plural by removing the -y and adding -ies
  • end in a sibilant (one of [s], [ʃ], [z], [ʒ]) and form their plural by adding -es
  • are not subject to one of the above rules and form their plural by adding -s
And I know that the topic was discussed at Category talk:English nouns with irregular plurals (I moved it to its original place) because I was involved in these discussions as you can see. Ncik 14:12, 1 October 2006 (UTC)Reply

(Ncik, I might be inclined to agree, but your request to desysop Connel discredits everything you say ...)

If the argument has merit, the proper action would be to recat the entries to Category:English plurals or Category:English plurals ending in "-es" (or whatever). Uncategorizing entries is vandalism that requires someone manually fix them all later (though in this case we have a convenient list). Robert Ullmann 15:18, 1 October 2006 (UTC)Reply

I don't know what the merit of a category Category:English plurals ending in "-es" would be, but I don't object to having it. Uncategorising wrongly categorised item is hardly vandalism! Also, adding a new category is faster than renaming one (simply use CTRL+v). You should consider a more neutral (or critical) attitude towards Connel. You haven't been around for long, so you probably won't know that this is not the first desysop request brought forward against him. Ncik 16:29, 1 October 2006 (UTC)Reply
Moot point - it's done. The word "irregular" is removed from the offending category names - I think we can all agree that this is an accceptable solution. Cheers! bd2412 T 16:48, 1 October 2006 (UTC)Reply
Thank you BD2412. --Connel MacKenzie 18:58, 1 October 2006 (UTC)Reply
Ncik, I obviously hold the opinion that you are the worst type of vandal Wiktionary encounters; that is true. But a de-sysop request for a 15 minute ban to stop active vandalism in progress, from a 'contributor' who already ignored the conversation? Indeed. --Connel MacKenzie 18:58, 1 October 2006 (UTC)Reply
Ncik, it is irresonsible and rude to continue with an action you have been asked to stop, until some discussion has taken place. I wouldn't have characterized it as vandalism, but it certainly isn't a way to make friends. Whether or not the blocking was warranted, when another user asks you to stop and discuss the changes you are making you ought to do so, that is the spirit of community that we are aiming for. There are not many ways to deal with someone who refuses to cooperate and who refuses to discuss things, so you can resolve that by not being one of those people. - TheDaveRoss 19:17, 1 October 2006 (UTC)Reply
This bias is outrageous. When and who by was I asked to stop removing those items from the category? By nobody until after Connel blocked me. A quick post on my talk page is sufficient to immediately draw my attention to any objection. Blocking a user to achieve this is clearly a gross abuse of admin powers. Ncik 22:47, 1 October 2006 (UTC)Reply
  • Lastly, with regard to the vandalism of this page, I can think of no better endorsement of today's 15 minute block, than Primetime stepping up in defense. Perhaps he feels his vandal status is threatened or otherwise outclassed?
    Primetime's vandal status is completely irrelevant to Ncik's actions and yours. Arguments can only add weight, if they do anything. "Because he's a vandal, he can be ignored" is the about the best truth that can be drawn. DAVilla 04:26, 7 October 2006 (UTC)Reply
  • For those of you confused by the vandalism, Ncik and I have disputed (for over a year) the classification of various English irregulars. The inexplicable support he got for his conduct at that time caused much bad blood all around. In a poorly thought out compromise, I have not pressed this issue, either way. This certainly is not a new topic; the "obvious" recategorizing does not, in fact, satisfy all concerns. But that can wait for another day, with ample discussion beforehand. --Connel MacKenzie 22:43, 1 October 2006 (UTC)Reply

There Are No Heroes Here

Jesus Christ, what a mess. What a stupid, puerile, unnecessary little mess.

I can't begin to disentangle the claims and counterclaims surrounding the obviously broken "Category:English irregular plurals ending in '-es'", but there is nothing about that minor little category that's so important that it's worth bans and invective and bad blood and bitter recriminations at this level. Guys: Wiktionary is just a dictionary. That category is just a category. Get a grip, will you?

If Ncik waltzed in after a long absence and started making changes against long-ago consensus (which is what Connel seems to be alleging, although I can't see it) that's wrong.

If Connel blocked someone without warning, that's wrong too. And calling it "vandalism", or comparing it to goatse, is an overreaction hyperbolic enough that I hear Mike Godwin waiting just around the corner.

Here's how crazy and nonsensical this all is: someone claiming to be Primetime popped in to point out (rather accurately) how crazy and nonsensical this all is. But of course you can't see it now, because Primetime is PNG here, and gets reverted no matter what. (But check the page history if you're curious.)

There are quite a few of things which the principals in this pathetic little feud seem not to realize. They don't realize how relatively unimportant the issue at hand is, and how badly they're overreacting to it. They don't realize how childish they're being. Most importantly, they don't realize how damaging their incessant truculence and vituperation is to the project. I don't know about anybody else, but I can not deal with this nonsense. If that's the way you want to run things, fine, but it proves the place to be a looney bin that sane people will prefer to stay far away from. I've got better things to do than try to contribute to, or even make sense out of, a project where people's priorities are so bizarrely unbalanced that overblown tempests like this one can regularly erupt out of the tiniest of teapots.

I'll come back in a few days and see if you people have calmed down at all. Do try to.

scs 00:29, 2 October 2006 (UTC)Reply

Noddy Suits

How does Wiktionary handle non-vulgar slang words and phrases? For example, I was looking for a definition of "Noddy Suit". It turns out its a very common slang word within the british armed forces for a [NBC suit]. Should this have it's own page, or is Wiktionary strictly for words similar to those id find in the Oxford dictionary? Renski 15:58, 3 October 2006 (UTC)Reply

As long as it is attested, usually with print citations, and not just a neologism or protologism it deserves its own entry. Google books has at least one citation, you should be able to find a couple of good web citations as well. Also note that "vulgar" has different senses. Just put (British military slang) on the defintion line. Sounds fine to me. Robert Ullmann 16:38, 3 October 2006 (UTC)Reply
The Oxford dictionary does include slang terms. --Ptcamn 00:47, 4 October 2006 (UTC)Reply

Alphabet pages

It looks like Devanagari alphabet was redirected to Devanagari back in April. User:Taxman explained (in the page history) that it's not actually an alphabet. This got me to thinking: shouldn't every "X alphabet" page in the main namespace redirect elsewhere? After all, a phrase like Greek alphabet, for example, is not idiomatic: it is easily understood as meaning Greek + alphabet. Hence it shouldn't be an article in the main namespace. The logical place for such information is the Appendix: namespace, which already contains several "X script" pages. Unless anyone objects, I'll start to make such changes "soon". - dcljr 22:47, 4 October 2006 (UTC)Reply

I very, very strongly object. Idiomacy is not the only criteria. --Connel MacKenzie 11:03, 5 October 2006 (UTC)Reply
Absolutely not. These are set phrases that belong in the main namespace. And Taxman is correct, Devanagari is not actually an alphabet, but a syllabary. Most of the scripts that developed out of Phoenician to the west became true alphabets, while most scripts that moved eastward became syllabaries. Some people now claim that Thai has become a true alphabet, but this is debatable. Korean is an alphabet, but it did not evolve from Phoenician. Many of the North American Indian languages are written in syllabaries (e.g., Cherokee and Ojibwe). —Stephen 11:27, 5 October 2006 (UTC)Reply
Oh, I get it! It's not an alphabet unless it doesn't make any cens. Stuff that makes sense iz inferior. :-P DAVilla 13:43, 5 October 2006 (UTC)Reply
I don’t understand what you are trying to say. Syllabaries are not inferior to alphabets, they are simply different. For some languages, syllabaries make much better sense, and other languages are more suited to alphabets. —Stephen 15:36, 5 October 2006 (UTC)Reply
Google print shows a number of usages of "Devanagari alphabet" even in modern works that know better, so it's clear the term is used innacurately to get the point across to people that don't know the difference between an abugida and an alphabet. So I created the definition as such. The problem is it either obscures the more detailed information in Devanagari or must duplicate it. - Taxman 15:19, 5 October 2006 (UTC)Reply
I think your point of including the previous contents in the Appendix namespace was overlooked, and I quite agree with it. As for the page itself, the title in the main dictionary space needs a definition even if the phrase is misconstructed, as you and Stephen claim, assuming it can be attested as such. DAVilla 14:41, 5 October 2006 (UTC)Reply
Oh my. I agree with DAVilla; the addition of Appendix:___ alphabet would be useful. I did seem to completly misunderstand the original question, then. --Connel MacKenzie 01:51, 6 October 2006 (UTC)Reply

Sorry, I tried to make my comment more concise at the expense of clarity, apparently. Originally I had pointed out that Taxman's edit summary was quite right and that I wasn't objecting to his redirect at all. But then I removed the remark, thinking it wasn't necessary. I was, OTOH, questioning whether Greek alphabet and similar entries should be in the main namespace. I disagree with Stephen: "Greek alphabet" is not a "set phrase". It's a regular noun phrase referring to the alphabet used to write the Greek language. This is no "Good morning" or "Long time no see". If Greek language doesn't warrant its own entry, I don't see why Greek alphabet should. <time passes> Uh, oh. I see that Greek language does have an entry, even though French language and English language do not (they're redirects, just like I was suggesting Greek alphabet should be). Connel, am I to understand you're saying Appendix:Greek alphabet is a good idea, but we should keep Greek alphabet as an article? That seems unnecessarily redundant to me... - dcljr 23:32, 17 October 2006 (UTC)Reply

WT:BLOCK

It has recently come up, via 15MinuteBlockGate(tm), that we do not have any Official policy about who to block, for what, and for how long...infact, we have no Official policies at all. No where does it say that Connel was wrong for blocking Ncik. My thoughts on what policy is, and should be, as it pertains to WMF communities such as ours, are that loose guidelines are a Good Thing(tm) and should be written down, so as to avoid incidents of offense and allow everyone to work from the same page. Currently we have some 40 policies in various stages of completion and acceptance, from rejected policies, to drafts, think tanks and semi-official ones. There is a hypothetical "Official" status of which we have none, but I think there are plenty of unwritten or unconfirmed policies which could be formalized and written down, yet would affect the day-to-day workings of Wiktionary very little, as they are practices we already adhere to. The purpose of writting them down and stamping them with our approval would be to let new people know what the long established guidelines are, and to have something to point at if we feel someone is being out of line. I propose that we start working on these policies, and promoting the ones we support, and rejecting the ones we don't, until we have something which resembles actual Wiktionary policy. Starting with WT:BLOCK, as the most recent policy it would have been handy to have.

I wrote down some duration guidelines in there, they can and should be ammended to reflect a broad consensus and to be more/less specific, so as to reflect community opinion. I also changed some parts so they no longer referred to vandals (per WP:DENY). Have a go, discuss it here or there, and once we have something which most people agree with we will upgrade it to OFFICIAL and have a party...then move on to the 39 others. - TheDaveRoss 20:05, 6 October 2006 (UTC)Reply

My initial take on WT:BLOCK is that it is complex and very technical. Much of it is completely beyond my ability to understand, especially the abbreviations. —Stephen 02:35, 7 October 2006 (UTC)Reply
The only portions which are technical are the ones which pertain to IP (internet protocol) addresses, which all admins should have a basic handle on if they are going to block an IP address. I would say that the parts which seem too technical wouldn't apply to people who find them so. No one has to do ARIN lookups nor rangeblocks, and those of us who are comfortable with those features do understand what those things mean. The people who are most likely to do much work with such things are checkusers, who all know what is going on there. As for the non-technical side...please help us clean it and make it clear, complexity is the opposite of our goal, we want it to be as user friendly and reflective of community opinion as possible. If there are specific parts which you think need a lot of work, please indicate them and we will work on them if you would rather not. - TheDaveRoss
I have reformatted the page a bit, maybe some things are a little clearer. - TheDaveRoss 07:18, 7 October 2006 (UTC)Reply
Good, it's not an acronym anymore! Bullies lacking ostensible criterial knowledge... to RfV anyone? :-P DAVilla 14:59, 7 October 2006 (UTC)Reply
I half-jokingly started adding a "glossary" line to the range-block section. Much to my dismay, 10 out of 10 terms were redlinks. I'll add them shortly. --Connel MacKenzie 17:11, 7 October 2006 (UTC)Reply
I started a Wiktionary:Range blocks page to help with the techie side of adminship. - TheDaveRoss

Werdnabot

I'm enabling Werdnabot (archiving the grease pit and any discussion page that has it set up) on Wiktionary. Dvortygirl has temporarily flagged the bot to bypass the MediaWiki antispam captchas. It will be deflagged in four days — and I'll wait for consensus to flag it. If you want it to archive a discussion page, get consensus on that page if it's not your user talk page, and add {{User:Werdnabot/Archiver/Linkhere}} <!--Werdnabot-archive Age-x Target-y-->. Where x is the maximum age in days that a dormant thread will stay on the page, and y is where to place the archived threads. If you want a section index, add <!--Werdnabot-index z-->, where z is where you want the section index placed. This bot is approved on en.wikipedia and has run without error there for a number of months. All yelling to my talk page. Werdna 05:12, 7 October 2006 (UTC)Reply

Wow, superuseful! Thanks! DAVilla 18:38, 7 October 2006 (UTC)Reply

flavor/flavour of English definitions (奶嘴)

I'm sure others have encountered this kind of thing if they have dealt with Wiktionary long enough. I entered the word 奶嘴 into Wiktionary just now. As you will note from clicking on the link, it actually goes by a different name depending on what part of the English speaking world you come from. In this case, I provided a picture in order to quickly convey the meaning. What would be the proper way to annotate this? Does it need to be annotated? For some reason, it seems as though I should put something like:

  1. (US) pacifier; (UK and Australia) dummy; (Canada and Ireland) soother

However, according to WT:ELE rules, the stuff in parentheses would imply the specific dialect or accent of the defined word (not the defining word or words). Am I obsessing too much over this or is there a proper approach for such an entry? A-cai 03:35, 9 October 2006 (UTC)Reply

I'm pretty sure we just list them all, then on the target pages, explain what locales they are specific to. --Connel MacKenzie 06:39, 9 October 2006 (UTC)Reply
The example you gave seems very reasonable to me, whatever ELE says. Widsith 07:57, 9 October 2006 (UTC)Reply
But what it means is that the US usage of 奶嘴 means pacifier, and the UK usage of 奶嘴 means dummy ... I had thought (while editing something, I think widdershins ;-) that this case should be (e.g.)
  1. pacifier (US); dummy (UK and Australia); soother (Canada and Ireland)
But that isn't really that distinctive. Connel's idea is probably best. Robert Ullmann 16:46, 9 October 2006 (UTC)Reply
Hm, I don't see why the "and"s are not italicised... what is wrong with UK and Australia or even just UK, Australia? — Paul G 09:24, 10 October 2006 (UTC)Reply

IPA

At the moment, {{IPA}} creates an automatic link to w:IPA chart for English. But this template is used for all languages here. Maybe it should link to something a bit more generic, like w:IPA. Widsith 08:57, 10 October 2006 (UTC)Reply

Better yet, change the link depending on the language, where unknown or unspecified languages go to the most general page. Adding a language parameter to existing entries is something a bot could do. DAVilla 08:55, 11 October 2006 (UTC)Reply
Actually, I have been meaning to raise this issue. I think it would be really cool if the template gave readers (who don't necessarily know IPA symbols) an intuitive way to understand the sound represented by each IPA symbol. One way would be something like: [ Lua error in Module:IPA at line 396: IPA input must not contain wikilinks. ], then make sure each symbol has an entry that contains enough information (i.e. sound files, rhyming words etc) for a reader to equate the correct sound to the symbol (The entry for Lua error in Module:IPA at line 396: IPA input must not contain wikilinks. isn't too bad). Can the IPA template be modified to do this (without the user having to hyperlink each and every letter)? Is there another way that the same result could be obtained? A-cai 12:07, 11 October 2006 (UTC)Reply
I can think of a way of doing it in Javascript, but not wikitemplates. To me, this seems like a marvelously useful idea. Actually, if we had the StringFunctions extension, this could be done in wikitemplates. --Connel MacKenzie 18:04, 11 October 2006 (UTC)Reply
I also think we should have local copies of IPA and SAMPA charts, as complete as possible. I also think we should have subpages of the chart with information about each phonetic, but that may just be me. - TheDaveRoss 18:32, 11 October 2006 (UTC)Reply

Template en-noun for uncountable nouns

Template en-noun currently displays a hyphen where the plural would be for uncountable nouns. To me, this makes it look like there is something missing or something that could not be displayed by the browser. In fact, there is something missing, namely any indication that the noun is uncountable. I understand that the hyphen is meant to mean "no plural" but that is not quite the same as uncountable, which means "not used in the plural".

Could someone remedy this please, so that "uncountable" is displayed? I would if I knew how to do this.

Thanks. — Paul G 09:23, 10 October 2006 (UTC)Reply

Several of the template appear to be broken at the moment. See discussion in the Grease Pit. --Jeffqyzt 13:49, 10 October 2006 (UTC)Reply
Thanks. I've just noticed the same with en-verb. — Paul G 15:09, 10 October 2006 (UTC)Reply
I think it is a laudable victory for open communication, that this was resolved as quickly as it was, yesterday. --Connel MacKenzie 17:49, 11 October 2006 (UTC)Reply

Checkuser

Do we have a "probable cause" clause for Checkuser? DAVilla 15:16, 10 October 2006 (UTC)Reply

I think in meta yes, it depends on the situation. Do you have a few more specifics or is it the usual WF socks... -- Tawker 17:56, 10 October 2006 (UTC)Reply
More of a general "rights" issue than anything specific, prompted by WF as far as false accusations go, and no reason that I could immediately find to even raise the question. DAVilla 18:13, 10 October 2006 (UTC)Reply
I don't really understand why all of WF's sockpuppets are immediately blocked, since most of the time they are just making perfectly reasonable contributions to the site. Widsith 07:52, 11 October 2006 (UTC)Reply
Probably something to do with using new socks to evade an indefinite blocks on his other ID's, that is still against the rules, right? --Versageek 08:33, 11 October 2006 (UTC)Reply
You are probably right, but I don't feel very comfortable about it. The vast majority of his edits were, and are, constructive. Widsith 09:03, 11 October 2006 (UTC)Reply
Do you have a better solution to propose? His MO has been to submit legitimate edits long enough to gain trust (i.e. sysop) then to wreak havoc. Yes, of course he is trying the same thing again; what gain can we possibly get from encouraging that now? --Connel MacKenzie 17:45, 11 October 2006 (UTC)Reply
Fair enough. Widsith 20:09, 11 October 2006 (UTC)Reply
Do accounts get any special privileges after so many edits, like 200 or something? Most of the pages that are protected I think are editable by sysops only, but if there are large areas of semi-protected pages, or for the case of casting a vote or the like, I could see enforcing this to be necessary even without probable cause. DAVilla 08:51, 11 October 2006 (UTC)Reply
Could you restate your question more specifically, please? The only edit-count based privilege I know of, is board voting. --Connel MacKenzie 17:45, 11 October 2006 (UTC)Reply
The thing is, every time you block WF, you let him know you're onto him. Right now he probably knows how to evade all the tricks we have against vandalism, and so we're going to need new ones sooner than we should. If you waited until he started making questionable contributions to check him, or until he had 200 edits or something, then the learning curve would be stretched out that much more. DAVilla 21:46, 11 October 2006 (UTC)Reply
Our tricks aren't supposed to be hidden, the idea is to block all vandals, block all anonymous means of editing, then have cake. - TheDaveRoss 21:50, 11 October 2006 (UTC)Reply
For no reason should Wonderfool, the person, not the account, ever be allowed to edit again. He has shown twice that his intent is not good, regardless of the number of quality edits he makes in the mean time. - TheDaveRoss 18:29, 11 October 2006 (UTC)Reply

The <tt> Funcion is for what?

asdfasdf Why do you use the <tt> in the pronouncation? Why do you include haUs? My userpage in on wikipedia, user:100110100. Please reply there. Thanks.198.166.59.152 09:02, 11 October 2006 (UTC)Reply

<tt> changes the code to monospace. The pronunciation in {{IPA}} uses font code that's much more complicated. In addition to that template, the pronunciation for house should probably use the {{SAMPA}} template rather than <tt>. DAVilla 09:14, 11 October 2006 (UTC)Reply

Stop me if this sounds familiar

Additions by regular contributors far outweigh spotted edits by passers by, and that's not counting the revisions necessary to revert the spotted edits. Many times when fixing an edit I start to reword it comepletely, and it feels like I could have done a better job if I had done it myself the first time. This is apart from concerns about learning the standard format which merely makes the burden lighter on us. Now not all of the regular contributors are registered. There are a few anonymous IP's who never bothered to register (or perhaps only didn't bother to log in?) who make excellent contributions, but—and this is the point—who make them in batch.

Newcommers, not counting numerous vandals, make mistakes because they don't understand the implications of a multilingual online dictionary, as opposed to any dictionary they've seen before, be it a regular or even a translating dictionary. This single aspect has many facets. Each spelling gets its own page: we do not redirect inflections or alternate spellings. We write foreign entries in English, and we don't translate them. That funny text next to a language script you don't understand isn't the pronunciation. Even simple things like using lower-case page titles seem to trip people up.

At Wikipedia aside from two occasions I refused to revert vandalism because I did not agree with allowing completely anonymous editors. Granted that was a short-lived first experience with a wiki (I don't have much encyclopedic knowledge to contribute) but I felt the same way coming here. Now being a sysop I feel obligated if I see it. But doesn't everyone else having to patrol all these edits feel the same way? Can we actually measure how much good anonymous editing does, or even if it does any net good at all? Again, not counting formatting, and even excluding vandalism which we can't say would be diminished by how much.

I mean, goodness, it's not like registering even requires an email address or opening a new window or anything. If we can't expect people to take that basic step, then can we expect them to make any effort at all to understand what they're editing? DAVilla 20:09, 11 October 2006 (UTC)Reply

Did I write that? Oh wait, that wasn't me. It was my rant, in someone else's words.  :-)
One thing I've noticed since we've had "patrolled" edits, is that the "good" edits get filtered out very quickly. The side effect of this, is that patrolling edits (with "hide patrolled edits" turned on) becomes a sysop-burning-out vandalfest of crappy contributions. With none of the "good" edits mixed in, it looks like all anon contributions are crap. (This really isn't the case, but with the extra focus of patrolling edits, it seems that way.)
Also of note, is that according to Alexa, we are still experiencing growing pains. The lack of additional sysop nominations recently may be starting to catch up with us. --Connel MacKenzie 20:26, 11 October 2006 (UTC)Reply
Ooops. Also see m:GAY. --Connel MacKenzie 20:28, 11 October 2006 (UTC)Reply
Well yes indeed. The majority of anonymous contributions (not anonymous edits to existing words) are very poor. What I would like is for people to take a simple test (construct some sort of dummy entry with reasonable format) and, only if they make a decent fist of it, then get a flag set that allows them to be a contributor. Probably not feasable though. But seriously, it might be a good thing to put to the vote - we haven't had a vote for a while. SemperBlotto 21:19, 11 October 2006 (UTC)Reply
There are a few (maybe 15) IPs which contribute prolifically and constructively, and while I would hate to lose their support, I have a feeling that they would finally register an account if their hand was forced. Let's do it. - TheDaveRoss 21:33, 11 October 2006 (UTC)Reply
I have very good reason to seriously doubt that. I tried forcing a name on one rather prolific anon contributor, and he stopped contributing here, until I undid the change. (He's the #1 anon on WP.) For the other fourteen, you may be correct. --Connel MacKenzie 00:00, 13 October 2006 (UTC)Reply

Interesting point about good edits being filtered quickly. Actually, I'm feeling better about the anon-editing thing because, since writing it, I've noticed a lot of foreign-language contributions by anonimous IPs.

If we can get translation sections divided sense-wise by default, that just might be enough to tip the scales. Translations are the kind of edits that I feel really good seeing because I know I couldn't add them myself. I see anonimous IPs and anonimous-to-me new contributors still marking numbers, and I feel bad because the contribution is basically nil. Some day a contributor who would be able to add the translation him/herself anyways is going to have to come by and check it.

But that doesn't always work. What happens is that a vandal can hit us in a language we don't speak, so that it's much harder to detect it. I reverted one "translation" today that I am 99% sure was vandalism, but it's hard to know when you don't know the word. This case was at least for a language I kind-of speak, so I could evaluate it somewhat, but were someone to add Telugu words for genitalia all over entries, I wouldn't be able to make such a determination. --EncycloPetey 23:12, 13 October 2006 (UTC)Reply
You will note that my argument barely concerned vandals at all. I was very seriously suggesting that, vandals aside, the only people who contribute anything of worth are those who come back time and again. The group of contributors who could be characterized as spotted editors with good intentions do not, on the whole, contribute to the project. That was my assertion, initially. The number of legitimate translations is potentially an argument against. DAVilla 08:20, 14 October 2006 (UTC)Reply

However, that doesn't really negate what I said before. It might be that anon IPs are registered with other Wiktionaries and don't bother to take the time and register at each one. If we ever get cross-project and/or cross-language accounts, then I would very seriously consider studying this. Despite the "friends of gays" humor, this is not Wikipedia and the differences are pronounced as I've laid them out above.

Very true - the humor was not meant to detract from your points, only to provide some levity. The differences are quite pronounced, indeed. --Connel MacKenzie 20:43, 13 October 2006 (UTC)Reply
I'm certain I've read that long ago, and it was no less funny this time around, probably moreso. DAVilla 08:20, 14 October 2006 (UTC)Reply

In the meantime, I propose semi-protecting some of the, um, more carnal Wikisaurus pages. This does very little but it might make lighter work for a couple of people. I know this has been brought up before, and at the time in fact I was against it because I felt they were vandal magnets. But the truth is that the contributions to these pages aren't pure vandalism, they're just sketchy, and it's just as well to have people editing them whom we have a little more confidence in. DAVilla 22:16, 11 October 2006 (UTC)Reply

I support the protection of the sketchy WS pages. But I think it should be WT:VOTEd on, as there was so much dispute about consensus, last time a major/minor change was attempted with something in the WS namespace. --Connel MacKenzie 23:47, 12 October 2006 (UTC)Reply
I protected (auto/sysop) Wikisaurus:homosexual about a month ago, and either no one noticed, no one minded, or no one cared, because no one complained. I am in favor of protecting, purging, formatting, cleaning and making these pages more in line with something one might consider...um...valid? I am willing to do the work, but last time I hit so much resistance that I gave up. If there were a vote and consensus indicated that cleanup would be allowed (last time I cleaned for 3 hours and then had it reverted) I will clean them up again. All of them. - TheDaveRoss 23:52, 12 October 2006 (UTC)Reply
OK, I have started such a vote at WT:VOTE. --Connel MacKenzie 20:41, 13 October 2006 (UTC)Reply

I have an additional suggestion. For each of the problematic Wikisaurus pages, that would be semi-protected under this proposal, put a simple note at the top saying, "This page is full. Please add additional synonyms to [[/overflow]]." That way the kids can continue to have their fun thinking of new words for penis and breasts, but nobody else has to look at them. —scs 00:38, 16 October 2006 (UTC)Reply

Oh! Lookit that. Wikisaurus:penis and Wikisaurus:breasts are already doing just that. —scs 00:40, 16 October 2006 (UTC)Reply
I still find fault with the notion that we have to act as a repository for any swill that the twelve year olds can come up with. Why is that? - TheDaveRoss 01:48, 16 October 2006 (UTC)Reply
It's an unexpected result which, arguably, we have to live with. The proof is not direct:
1. We are, like Wikipedia, the free dictionary anyone can edit.
2. Our success is (like Wikipedia's) due to our accessibility, to the extremely low barrier-to-entry for editing.
3. Anything which detracts from our openness (mandatory registration, page protection, etc.) has an unknowable but significant deleterious effect on our openness and must, therefore, be avoided if at all possible.
4. If we're open, we're open to everybody; we can't say "The free dictionary anyone can edit as long as they make only edits we like."
5. Demonstrably, there are ample numbers of editors who are irresistibly driven to add new synonyms for penises, breasts, sex, and masturbation.
6. If we worked too hard to prevent them (and no matter how seemingly important the short-term gain of diminishing the embarrassingly puerile content of those pages), we would conflict with point 3.
7. Also, if we worked too hard to prevent them, there would be a backlash: some fraction of the frustrated aspiring penis synonym adders would waste our time arguing about the restrictions, or would turn to vandalism out of spite or frustration.
8. Therefore, giving them some reasonably painless and low-key "out" is a low-cost compromise which, as I said, lets them "continue to have their fun" without pissing anyone off.
Now, I freely concede that any "proof" with that many steps in it may well have some errors or unsound inferences along the way. I'm not prepared to defend that "proof" to the death; I don't expect everyone to be swayed by it. But it's why I conclude that "overflow" pages like Wikisaurus:penis/more and Wikisaurus:breasts/more are a reasonable and appropriate solution to the problem.
In the argument above, point 3 is the most important. If an aspiring first-time editor comes to our supposedly open project and discovers that he has to go through some registration process first, or that too many pages are protected and can't be edited after all, he may say to himself, "eh, never mind" and wander away again. Moreover, this can happen just as well when the aspiring editor was not here to add Yet Another synonym for penis, but was rather here to make some quite useful change, perhaps the first of many, perhaps as a toe-in-the-water prelude to registering and becoming a valued long-term contributor. Ergo, we can't (for example) discourage anonymous editors too severely, even though it's anonymous editors who cause most of our annoying nuisance edits and vandalism, because there's no way a priori to distinguish between the well-meaning first-time editors and the annoying ones.
Or, in a nutshell, we have to (sometimes) act as a repository for any swill that the twelve year olds can come up with so that we can be as open as we have to be to also attract the editors who will actually write the open dictionary. The occasional pockets of swill are, unquestionably, among the prices we pay for our openness, but it does seem that the result (i.e. the rest of our non-swill content) is worth that price. —scs 02:54, 16 October 2006 (UTC)Reply
I am more concerned about the folks who want a resource than the ones who want a project. Yes, I think that it is important that anyone can edit, but you are wrong about number 4 there, we do have plenty of restrictions on what can and can't be included, why doesn't this cover Wikisaurus? I am not willing to put effort into that portion of Wiktionary anymore because of the uselessness that I see in it, there is no reason to add valid content because no one will ever trust it as a resource of merit while the criteria for inclusion there is limited to the imagination and the ability to click the edit button. I would just as soon lose a few potential editors and do the work myself, if the result is something useful and accurate, to gaining those potential editors at the cost of the usefulness and validity of the project as a whole. It makes Wiktionary look bad to even have that portion of it, let us fix it. - TheDaveRoss 07:24, 17 October 2006 (UTC)Reply
If it doesn't make me sound like a namby-pamby fence-sitter, I don't disagree with anything you've said. A couple of those pages are, indeed, an embarrassment. But it's tough to find just the right balance to strike between openness and control. —scs 18:27, 17 October 2006 (UTC)Reply
Note that there's nothing preventing cleanup/verification of the spill-over pages, or from someone requesting moves of content from the spill-over into the main page, if anyone's so inclined. By the way, I like the current /more sub-page vs. /overflow, as a name, FWIW. The term overflow trivializes the content, which is undesirable, even if much of it turns out to be in fact trivial. --Jeffqyzt 12:51, 18 October 2006 (UTC)Reply

(can I go back to the left margin?) I'd just as soon dump Wikisaurus entirely. In all the times I have ever looked at Recent changes, I have never seen an edit that wasn't to breasts, penis or some such. (Are there any other words in Wikisaurus?). I looked at the stats once and Wikisaurus:breasts (IIRC) was the second highest page, probably from Google when people look up all those slangy words. Is this really how we want to present to the world? Robert Ullmann 13:16, 18 October 2006 (UTC)Reply

I and several others have put a fair ammount of work into the legit side of Wikisaurus, including formatting and content, we got the anatomy pages down to less than half of the total number of WS:pages User:TheDaveRoss/to_saurus/cleanup#Articles has a list of articles that existed around the time I left, divided by whether or not they were content that I considered valid or not. - TheDaveRoss 15:33, 18 October 2006 (UTC)Reply
Indeed. I went to the vote page to support protecting the pages, and saw you were very eager to fix it. Go for it! Robert Ullmann 16:23, 18 October 2006 (UTC)Reply

Current events

Can anyone think of meaningful content for this page, or shall we nuke it from the sidebar? - TheDaveRoss 21:36, 11 October 2006 (UTC)Reply

It is supposed to redirect to WT:AN, right? --Connel MacKenzie 22:06, 11 October 2006 (UTC)Reply
Now it does, unless someone can think of something better. It has also been protected. - TheDaveRoss 22:09, 11 October 2006 (UTC)Reply

Personal matter

I'd like to direct your attention to a vote going on here. Naturally, only regular contributors could be counted towards such an important decision, or at least those who can prove their trustworthiness. Vote will close within a day of the nineteenth legitimate submission. DAVilla 05:04, 12 October 2006 (UTC)Reply

Bot category move requests page?

Looking at Category:German idioms, I was about to move it to the correct location Category:de:Idioms, then paused. Since I'm not 100% certain, I'll let it linger, but what I really wonder, is, is there a good place for me (or anyone) to request/discuss this kind of move? Should we have a Wiktionary:Requests for category moves page? Something like a 24 hour wait period for discussion, then one of the (growing pool of) bot operators could just zap it over?

Good idea/bad idea/comments? (About the request page, not this one individual move.)

Thanks in advance, --Connel MacKenzie 23:44, 12 October 2006 (UTC)Reply

How is it not the correct location? We put POS (and ~POS;-) cats under (language) (POS); it should be German idioms. The de: cats are for topics; de:Idioms would be German words about idioms ... Robert Ullmann 08:45, 13 October 2006 (UTC)Reply
Apparently I lapsed into momentary stupidity there. All the more reason to have a "Requests" page, for the added sanity checks provided there. --Connel MacKenzie 16:40, 13 October 2006 (UTC)Reply

Collapsible translations sections

As a result of a suggestion in WT:GP about hiding translations sections, I put together templates to do it. Like this:

{{trans-top|discussion space}}
*Language 1: one
*Language 2: two
{{trans-mid}}
*Language 3: three
*Language 4: four
{{trans-bottom}}


With the idea that if people liked it, we might add this to {{top}} or do something like that. There are examples at get, orange and book. There has been a bit more discussion at Wiktionary talk:Translations. Aaronsama then added them to WT:ELE ... I've (at least temporarily) reverted his edit ... it isn't bad, but I think it ought to be raised here first of course. Quite a number of people have thought along these lines.

The templates also need some technical work, but of course that can always be done. Robert Ullmann 17:00, 13 October 2006 (UTC)Reply

I am obviously enthusiastic about switching over to a system like this. I think it would be one significant step closer to making Wiktionary more usable as a translation dictionary. However, I would advise against adding it to {{top}} for three reasons:
  • The syntax for {{top}} is different than the syntax for {{trans-top}}.
  • {{top}} is likely being used for things besides just translations.
  • I think it should be very clear that the template is specifically for translations.
I know the switchover would be a significant amount of work, but doing it right is always better in the long run. In any case, I'm pleased we have a possible solution to a longstanding problem. --Aaronsama 17:27, 13 October 2006 (UTC)Reply
{{top}} is not used for anything other than translations. Or, any place it is used that isn't a translation table, is routinely corrected to {{top2}}...but I haven't seen this particular error in quite some time now. --Connel MacKenzie 17:50, 13 October 2006 (UTC)Reply
This question is actually irrelevant to the discussion. Because we currently put a summary on a separate line, each and every page would need to be altered anyways. DAVilla 18:42, 13 October 2006 (UTC)Reply

I don't see any reason not to use the same framework for all sections, that is, to use the same {mid} and {bottom} for any {top}. The difference for translations is that there is a parameter to {top}, so if you want to make only translations collapsible and if you want to distinguish them in color etc. all of this can go into the {top} code, which can easily distinguish translations from other sections simply based on the existence of a pipe | and following text. In the future we could always add special parameters as well.

Can we see an example of what an uncollapsible frame might look like, that is, a {top} using code similar to WikiNews but showing a simpler box, without a visible frame or anything? DAVilla 18:36, 13 October 2006 (UTC)Reply

I like this option, does it mean we can finally change the ugly yellow background? - TheDaveRoss 18:48, 13 October 2006 (UTC)Reply
That's tangential of course, but your point is made. DAVilla 19:20, 13 October 2006 (UTC)Reply
But not very clearly. Could someone please put together more coherent examples or the different flavors of what is being proposed? Clearly stating just what is and what isn't being proposed would help too. --Connel MacKenzie 20:33, 13 October 2006 (UTC)Reply
Basically there are three completely independent discussions going on. First, how should the templates be named? Second, which code should be used where? Third, how should they look? Every possible decision for each of those questions can be accommodated with minimal impact on the others, with the exception that {{bottom}} cannot be overloaded with two different code types.
(Bottom can be overloaded that way, if we have it close two divs, and have all top variants open two divs, needed or not. (minor technical point, if anyone doesn't understand this, don't worry!) Robert Ullmann 21:47, 13 October 2006 (UTC))Reply
Right... though that's not the same code that's there now, my point, but I guess your point is it's pretty close. The thrust is we really can think of these as independent. DAVilla 22:08, 13 October 2006 (UTC)Reply
For the first question, do we want something that's easy for anyone to pick up and remember, or do we want something that's specific to the use? The latter is an important direction for templates in general, such as the structural names for {{italbrac}} and {{italbrac-colon}} rather than an abstract name related to the use in {{synonyms}} for instance. In that case the abstract name is easier to remember, but in this case I think the structural name {{top}} will do.
My comments above apply to the second question. TheDaveRoss's comments apply to that idea, at least what he understood of it, but also specifically the third question, which can be altered even now without the new code. DAVilla 20:59, 13 October 2006 (UTC)Reply
I think this should go back to the grease pit for a couple days, to address (or at least understand better) DAVilla's concerns. --Connel MacKenzie 04:42, 14 October 2006 (UTC)Reply

Am I right in thinking that this isn't supposed to replace ALL translation sections, only to help with those pages which are very crowded/cluttered etc? Or are we planning to hide all translation sections? Widsith 06:25, 14 October 2006 (UTC)Reply

I would be in favor of doing this for all translation sections, and having a preferences option to default them open or closed. The reasons for doing all of them are twofold, first, it is good to have consistancy, second, all translation sections will hopefully grow very large. - TheDaveRoss 20:13, 14 October 2006 (UTC)Reply
I had assumed that this was related to the WT:PREF setting for "hiding translation sections," but apparently this is very much the same thing...perhaps a pretty version, right? Perhaps simply customizing that feature to do this, would be better? The goal is to make it a user preference, is it not? --Connel MacKenzie 20:26, 18 October 2006 (UTC)Reply

If we're going to try this for the translations sections (and I do like the idea), then we should also be thinking about doing Quotations the same way. Right now, long Quotaions sections are being shunted to a separate Citations page. This makes it very difficult to coordinate definitions changes with the quotations, since it is not always obvious that the citations exist, and in some cases I've found citations pages for entries that didn't even exist yet. --EncycloPetey 22:13, 18 October 2006 (UTC)Reply

Has this started? I'd think this would require some community agreement before moving beyond one or two examples. I think the parameter in {{top}} is quite clever. Hopefully, not too clever. Bot-converting these 26,000 entries might be advisable, if this gains strong support. --Connel MacKenzie 15:22, 19 October 2006 (UTC)Reply

From time to time people ask why our translations link to entries here in the English wiktionary, rather than to the appropriate other-language wiktionaries. We all know the answer to that, but I've been meaning to ask why we don't do what de.wiktionary.org does, namely link to both.

I just came across a page of ours that does do this, although with an odd convention of hiding the links behind an unobtrusive degree symbol, °. See bouquet#Translations. That's probably not the best or most obvious way to do it, but while we're talking about translations, do we want to pursue something like this, too? —scs 14:38, 16 October 2006 (UTC)Reply

I've noticed that in other wikts, and it turns out to be quite useful. It works well to have a template for the translation line with the code, then each word paired with transliteration or whatever. Then it generates all the links: wikilink the word, sister-link the word in the other wikt, generate the other info, then the next word. For an example, see rw:Template:isemura which I've just done. (;-) (and see rw:mudasobwa and rw:kuwa mbere, note that I'm in the middle of adding a bunch of the language name templates) Robert Ullmann 15:01, 16 October 2006 (UTC)Reply
Oh, one thought I may try in the Kinyarwandan wikt (that would not be good here!) is to link to the word in rw.wikt if it exists, else link to the sister wikt. Since most words in most languages (other than fr and en) will not be in the rw.wikt for a while (unless we do some massive imports or something). Not a good idea here, we want the redlinks so people can see what to add. But there, automatic links especially to fr and en would be very useful. (Those are the other two national languages in Rwanda) Robert Ullmann 16:10, 16 October 2006 (UTC)Reply
It would be better to use a common symbol for these links, such as (*) or (^). The language codes are confusing...most of them are unfamiliar to most people, and many of them look like they could be definite or indefinite articles, prepositions, or abbreviations. —Stephen 04:32, 17 October 2006 (UTC)Reply
The language codes may be confusing, but it seems to me that an obscure symbol is even worse. It's not obvious what it's for; it's easy to overlook; it's hard to click on even if you do notice it and know what it's for. —scs 17:44, 17 October 2006 (UTC)Reply
There is some clever Monobook.js code floating around that finds all valid interwiki links on a page, and adds them to be within the translations section also (if it can.) This seems like a more elegant approach, to me, rather than filling the entries with links that may or may not work. Shall I add this? --Connel MacKenzie 07:47, 17 October 2006 (UTC)Reply
Yes, we should try that to see how it works. —Stephen 07:56, 17 October 2006 (UTC)Reply
We have already in quasi-use the Template Template:t, which works as {{t|fr|traduction|f}}, comes out as traduction f. I think Paul G, Polyglot and Wonderfool have all used it, some more than others. --DWarF 08:17, 17 October 2006 (UTC)Reply
Sorry, WF, but that just clutters things. Adding something like http://bs.wiktionary.org/wiki/MedijaViki:Monobook.js#interwikiExtra has the same effect for the reader, without the clutter for the editors (and updated automatically, with each run of the interwiki bot.) --Connel MacKenzie 08:28, 17 October 2006 (UTC)Reply
Pardon me, you are talking about two different things; scs and I (etc.) are talking about links to the foreign word in the foreign wikt. The interwiki links are to the English word in the foreign wikt. Robert Ullmann 11:43, 17 October 2006 (UTC)Reply
Right you are. Sorry about that. Shall I just forget the other thing, then? Or should I modify that code to assume that if a translation exists on the other language Wiktionary, that the word(s) in that language also exist on that Wiktionary? --Connel MacKenzie 05:38, 18 October 2006 (UTC)Reply

WT:BLOCK upgraded

I have just upgraded WT:BLOCK to semi-official, please continue to look it over, and watch out for a vote to raise it to official policy status later this week. Next on the list is Wiktionary:Page deletion guidelines so have a look and make changes you think it needs. - TheDaveRoss 20:53, 16 October 2006 (UTC)Reply

It should be pointed out that all users should look at this and comment and/or vote. It isn't some private sysop thing; it is our common policy. Robert Ullmann 11:37, 17 October 2006 (UTC)Reply
Absolutely, the more people who look at it and comment the better we can feel about it reflecting consensus. This is one of those cases where silence indicates support. Thanks Robert. - TheDaveRoss 16:15, 17 October 2006 (UTC)Reply
Most of what’s on that page is incomprehensible to me. What little I can figure out (if I understand it correctly) seems to have a lot of unnecessary repetition. What, for example, is the difference between "blatant vandalism" and "vandalism only accounts"? What’s the difference between "blatant vandalism" and "vandalism"? —Stephen 21:27, 18 October 2006 (UTC)Reply
Blatant vandalism applies more to anonymous IPs, whereas a vandalism only account would be an account which is created with the sole intent of vandalizing. I am pretty sure that removing the repetition would cause it to be less clear, but it is a wiki, feel free to cut out what you find redundant. This is the time to make modifications, please, please do so. - TheDaveRoss 03:25, 19 October 2006 (UTC)Reply
In my experience, 99.99% of cases of vandalism are done by accounts created in order to commit vandalism. About the only vandalism accounts that were created for legitimate purposes have been Primetime, Wonderfool, EddieSegoura, and some of the Jahbulon detractors. So it seems to me that a simple category of vandalism should be sufficient to cover the topic. I can’t really offer any other suggestions on the page since I don’t know what it’s talking about for the most part, especially everything in RangeBlock and BlockDuration. What’s the difference in vandalism and pure stupidity? How do you recognize "bad" sockpuppets? What’s the difference between random spam and vandalism? —Stephen 04:39, 19 October 2006 (UTC)Reply
Ah ha! These are things I can address. The simple answer to range blocking is that if you don't understand it, don't use it. I would be more than happy to go into more depth than WT:BLOCK and Wiktionary:Range_blocks if you would like, but the easier solution is to ask someone who knows what is going on with those blocks and with ARIN checks to do the actual blocking. Yes, most vandalism is from accounts created for the purpose, or anonymous accounts, however the idea is for the policy guidelines to be comprehensive enough to still apply when those rarer (and often more contentious) situations arise. The reason why we subdivide the page is that there are many degrees of vandalism (or so I and the folks I discussed it with feel) and one block type and duration isn't universally applicable. There should be a different block for an IP which spams "Dave is a dork" than there should be for logged in long time user who persistantly reverts someone elses good faith edits but refuses to discuss the changes. The intent of the page is stated as clearly as I could state it at the top of that page. As for the difference between vandalism and stupidity, that has a lot to do with perceived intent, persistance, type of vandalism...it is tricky but mostly we know it when we see it I think. Bad sockpuppets are the kind which are used to vandalise, evade blocks, etc, while a good sockpuppet doesn't, simple as that. - TheDaveRoss 05:04, 19 October 2006 (UTC)Reply

Proto-Polynesian

I've found about 30 words in various translation tables given for "Proto-Polynesian". All entries begin with an asterisk, which suggests they are hypothetical reconstructions rather than actual words. However, I can't find anything written in Translations Policy pages that spells out our criteria for "proto"-language translations. So, do we remove them all and amend Wiktionary:Translations? --EncycloPetey 22:59, 16 October 2006 (UTC)Reply

I added them. They're interesting and useful as are words for all proto-languages. In particular they're useful for putting in the etymology sections of entries which don't yet exist for languages such as Hawaiian, Maori, Samoan, Tahitian, and Tongan. Naturally all words in all protolanguages are reconstructions and hence hypothetical but this does not reduce their status since years of research by trained experts goes into them. I cannot think of any reason you would want to remove them. Instead concentrate on removing junk words the kids made up yesterday. — Hippietrail 00:24, 17 October 2006 (UTC)Reply

The last time this came up, they were all moved to Appendix:Proto- namespacing, with an asterix preceding the root. --Connel MacKenzie 07:58, 17 October 2006 (UTC)Reply

Han characters

Copied from above; the issue is to clean up the Han character (CJKV) entries and make sure we have no problem with coppyright. Robert Ullmann 18:56, 18 October 2006 (UTC)Reply

Ok, speaking as an IP attorney, I think this is most akin to the situation in Kregos v. Associated Press, 510 U.S. 1112 (1994). There, a baseball reporter came up with a set of nine statistics that he thought were particularly important to determine which pitchers would win the day's games. The Supreme Court held that although Kregos could receive protection in the arrangment and presentation of the statistics, this protection would be very narrow. In essence, all that could be protected was the exact presentation. Any other paper that chose to publish similar statistics could do so as long as the alternate presentation differed from that created by Kregos in "more than a trivial degree", specifically finding it unlikely that the AP's form infringed where it included only 6 of Kregos' 9 stats, included 4 additional stats that Kregos did not.
In our case, we are attempting to provide as much information as possible about every character available. The actual information we seek to present is in the public domain. It is only, therefore, the particular selection and arrangement to which another party could lay claim. Hence, if we strip all non-essential information originating with the other source, include additional information (which we are going to do anyway), and change the arrangement to suit our purposes, we should be in a position to prevail over any challenge to our use of this information.
Cheers! bd2412 T 16:35, 17 October 2006 (UTC)Reply
Having not heard anything back from the Wikimedia General Counsel, having asked repeatedly ... sigh. Given what you say, I think we should do this:
  • Format the info at the top of the entry, which Unicode can't really claim any copyright on (compilation, derivation or whatever) into Template:Han char under a Translingual header so that the format meets our standard.
  • Delete the "Dictionary information" section; it is page and line references to the unabridged versions of large dictionaries most people don't have anyway. (One of them is ~10,000 pages.) This information was developed by Unicode, they might have some claim, but so what; we strip it.
  • Delete the "Technical information" section; this is the Unicode/IS 10646 code point in hex and decimal, ditto the Big5 code point. Unicode has no claim on this, but there is no reason to have it. We don't give the JIS codes (or ASCII, or whatever).
  • We already have additional information on a large number of entries.
  • Fix up the headers so that we don't have == Korean Hanja == and so on.
  • Cat them in Category:Han characters, sorted by radical and stroke. If and only if all this is done!
I've set this up on AWB. It can't be run automatically, there are just too many variations that people have introduced. I have matched most of the common patterns. See the entries that are in Category:Han characters, I've run a few. (Easy to roll back if there are concerns!)
Comments please? Robert Ullmann 18:56, 18 October 2006 (UTC)Reply
My two cents, or yen, or yuan, or whatever:
  • I think it's worth keeping some of the technical information, at the very least, the Unicode code point. It's a useful "handle" and a very useful cross-reference (if for nothing else, to the rest of the Unihan information we're thinking of deleting).
  • Personally (though IANAL), I think the only information Unicode might complain about our abuse of their copyright on is the "Common meaning" phrasing. So if we're worried about copyright, I'd say we should delete all of those, or delete them if they're identical to the phrasing in Unihan.txt (older or newer versions).
  • I could go either way on the "Dictionary information" subsection. This, too, is potentially useful as a cross-reference for our readers. The original use for those references within Unihan.txt was, I suspect, mostly to validate the CJK unification work and to cross-check its coverage. Those uses don't apply to us, of course, so the question is, how often will those listings help one of our readers look up an ideograph in one of those other dictionaries?
scs 13:03, 19 October 2006 (UTC)Reply
I would like to keep the code point information as well. I think it is useful, when dealing with so many coding schemes on the computer, to have a single place to look up this info. Might as well be wiktionary!
I strongly agree with dumping the common meanings section. There are simply too many problems associated with it. As a contributor, let me give you one of my main beefs with the common meanings/pronunciation section for individual Han characters. In Chinese, some han characters can be pronounced in several different ways. The pronunciation usually is associated with a specific meaning. For the Chinese students, we call this 多音字 (PRC) or 破音字 (Taiwan). For example, the character: Template:zh-ts can be a noun or a verb. As a noun, it is pronounced shù (Lua error in Module:parameters at line 360: Parameter 1 should be a valid language or etymology language code; the value "[ ʂu˥˩ ]" is not valid. See WT:LOL and WT:LOL/E.), and has a root meaning of number. As a verb, it is pronounced shǔ (Lua error in Module:parameters at line 360: Parameter 1 should be a valid language or etymology language code; the value "[ ʂu˨˩˦ ]" is not valid. See WT:LOL and WT:LOL/E.), and has a root meaning of to count. Now take a look at the common meanings section for this character (while your at it, take a look at the pronunciation section as well). I can't for the life of me figure out how I would indicate the above, given the constraints imposed by the common meanings section. On top of which, you're not given the part of speech (unlike the rest of wiktionary)! Incidentally, this is the entry from one of our competitors. I think we should shoot for atleast as good as, if not better than our competition.
Finally, I agree with getting rid of the "dictionary with page numbers" section. The referenced dictionaries are mainly for scholars. If a scholar can't find a character in a dictionary without first looking up the page number on wiktionary ... (you make up your own punchline!) :)

A-cai 13:34, 19 October 2006 (UTC)Reply

Okay, I rolled back the ones I had done, and made a number of changes, then ran a few more. See .
  • I'm using two templates (as I was before), one for the info under the Han character header, one under References.
  • All the data in the page is stuffed into the templates (except alternate form, which goes into {{see}} where it belongs); this means the process isn't losing any information. (In a formal sense, it is reversible, except for white-space, and the presence or absence of headers for unused fields.)
  • The unicode code point is in the second template.
  • The common meaning is stuffed into the first template as a parameter, but not used.
  • The dictionary references are stuffed into the 2nd template, and not used.
  • The second template generates a link to the Unihan database, where all that can be found. It can also do other things with the codepoint if we like. (Remember the codepoint IS the page title, coded in hex; no copyright problems there!)
  • We can always choose to display/hide things by modifying the template(s), if and/or when we run into trouble with having one of the data fields in the wikitext, we can always bot-strip just that from the template call.
I think that addresses the points so far. What else should we look at? Robert Ullmann 17:42, 19 October 2006 (UTC)Reply

New vote on numeral headers

I've started a new vote on Wiktionary:Votes#Number versus Numeral, pertaining to the use of headers like ===Cardinal Numeral=== in place of the current ===Cardinal Number===. The discussion should happen on the linked page at Wiktionary_talk:Entry_layout_explained/POS_headers#Number_versus_Numeral, rather than here, as that is the location linked from the vote itself and is where the disucssion began. --EncycloPetey 22:17, 18 October 2006 (UTC)Reply

Acceptable entries?

I am unclear about the policies for acceptable entries. Where do you draw the line when dealing with lesser used and virtually unused words? For example, I have seen "obsolete" word entries, taken from old dictionaries, which seem to me not to fall under the policy of acceptable words being ones which people are apt to want to look up. Are these not supposed to be here? I see that there is a policy that says words that have limited regional use are not acceptable. Why not?

I long looked for a word we used when I was a kid, and I finally found it only in D.A.R.E., which said it is used primarily in New England. I have sometimes looked for words with specific meanings, and there turn out to be obscure words which fit. Considering the policy that there need to be published uses, sometimes the only places you can find certain words are in special dictionaries, as in the case of obsolete or regional words, or those almost exclusively used orally. Are these dictionaries acceptable as published references?

It seems to me that there should be very little limitation on acceptable entries in the Wiktionary. Exclusion is the domain of commercial printed dictionaries catering to particular markets and constrained by limited space. Here, there is virtually no limitation on space and no commercial consideration. The presence of any word is harmless. If there is a qualification concerning its status as a word, that qualification can be included in the entry, such as its limited use. The principle of inclusion is what could eventually take the Wiktionary beyond the scope of Oxford. Abstrator 06:55, 19 October 2006 (UTC)Reply

Yes. All real words are acceptable. If they have a local use you might have some difficulty in providing citations to prove their definitions though. Basically, every word is judged on its own merits. SemperBlotto 07:37, 19 October 2006 (UTC)Reply
Not to state the obvious, but have you had a chance to look at our Criteria for Inclusion page yet? If not, that might clear up some confusion. --Jeffqyzt 18:06, 19 October 2006 (UTC)Reply

Ancient Greek

I've been trying to find ways to systematically categorize and present Ancient Greek terms (at this point, mainly nominal; I hope to move on to verbs soon, but they are prone to greater disparities in accentuation paradigms!); it has been a bit difficult due to the fact that not only am I new to Wiktionary, but the presentation of the language on Wiktionary is in the earliest stages of development. I thus have several concerns:

  1. What is the appropriate manner to present the orthographic Romanization of these terms? I understand that Japanese terms written in Rōmaji are often wikified separate from the same terms presented in Kana/Kanji. As keyboards do not well represent the polytonic orthography of Ancient Greek, I feel it would not be inappropriate to likewise wikify the Romanization of the Greek terms that searching for them is simplified.
  2. Ancient Greek is highly dialectical and words often vary greatly from one to another. I would propose that each dissimilar term have its own page. (For example, in the Attic dialect, young man is νεανίας, while in Ionic, νεηνίης.) Were separate pages not utilized, would a simple header "Alternate spellings" or "Other dialects" be the best possible way to represent these terms?
  3. The nominal templates I've created, while very specific, would most likely be extremely confusing to casual scholars of the language or those new to it. Would it be proper to create a page regarding these inflection templates? Conversely, ought I simply explain each template on its respective talk page?

Thank you. Medellia 16:52, 19 October 2006 (UTC)Reply

  1. On the Latin Wiktionary I've been giving both the Romanization and Beta code, linking to the Beta Code without the accents and the Romanization as is (e.g. ἕκτος = hectos and E(/KTOS).
  2. I should expect dialect forms to be treated the same way as dialects in any other language (I'm not sure what the current policy is; but cf. armor and armour, octante and huitante).
  3. The use of a template can be explained on the page itself, using <noinclude> tags. Example at la:Formula:grc-declinatio-adj-oxy. The meaning of a template (i.e. what all the cases and such are for) might be better on a separate page, e.g. Appendix:Greek first declension. —Muke Tever 02:46, 20 October 2006 (UTC)Reply

Uncountable

Okay, I'm getting worried. Either I don't understand uncountable or someone else doesn't. Recently I ran across the entry Wikipedia claiming that it had an uncountable sense. I objected on the talk page and someone agreed with me, so I removed all mention of "uncountable" from the entry. Then I saw an example on Template talk:en-proper noun saying America could be uncountable (see that talk page to discuss whether any proper nouns can be uncountable — maybe there are some). Finally, I looked at Special:Whatlinkshere/Template:uncountable and noticed that many language names are marked as being uncountable. Examples: "Hebrew ... (uncountable) The language of the Hebrew people" and "Japanese ... (uncountable) the main language spoken in Japan". The most egregious of these is Chinese, which lists no fewer than 5 "uncountable" senses, only one of which (#7) I agree is definitely uncountable. Just because there is, in reality, only one of something doesn't necessarily mean it's uncountable. As I point out on Template talk:en-proper noun, for example, one can talk about "two Americas". I guess I can't think of a good example of using "a Chinese" (referring to the language) or "two Chinese(s)" (again, in the language sense) in a sentence, but does that really make it uncountable (I know, according to our definition, it would seem that the answer to that is yes)? Wouldn't it be better to just call such words proper nouns, or collective nouns, or whatever they are, and leave it at that? Maybe I'm just thinking too much in the mathematical sense of uncountable vs. countable.... - dcljr 17:51, 19 October 2006 (UTC)Reply

There's (at least) three different reasons for a sense to be uncountable:
  1. It is a truly proper noun, (i.e. not just something conventionally capitalized in English) ex. North America, John. These are generally pluralizable, and pluralizing this kind of uncountable produces the sense "things or people called X" (I know a lot of Johns; Mark Twain's America is more romantic than the other Americas of history and fable; the Chinese I learned is not a Chinese one finds spoken around here)
  2. It is a mass noun, referring to a material rather than a concrete item, ex. water, wood. These are also generally pluralizable (though perhaps not as much); pluralizing this kind of uncountable produces the sense "kinds of X" (Many different hardwoods are processed by this plant) or "instances of X" (That'll be three large waters).
  3. It is an abstract noun, which is sort of a subclass of the preceding; ex. justice, transitivity. Pluralizing this (which is often harder) generally also produces the sense of "kinds of X".
I agree it is better to be more specific, thus on la: I stopped using innumerabile, replacing it with materiale or abstractum or proprium as the case may be. —Muke Tever 03:01, 20 October 2006 (UTC)Reply
Don't worry, it is probably the other people. There is a simple test, like for it's/its (use it's where "it is" makes sense). Just try counting: zero Japanese, one Japanese, two Japanese ... makes sense for people, doesn't for language. So Japanese (the person/people) is countable, Japanese (the language) is uncountable.
Zero Wikipedias, one Wikipedia, two Wikipedias ... yup, countable! (Sure, Wikipedia as an adjectified noun, "wikipedia editing" doesn't seem countable, but that is always the case. And it would be the edits that are counted anyway. And are.)
And then don't let poetic exceptions like "two Americas" confuse you ;-) Robert Ullmann 18:06, 19 October 2006 (UTC)Reply

This has come up before, without any definitive resolution. I hope that Wiktionary can decide to use the common use of the term is explaining countability to its readers. For example, the sands of Southern California may be different then the sands of Hawaii, but we should explain that sand is a mass noun (or whatever the current favorite term is) that refers to millions of individual grains of sand. Emotions should also be clearer about singular/plural usage. If a term can be used as a plural but very rarely is (and only in certain grammatical cases) then we should have a better way of indicating it. AFAIK, we do not have a consistent method of doing so, today. --Connel MacKenzie 18:11, 19 October 2006 (UTC)Reply

Here's my take on countability.
  1. As was mentioned in an earlier rendition of the discussion, it's senses that are countable or uncountable, not words.
  2. If none of a word's senses is countable, there's no need to mention a plural on the inflection line.
  3. Trying to assert what the plural would be if the word had one (i.e. if we were to discover a countable sense we're missing, or if the language were to evolve a new, countable sense somewhere down the road) is quite unnecessary. We're descriptive, not prescriptive. If/when that countable sense is discovered, the world's English speakers are perfectly capable of discovering or inventing a plural form without our help. (And then we can list it, when we've got some live usage to cite.)
  4. As Connel says, there's also little point in devising really speculative example sentences just so that a plural can be listed. (Though I'm not sure "sand" is the best counterexample, since that poetic usage "the sands of time" is so prevalent and so mellifluous. "The snows of yesteryear" gets me going, too.)
  5. To me, the biggest reason to list a word (or sense) as "uncountable" is as a flag reminding us it's okay that the word has no plural listed. Words which don't have a plural listed, but which aren't explicitly marked as uncountable, may be in need of attention. But words which are marked as uncountable can be passed over (on that score) so that one can devote one's time to the words that do need attention.
Given that it's the senses that are countable or uncountable, it's arguably somewhat wrong to say "uncountable" in the inflection line (i.e. as if it applies to the word as a whole). In the same earler incarnation of the discussion, someone suggested having the template display something like "no countable/plural senses attested", which is a fine idea.
scs 02:44, 20 October 2006 (UTC)Reply

Dahl

I happened to look at this definition and noticed it was incorrect.

I posted a corrected entry of similar length. I later noticed that this entry had been expanded, but that some of the information was incorrect. I left it mostly as it was but edited it so that it was correct and consistent (i.e. I left the format much the same but corrected the facts).

Shortly afterwards it was vandalised with the incorrect facts being reinserted. I changed it back again and left the reasons for doing so in the talk page. Minutes later the vandal "SemperBlotto" had spoiled the page again.

He clearly knows nothing about the word. The edits he makes are inconsistent with:

a) Themselves b) Wikepedia c) Normal usage

Dahl is a generic name for a husked pulse. The fact that it's husked is what make it dahl, and not simply a pulse - at least the last time he edited the page he left that change in, whether by accident or design I don't know.

A Dahl in the sense of a meal is a meal made with any dahl (in the sense of an ingredient), not just lentils to which he keeps changing the text. He's not even consistent because in the expansions he lists other types of dahl.

Finally he keeps inserting a spurious claim that "pigeon pea" is dahl, when it has no such special claim.

The vandal keeps making changes but does not leave any indication in the talk page of why he's doing so. He just arrogantly changes the page so that it displays the incorrect information. — This unsigned comment was added by 87.112.18.51 (talk).

If you want to dispute a sense, there is a procedure, see WT:RFV. If you simply delete information, you will get reverted really fast. That's the way it works. If you added the word "husked" or change "lentils" to "pulses" it would probably stand, but because you are blanking information (which is vandalism, btw!) the whole entry gets reverted. Robert Ullmann 19:45, 19 October 2006 (UTC)Reply

Hmmm, interesting.

Well, he's now added references and I can see why he's made the changes he has.

I thought it was odd that an expert on Indian foodstuffs and cookery happened to appear a few minutes after I edited the entry originally.

The problem is that the references are simply wrong (or absurdly incomplete)!

The first reference is not one I know, but it's definition is laughably wrong. Lord alone knows where it came from.

The problem with the second reference, the OED2, is that its entry is so small, to cover such a large subject. It's not wrong per se, but it's based on a single example and is woefully inadequate.

I'll check the WT:RFV procedure.


Oh, also: create an account, log in, and sign your talk page contributions and you will have much more credibility. Robert Ullmann 19:48, 19 October 2006 (UTC)Reply
As I have often been cautioned against, do not inappropriately abuse the v-word. Accusing the most prolific editor of it, won't get you very far.
Certainly the very rare (or very British?) term "pulses" to mean vegatables is less coherent than "lentils." Since that seems to be the most common one used, it makes perfect sense to describe the dish as using them.

No, a pulse is a specific type of vegetable, such as a pea or a bean. It's a sub-catagory of vegetable, not a British word to mean all vegetables.

Any pulse, when dried, husked, and (usually) split, is a dahl.

The problem is that you and the other guy are trying to work this out from explanations by people who have clearly got hold of the wrong end of the stick.

A lentil is a type of pulse. It is a lentil whether it has been husked and split or not.

Dahl is any pulse, but it is only dahl when it has been dried, husked and split.

A dhal in the sense of a meal is something made with any ingredient that qualifies as dahl.

Still, you guys obviously have the technology to get your own ways so I suppose I'll just have to leave you to it.


You can use the template {{unreferenced}} to request sources for "pigeon pea" (if they haven't since been added.) Use {{rfv}} as described above. (edit) 19:53, 19 October 2006 (UTC)
--Connel MacKenzie 19:50, 19 October 2006 (UTC)Reply