Wiktionary > Discussion rooms > Beer parlour Wiktionary:Beer parlour/header

July 2011

How to treat participles on Wiktionary

Latest comment: 13 years ago91 comments12 people in discussion

I'd like to continue the discussion started above at Inflected German participles, but this time not only for German, but cross-linguistically since it turned out to be a problem that concerns many languages. Ok, so the basic question is how participles are to be treated best on Wiktionary. An example for participles in English would be (deprecated template usage) playing as the present participle and (deprecated template usage) played as the past participle of (deprecated template usage) play. Let me sum up the previous discussion. Traditionally, participles are treated as verb forms, so they normally appear in inflection tables of verbal infinitives (see here for German (deprecated template usage) spielen), in Wiktionary too (German, Dutch, French...). The tricky point is: Often participles are used as adjectives in sentences (and can then be declined like normal adjectives). This goes as far as that, for example, the German present participle cannot be used as a verb, only as an adjective (or adverb). This might apply to other languages and really questions whether such participles should be put under "Verb" headers (as is currently done in German and probably most other languages), and even whether they should appear in verb inflection tables.

All in all it seems that the current "German" way of treating participles is rather bad. I know of two other possible solutions. One was proposed by Dan Polansky above. When participles only appear as adjectives (such as German present participles) they don't get a Verb but an Adjective header. When participles are used both as verbs and as adjectives (such as most German past participles), they get both headers. Personally, I think this solution makes sense except for two problems: First, for almost any verb we'd have a verb as well as an adjective section for its past participle. To me this seems redundant, but I also understand the contrary attitude that it's more clear-cut. A more serious problem would be that there appear to be cases where participles are used ambiguously so one cannot tell for sure whether they are verbs or adjectives -- e.g. German das Haus ist gebaut, Dutch het huis is gebouwd (thanks to CodeCat), French Il a sucré son café, puis a bu le café sucré (thanks to Lmaltier). If it's true that participles are something in between verbs and adjectives here, another solution might be appropriate, and that solution is already being used in Latin. For this language, there's a separate Participle header which subsumes the different Latin participles. See (deprecated template usage) āctus for an example. What's the downside of such an approach? As I said, participles can be inflected, and such inflected participle forms (such as (deprecated template usage) āctī) are also under a Participle header. This misses the fact that ~~those forms are completely unambiguously used as adjectives~~ (ambigious cases can still be inflected, as in Spanish la casa está construida, thanks CodeCat), and "participle" is probably not a proper part of speech either.

That's quite complicated, and if anything's unclear or if I put something wrongly, I'm looking forward to your comments. So, how do participles behave in other languages? How are they treated on Wiktionary, and do you think it makes sense? What do you think about the Latin way? Is there possibly a uniform way to represent participles on Wiktionary independent of language, or should we continue to have language-dependent ways of treating them? But, as I said, all the current ways I know of have flaws. At the end of the discussion, of course I'd like to have a good solution for German, but if other languages benefit, so much the better. Longtrend 10:37, 9 July 2011 (UTC)Reply

The inflected forms are not always unambiguously adjectives either. In French, for example, when a past participle is used to form the perfect tense, it still inflects based on the gender and number of its direct object. So they could arguably be considered 'declined verb forms'. —CodeCa t 11:40, 9 July 2011 (UTC)Reply

Thanks for the notice, I missed that. In languages that inflect for case, it would be more appropriate to say "non-nominative past participle forms are used unambiguously as adjectives". Longtrend 12:11, 9 July 2011 (UTC)Reply

That may not be accurate either. In early Old Norse, the agreement in the perfect tense was actually the accusative, which later became specifically neuter accusative, but still agreed in gender in earlier texts. This example is found in Völuspá (with the agreement in bold): hverir hafði lopt alt lævi blandit , eða ætt iotuns Óðs mey gefna , with the first agreement being neuter nominative/accusative, but the second is feminine accusative. This is because the combination of participle and object was still considered an object of 'to have' in that language, and was therefore placed in the accusative case. That is, a sentence like 'I have painted a door' was not distinguished grammatically from 'I have a painted door' or 'I have a door painted'. —CodeCa t 12:33, 9 July 2011 (UTC)Reply

That's interesting. I think I better don't try another generalization :) But your example seems to be a strong argument in favor of the thesis that participles are (or can be) something in between verbs and adjectives -- that is, if we are going to treat participles uniformly across languages; otherwise it's at best an argument for Old Norse. Longtrend 12:57, 9 July 2011 (UTC)Reply

Hungarian has present, past, future, and adverbial participles. The Etymology section contains the information that this entry is the participle of a verb. There can be adjective and noun sections to illustrate the appropriate usage and declension. See for example nevelő, the present participle of nevel (to educate). --Panda10 13:11, 9 July 2011 (UTC)Reply

Latin participles also come in past, present, and future, and have mood (active or passive as well). There are some Latin participles that were used as adjectives, but since Classical Latin did not always clearly distinguish between adjectives and nouns (they had the same inflectional endings), this means that some participles were used as substantive nouns. In fact, the future passive participle eventually came to replace gerunds and infinitives to funtion as a noun. However, it still had a verb funtion in the passive periphrastic conjugation, and was never used in the nominative (you had to use a verbal infinitive for that). In other words, the situation was rather complicated as to what part of speech these things were. For Latin, we've chosen simply to recognize "Participle" is a separate part of speech because it simplifies everything. Other languages are free to make similar choice in how they handle their parts of speech, but I don't think there's a single way to handle everything that will work across all languages. --EncycloPetey 14:12, 9 July 2011 (UTC)Reply

We should not invent anything: words should be addressed according to traditions of each language. In French, it's clear that participles are verb forms, not adjectives, and that adjectives are not participles, are not verb forms. I provided an example of a sentence with an ambiguous meaning. This sentence shows that this is not always an easy distinction, and this is a good reason to make it as clear as possible here, this is not a reason to blur the difference. Lmaltier 15:22, 9 July 2011 (UTC)Reply

"words should be addressed according to traditions of each language" -- so in your opinion, we should treat German present participles as verb (form)s, even though they are never used as such, just because they are traditionally regarded as verb forms? "In French, it's clear that participles are verb forms, not adjectives" -- how come past participles inflect for gender in predicative use then, a behavior you can only find in adjectives otherwise? Longtrend 12:03, 10 July 2011 (UTC)Reply

French past participles are inflected in some cases, yes, this does not make adjectives. Actually, I think that the distinction between participles and adjectives is exactly the same in English and in French. I also think that all German verbs have compound tenses, and, therefore, that all German participles are actually used as verb forms. Am I wrong? Lmaltier 13:56, 10 July 2011 (UTC)Reply

Something that's still not clear to me is just when something is a verb and when it's an adjective. I can understand that finite verb forms are verb forms... but what about non-finite forms? Why are they verb forms? Etymologically they are often not verb forms at all (like in the Old Norse example; Romance participles have a similar history), so why do we call them verb forms now? —CodeCa t 14:11, 10 July 2011 (UTC)Reply

Because (1) we speak English, (2) English has become less inflected and so its grammar has changed, (3) the original categories for parts of speech were set up by the Romans and Greeks, and (4) we have a better understanding of rammar in the 21st century. --EncycloPetey 14:29, 10 July 2011 (UTC)Reply

That still doesn't answer my question though. Why are they verb forms now when they were not originally? What about them makes us consider them verb forms? —CodeCa t 14:35, 10 July 2011 (UTC)Reply

In English, our classification of -ing forms and -ed forms and specific senses thereof depends on such things as whether there is a corresponding base form, and whether the forms behave like adjectives or nouns. The verb form is assumed to exist because it is hardly ever possible to find such forms never modified by any adverb. If derived from transitive verbs, they usually take complements just like other forms of the verb.

The conversion process of denominal verbs seems to sometimes begin with -ing and -ed forms. For example, one can be coffeed out or coffeed up, but instances like "He coffees himself up every morning" are more rare.

The answer to the question seems to be simple: when you think to the verb when using the word (when you think to the action expressed by the verb), it's a participle, a verb form (even when an ellipsis blurs this fact); when you don't think to any action, only to a characteristic of the thing (not to how the thing got this characteristic), then it's an adjective. See adjective and verb for definitions. Lmaltier 16:09, 10 July 2011 (UTC)Reply

@Lmaltier: I still don't quite understand your analysis of French participles. You probably know what I was talking about, but to be sure here's an example: Le café est sucré_ vs. La sauce est sucrée (excuse me if those sentences are wrong -- I just have some very basic knowledge of French, but you get what I mean). Correct me if I'm wrong, but sucré(e) behaves just like an adjective and nothing like a verb here -- you can perfectly replace it by a "proper" adjective but not by a "proper" verb. So what makes you think it's a verb other than 1) tradition and 2) the fact that it's obviously derived from a verb (which is not sufficient, as Dan Polansky convincingly demonstrated above -- the fact that in English almost each verb can be "agentivized" (for lack of a better word) by -er in English doesn't make the new forms verbs)? And as of German: Yes, you are wrong in your assumption that all participles are used in compound tenses. Present participles are never used in such constructions. In English, I perfectly agree with the analysis that present participles are (or can be) verbs, since there's such cases as I am playing -- however, there is no equivalent form *Ich bin spielend in German or any other complex verbal constructions with a present participle. Longtrend 17:00, 10 July 2011 (UTC)Reply

You misunderstand me. In your examples, used alone, they are not verbs: very clearly, both sucré(e) are adjectives. They refer to a characteristic of the thing. In Il a sucré le café or (passive form) La sauce a été sucrée avant d'être servie, it's also very clear that they are not adjectives, they are verb forms. The same applies to present participles (this is an easy case, as present participles are never inflected in French: when they can be inflected, then the words are not present participles, they are adjectives). For German, I was thinking to past participles. But, for German too, I think that the criterion should be: do you think to the action expressed by the verb or not? The difference between an adjective and a verb is not related to a suffix or anything of the kind, it's related to how it is used and what is meant by people using it; do people want to use the verb (to refer to an action), or do they want to use an adjective (to refer to a characteristic)? Lmaltier 18:43, 10 July 2011 (UTC)Reply

Your criterion is semantics, which is not valid. Expressing an action is neither a necessary nor a sufficient condition for being a verb. Verbs can also express characteristics ("shine") and nouns can express actions (just take the word "action") -- whether in some language there are adjectives that express actions I can't tell, but probably there are. We define parts of speech not semantically, but syntactically. Back to Le café est sucré, couldn't that also be a passive sentence (perhaps continued by "par...")? In this case the participle could be analysed as a verb, couldn't it? Longtrend 19:14, 10 July 2011 (UTC)Reply

But the definition of verbs and adjectives includes important semantic considerations! If you forget them, you won't be able to make the distinction in difficult cases. Of course, some verbs are not action verbs, but they probably don't cause problems. You are right: in Le café est sucré, sucré is an adjective, but in Le café est sucré par mes soins., it's a verb. It's exactly like sugared in English. Lmaltier 19:30, 10 July 2011 (UTC)Reply

Many east Asian languages have verbs that express states or properties rather than actions, as does Esperanto ("mi estas blua" and "mi bluas" both mean 'I am blue', "mi estas bluinta" means 'I have been blue'). Several old Indo-European languages also have stative verbs, which are semantically very much like a copula and a participle in English. —CodeCa t 19:40, 10 July 2011 (UTC)Reply

English and French too have verbs that express states. But are there examples of an unclear status (verb(participle) or adjective?) for these verbs? Lmaltier 19:53, 10 July 2011 (UTC)Reply

There is when dealing with Latin. There are a whole set of Latin deponent verbs whose meaning can only be conveyed in English using adjectves. A Latin scholar would identify the Latin translation as a verb, but only because it has verb endings and not because of any functional or semantic distinction. Latin participles are likewise not always verbs but primarily for the reason that they take the endings of an adjective, inflecting for gender which Latin verbs don't do. And yet the "participial form" is listed as a verb form in most texts and conjugation tables, and forms part of certain compound conjugations. So, in Latin the "verbness" of a participle comes from its tense and context, but its "adjectiveness" comes from its gender and inflectional endings. --EncycloPetey 20:21, 10 July 2011 (UTC)Reply

AEL

· [de-indenting] I agree with Lmaltier about sucré. Let me give a similar example, but in English. Take the sentence “At 3:00 PM, the window was closed”: it can mean either “At 3:00 PM, someone closed the window”, or else “At 3:00 PM, the window was not open”. When it has the former sense, it's a use of the participle: “was closed” is just “closed” cast into the passive voice. When it has the latter sense, it's a use of the adjective: “was closed” means “was a closed window”. The important point is that this ambiguity is specific to the word closed. English has a lot of participial adjectives, but it also has a lot of participles that do not double as adjectives. “At 3:00 PM, the window was opened” has only one meaning. (The analogous alternative meaning would be expressed as “At 3:00 PM, the window was open.”) So it's hard to imagine a solution that uses just a single POS header for words like closed: even though participles are often called "verbal adjectives", we still must distinguish between those that double as real adjectives and those that do not. The former clearly need an ===Adjective=== POS header in addition to whatever POS header the latter have; and I think it's clearly a bad idea to use ===Adjective=== for words like "opened".
· I agree also with Lmaltier that we should generally follow language-specific traditions. That doesn't necessarily mean following two-hundred-year-old theories of grammar; there are current active linguistic traditions for all of these languages. If all of the linguists working on German describe the present participle as a verb form, then we should at least figure out why that is, before just deciding that we know better!
—Ruakh_TALK 20:46, 10 July 2011 (UTC)Reply

Thanks for your input. Actually, I agree with you on almost all points. sucré was probably a very bad example to argue for my position, since it has developed a new adjective meaning and usage independent of the participle. Just like closed, it falls under the category of what I dubbed "lexicalized participles" in the initial discussion above (which, surprisingly for me, seemed to be rather unintuitive to many). I absolutely agree with you that such lexicalized participles indeed need two sections -- one Adjective section for the lexicalized usage, and one for the participle, and I think we only need to discuss the latter, since (from my point of view) in many cases it's really unclear whether we are dealing with verbs or with adjectives here (or perhaps even with ===Participle===s?). As an example, imagine English had gender, and in the sentence At 3:00 PM, the window was opened the word opened agreed in gender with the subject window. Would we still be so sure that opened was a verb, it would be declined for gender, after all? It's more than a hypothetical situation, this is exactly what we find in Spanish and French and probably many other languages: La respuesta está obviada "The reply is avoided" -- obviada has feminine gender here which comes from the feminine respuesta, and as far as I know it is not the case that "obviado" has developed adjectival meaning and usage. So what about cases like that?

As for German current linguistic tradition, it's certainly not the case that all linguists describe the present participle as a verb form. It's what you learn at school, and in many cases present participles are listed in verb conjugation tables. For example, the Institut für Deutsche Sprache describes past participles as inflected forms of elements of the word class verb and present participles as adjectives formed from verbs by word formation. canoo.net, on the other hand, lists present participles in its grammar as infinite verb forms, but then says that "all present participles have the form and the function of adjectives" and also lists them as adjectives in its dictionary (e.g. spielend). I'll see if I can consult some printed grammars. Longtrend 09:16, 11 July 2011 (UTC)Reply

@Longtrend, re: verb forms and gender: A word form's having a gender that matches the subject of the sentence does not speak against the form's being a verb form. Czech simple past tenses of verbs show the gender of the subject of the sentence, as in the verb dělat (to do) with its masculine simple past tense dělal, its feminine simple past tense dělala, and its neuter simple past tense dělalo. The same thing is seen in Russian, in its делать, де́лал, де́лала, and де́лало. Unlike these languages, German simple past tense machte does not show gender. --Dan Polansky 10:06, 11 July 2011 (UTC)Reply

Sorry, I was unclear here. Of course claiming that verbs cannot inflect for gender would be wrong. My point is that in the languages under consideration, inflection for gender does not happen (I hope this is correct), except for the dubious cases of participles, so we'd have to assume that for some reason verbs inflect for gender in that kind of construction and only there. But maybe that's not too good an argument, since in Czech gender agreement on verbs only seems to happen in simple past forms, too. Still: even if there is no strong evidence that we are dealing with adjectives here, is there any evidence that they are verbs? Or is it possibly adequate to say that participles in such positions are something "in between"? Longtrend 10:24, 11 July 2011 (UTC)Reply

Czech forms that have a similar function as English past participles (called Czech "passive participles" per W:Czech conjugation, whyever) show gender equally well as Czech simple past tense forms: dělán m, dělána f, děláno n, of dělat. They resemble their corresponding adjectival forms: dělaný m, dělaná f, dělané n. For example, "je dělán" corresponds to German "wird gemacht" and English "is made" or "is being made". --Dan Polansky 11:33, 11 July 2011 (UTC)Reply

@Longtrend, re gender: I see no contradiction whatsoever in saying that a participle (a "verbal adjective", as they're often called) is a non-finite verb form that (often) has various adjective-like properties, including (often) agreeing in gender/number/case/definiteness/c. with a modified noun. And there's no need to imagine a hypothetical English-With-Gender; in actual English, verbs do not agree with their subject at all ("I/we/you/he/she/it/they went") — except for present-tense verbs, which display a bit of agreement, and be, which displays a bit more agreement. Do we therefore say that be is a different part of speech — say, ===Copula=== rather than ===Verb=== — and that present-tense verbs are a weird in-between form that has properties both of a ===Verb=== and of a ===Copula===? —Ruakh_TALK 12:20, 11 July 2011 (UTC)Reply

As you already said, participles are sometimes called "verbal adjectives", and some experts don't even give a POS for them but say simply that they are "lexical items" that have "characteristics and functions of both verbs and adjectives" (see here). So discussing a ===Participle=== header is not as absurd as your analogy with English present tense verbs suggests (nobody doubts their verbal status). Of course inflection is only one criterion, there are other criteria that solidly confirm that English present tense verbs are verbs, such as position in the sentence. But I still miss any such criteria for past participles, let alone for present participles. I could agree very well with the approach to treat participles as verbs if they are used to form complex tense or voice constructions. This is the case in English with both present and past participles, so personally I would not change anything about the "English way" (unless we are going to find one solution for all languages). But this doesn't help for German present participles, since they are neither used as stand-alone verbs nor to form complex constructions. So what are they? Longtrend 16:57, 11 July 2011 (UTC)Reply

If, when you use them, you think to the verb, to the meaning of the verb, you feel you use the verb, then, they are verb forms. In French too, the phrase adjectif verbal is used by some authors, but it's misleading, because they are not verbs at all, their only relationship with verbs is etymological. And these authors don't use this phrase for participles... Lmaltier 18:15, 11 July 2011 (UTC)Reply

That doesn't always work either. When I think of (deprecated template usage) verwarring in Dutch, I definitely think of (deprecated template usage) verwarren. The form with -ing is very predictable like this in Dutch. But it's not a present participle like in English, it's a verbal noun. I've never heard of this form being considered a verb form any time, but I still think of the verb when the word is mentioned. —CodeCa t 18:23, 11 July 2011 (UTC)Reply

Well, in some languages (Bulgarian...), such forms, even nouns, are traditionally mentioned in conjugations. This is why traditions of the language are important. Your reference is right when stating that participles share characteristics of verbs and adjectives. Actually, they are verb forms with some characteristics of adjectives. But it's wrong when stating "In English, participles may be used as adjectives" (cf. opened, see above). Lmaltier 18:29, 11 July 2011 (UTC)Reply

Are there participles that can not be used as adjectives? Or can all participles behave as an adjective in all languages that have them? It seems more economic to me to say 'participles are adjectives that may sometimes be used as verb forms' than 'participles are verb forms that can always be used as adjectives'. —CodeCa t 18:52, 11 July 2011 (UTC)Reply

I just answered: opened is a participle, and is not an adjective. And, in French too, corresponding adjectives don't exist for all participles: they're rather common, but not systematic at all, for past participles, and much less common for present participles (note that, for present participles, derived adjectives often have the same pronunciation as the participle, but not the same spelling, e.g. intriguant is a participle, intrigant is the adjective derived from the participle). Lmaltier 19:07, 11 July 2011 (UTC)Reply

I'm sorry, that's not what I meant. By 'adjective' I meant 'showing adjective-like behaviour', not necessarily having 'adjective' as its part of speech. Opened can be used like an adjective: the opened door. So my question is, are all participles able to be used as adjectives? Are they all able to be used in non-adjectival ways (which apparently implies 'as a verb form')? —CodeCa t 19:10, 11 July 2011 (UTC)Reply

By that approach, we might as well list all words as ===Adjective===, since all words show adjective-like behavior. —Ruakh_TALK 19:20, 11 July 2011 (UTC)Reply

@Lmaltier: You honestly think that English participles cannot be used as adjectives? So all these are wrong? Do you have any reason for asserting that apart from your "emotional" analysis? If not, what is that analysis you're proposing based on? Even if we accept a semantic analysis, it's really fuzzy. When I say watcher as a nominalization of watch, I certainly think of the action expressed by the verb. So, is watcher a verb in your opinion? All the syntactic evidence suggests it's a noun, and we treat it as a noun. Longtrend 19:24, 11 July 2011 (UTC)Reply

In my opinion, in the opened door, opened is used as a participle, not as an adjective. I think that it can be considered as an ellipsis for the door which has been opened. But you probably know better than me.

@ CodeCat: I already answered your first question just above. I add that, in French, past participles of 100 % intransitive verbs are never inflected, it would be quite absurd to consider that they behave as adjectives. Second question: yes, in French, participles can always be used as verb forms (as they are verb forms). In English too. Most typical uses in French (not the only ones) are in compound tenses for past participles, and in the "en + participle" form for present participles. These forms are clearly verb forms.

About watcher: of course, you don't feel that you use a verb when you use watcher, you feel that you use a noun derived from the verb. Of course, it's not a verb form. Lmaltier 19:34, 11 July 2011 (UTC)Reply

I just fixed intrigant: I removed the verb form section for French (it was a Tbot mistake). As you can see, considering that participles = verbal adjectives leads to serious mistakes. Lmaltier 19:34, 11 July 2011 (UTC)Reply

Do you care to explain why "of course" watcher is not a verb form but participles undoubtedly are? I'm sorry, but your criterion just seems to be circular and fuzzy. Why does a word belong to a certain POS? Because you feel it. Why do you feel that it belongs to the POS? Because it does. Longtrend 19:59, 11 July 2011 (UTC)Reply

I never explained than all words directly derived from verbs are verb forms. I even explain that adjectives derived from participles are not verb forms, and that verb forms are not adjectives, even if they share some characteristics. Lmaltier 21:19, 11 July 2011 (UTC)Reply

@ Longtrend: I don't speak German, so it's impossible for me to judge; but there are other things you can look for. For example, in English, a transitive verb's present participle can take a direct object even outside of explicit progressive/continuous constructions: “while heating the milk, continue checking the temperature and consistency”. (There are a few adjectives that take directly construed complements, as in “it was worth every penny”, but that's very unusual among adjectives, but absolutely universal among transitive verbs' present participles.) —Ruakh_TALK 19:20, 11 July 2011 (UTC)Reply

Yes, this is possible in German as well. The Institut für Deutsche Sprache already quoted above states, in my translation: "The present participle -- unlike the past participle -- is never used as a part of analytical verb forms but only in contexts where adjectives occur otherwise. However, present participles show a verbal 'heritage' through their valency". So on the one hand, valency is an argument for verb status of present participles, but on the other hand, both inflection and distribution are arguments for adjective status. (Besides, I'm not sure why you accept valency as an argument for verb status of present participles [there are only few other adjectives taking direct objects] but at the same time reject gender agreement as an argument for adjective status of past participles [there are no other verbs inflecting for gender in French or Spanish].) Longtrend 19:46, 11 July 2011 (UTC)Reply

There are also many languages in which past participles can inflect as adjectives even if they are from an intransitive verb. I think Latin is an example, and so is modern Icelandic: hann er kominn (he has come) but hún er komin (she has come), the endings of 'come' differ based on the gender of the subject. This is apparently unlike French (it would literally translate as il est venu and elle est venue), but it just shows how much variation there is in each language. —CodeCa t 20:00, 11 July 2011 (UTC)Reply

Sorry if I'm misunderstanding you, but « il est venu » and « elle est venue » are exactly how you say it in French. I guess you're thinking that in French it would be *« il/elle a venu »? Most French verbs form the perfect by using (deprecated template usage) avoir and an uninflected past participle, and that's the case we were talking about above, but a bunch of common ones, including (deprecated template usage) venir, form it using (deprecated template usage) être and an inflected one. (Lmaltier erred when he wrote that "past participles of 100 % intransitive verbs are never inflected", unless he was rounding to the nearest percent. :-) ) Some verbs, by the way, can go either way, depending on syntax or semantics or speaker preference. And some use (deprecated template usage) être and an uninflected past participle, for reasons that make sense if you know French but aren't worth going into if you don't. —Ruakh_TALK 20:46, 11 July 2011 (UTC)Reply

If that's the case, then it seems to me that such a sentence is just a subject, copula and an adjective, much like 'elle est verte'. Venu is simply an adjective that means 'in a state of having come' (also etymologically), parallel to 'in a state of being green'. —CodeCa t 20:49, 11 July 2011 (UTC)Reply

Yes, I was wrong, venir is an intransitive verb with an inflected past participle (but I was meaning always intransitive verbs, not 100% of intransitive verbs). What I was having in mind was only verbs using avoir, the common case. And, no, in this sentence, venue is not an adjective, no Francophone would consider it as an adjective, it's part of the "passé composé" of the verb. Lmaltier 21:10, 11 July 2011 (UTC)Reply

[after e/c] @CodeCat: No, sorry. I see why you would say that, and that may well be the origin of the construction; but in everyday Modern French « elle est venue » can simply mean "she came", without any implication about present circumstances. (And even in literary French, which retains a separate preterite construction « elle vint » for that sense, one can write something like « elle est venue trois fois », meaning "she has come three times", where I think it's a bit farfetched to posit a state of "having come three times". Certainly in English you can't say "the window is open three times".) —Ruakh_TALK 21:20, 11 July 2011 (UTC)Reply

@Longtrend: I'm not rejecting gender agreement as an argument for adjective status, I just don't see it as conclusive. In French and Spanish, it is not only adjectives and sometimes past participles that show gender agreement, but also determiners (la femme, la mujer) and many pronouns (elle, ella; la tienne, la tuya); and many animate nouns come in masculine–feminine pairs that resemble gender agreement (japonais(e)_ADJ → un(e) Japonais(e)_N, japonés/esa_ADJ → un(a) japonés/esa_N). And of course, many Slavic, Afro-Asiatic, and other languages have gender agreement even in finite verb forms, so it's not like it's unheard-of. —Ruakh_TALK 21:57, 11 July 2011 (UTC)Reply

AEL 2

I'm not sure about languages other than English, but in English, there are some simple syntactical clues to tell whether a participle form has split off and become a full adjective. If it can be modified by very, it certainly exists as an adjective (and continues to exist as a participle). You can't say, for example that the sandwich was *very eaten that the letter was *very typed or that the world was *very created. I suspect a similar test would work in French. Would tres créé, tres dactylographié, or tres mangé be acceptable? Of course, this doesn't work all the time because not all adjectives are gradable. Another test is to see whether it can be the complement of certain linking verb other than be (particularly become), for example he became closed, the movie became interesting, and the muscles became bruised, but not *the letter became typed, *the sandwich became eaten, or *the world became created.--Brett 01:45, 12 July 2011 (UTC)Reply

Yes, the sense of adjective is exactly the same in English and in French. Lmaltier

The test with 'became' only works for English, because in Dutch de boterham werd gegeten (the sandwich became/was eaten) is not just valid, it's very common. The test with 'very' doesn't always work either, because there are certain verbs that indicate a progressive action. These are especially common in Dutch, where they begin with (deprecated template usage) ver- (although not all verbs in ver- have this progressive aspect). In these verbs, very would simply indicate that the progress had continued to an exceptional degree. decomposed is a good example: it was very decomposed. This does not necessarily indicate an adjective, since you could easily imagine that the decomposition process had progressed to a significant degree. There are probably a lot of other verbs like this. I'm not arguing that this means decomposed is a verb form in such cases, I'm just saying that the test is ambiguous. —CodeCa t 10:26, 12 July 2011 (UTC)Reply

As I said, I was making the specific point for English, but it seems likely that, in Dutch or other languages, there would be certain modifiers that will modify verbs and not adjectives or adjectives and not verbs. It might not be the equivalent of very, but there may be something. Similarly, while the Dutch word for become may take both verbs and adjectives as complements, there is likely some verb that will take only adjectives (or AdjPs) as complements.--Brett 11:09, 12 July 2011 (UTC)Reply

I know nothing about Dutch, but would de boterham schijnt/lijkt gegeten be grammatical?--Brett 12:33, 12 July 2011 (UTC)Reply

It would be grammatical even though it sounds a little strange, mostly because people would not say it that way. Dutch has a separate verb (deprecated template usage) opeten which is used when something is eaten completely. It's also more usual to add te zijn after 'schijnen' or 'lijken' and an adjective: de boterham schijnt/lijkt opgegeten te zijn (the sandwich seems to be eaten up), just like de boterham schijnt/lijkt rood te zijn (the sandwich seems to be red). But de boterham schijnt/lijkt gegeten is not really wrong, because people will understand 'gegeten' as an adjective. —CodeCa t 12:52, 12 July 2011 (UTC)Reply

That's true in English as well; participles can productively be turned into adjectives. (Just as you can reply to "Are you inside yet?" with "Very inside", even though "inside" is a preposition rather than an adjective and "very inside" doesn't have a single specific meaning, you can reply to "Is it eaten yet?" with "Very eaten", even though "eaten" is a participle rather than an adjective and "very eaten" doesn't have a single specific meaning. For example, it could mean that even the crumbs got eaten; or it could just mean that it was eaten a long time ago: "Am I too late? Is the cake eaten yet?" "Very eaten. You're about a week too late." That doesn't mean that eaten is normally an adjective, only that participles can be stretched into use as adjectives.) —Ruakh_TALK 13:37, 12 July 2011 (UTC)Reply

The discussion is going a bit in circles right now. If they can be used as adjectives in all cases (not including cases that some 'known' adjectives lack, such as comparison), then why are they not adjectives after all? It doesn't really matter if they have extra properties that most other adjectives don't. Do they meet all the minimum requirements to qualify as adjectives? —CodeCa t 14:45, 12 July 2011 (UTC)Reply

All words can be used as adjectives. The point of parts of speech is not "is it remotely possible to use this word in this way?", but rather, "is this how this word is normally used?" It is possible to press participles into service as adjectives, and this is a fairly productive process: plenty of normal adjectives (tired, interesting, closed) began life as participles. But most participles are not normally used this way. —Ruakh_TALK 14:58, 12 July 2011 (UTC)Reply

Historically it's actually the opposite. The oldest participles in English actually began life as adjectives and only later became used as verb forms. Proto-Indo-European had no periphrastic tenses (or even tenses at all!), and even in Proto-Germanic participles were still mostly adjectival (compare the Old Norse and Icelandic examples above, which closely reflect the PG situation). I realise this doesn't really change the situation for English as it is currently spoken, but it does point out that the question of 'which was first' is definitely 'adjective'. The productive process eventually came to be reversed, but it was not always so. I think if you go back far enough in history, you'll find that many old English participles were originally adjectives, then became participles, and (maybe?) had adjectives formed from them again. —CodeCa t 15:05, 12 July 2011 (UTC)Reply

You'll forgive me for not just taking your word for that, given that you also think that participles today are definitely adjectives. Just because they're not used in any periphrastic verb constructions, doesn't mean they're not verb forms. (I'm certainly not saying you're wrong. I'm just not confident that, if I knew more about those languages, that I would agree with you.) —Ruakh_TALK 15:55, 12 July 2011 (UTC)Reply

In PIE, the distinguishing feature between verb forms and verbal adjectives is that the former are based on aspect stems (stative, perfective and imperfective) while the latter are based directly on roots. Strictly, only aspect stems form verbs in PIE, since they are conjugated while roots are not (unless it's an athematic root verb such as Template:termx, but those are rare). The English weak past participle and the Latin perfect passive participle both derive from a verbal adjective in *-tos which was attached directly to the root and had no aspect-forming infix originally. Irregular weak participles like (deprecated template usage) brought are still remnants of that. —CodeCa t 16:11, 12 July 2011 (UTC)Reply

@Ruakh: Isn't there a third group of "original" participles between those that you just mentioned (participles that cannot be used as adjectives or just in such a way that all words can, and lexicalized participles -- tired etc. -- that are now true adjectives independent of the original verb): participles that are regularly used as adjectives and are not in any way peculiar in such constructions. I'm thinking of such cases as the opened window (I'm not even sure whether this is grammatical -- please correct me if it's not!). It is not lexicalized as an adjective here (compare open), but it's not just a weird way to use an adjective either (compare *the cried child). Longtrend 15:11, 12 July 2011 (UTC)Reply

I wonder why 'cried child' is strange but 'fallen child' is fine, especially since both cry and fall are intransitive. There must be something inherent in the meanings of these participles that makes them different somehow. Maybe some participles like fall are active by nature while cried is passive? —CodeCa t 15:14, 12 July 2011 (UTC)Reply

Is fallen child really acceptable in the sense "child that fell"? Or is it rather only acceptable under a lexicalized interpretation of fallen? Longtrend 15:24, 12 July 2011 (UTC)Reply

@Longtrend: I believe "the opened window" is a reduced passive; you can also say "the just-opened window", for example, meaning "the window that had just been opened", or "the next-opened window", meaning "the window that had been opened next". It's not really an adjective; you can't say *"the very opened window", even though semantically that would make sense. —Ruakh_TALK 15:55, 12 July 2011 (UTC)Reply

Okay, I think this makes sense for English. In German there is the exact same kind of construction (das geöffnete Fenster) and you can also say das gerade (just) geöffnete Fenster but not *das sehr (very) geöffnete Fenster. Here, however, the participle inflects just like an adjective. That is, unlike in the discussion we led above, it doesn't just take one category typical of adjectives (gender), but inflects according to a whole adjective paradigm. Would you still say the participle is a verb there, given that info? Longtrend 16:35, 12 July 2011 (UTC)Reply

Yes, that's what I'd say: lexically speaking, it's a non-finite verb form, and grammatically speaking, it differs in consistent ways from true lexical adjectives, so it's best thought of as a ===Verb=== rather than as an ===Adjective===. But I'd say it very cautiously, doing my best to make very clear that (1) this is my tentative opinion based on almost no knowledge of the language at all and (2) I mean, I'm not a linguist or anything. I'm just doing my best to understand what linguists have figured out. —Ruakh_TALK 17:28, 12 July 2011 (UTC)Reply

Okay, I appreciate your assessment anyway. What I don't like about that solution is that we'd weirdly have an adjective declension table under a Verb header. I wouldn't even know how to handle this. Longtrend 14:04, 14 July 2011 (UTC)Reply

When the word is not an adjective, it's not an adjective declension table, it's a verb form declension table... This may be included in the conjugation table. Lmaltier 19:46, 15 July 2011 (UTC)Reply

In Greek, (deprecated template usage) Lua error in Module:parameters at line 376: Parameter "sc" should be a valid script code; the value "polytonic" is not valid. See WT:LOS. is one of the ten parts of speech, at least according to school grammars. Its special character of being something that shares ((deprecated template usage) Lua error in Module:parameters at line 376: Parameter "sc" should be a valid script code; the value "polytonic" is not valid. See WT:LOS.) qualities of both verb and adjective makes it worth distinguishing it from other POS. On el.wiktionary we follow this distinction and use μετοχή as an L3 header for Greek words. I see that there is in use a Participle L3 header for "some Russian, Lithuanian, and many Latin entries" (Wiktionary:Entry_layout_explained/POS_headers). So I think that we could also discuss the possibility of a more extended use of this header. --flyax 15:29, 12 July 2011 (UTC)Reply

That's what I originally considered the best possibility (or rather after Prince Kassad's comment in the initital discussion) since there appear to be cross-linguistic problems of assigning participle forms to parts of speech, but at the moment I tend to a language-specific approach (I'll give my arguments later). That doesn't mean, though, that it's impossible that more languages use a Participle header, let alone that the header is wrong for the languages that already use it. Longtrend 15:39, 12 July 2011 (UTC)Reply

Since this discussion is currently inactive (thank you all for your contributions!), I'll try to sum it up and draw my personal conclusions from it. If there is one thing that we all agree on, I think it's the fact that the matter is very complicated and not easy to handle. Put more concretely, it is not desirable to simply have a linguistically universal Participle header for everything that is traditionally called a participle. Even if there seem to be cross-linguistic problems of assigning participles to a POS, each language should be considered separately and carefully.
For German, after this discussion and checking out some grammars, my personal impression is that the introduction of a Participle POS header should be taken into consideration. I'll give my arguments for that impression, which might also be relevant for other languages.

First of all, it should be questioned whether different kinds of participle in one language even form a more or less homogeneous class, or if they should be treated separately: e.g. for German, should present (pr.p.) and past participles (pa.p.) be treated the same or differently? Opinions differ slightly here, Peter Eisenberg's grammar Grundriss der deutschen Grammatik only treats pa.p. as infinite verb forms, but pr.p. as adjectives. But most grammars agree in putting pr.p. as well as pa.p. into the same class (mostly infinite verb forms). There is an interesting article by Heinrich Weber (unfortunately in German) discussing the classification of German participles on the basis of twelve criteria that help distinguish verbs from adjectives (such as including a verbal lexeme, governing accusative and/or oblique cases, usability as an adverbial, gradability). He comes to the conclusion that of those, pr.p. and pa.p. have eight charasterics in common, pr.p. and the infinitive six characteristics, pa.p. and infinitive also six, but only five common characteristics for pr.p. + adjectives / pr.p. + finite verbs and four common characteristics for pa.p. + adj. / pa.p. + finite v. So present and past participles have more characteristics in common both with each other and with the infinitive than with either finite verbs or adjectives. This is an argument in favour of treating German pr.p. and pa.p. basically the same, whatever that solution may look like.
So what header should we use for German participles: Verb, Adjective or what? All grammars I checked out agree in that pa.p. are to be treated as a verb form, but that most can also be used as an adjective. For pr.p., there is less of a consensus: For Peter Eisenberg and the Institut für deutsche Sprache (IDS), pr.p. are not (infinite) verb forms but adjectives that are merely formed from verbs. All other grammars I know of classify them roughly as verb forms, but some then weirdly say that they are used only as adjectives (such as canoo.net or the Duden-Grammatik which states that pr.p. aren't conjugational forms of verbs). Since pa.p. are also used to form complex tenses, I think we can agree that putting both pr.p. and pa.p. solely under an Adjective header makes no sense.
What solid arguments are there against using a Participle header for German (for both pr.p. and pa.p.)? Traditionally, "participle" was considered a separate part of speech. This has changed, now they are often regarded either as verbs or as adjectives, so this might be an argument against the Participle header. But I believe this to be simply due to basic differences between grammars and such dictionaries as the Wiktionary. We here at Wiktionary are forced to assign each word form to a POS. This is not the case for grammars. If we can't decide for a POS after considering all relevant aspects, why not recognize that what we need may be a separate POS? It might seem that pr.p. in German can be perfectly treated as adjectives, according to syntactic distribution and morphological inflection. But then they govern arguments like verbs, are generally not prefixable by (deprecated template usage) un- or gradable, etc. They simply don't fit either category. And the same is true for pa.p., which might seem to be clearly verbs. But then they can be used attributively, decline like adjectives, are sometimes governed by other verbs (unlike finite verbs, but like adjectives), etc. Let's assume we use a Verb header for German participles despite the adjectival characteristics. How would we solve the dilemma of needing to have an adjective declension table under a Verb header?

For those reasons, it seems to me that introducing a Participle header would be the best option for German. We could put declension tables there without a contradiction (as there would occur for declined "verbs") and at the same time link to the verbal origin. Just for clarification, lexicalized participles such as (deprecated template usage) wütend or (deprecated template usage) verrückt that are now true adjectives would of course be unaffected. All those who disagree with me: in which points exactly do you think I'm wrong or I drew wrong conclusions? I'd be very glad to hear your comments, especially since I really want to reach a consensus. I'm well aware that introducing a new POS to a language needs more justification than keeping the status quo -- but the status quo in this case is not an option, since currently we have no way at all to treat declined participles (AFAIK there is not a single such entry on Wiktionary yet). Longtrend 18:54, 15 July 2011 (UTC)Reply

I imagine Dutch will be treated the same, because its participles are more or less identical to German ones. Is the situation for the Romance languages much like German as well (apart from the fact that they show gender agreement in predicates, which German doesn't)? —CodeCa t 19:04, 15 July 2011 (UTC)Reply

French and Spanish (the only Romance languages I speak) differ from German in important ways: (1) French distinguishes blatantly and obviously between present participles, which are very restricted in their uses and which do not inflect for gender or number, and adjectives derived therefrom, which are normal adjectives and often spelled differently from their participles; (2) Spanish has two different constructions that could be called "present participles", of which one (the gerundio; here we call it the "adverbial present participle") is considered to be a verbal adverb and does not inflect for gender or number, and the other (the participio presente; dunno if we have a name for it here) is no longer productive, but rather survives only as various nouns and adjectives; (3) neither French nor Spanish requires the declension tables that have Longtrend so bothered, since their adjectives and past participles inflect only for gender (masc/fem) and number (sing/pl), not for definiteness or position or case. The closest thing to that is Spanish forms like (deprecated template usage) dándo-, which we're currently not worrying about SFAICT, and which anyway are further evidence for ===Verb===ness. Personally I still suspect that ===Verb=== is the way to go for German as well, but many of Longtrend's reasons for using ===Participle=== for German don't apply to French and Spanish anyway. —Ruakh_TALK 20:17, 15 July 2011 (UTC)Reply

Thanks for the analysis and information you provide. It clarifies things much for German. My conclusion is that a Participle POS is no more justified in German than in English or in French. Why? Because specialists call them either verb forms or adjectives. It's possible to treat the declension of adjectives in an adjective declension table, and the declension of verb forms in the conjugation table. Lmaltier 19:57, 15 July 2011 (UTC)Reply

But that would mean that some verb forms can have an adjective declension section. Do we want that? —CodeCa t 20:05, 15 July 2011 (UTC)Reply

@Lmaltier: Since when do you listen to specialists' analyses rather than the speakers' emotions? Or is it just because it's a convenient way to prove my point wrong? I already said why I think it is that participles are often treated either as verb forms or as adjectives. Until you respond to my arguments, I see no reason to take over your point of view instead. While responding, keep in mind that experts by no way agree in the decision whether participles are verbs or adjectives. Longtrend 20:12, 15 July 2011 (UTC)Reply

I always said that we should not invent anything (and this is one of the basic principles of the Foundation),and that we should follow specialists, traditions of the language. And verb forms cannot have an adjective declension section, as they are not adjectives. The best place for the declension of these forms is the conjugation table. Also note that I don't propose anything on how to deal with the question in German (this is not easy if opinions differ among specialists, and it's true that a decision should be taken). I only think that, in German, we can do with the verb and adjective POS, according to what you explain. Lmaltier 06:30, 16 July 2011 (UTC)Reply

You still haven't responded to my argument about the difference between grammars and Wiktionary. What linguists do agree in is that German participles have characteristics of both verbs and adjectives, so I have a bad feeling about just squeezing them in one of these groups. (And putting them in both groups would suggest that in one usage they are clearly verbs, while in the other they are clearly adjectives, which does not seem to be the case either.) I don't really see a problem about a Participle header, which to the contrary would solve those problems. This is my impression specifically for German, my proposal would not affect any other languages, since I know too little about them -- we should refer to linguists' analyses there as well. If you worry that Participle is not a proper POS, well, "proper noun", "prefix" and "symbol" aren't either, as is even in the ELE. Do you think the Participle POS is inappropriate in Latin, too?
You probably know what I meant by "verbs would have adjective declension sections": this was short for "verbs would have a declension section that would include exactly the same forms as adjective declension templates". And this would be a problem in my opinion. You propose to include those forms in conjugation tables. Just to make sure I understand you correctly: you want to change the verb conjugation template so it includes all the declined forms? But that's declension, not conjugation as the header would suggest. The contradiction remains. Participles inflect for completely different categories than normal verbs. Longtrend 09:56, 16 July 2011 (UTC)Reply

For Latin, I don't know: when I was learning Latin, participles were ccnsidered as verb forms, but this tradition may be different in different countries, and may change with time. The tradition to be adopted is the one currently used for Latin in English-speaking countries. In French, nobody considers that it's a problem to consider aimée, aimées, aimés as conjugated forms of aimer. I don't see why it is a problem to decline a verb form. Lmaltier 11:19, 16 July 2011 (UTC).Reply

Discussing this topic would be a lot easier if you replied to my arguments and all my questions... Longtrend 12:21, 16 July 2011 (UTC)Reply

Maybe someone else wants to answer my questions and concerns then. To be honest, it's not that important for me to have a Participle header for German. I don't think a Verb header would be really wrong or anything. I just want a solution that works for German (and of course I want that solution to be as good as possible, so I still like the Participle solution best), and I can't imagine how a Verb header could work for declined participles. Any concrete suggestions? Even if so, why bother when "Participle" could do the job effortlessly and is obviously not really wrong (to say the least)? Longtrend 17:12, 19 July 2011 (UTC)Reply

It seems strange to have certain verb forms listed under ===Participle=== rather than ===Verb===, given that we don't generally use different POSes for different inflected forms. I mean, assuming you're still planning on definitions like “Past participle of spielen.”? And I really don't see why the verb's ====Conjugation==== section, at the lemma (infinitive) entry, can't provide all forms of the participle. It just doesn't seem like ===Participle=== buys us anything. —Ruakh_TALK 17:35, 19 July 2011 (UTC)Reply

Thanks for your reply. Well, IMO the advantage of ===Participle=== over ===Verb=== is that with the latter, we would say "this word is a verb" despite all its adjectival characteristics, while with the former we would admit that the issue is more complicated than that. AFAICT, it simply reflects the linguistic facts better.
My problem with listing all participle forms under the infinitive entry is the following: a form like (deprecated template usage) spielendem (dative, masculine or neuter, singular) is clearly declined, not conjugated, it can clearly be traced back to a base form (deprecated template usage) spielend. I'm not aware of any other case where there is a word form which on the one hand is an inflected form of some lemma, but simultaneously serves as the base form for a group of differently-inflected items. The latter in this case is declension, the former (allegedly) conjugation. Or is it just a terminological issue?
I also don't think it's so clear we're talking about "verb forms" here as you seem to assume tacitly. Saying that they are not verb forms but, well, participles, seems to work just well for Latin, see (deprecated template usage) amāns: the "definition" is a translation, while the "present participle" part is in the Etymology section. The only problem I see is the following: German past participles, unlike present participles, all appear as part of complex tenses (which arguably makes them verbs, at least in this usage). And some intransitive ones cannot even appear in non-verbal positions or be declined, i.e. they show no adjectival characteristics. Of course, we might also use ===Participle=== in such cases and just omit the declension part, but this might fail to capture the fact that what we find here are quite unambiguously verbs.
Oh well. I certainly learned a lot about participles during this discussion, but regarding my initial question "How to list declined participles on Wiktionary?" I'm as perplexed as before. Longtrend 20:17, 19 July 2011 (UTC)Reply

"Tacitly", my foot; I explicitly said I was assuming it, and added a question mark for good measure! Regardless, from everything you've said, it seems clear that German past participles, at least, are certainly verb forms, even if they are also adjective-like. Re: "I'm not aware of any other case where there is a word form which on the one hand is an inflected form of some lemma, but simultaneously serves as the base form for a group of differently-inflected items": It happens. For example, in Hebrew, especially Classical Hebrew, if a verb-form has a personal pronoun as a direct or indirect object, then that pronoun can be incorporated into the verb-form as an additional nominal inflection; tishkakhénu, for example (in Lamentations 5:20; KJV "dost thou forget us"), is tishkákh ("thou dost forget", verb) + -énu ("us", object pronoun), where tishkákh is the second-person masculine singular imperfect/future/prefix-conjugation of shakhákh ("forget", verb). —Ruakh_TALK 04:07, 20 July 2011 (UTC)Reply

Okay, but if Wiktionary's basic policy indeed is "Don't invent anything" (as Lmaltier claims, and as you might assume tacitly? SCNR...), then that's no option for us. Saying that (deprecated template usage) spielendem is an inflected form of (deprecated template usage) spielen (verbal infinitive) rather than (deprecated template usage) spielend (present participle) is something I've never heard before. Compared to that, putting ===Participle=== as the POS is, if it all, just a ridiculously tiny "invention", and definitely not wrong (since they're definitely participles, we just don't agree if it's a proper POS). Longtrend 17:46, 20 July 2011 (UTC)Reply

AEL 3

Sorry for not going through all this tl;dr discussion - has any kind of agreement been reached by now, or are people still arguing about tiny details? -- Liliana • 15:29, 29 July 2011 (UTC)Reply

There's no agreement, but at the moment there's no discussion either. I also don't think we ever argued about "tiny details". There is no established way to treat German inflected participles on Wiktionary, and no imaginable way seems perfect. If you have any input, please feel free to contribute. Longtrend 22:37, 30 July 2011 (UTC)Reply

Just throwing in that on German Wiktionary, they do use POS headers called "Partizip I" (which is German for "present participle") and "Partizip II" (which is "past participle"). However, I couldn't find any discussion that led to the introduction of those headers (didn't search too long, though), and they don't seem to have any entry for an inflected participle, either. Note that inflected participle forms don't appear in verb conjugation tables. Longtrend 16:16, 1 August 2011 (UTC)Reply

I'm not a linguist, but I added some 10,000 Swedish entries to the English Wiktionary, including many of the most commonly used words. In order to get things done in a limited time, I systematically treated past participles as adjectives, giving their role as a verb form in the Etymology section. See for example arresterad, bekräftad, debatterad. This works fine with the existing templates used for Swedish adjectives. So far, this has not been controversial at all. Some future linguist may perhaps argue that these are not actually adjectives, but if they have the time to change my edits, they will find the job easy to automate by the fact that I followed a single pattern. --LA2 10:26, 11 August 2011 (UTC)Reply

Thanks for your input. Sounds like a workable solution, but the problem is that in many languages, including German, (past) participles are often clearly used as verbs, i.e. in compound tenses. Maybe that's not true for Swedish. Longtrend 22:31, 11 August 2011 (UTC)Reply

For Swedish verbs, the form that is used with verbs is the supine, which was originally the neuter form of the past participle. But they are not always identical anymore, since verbs with participles in -en have a neuter form -et but the supine has -it (this distinction is not original, though). —CodeCa t 22:47, 11 August 2011 (UTC)Reply

I've just seen this discussion so thought I'd chip in with a note about Luxembourgish. It's pretty much the same as German; the Luxembourgish participle (only one, rather than two in German) can be used as an adjective (either attributive or predicative), but it is also used in many compound tenses. Most verbs in Luxembourgish only have conjugations for the present tense, so for those every other tense (past, future, conditional, etc.) is formed using the participle. Therefore just having the entry as either an adjective or a verb form would be inaccurate. BigDom (t • c) 08:42, 5 September 2011 (UTC)Reply

August 2011

earliest-attestation categories

Latest comment: 13 years ago3 comments3 people in discussion

In such cases as we an say with some certainty — perhaps through research or by appeal to the OED and other authorities — that the earliest a word can be attested is 1922, or circa 1922, do we want to categorize it as such? Say, category:English words first attested 1900-40, with corresponding categories for "...1940-60", "...1960-80", "...1980-2000", "...2000-20", and, working down, "...circa 1900", "...1860-1900", "...circa 1850", "...1800-1850", "...circa 1800", "...1750-1800", and so on (with, for earlier centuries, perhaps fewer than the four categories per century I've envisioned for the 19th and 18th. Anyway, my choice of specific categories here was off the top of my head. I'm asking about the general idea. Too, specific categories will vary by language, with Esperanto, say, having pre-1887, 1887-1904, and other categories, perhaps).—msh210℠ on a public computer 06:07, 4 August 2011 (UTC)Reply

I like the general idea. --Daniel 23:41, 7 August 2011 (UTC)Reply

It would be a lot of work, and we would find that many authorities disagreed on the earliest attestation of a word (I discovered that making this list; for quartz, for example, there's a range of more than a century). On the other hand, it could be quite useful for some things and to some people. I don't oppose the idea. - -sche (discuss) 23:46, 7 August 2011 (UTC)Reply

Special:NewMessages

Latest comment: 13 years ago4 comments2 people in discussion

What's the deal wtih New messages, which appears in the upper right-hand corner of every page between "My watchlist" and "My contributions"? If I had new messages, wouldn't they be on my talk page? —An gr 13:38, 4 August 2011 (UTC)Reply

We have installed here, for use by those users who want it on their talkpages, LiquidThreads. See, e.g., [[user talk:Yair rand]]. If you post something using that system and get a reply, it will show up on your watchlist as "You have new messages" or something like that with a link to [[special:newmessages]], which latter (is also linked to from the top of each page, as you've seen, and) lists all the replies you've gotten using LiquidThreads. If you want for any reason to hide the "New messages" link atop each page, add #pt-newmessages{display:none!important} to your CSS ([[special:mypage/vector.css]] if you use Vector).—msh210℠ (talk) 15:51, 4 August 2011 (UTC) 16:03, 4 August 2011 (UTC)Reply

OK, thanks. It does seem like it would be better to call it something other than "New messages", since that's exactly what new messages on one's user talk page are called. —An gr 15:57, 4 August 2011 (UTC)Reply

Yes, the whole thing is somewhat poorly executed.—msh210℠ (talk) 16:03, 4 August 2011 (UTC)Reply

Updating anagram format

Latest comment: 13 years ago2 comments2 people in discussion

I've actually only avoided raising this issue as I consider it so minor in relation to other areas where we could make progress. Wiktionary:Votes/pl-2009-12/Modify anagram section of ELE is now out of date as {{alphagram}} displays nothing unless the first parameter isn't a valid page name. So this example:

* {{alphagram|opst}}: [[opts]], [[pots]], [[spot]], [[stop]], [[tops]]

displays in fact

Template:alphagram: opts, pots, spot, stop, tops

That is, an isolated colon with a space either side of it, preceded immediately by a bullet point. I'd simply like to amend this vote to exclude {{alphagram}} and delete it (or RFDO it and let the community make that decision separately). Not necessarily delete it to never come back again, but it shouldn't be allow to be used whilst it's blank, and Conrad.Bot which added it in the first place is inactive, Conrad.Irwin hasn't said whether he intends to use it ({{alphagram}}) again. --Mglovesfun (talk) 14:07, 4 August 2011 (UTC)Reply

Sounds good to me.—msh210℠ (talk) 15:48, 5 August 2011 (UTC)Reply

Romanizations of languages in ancient scripts

Latest comment: 13 years ago23 comments9 people in discussion

This point has been brought up before but it has never really been properly solved. Many old languages on Wiktionary were written in scripts that are no longer common and the texts in which they appear are more commonly published in romanized form than in the original script. The situation would be as if ancient Chinese texts were now almost exclusively published in pinyin. So although the original script was the only script used in contemporary attestations, modern readers will almost exclusively read texts in that language in Latin script. Grammars and dictionaries are written in Latin script as well, and this is the script that people will most likely want to look up words in. So I think using Latin script as the main script of these languages would have far more practical value for users than the original script ever will. I'm not saying that the words should not be present in the original script, but I would prefer it if we turned the tables: that the entries in original script link to the modern Latin-script versions of the terms. —CodeCa t 14:15, 4 August 2011 (UTC)Reply

Sounds like a good idea in principle, but there may be a spectrum with no clear boundary here. For example, Sanskrit is usually written in Devanagari in India, but usually in romanization in the West, so it's not clear which should predominate. (Of course, Devanagari is a script that's still widely used for modern languages too, so that may tilt the tables in its favor.) Some languages' scripts aren't even encoded in Unicode yet, like Tocharian, so everything in Category:Tocharian A language and Category:Tocharian B language is already necessarily in romanization. But I think you have a good point for, say Gothic and Primitive Irish. No one really goes around reading Gothic script or Ogam nowadays; instead, romanization is practically universal. Definitely worth thinking about. —An gr 14:42, 4 August 2011 (UTC)Reply

To be honest, I'm a bit tired of you bringing up that topic once again. Last time, it's been shown that the community doesn't want this, repeating the whole discussion will not change a thing. You can try starting a vote if you really want to push this through, but whether or not this will pass is entirely up to the people here. -- Liliana • 12:39, 5 August 2011 (UTC)Reply

It bears consideration. Earlier this year somebody deleted our once considerable collection of romanized Sumerian, Akkadian, etc. I think these ancient languages that used inadequate or little-known scripts deserve at least the treatment that we permit for Chinese and Japanese. If an ancient language is usually studied in the English-speaking countries using the Latin script, then we should have the romanized spelling just like we do with Pinyin. Whenever entries can be created in the original ancient script (whether cuneiform, Devanagari, hieroglyphics, Mayan logographic script, Linear B, or whatever), then the romanized spelling could be made to redirect to the ancient script. Deleting ancient words that are added in the Roman alphabet was a dreadful loss, and maintaining all the entries in these ancient languages strictly in their lesser used and rather inaccessible scripts makes them not very useful to the people who want to study those languages. —Stephen ^(Talk) 13:24, 5 August 2011 (UTC)Reply

I'm not at all sure about making the Latin alphabet the main script for these languages, as CodeCat suggests, i.e. having the Latin-alphabet entry be the primary one, while the original-alphabet entry merely says "<Original alphabet> spelling of <Latin alphabet>" or the like. But we should definitely have listings for the romanizations of such entries. For example, qino should be an entry saying something like "Romanization of (deprecated template usage) 𐌵𐌹𐌽𐍉 (qinō)" rather than a red link. —An gr 14:13, 5 August 2011 (UTC)Reply

Last time it came up, I saw no definitive resolution, certainly not that "the community doesn't want this". I find it silly to have entries only in scripts that they literally have never been published in.--Prosfilaes 21:32, 5 August 2011 (UTC)Reply

Romanizations are generally mentions, not uses. However if you romanize a whole text, or a whole series of texts in Gothic, this would be Gothic in Latin script, right? So the uses of the words are uses and not mentions. Am I missing something here? So you don't need an exception to CFI. Mglovesfun (talk) 11:22, 6 August 2011 (UTC)Reply

Yes, you’re missing something here. Ancient dead languages are not like modern living languages. Books and magazines are not being published in Sumerian or Akkadian. The vast majority of the known texts are in the form of images. Words of Spanish are used, but words in Sumerian are only mentioned. A couple of ancient languages are in the process of being revived, which is why we have an Old English Wikipedia; and a couple (Old Coptic, Ge'ez, Old Church Slavonic) are still in limited use liturgically, but most of these languages are simply studied, compared, and referenced. —Stephen ^(Talk) 12:36, 6 August 2011 (UTC)Reply

I agree with Angr: ... qino should be an entry saying something like "Romanization of (deprecated template usage) 𐌵𐌹𐌽𐍉 (qinō)" rather than a red link. — I support the idea of romanisation of some languages, Gothic and others, for practial reasons. I've been thinking about Gothic script for a while (see [1] and here) but I guess that most users won't be able to type any Gothic characters, so there should be some kind of romanisation. Most dictionaries and grammars use romanised Gothic, so we shouldn't be "more catholic than the pope". --MaEr 14:17, 6 August 2011 (UTC)Reply

I agree with User:Angr and MaEr, "qino should be an entry saying something like "Romanization of (deprecated template usage) 𐌵𐌹𐌽𐍉 (qinō)" rather than a red link". - -sche (discuss) 23:31, 6 August 2011 (UTC)Reply

I agree too, it makes no sense to have entries that nobody can search for. Romanised entries would be very helpful. BigDom 09:48, 7 August 2011 (UTC)Reply

I don't know if you're missing anything, but it doesn't seem to have been our practice to record Gothic in the Latin script, even though it so published.--Prosfilaes 18:37, 6 August 2011 (UTC)Reply

Maybe I'm missing something, discussing is not my strong side. — Indeed, it isn't practice to record Gothic words in Latin script in this wiktionary, but in my opinion there should be some romanised entries that link to the Gothic script entries (as Angr suggested). Otherwise users have nearly no chance of looking up a Gothic word.

Imagine you find a word like aþþan in an etymological dictionary: how would you look up this word in Gothic script? --MaEr 09:22, 7 August 2011 (UTC)Reply

Appendix:Gothic script -- Liliana • 11:39, 7 August 2011 (UTC)Reply

Do you really expect people to copy and paste each individual letter for every word they look up in Gothic?? —CodeCa t 12:05, 7 August 2011 (UTC)Reply

No. This is why the Gothic script is featured in the edittools, so you can just click on all the letters you need to create your words. -- Liliana • 12:22, 7 August 2011 (UTC)Reply

If I had to do that just to look up one word, I'd probably find a better dictionary instead... —CodeCa t 12:29, 7 August 2011 (UTC)Reply

I would think you could integrate the edittools toolbar onto the Main Page somehow, given good enough JavaScript skills. The setting on WT:PREFS doesn't seem to work for me, but in my opinion, this solution would be much easier to implement than a policy change. It wouldn't be a hassle for readers at all if it were implemented this way, as very many non-Latin dictionaries feature such a system. -- Liliana • 12:34, 7 August 2011 (UTC)Reply

This may work for some scripts like Gothic that are still superficially similar to Latin. But it would not work for cuneiform which is very different. Should we expect students of Hittite to learn cuneiform? —CodeCa t 12:49, 7 August 2011 (UTC)Reply

This would not work even for Gothic. If somebody wanted to look up the Gothic words eyz or noicz, how would he be able to transliterate into the Gothic script unless he knew the alphabet? (And, of course, he would need a Gothic font installed or he wouldn’t see anything but boxes.) Many Gothic transliterations are even more cryptic than these two. The edittools are there for our editors, and not really for someone trying to look up a word. People who study ancient dead languages have completely different goals than those who study modern languages, and most of them have little need to learn the ancient scripts, particularly if it is a difficult one like cuneiform or hieroglyphics, and more often than not will carry out most of their studies on words in the Roman alphabet. —Stephen ^(Talk) 13:13, 7 August 2011 (UTC)Reply

It's not a hassle for readers at all to force them to transliterate words into an archaic script that isn't used any more? I think you have a different definition of the word hassle than I am, because I think that would be a PITA.--Prosfilaes 20:04, 7 August 2011 (UTC)Reply

I count six people in this discussion who (seem to) want or tolerate Romanisations, and one person who doesn't want them; I presume more supporters and opponents have commented in other discussions. (I hope one of the other discussions explained why pinyin and romaji are allowed.) So, let's set up a page for a vote (but not start the vote yet), so that we can begin working out how the vote should be set up, and ultimately decide this issue. As to how to set up the vote: I suggest having (on the same page) different votes for different languages, so users can (for example) vote to allow Romanisations of Gothic but oppose allowing Romanisations of Hittite, if they want. - -sche (discuss) 20:35, 7 August 2011 (UTC)Reply

I've created a vote: Wiktionary:Votes/pl-2011-08/Romanization of languages in ancient scripts. —CodeCa t 21:34, 7 August 2011 (UTC)Reply

AWB

Latest comment: 13 years ago3 comments2 people in discussion

Hi, I wondered if I could be put on the approval list to use AutoWikiBrowser. I used it over at Wikipedia even before becoming an admin there so I know how to use it. At the moment, I need it to fix a small error I found in the conjugation table {{lb-conj-regular}} which has affected a few pages and using AWB would be far quicker than going through the individual entries. Cheers, BigDom 16:51, 5 August 2011 (UTC)Reply

I don't see a problem with it. Granted. -- Liliana • 16:54, 5 August 2011 (UTC)Reply

Thanks, appreciated. BigDom 16:58, 5 August 2011 (UTC)Reply

Glosses in old languages

Latest comment: 13 years ago4 comments3 people in discussion

Some words in old languages, such as Old High German, are attested only as glosses with translations in foreign-language (usually Latin) texts, instead of in running texts. In theory this is a mention and not a use, so as far as I know those words would fail CFI. However, because of the special situation of old languages, especially those that are sparsely attested, should there be an exception for these cases? —CodeCa t 21:46, 5 August 2011 (UTC)Reply

This is also a problem for rare and recently extinct languages and dialects, such as the Vegliot dialect of Dalmatian, for which nearly all information comes from a German translation of an Italian text written by the scholar Matteo Giulio Bartoli and based on an interview with the sole surviving speaker of the language (and he was old, polylingual, partly deaf, and hadn't spoken the language in 20 years at the time of the interview).

For Classical languages and languages known only from scholarly publications, the attestation criteria are normally relaxed. --EncycloPetey 04:51, 8 August 2011 (UTC)Reply

Liliana (Prince Kassad) is a fan of allowing mentions for otherwise unattested languages. I'm not really a fan myself, but I doub I would actually object to it if it were voted on. But I also doubt I'd support it. --Mglovesfun (talk) 11:00, 9 August 2011 (UTC)Reply

For clarity's sake, I'd object to mentions being allow for all dead languages, particularly as some dead languages are quite well attested, better attested in writing than some living languages! But for otherwise unattested languages such as Dalmatian, I would neither oppose nor support it (what I said above). --Mglovesfun (talk) 11:01, 9 August 2011 (UTC)Reply

Regional distribution of colloquial terms

Latest comment: 13 years ago3 comments1 person in discussion

For colloquial terms it is often difficult to find information about regional distribution. Evidence for pan-UK and pan-US usage is not too hard, but print evidence for other usage seems more difficult. How can we accumulate evidence on other colloquial use? Is there a tag-and-category arrangement that would help. Can we accumulate votes or opinions somehow, perhaps using an entry's talk page?

AFAICT, we have never had a systematic effort to address this. We have had individuals who advocated particular dialects (Ireland, Canada, Australia, and Singapore come to mind). No catch-all category addresses this problem. Do we need a project page for each region to provide focus for potential contributors who may have some familiarity with a particular region or dialect? What would be good regions or dialects for experimenting? Scotland? Ireland? Australia? AAVE? Canada? Southern US? India? DCDuring TALK 20:21, 6 August 2011 (UTC)Reply

Two entries that illustrate issues are toey (See also WT:RFV#toey.) and stupid-head (which bears an invisible remark in {{attention}}). DCDuring TALK 20:43, 6 August 2011 (UTC)Reply

Reviewing entry Talk pages that contain "black" yields some candidates for an AAVE page (or one otherwise named) that might also improve some definitions. DCDuring TALK 20:43, 6 August 2011 (UTC)Reply

Compound tenses in conjugation templates

Latest comment: 13 years ago5 comments3 people in discussion

I have recently been creating conjugation templates for Luxembourgish verbs; example {{lb-conj-regular}}. I was wondering if there is really any need to have the compound tenses in there, as the patterns don't change between verbs so just showing the relevant auxiliary verb should be enough. Currently, the number of parameters required to use the templates is getting out of hand. This is mainly due to the Eifel Rule, which means that -n or -nn endings are removed if the following word begins with certain consonants. Would anyone object if I removed these tenses and just include the forms that are actual conjugations of the verb, as in the Dutch templates (e.g. {{nl-conj-wk}})? The only problem would be that I would have no idea how to convert the existing template calls into the new form. BigDom 20:34, 6 August 2011 (UTC)Reply

P.S. the new template would look like this, which has parameters to add a line for preterite indicative and simple conditional if needed (only a few verbs have these conjugations). BigDom 23:09, 6 August 2011 (UTC)Reply

You could take a look at the tables on the Galician verb (deprecated template usage) cantar and Latin verb (deprecated template usage) amō, to see how this has been handled for some compound tenses in Romance languages. I'm not familiar enough with Luxembourgish to offer an opinion to you. --EncycloPetey 04:43, 8 August 2011 (UTC)Reply

That's fair enough, not many people are too familiar with Luxembourgish! I had a look at those, and came up with User:BigDom/Template:lb-conj, which is based on the Latin and French templates. BigDom 18:13, 8 August 2011 (UTC)Reply

Including compound tenses in tables is a help to readers, and there is no good reason to exclude it (in a paper dictionary, the good reason would be the paper used). Lmaltier 19:31, 18 August 2011 (UTC)Reply

d, di, de

Latest comment: 13 years ago21 comments4 people in discussion

It is not "bullshit", d and di are the alternatives of de (的) and de (地). Please see Basic Rules of Hanyu Pinyin Orthography Chapter 7.4. And see here and here. Engirst 11:28, 7 August 2011 (UTC)Reply

That site (pinyin.info) does support your claim. How reliable is it? Do any printed reference works make the same claim? - -sche (discuss) 20:42, 7 August 2011 (UTC)Reply

Please see the original Chinese edition Basic Rules of Hanyu Pinyin Orthography (The national standard of the People's Republic of China). Engirst 22:47, 7 August 2011 (UTC)Reply

uhha... What exactly is your point? It obviously says de is the accepted form and the others are secondary (i.e. COULD be used) in 4.7.4, which comes back to my point, why are you putting in translations with secondary forms instead of the accepted primary form? Or maybe you just want to be difficult? Jamesjiao → ^{T ◊ C} 22:52, 7 August 2011 (UTC)Reply

Also please see printed reference works here and here. Engirst 04:36, 8 August 2011 (UTC)Reply

And Chinese Romanization: Pronunciation and Orthography. Engirst 05:01, 8 August 2011 (UTC)Reply

Right, "d" and "di" are clearly secondary forms. Do you think we should use them rather than the primary forms? Why? - -sche (discuss) 05:08, 8 August 2011 (UTC)Reply

"But it may be desirable in certain situations to differentiate the three. In this case, they may be assigned different written forms: 的, the most commonly used, as "d"; 地 as "di"; and the third, 得, as "de"." (Please see here)

Anyway, the entry "di" shouldn't be deleted (Please see here). Engirst 05:56, 8 August 2011 (UTC)Reply

Have you lived in China at all? Try using pinyin with people, yes that's not going to turn out good for ya. So no, pinyin will never replace characters and will always STAY a pronunciation scheme. Anyway the pdf you linked, it states at the very end of the section that Note: when necessary for technical purposes, the characters (referring to the 3 discussed here) may be spelled as d, di, and de respectively.. What technical purposes? What was your purpose to prefer di over de in your translations? It makes no sense whatsoever to do that. Jamesjiao → ^{T ◊ C} 23:15, 8 August 2011 (UTC)Reply

The subject of this topic only let everyone know that it is not a "bullshit". Engirst 01:36, 9 August 2011 (UTC)Reply

de is the dominant form and will always be. You can list di and d as alternative forms, but should never use them in translations. It's misleading. By the way, stop creating pinyin entries until an agreement has been reached on how we will go about creating in the future. You will suffer a block again if you persist in your singleminded approach. Jamesjiao → ^{T ◊ C} 20:52, 7 August 2011 (UTC)Reply

We should follow Wiktionary's current rules and you too. Some entries of new format for experimental purpose just following your edit (please see here). Engirst 23:08, 7 August 2011 (UTC)Reply

Alright, my understanding of the above-cited references seems to mirror Jamesjiao's: "d" and "di" are secondary forms which exist, and which should definitely be mentioned in the main entries ([[的]] and [[地]], I presume), but which should be disused elsewhere in favour of the primary forms. - -sche (discuss) 04:48, 8 August 2011 (UTC)Reply

The pronuciation "di" (with no tone) exists for the two out of the three particles that have the normal reading "de" - Template:Hant and Template:Hant. This pronunciation is still common in songs and poems as the alternative to "de". It's seldom used in dictionaries and in my observation, it's discouraged in China like everything non-standard. I wouldn't include "d" at all. It must be incorrect pinyin, standard hanyu pinyin NEVER uses consonants on their own (without a vowel), apart from "r" (as a final only).

It is encouraged in China and is a National Standard. Please see the National Standard of the People's Republic of China for your reference. Engirst 06:45, 8 August 2011 (UTC)Reply

YES, it is a standard. It is a standard for PRONUNCIATION for Mandarin speakers just like IPA is an international standard for pronunciation. A pronunciation standard is such that it doesn't contain any ambiguity (that's why English words themselves cannot be used as a pronunciation guide because their pronunciations are ambiguous!). It does not replace Chinese characters and never will. Jesus, how far would you go to twist and turn words like that to fuel your vain attempt at degrading this dictionary into a pinyin dictionary? I am not sure what I have to do to drill this into your brain. Why don't you just make an IPA dictionary as well for all the languages on this website? Go ahead. Jamesjiao → ^{T ◊ C} 23:15, 8 August 2011 (UTC)Reply

Where do you see "d" on its own? Hanyu pinyin is a national standard for romanisation and as the learning tool, not a replacement for the proper script - hanzi. --Anatoli 20:54, 8 August 2011 (UTC)Reply

Pinyin entries are convenient for users to learn Chinese. Only you said that Pinyin are for replacing Hanzi. Engirst 00:44, 9 August 2011 (UTC)Reply

BTW, pinyinfo is an interesting site using a lot of pinyin but their objective is replace hanzi with pinyin as the standard Chinese Mandarin script, same as our ill-famed abc123 aka Engirst, etc. --Anatoli 05:33, 8 August 2011 (UTC)Reply

You have also chosen articles that favour your arguments. See this: w:zh:汉字改革, w:zh:汉语拼音, especially the section on 汉语拼音化 (pinyinisation). Oh yeah, another interesting to note that everything on the zh wp is written in characters, not pinyin!! Does that tell you something? Jamesjiao → ^{T ◊ C} 23:29, 8 August 2011 (UTC)Reply

Don't depart from the topic. The subject of this topic only let everyone know that it is not a "bullshit". Engirst 01:16, 9 August 2011 (UTC)Reply

Alright, let's put usage notes at [[的]] and [[地]] explaining that "d" and "di" exist as (nonstandard? uncommon?) secondary romanisations of the characters, noting (if desired) which authorities/references give them as secondary romanisations. (Unless someone has a specific argument against providing this information, e.g. that the information is invalid. Even then — even if it is invalid — if it is in printed reference works, it would seem helpful to users to have a usage note like "XyzReference lists "d" as a secondary romanisation of this character, but this is wrong...") Consensus, however, is not to use those romanisations anywhere else. There furthermore appears to be an argument about whether Chinese is written in characters (such as 革) or in pinyin, which is spilling over into this thread from elsewhere; consensus on that issue is clearly that Chinese is written in Chinese script characters. - -sche (discuss) 23:55, 8 August 2011 (UTC)Reply

Vote: Attestation of extinct languages 2

Latest comment: 13 years ago1 comment1 person in discussion

FYI, I have opened the vote: Wiktionary:Votes/pl-2011-05/Attestation of extinct languages 2. --Dan Polansky 09:57, 8 August 2011 (UTC)Reply

The problems of Mandarin entries

Latest comment: 13 years ago13 comments6 people in discussion

What is your suggestion for solving these problems? Engirst 10:11, 9 August 2011 (UTC)Reply

Untoned pinyin is not allowed. We should follow rules. :) —CodeCa t 11:20, 9 August 2011 (UTC)Reply

We are talking about the search and redundancy problems. Please read these problems clearly first. Engirst 12:29, 9 August 2011 (UTC)Reply

Entries are already searchable by pinyin on Wiktionary. Type in "yinyue" into the Wiktionary search bar and you'll see pinyin and characters are all searchable. ---> Tooironic 13:38, 9 August 2011 (UTC)Reply

Please read these problems clearly first. Engirst 17:26, 9 August 2011 (UTC)Reply

Engirst --

It is clear that Tooironic has already "read these problems clearly". As previously noted, entries are already searchable by pinyin on Wiktionary. Try it. Seriously. Enter toneless pinyin into the Wiktionary search bar, and the results you get are quite close to the "good solution" you link to on Jamesjiao's Talk page. WT effectively already implements what you are suggesting, obviating any need for toneless pinyin entries. -- Eiríkr Útlendi | Tala við mig 18:10, 9 August 2011 (UTC)Reply

Thanks for your response. But the problems are: one problem about yapo mentioned by Contributions/71.66.97.228; another I am talking with Jamesjiao is about the duplication of traditional and simplified character entries. Engirst 20:33, 9 August 2011 (UTC)Reply

Perhaps you could restate the exact issues, then? Reading User_talk:Jamesjiao#yapo, the primary issue appears to be about searching, which is already addressed, and about page overlap between toneless pinyin entries and other languages, which is moot since toneless pinyin pages are not needed and should be (are in the process of being?) removed.

I see your mention of duplication issues, but you do not give enough detail there for me to understand what you mean. Is your concern about duplication that the same entry content is duplicated across multiple heading words, such as 馬 and 马? This is an issue for multiple languages, even English (c.f. color vs. colour -- the content should be mostly identical, as these are essentially the same word, only spelled differently -- just as, for example, 呪い and 詛い in Japanese).

Please explain. As it is, the main concern of yours that I can understand has already been dealt with. -- Cheers, Eiríkr Útlendi | Tala við mig 20:50, 9 August 2011 (UTC)Reply

Yes, we are talking about the duplication such as 馬 and 马 (Please see here as Jamesjiao mentioned).

This should be a good solution. There is no duplication of entries of the dictionary of this "good solution". Please see the search results of "蘋果", "苹果", "ping2guo3" and "pingguo", there is no duplication indeed. Engirst 03:57, 10 August 2011 (UTC)Reply

Again, just ignore him, he's trolling again. It is true that the trad/simp entries could be synchronised in a way to make it easier to contribute, but so far no one has come up with any kind of solution. ---> Tooironic 00:46, 10 August 2011 (UTC)Reply

For all that, Engirst has apparently hit upon a real issue that has been a conceptual niggling thorn in my side as well. However, the crux of the issue -- the need to have multiple index fields having the same descriptor content -- touches on one of the core limitations of the wiki structure: you can transclude, but you can't have more than one index field (i.e., headword) per page. Dictionaries like the one that Engirst points to as potential solutions use very different back-end database structures, something that is just not possible on the current generation of wiki software (and probably won't be possible for the foreseeable future). This structure works fine for an encyclopedia, but it has real shortcomings when people try to apply it to a dictionary.

Several months back, I recall participating in a similar discussion about how to unify English-language entries such as color and colour. There just doesn't seem to be an elegant way to do it; labeled section transclusion presents itself as one option, as does fancy selective transclusion using {{#ifeq:}} calls, but then the trouble is still that the content must reside under just one headword and then be referenced by the alternate spellings. Another option might be redirects, but then the destination of the redirects must include some way of explaining the alternate spellings and the reason for the redirection. The Semantic MediaWiki extension seems the most promising, and some folks have built interesting tools using this that might do the kind of many-headwords-to-one-entry structure that Engirst seems to desire, but I don't think this extension is enabled for WT, and it would require a gargantuan amount of work to support here.

So Engirst, if you're reading this, I do feel your pain -- but there's nothing for it, unfortunately, as the reason that Wiktionary needs separate pages for 蘋果 and 苹果, or 馬 and 马, or colour and color, comes down to the core fundamentals of how the wiki software is designed -- and that's not going to change any time soon. -- Cheers, Eiríkr Útlendi | Tala við mig 06:31, 10 August 2011 (UTC)Reply

Don't feel too sympathetic, if you don't know the full story. The technical limitations were always there but the work on Mandarin and Serbo-Croatian was continued nevertheless, despite the necessity to maintain duplicate entries. People like Engirst slow down the work by not following the accepted rules and creating further redundancy, completely out of synch with existing simplified/traditional Mandarin entries, causing a lot of extra work for others. All the requests and blocks were ignored and he continued to do what he wanted using multiple anonymous accounts. --Anatoli 06:44, 10 August 2011 (UTC)Reply

The problem doesn't really come down to "the core fundamentals of how the wiki software is designed", exactly, only to how the Wiktionary editing system works. If we were to switch to using javascript tools as the primary way to edit entries, synchronizing data would be pretty simple. Side question: Is there any specific reason that the pages with toneless pinyin titles don't get {{also}} added to them, pointing to the actual entries, or is it just that nobody bothered? --Yair rand 06:54, 10 August 2011 (UTC)Reply

hypocorism vs diminutive

Latest comment: 13 years ago3 comments2 people in discussion

All the sources I've found listing some hypocorisms agree about some entries which we've qualified of diminutive currently here. For example, our definition of this last term is:

A word form expressing smallness or youth

And the article Johnny says:

A diminutive of the male given name John

And I doubt that all the Johnnies were named from an older or bigger John...

That's the reason why I suggest to uniform these etymologies, by replacing the different below mentions by Template:hypocorism:

Alec: diminutive
Alex: shortened form
Lex: pet form
Kat: short form
Joe: common nickname
Deb: abbreviated form

JackPotte 18:01, 9 August 2011 (UTC)Reply

A diminutive also means a hypocorism. Isn't it simpler to add that definition to diminutive? Is the Wikipedia your only source? The diminutive definition is built in Template:given name. All diminutives/hypocorisms used to be defined as "given names", hence the confusion of terms above. Pet forms of given names are used in a different way in every language so strict standardization might not be a good idea. --Makaokalani 15:08, 10 August 2011 (UTC)Reply

Wikipedia is reliable, and the frontier is clear, as there as into all the dictionaries I've read, including in French which translations (hypocoristique & diminutif) are fully transparent.

"hypocoristic diminutive" isn't a pleonasm.

The origins of many surnames are obscured by one characteristic of the hypochoristic forms of many personal names, that is, the pet forms, diminutives, or 'short' forms of names.

Jacko is a diminutive (informal) whereas Jacky is an hypocorism...

I'll report these researches into our two articles when all the minds will be forged. JackPotte 20:56, 12 August 2011 (UTC)Reply

Common nouns and proper nouns

Latest comment: 13 years ago9 comments3 people in discussion

I seem to be unclear on the difference between common nouns and proper nouns. Why, for example, is German a common noun when it means "a person from Germany" but a proper noun when it means "the German language"? It's capitalized in both meanings. —An gr 13:49, 10 August 2011 (UTC)Reply

Capitalization does not make a noun proper, nor does lower case make it common. The difference between common and proper nouns is intrinsic and lexical, with the decision of whether to capitalize or not being secondary. Capitalization in most languages is more by convention than by type. Spanish does not capitlaize the names of languages, even though they are proper nouns. English capitalizes the days of the week, but does not really use them as proper nouns. German capitalizes all nouns. Further, capitalization even in English has varied through time, so that abstract nouns like socialism and liberty were once regularly capitalized even though they are not capitalized today. This reflects a change in style of writing, and not a change in grammar.

For more than you want to know about the difference between the two categories of noun, see the draft I started at User:EncycloPetey/English proper nouns. --EncycloPetey 14:36, 10 August 2011 (UTC)Reply

Thanks for the link to your draft. It's the first time I've ever seen a definition of proper noun that wasn't circular. Usually when I try to pin someone down on why something is a proper noun, they say "Because it's capitalized." And when I ask why it's capitalized, they say "Because it's a proper noun". What will be easiest for me to remember is that proper nouns are always definite and don't get pluralized (although some proper nouns are pluralia tantum, like "the Netherlands" and "the United States"). As for weekdays, I think they can be both. If I say "I'll do it on Friday", it's a proper noun as it's referring to a single unique day, but if I say "There are five Fridays in this month", it's a common noun because it's referring to members of a class. Noticing that your draft is called "English proper nouns", I wonder if it's possible to come up with a cross-linguistic definition of proper noun. Other parts of speech like "noun", "verb", "adjective", and "preposition" can be defined without reference to the language they occur in (though of course not all languages have all parts of speech). —An gr 14:52, 10 August 2011 (UTC)Reply

It is possible, but pushes into the realm of abstract linguistic philosophy, which will be understandable by few people. My choice was to work on a page treating English as exhaustively as possible, with enough examples and discussion to allow people familiar with other languages to make the extrapolation of the principles themselves. Even the quality of "definite" doesn't work across all European languages because there are shades of difference in what that means. Some languages have a "definite" and "indefinite" form for all their nouns. --EncycloPetey 14:55, 10 August 2011 (UTC)Reply

CGEL makes a distinction between proper nouns and proper names which we never have in English PoS headings, AFAICT, though former versions of CFI did.
"The central cases of proper names are expressions which have been conventionally adopted as the name of a particular entity - or, in the case of plurals like the Hebrides, a collection of entities."

"Proper nouns, by contrast, are nouns which are specialized to the function of heading proper names."

As I would apply these definitions "German" is both a common and proper noun. It is a proper name when referring to the language. This usage seems to make it a proper noun. When referring to the people "(the) Germans" would seem to be the proper name. When referring to an individual "German", it is a common noun (capitalized). It also seems to function as a full adjective, being gradable, comparable, and able to serve as a predicate without any article or determiner. DCDuring TALK 15:09, 10 August 2011 (UTC)Reply

The CGEL distinction between "proper noun" and "proper name" hinges on the fact that they define a word as a cohesive unit lacking internal spacing. Since Wiktionary works with terms as its units, and permits internal spacing in these terms, the distinction between a "proper name" and "proper noun" becomes moot. But, your summary of what CGEL says is spot on. --EncycloPetey 15:24, 10 August 2011 (UTC)Reply

It is also true that typically we do not have Proper noun PoS sections at entries like [[Germans]] for "the Germans". Shouldn't we? DCDuring TALK 19:11, 10 August 2011 (UTC)Reply

Also, aren't informal demonyms also proper names and, therefore, proper nouns for Wiktionary purposes, eg, "(the) Brits"? Even derogatory ones would be. DCDuring TALK 19:18, 10 August 2011 (UTC)Reply

Informal demonyms have some properties of a proper noun, but that's true of most substantive biological nouns, and not just demonyms. Compare: "We planted a conifer." to "The conifers grow in boreal climates." In the former sentence, you a speaking of a member of a group, but in the latter, you are referring to the category as a whole. We generally do not create a separate entry for these collective senses. --EncycloPetey 19:23, 10 August 2011 (UTC)Reply

Nei Mongol - Why is it locked?

Latest comment: 13 years ago2 comments1 person in discussion

"Mongol" of Nei Mongol shouldn't be an "abbreviation for Mongolia" (please see here for reference). Anyhow, the entry shouldn't be locked. Engirst 02:33, 11 August 2011 (UTC)Reply

Why is it locked?

The etymology of Nei Mongol seems has problem. Engirst 00:57, 1 October 2011 (UTC)Reply

Library of Congress vocabularies

Latest comment: 13 years ago2 comments2 people in discussion

Over at http://id.loc.gov/ you will find search and download entries to the Library of Congress' subject headings, name authorities and other vocabularies. For example, in the Geographic Areas file, you will find that "Sweden" has a broader term "Europe" and narrower terms such as "Lapland". This means library books tagged as Lapland may contain information about Sweden and Europe. It might be a hint that the Wiktionary entry Sweden should contain a pointer to the Wiktionary entries Europe and Lapland. I don't know if this is a useful source of ideas, but you can download it and play with it. Wiktionary has 81 links to loc.gov but none yet to id.loc.gov. --LA2 10:52, 11 August 2011 (UTC)Reply

It might be great as an authoritative substitute for our current unreferenced, whimsical topical category structure. It would allow the reversal of the hijacking of the usage context labels. DCDuring TALK 12:17, 11 August 2011 (UTC)Reply

Admin-only definition editing options trial

Latest comment: 13 years ago7 comments5 people in discussion

It was suggested in the earlier discussion on enabling the definitions editing tool for a trial period that it would be better to first have opt-out trials for only administrators. So what do people think about turning it on for two weeks for admins? --Yair rand 18:54, 11 August 2011 (UTC)Reply

Support. —Ruakh_TALK 18:04, 12 August 2011 (UTC)Reply

Support. DCDuring TALK 18:36, 12 August 2011 (UTC)Reply

Okay.—msh210℠ (talk) 17:53, 14 August 2011 (UTC)Reply

Support, sounds good to me. --Neskaya … gawonisgv? 07:04, 15 August 2011 (UTC)Reply

Okay, trial started. I'll put a disabling button in this section. Anywhere else that should have a disable button? Maybe in WT:News for editors, or is that not really kind of thing that goes there? --Yair rand 21:06, 17 August 2011 (UTC)Reply

The trial is now over. --Yair rand 20:58, 31 August 2011 (UTC)Reply

What counts as a "derived term"?

Latest comment: 13 years ago5 comments2 people in discussion

A string of dubious and excessive edits to Japanese entries leads me to wonder, what counts as a "derived term"? Do simple compounds warrant listing as "derivations"?

By way of example, have a look at the 魔法#Japanese page. The list of "derived terms" includes things like 魔法カード "magic card" and 魔法能力 "magic ability", among others. Both of these are just plain old compounds -- one word plus another -- and I could just as validly say 魔法茄子 "magic eggplant" or 魔法鉛筆 "magic pencil". Note that these terms are not customary set phrases, like magic carpet, but just plain old compounds.

Do compounds like this, of the exceptionally prosaic and unremarkable sort, merit inclusion in lists of "compounds" or "derived terms" on entry pages? -- Cheers, Eiríkr Útlendi | Tala við mig 06:14, 12 August 2011 (UTC)Reply

To clarify, now that my brain has picked up some speed, how do we decide if a combination of words is just a sum of parts, or if it counts as something more? -- Eiríkr Útlendi | Tala við mig 06:16, 12 August 2011 (UTC)Reply

Have you looked at WT:AJA? There is an associated talk page. Note the existence of an "Idioms" header. DCDuring TALK 18:40, 12 August 2011 (UTC)Reply

Thank you DCDuring, yes, I have looked at that page. What I'm wondering about here is not quite about idioms, but rather what counts as a "derived term" or "compound". The WT:AJA subsection Derived terms doesn't quite answer the question. (But thank you for prompting me to read through that page again, as it clarifies that only kanji headwords should have a "Compounds" section.) -- Cheers, Eiríkr Útlendi | Tala við mig 19:01, 12 August 2011 (UTC)Reply

I was hoping yours was a Japanese-specific issue.

I don't think this is a settled question at the margins - and the margins are ample. What can be under Derived terms would include all morphological or historically derived terms that meet WT:CFI. (I personally would prefer to put terms that are historically derived from other languages despite there being a morphological process to which the etymology could be ascribed in Related terms.) But there has also been inconclusive discussion about the desirability of inserting common collocations under Derived terms. I personally prefer having certain collocations illustrated in usage examples rather than in Derived terms, but, especially for large entries, citations appear on the Citations page where they are not searchable by default. DCDuring TALK 19:46, 12 August 2011 (UTC)Reply

unified Serbo-Croatian... by bot

Latest comment: 13 years ago8 comments4 people in discussion

Would it be acceptable to convert the subpages of Category:Croatian parts of speech to Serbo-Croatian by bot, as opposed to only by hand. It would be risk free, it wouldn't be possible to add the Cyrillic spellings but that's about the only thing a bot can't do. Specifically

Convert ==Croatian== to ==Serbo-Croatian==
Convert |hr}} to |sh}} (etyl templates)
Convert {{hr-decl-noun| to {{sh-decl-noun|
Convert |lang=hr to |lang=sh
Convert [[Category:Croatian to [[Category:Serbo-Croatian
Convert [[Category:hr: to [[Category:sh:
Convert {{infl|hr| to {{infl|sh|

This would leave {{hr-adj}}, {{hr-noun}} and {{hr-noun-coll}}. {{hr-noun-coll}} has few enough transclusions that it can be done by hand in a few minutes; not so for {{hr-noun}} and {{hr-adj}} though; these as a temporary measure could categorize in [[Category:Serbo-Croatian <adjectives|nouns>]] while waiting for them to be removed; or, depending on your taste, AWB can skip any pages featuring these two templates. I'm pretty sure you can set up AWB (AutoWikiBrowser) to skip if it finds a certain sequence of characters on a page, such as {{hr-noun|. Mglovesfun (talk) 12:59, 14 August 2011 (UTC) IFYPFY.—msh210℠ (talk) 00:37, 14 November 2011 (UTC)Reply

The work of editors that have chosen Croatian headers, etc. must be respected. Lmaltier 19:14, 18 August 2011 (UTC)Reply

Even if I bought into this argument (which I don't) Ivan Stambuk created the majority of the Croatian entries, and he supports converting them to Serbo-Croatian. Mglovesfun (talk) 08:58, 19 August 2011 (UTC)Reply

I have no objections. —Internoob (Disc•Cont) 19:02, 21 August 2011 (UTC)Reply

It looks interesting and I support it but not sure what to do with existing Croatian, Bosnian and Croatian translations, they may coincide (same words) or differ with existing Serbo-Croatian translations, many of them are not formatted with {{t}}, only have square brackets. Serbian may have nested Cyrillic and Roman (occasionally Latin) translations and sometimes no nesting. If they coincide with existing Serbo-Croatian translations, they should not add duplications. --Anatoli 00:26, 22 August 2011 (UTC)Reply

User:Mglovesfun/vector.js converts Serbian to Serbo-Croatian in translation tables, but does not convert Bosnian and Croatian to avoid possible duplication. Note: translation templates don't appear in the above proposition. The reason the vector converts Serbian is that they're the closest alphabetically and that only Serbian uses both script. Also, some Serbian translations only use the Cyrillic script and use a transliterationn into the Latin script, this despite the fact the Latin script is official in Serbian. I read a book on the matter while I had no Internet, and the book wasn't even a recent print! Mglovesfun (talk) 07:58, 22 August 2011 (UTC)Reply

Serbian uses both Cyrillic and Roman. --Anatoli 09:50, 22 August 2011 (UTC)Reply

Yes... Latin and Cyrillic. Mglovesfun (talk) 10:12, 22 August 2011 (UTC)Reply

Current votes

Latest comment: 13 years ago2 comments2 people in discussion

These are the current votes:

--Daniel 04:04, 15 August 2011 (UTC)Reply

Thanks! --Neskaya … gawonisgv? 07:03, 15 August 2011 (UTC)Reply

native-languages.org

Latest comment: 13 years ago2 comments2 people in discussion

Hello, do you think we could get informations from this website. On their FAQ, we can read

Q: May I reprint information from your website on my own website or blog?

A: Yes, as long as you link back to our website from the page where you have used our information.

Yet, there is also

Q: I am a teacher. May I use information from your website in my classroom?

A: Yes. All of the materials on our website may be freely used for noncommercial educational purposes.

Problem is the first affirmation looks mean we can get this information if we cite them but the second one says it is forbidden for commercial uses what is not compatible with Wiktionary licence. Maybe, we could write them to ask if we can exeptionnally import these data on Wiktionary? What do you think? Pamputt 07:41, 17 August 2011 (UTC)Reply

To be honest I don't really trust their information, so it's probably not worth asking. -- Liliana • 09:31, 17 August 2011 (UTC)Reply

including context tags in inflected forms (of sh entries)

Latest comment: 13 years ago6 comments4 people in discussion

This vote (which links to kolovoza) raises for me a point worth discussing: for Serbo-Croatian entries, should we allow dialect/sublanguage context tags not only in main entries (kolovoza), but also in form-of entries? That would have the benefit of clarifying that the series of letters kolovoza is only used in Croatian; it might have the disadvantage of making readers think (until they clicked through to the main entry) that kolovoza was a Croatian genitive of a pan-Serbo-Croatian word and Serbian used a different genitive, like *kolovozu. Note that I say allow, not necessarily require (uniformity is good, but the work could be left to the editors who wanted to do it). - -sche (discuss) 07:55, 18 August 2011 (UTC)Reply

Dunno, as an analogy I dislike something like:

==English==

===Noun===
'''favors'''

# {{US}} {{plural of|favor}}

As my initial reaction reading this is 'what is the non-US plural of favor'? -Mglovesfun (talk) 10:17, 18 August 2011 (UTC)Reply

Inflected form entries are just glorified redirects, not mirrors of lemma entries. We should limit such additional content to cases for which the inflection has a context different from that of the lemma. Tne Serbo-Croatian aspect of this is a result of the vote on that matter. I'm surprised we haven't gotten more pushback on that vote. DCDuring TALK 12:26, 18 August 2011 (UTC)Reply

I agree with Mglovesfun that this part of your comment: "it might [… make] readers think […] that kolovoza was a Croatian genitive of a pan-Serbo-Croatian word and Serbian used a different genitive, like *kolovozu" hits it exactly on the nose. Form-ofs shouldn't duplicate lemmata's context-tags. That said, what if (deprecated template usage) kolovoza really were a Croatian-specific genitive of a pan–Serbo-Croatian word? What about rare/archaic/dialectal plurals of ordinary English nouns? I think context-tags are potentially useful in those cases. [After e/c: this also seems to be what DCDuring is saying.] —Ruakh_TALK 12:40, 18 August 2011 (UTC)Reply

I agree with this, I can find one specific example:

====Verb====
'''spelt'''

# {{chiefly|British}} {{past of|[[spell#Verb|spell]]}}

This to me seems to be correct. Mglovesfun (talk) 19:22, 18 August 2011 (UTC)Reply

In addition to that type, with an explicit context tag, we have entries like [[boyz]], which uses {{form of}} with the in-template equivalent. Arguably it should have a better register indication than "informal". DCDuring TALK 20:05, 18 August 2011 (UTC)Reply

Pinball category?

Latest comment: 13 years ago3 comments3 people in discussion

What up homies. I've been adding some pinball terms. I don't know whether it merits a category, and don't know or care enough about categories to create one, but if anyone is so inclined then the following terms might possibly qualify: autoplunger, backbox, backglass, flipper, flipperless, gobble hole, kickback, knocker, multiball, rollunder, rollover, rolldown, outhole, outlane, pinballer, playboard, plunger, silver ball, sinkhole. Equinox ◑ 21:46, 18 August 2011 (UTC)Reply

I don't see why not. Mglovesfun (talk) 08:13, 19 August 2011 (UTC)Reply

Did you miss digit counter (it's in "Pinball Wizard"). SemperBlotto 08:19, 19 August 2011 (UTC)Reply

Latest comment: 13 years ago3 comments2 people in discussion

Could I please continue adding etymology to verb forms ending in -t‽ --Pilcrow 22:40, 18 August 2011 (UTC)Reply

I think you'd better avoid adding "===Etymology=== {{suffix|verb|t}}" to the likes of "dreamt", and avoid adding "===Etymology=== {{temp|suffix|verb|ed}}" to the likes of "dreamed". In general, I think verb forms should better have no etymology section, with exceptions in those cases where the etymology is unusual and of special interest.

For the record, there exist the following categories:

Category:English words suffixed with -ed - 342 members
Category:English words suffixed with -t - 21 members

--Dan Polansky 10:14, 20 August 2011 (UTC)Reply

But the -t suffix is irregular. If etymology sections should not be included, could I at least categorize those forms‽ --Pilcrow 16:40, 20 August 2011 (UTC)Reply

hand and 手

Latest comment: 13 years ago6 comments3 people in discussion

Note that the hand picture links for these Dutch, Swedish and Mandarin entries these point to #English in the links - how to fix this? ---> Tooironic 01:15, 21 August 2011 (UTC) Also note that Category:Visual_dictionary contains both English and LOTE entries - is it supposed to mixed up like that? ---> Tooironic 01:20, 21 August 2011 (UTC)Reply

{{picdiclabel}} has a language parameter (lang). Mglovesfun (talk) 10:09, 21 August 2011 (UTC)Reply

How do you use it? I'm a newbie about these kind of things. ---> Tooironic 12:46, 22 August 2011 (UTC)Reply

Like this, as you can see it took me two goes to get it right. Mglovesfun (talk) 12:48, 22 August 2011 (UTC)Reply

Awesome. Thanks. How about the Dutch and Swedish at hand? ---> Tooironic 21:34, 23 August 2011 (UTC)Reply

Was just about to do this for you, but Mglovesfun has beaten me to it! It's fixed now anyway. BigDom 22:32, 23 August 2011 (UTC)Reply

Preferred forms for Japanese lemmata

Latest comment: 13 years ago13 comments4 people in discussion

Haplology and I wound up conversing a bit on the subject of lemma forms for Japanese, as relating to the keiyōdōshi part of speech (also known as "quasi-adjectives", and better known among Japanese learners as "な (na) adjectives"). So far, every Japanese dictionary that I've ever seen uses the uninflected base form of a な adjective as the headword -- except Wiktionary. For reasons lost in the mists of history, Wiktionary alone uses an inflected form of な adjectives as the headword, by including the な on the end. This causes some odd inconsistencies, such as the base uninflected forms being mostly just stub entries, sometimes being missing, sometimes being redirects to the inflected forms with な, and also to the base forms sometimes being classified as nouns (which is never correct AFAICT).

I was under the general impression that, while Wiktionary happily includes inflected forms of a word, the main entry should be under the uninflected form, with the inflected forms mostly just pointing to the main entry. (I cannot find an explicit description of this policy, neither at WT:ELE nor at WT:AJA; perhaps this could be added?) This holds true at least for English, German, Spanish, Latin, Korean, and Navajo terms, and for Japanese verbs and い (i) adjectives, to the best of my knowledge. If this understanding is correct, would anyone object to Japanese editors keeping the main entries for な adjectives under the uninflected, な-less base forms? -- Eiríkr Útlendi | Tala við mig 16:14, 23 August 2011 (UTC)Reply

I second this. It's easier as well, otherwise you would need to keep both entries - with な and without it. The same is true for の (-no) adjectives. You can add an adjective section to existing noun entries, e.g. Template:Jpan. --Anatoli 20:28, 23 August 2011 (UTC)Reply

@Anatoli: Thanks for replying.

Some additional questions / considerations:

There are lots of entries with [Japanese word] + な, or [Japanese word] + に. By everything I've read (not just on WT), these な and に are essentially particles, which makes these entries just sum-of-parts and thus not meeting WT:Criteria for inclusion. I tried to explain some of how this area of Japanese grammar works, and why this means such entries are SOP and thus not valid, over at WT:RFD#親切に. The way Japanese keiyōdōshi work makes this even more important, in that keiyōdōshi are *both* adjectives and adverbs at the same time; in kanji compounds, there is no distinction between adjectival or adverbial senses, and in spoken or running text, the distinction is made by using either the な (or の for those rarer の-type keiyōdōshi) or the に particles -- i.e., by adding a separate word.

I propose that keiyōdōshi entry content be kept under the main keiyōdōshi headword, without any particles. Adjective/adverb senses can be shown by using Template:ja-na or similar, as is currently the case over at 特別. I further propose that the keiyōdōshi + particle entries be deleted, as these are sum-of-parts and are thus no more worthy of inclusion than English phrases like an apple or to the store.

However -- since keiyōdōshi are equally adjectives and adverbs, the WT:AJA#Quasi-adjectives (形容動詞) recommendation to use level-three or -four Adjective headings seems inadequate, as this ignores adverbial senses. So:

Should we include both Adjective and Adverb headings for keiyōdōshi?
Should we instead use some other heading, such as Keiyōdōshi, Quasi-adjective, or something else?
Should we use just the Adjective heading, and 1) add something to Appendix:Japanese_glossary about this? 2) create Appendix:Japanese_grammar? 3) refer users to w:Japanese grammar?

TIA for your input, -- Eiríkr Útlendi | Tala við mig 18:34, 25 August 2011 (UTC)Reply

I would go with including both Adjective and Adverb headings for keiyōdōshi. While clear to people familiar with Japanese grammar, other headings would be confusing to most readers. If both Adjective and Adverb are included next to each other, it should be clear from the juxtaposition that the word is both parts of speech at the same time. Such a system would also be easier for new contributors to pick up. In addition, it would be the most aesthetically pleasing, in my opinion.

~~In any case I agree that pages with -na or -ni should be deleted and the content moved to the headword without -na or -ni. Haplology 15:36, 27 August 2011 (UTC)~~Reply

I've changed my mind. In line with Eirikr, I conclude that na-adjectives or keiyōdōshi should be listed under the header "Nominal" in their uninflected form, that is, without a "na" or "ni". For example, おぼろ. Haplology 17:48, 29 October 2011 (UTC)Reply

I really don’t understand why English Wiktionary uses forms with な as entries. At least they should end with だ. First of all, there are two conflicting analyses of what they call na-adjectives:

	Keiyōdōshi	Adjectival noun
科学的	Stem (not a word)	Noun
科学的だ	Base form	Noun + copula (two words)
科学的な	Inflected form (rentaikei)	Noun + copula (two words)

In the traditional Japanese grammar with keiyōdōshi, 科学的 is not a word but a stem, but it is against my native speaker’s intuition. It is a noun, even though its behavior is different from common nouns. If a Japanese pupil doesn’t know the meaning of 科学的な, he or she will quite naturally look for 科学的 in a dictionary, not 科学的だ or 科学的な. Why don’t we have a small table to show what they call keiyōdōshi and some inflected forms under the noun entry like the following?

Noun	科学的
Predicative	科学的だ
Attributive	科学的な
Adverbial	科学的に

It is clearer, and relatively free from the academic disagreement. — TAKASUGI Shinji (talk) 12:30, 30 October 2011 (UTC)Reply

I think I agree and so does Eirikr, as I understand, at least mostly. I'm just not sure I understand your proposal exactly, so just to confirm:

Which parts-of-speech headers to have and what to call them? Do you mean one header, Noun, with Attributive, Adverbial, and Predicative forms in a table inside Noun? Is that better than using Nominal instead of Noun?
Table: The current {{ja-na}} template produces a table which has all of those forms and others, except it has Predicative for [[科学的だ] instead of Terminal. How about we use that table?
Page title: In your example, is the page 科学的?

In short, how would you change 素直?

Thanks Haplology 16:23, 30 October 2011 (UTC)Reply

I didn’t know the nominal header. However, 素直 is a noun functioning like an adjective, rather than a word of another class functioning as a noun. It is not uncommon to say 素直が一番だ instead of the standard 素直なのが一番だ ("Being honest is the best thing"). If we don’t have an adjectival noun header, I believe it should be a noun, even though that is not traditional. The use of {{ja-na}} will be fine. — TAKASUGI Shinji (talk) 18:05, 30 October 2011 (UTC)Reply

Thank you for chiming in, Takasugi-san. I must admit I shy away from using the ===Noun=== header, precisely because 1) these do not function as regular nouns, as you point out, and 2) a number of these words cannot be used grammatically as nouns. For instance, I don't believe you can use the -的 (-teki) words as the subject of a sentence in a purely noun sense. A quick Google search for google:"科学的は" gives tons of hits, but most with punctuation between the 的 and the は; those few instances without punctuation instead show 科学的は as shorthand for 「科学的」とは or some similarly elided construction. This is not unlike English utterances such as "'decidedly' is an interesting word", where a part of speech that is not a noun is used as the noun subject of a sentence. One such Japanese example is [ here] on an Amazon book listing page:

科学的はどんな推論をし、アプローチするのか?

A quick translation of this might be:

Scientific is what kind of reasoning or approach? (direct translation to match the source language)

科学的 (scientific) is clearly being used as the topic/subject of this sentence, but I would argue that it is still adjectival in nature. In the same mien, verbs and verb phrases can be used as subjects, as in 覚えておくがいい (it would be good to remember), but these are still verbs.

After mulling for some time on what would be the best header for this POS, I find my thoughts keep coming back to the simple label na- adjective. My reasons:

As noted above, these are not straight nouns, and labeling them ===Noun=== would cause no end of confusion.
===Nominal=== works to some extent, except I find that this is seldom used, and further thought leads me to suspect that few WT users would know what this is. Ditto for Nominal adjective, Copular adjective, Descriptive noun, Adjectival noun. And, FWIW, I've only ever seen the label Quasi-adjective here on WT.
The term na- adjective is widely used in Japanese textbooks for the English speaker, whereas I have only seen other labels used by linguists in dense academic writing that most WT users probably won't have read. This label makes it clear that this is an adjective, but also that it's somehow different from being just an adjective.

I poked around the Wiktionary talk:About Japanese page, but I couldn't find anything that looked like a real discussion of the proper labels for this part of speech. Since it looks like everyone (so far, at least) is happy to locate the lemmata for these under the stem form (i.e. without any trailing -na or -da), it looks like this particular Preferred forms for Japanese lemmata issue is closed. I will start a new thread momentarily about what to call 形容動詞 (keiyōdōshi) in English for purposes of the POS label. -- Cheers, Eiríkr Útlendi │ Tala við mig 16:15, 31 October 2011 (UTC)Reply

Yes, we seem to have agreed on the issue of lemmata. Just one thing to comment: がいい is rather a suffix today and you cannot have a pause between が and いい, and you cannot replace いい with another adjective. Its meaning is also slightly different from といい or のがいい.

覚えておくと、いい。
覚えておくのが、いい。
*覚えておくが、いい。 (ungrammatical: がいい cannot be split)
おそらく覚えておくといい。
おそらく覚えておくのがいい。
*おそらく覚えておくがいい。 (ungrammatical: がいい cannot coexist with some modal adverb)
私が行くといいだろう。
私が行くのがいいだろう。
*私が行くがいいだろう。 (ungrammatical: がいい cannot be used for the first person)

— TAKASUGI Shinji (talk) 16:02, 1 November 2011 (UTC)Reply

Excellent note, Takasugi-san, thank you very much for that analysis. 勉強になります。 We should probably have a がいい page then, as well as the alternates がよい and が良い, as this usage is clearly idiomatic. (And congratulations as well on your adminship!) -- Eiríkr Útlendi │ Tala við mig 21:39, 1 November 2011 (UTC)Reply

I updated WT:AJA slightly to reflect the decision not to have -na in headwords, namely by removing -na from the example provided under the section Quasi-adjectives. At the same time I deleted a few of my own creations, 無力な, むりょくな, and muryoku na, citing this discussion. This is just to double check that (a) that was right to do and (b) we should delete the other pages with -な, because they are SOP and therefore fail CFI? I want to make sure because there are a whole lot of them.

By the way, @Takasugi-san, congratulations on becoming an admin! Actually I didn't know about the vote until today. Haplology 16:34, 1 November 2011 (UTC)Reply

Hmm, given the stated goal of "all words in all languages", and the way that other languages happily include inflected forms such as (deprecated template usage) aß or (deprecated template usage) hablo or even 見た, perhaps we should keep the entries ending in -na? If so, these entries should include just a brief description that the -na forms are the prepended adjectival inflections, as and otherwise act as stubs pointing to the main entries without the -na. Similarly for entries ending in -ni, including a brief description that these forms are the adverbial inflections and linking through to the lemmata. -- Eiríkr Útlendi │ Tala við mig 21:39, 1 November 2011 (UTC)Reply

quantitative easing

Latest comment: 13 years ago6 comments4 people in discussion

How to remove "Lithuanian nouns lacking gender"? ---> Tooironic 21:40, 23 August 2011 (UTC) Or is that supposed to be there...? :S ---> Tooironic 21:41, 23 August 2011 (UTC)Reply

Someone has to add the gender info into the entries, then the number will be reduced. --Anatoli 22:20, 23 August 2011 (UTC)Reply

Does {{g}} belong in translations? DCDuring TALK 22:22, 23 August 2011 (UTC)Reply

Yes, the gender was lacking from the translation. I've added a pos parameter to {{g}}, so in theory at least you could do {{g|lt|pos=translations}} and move it to Category:Lithuanian translations lacking gender. Mglovesfun (talk) 22:29, 23 August 2011 (UTC)Reply

I thought gender was supposed to be a named parameter of {{t}}. DCDuring TALK 22:44, 23 August 2011 (UTC)Reply

Unnamed actually, the third parameter (like {{t|fr|foo|f}}). But when there is no gender, users can choose to add {{g|fr}} or {{g|French}}). This doesn't cause any problems per se, but could apply to probably every English entry with a translation table. I tend to add genders when they're missing, but as long as the gender is in the target (the translation) I say there's no need to worry about it. Mglovesfun (talk) 10:28, 24 August 2011 (UTC)Reply

Adidas

Latest comment: 13 years ago7 comments3 people in discussion

Been thinking about this and seeing this specific entry. I've decided to use it as an example. WT:CFI line one says "As an international dictionary, Wiktionary is intended to include “all words in all languages”." Is this a word in a language? WT:CFI makes no attempt to define word or language in this specific context. Normally I'd be happy to consider this a word, for our purposes it might be better to consider commercial coinages like this to be nonwords. Furthermore, I wouldn't consider this English, but rather Translingual. For example on a bottle Palmolive of shower gel I had, the translations into Russian and Greek (as well as all the other languages) used the word Palmolive in the Latin script. So, one possible to the issue of brand names is to not consider them words in any language for CFI purposes. Mglovesfun (talk) 22:19, 23 August 2011 (UTC)Reply

Bear in mind that a lot of brand names are actually translated, though. I've seen some ingenious cases: if you get an Arabic or Georgian bottle of Coca-Cola (yeah, my paper shop gets the bottles that "fell off a lorry"), it has almost the same logo, but reworked slightly so as to write the name in the appropriate script. I suppose that makes it a different word, even if it's only a transliteration. Equinox ◑ 22:21, 23 August 2011 (UTC)Reply

Some of the best things we could do, IMHO, is remove the section for brand names from CFI, and include brand names whenever they are single-word and attestable. Brand names do not create any problems; they are just disliked as uncustomary for a dictionary, in spite of the fact that useful lexicographical information can be recorded on them, including pronunciation and etymology. You seem to be proposing the very opposite: to exclude all brand names. We can argue whether brand names are words, but fact is they have many properties typical of words: they get pronounced, they get printed, they take positions in sentences, they serve as a basis of derivation (there is the Czech word "adidasky" derived from "Adidas"), they have an etymology, etc. --Dan Polansky 09:03, 24 August 2011 (UTC)Reply

That's another approach, used by the French WT:CFI. It can get a bit silly doing it that way; names of films and books and TV series and whatnot. Mglovesfun (talk) 10:28, 24 August 2011 (UTC)Reply

You don't need to include all multi-word names of works ("Much Ado About Nothing") in order to include all attestable single-word brand names. By contrast, "Lysistrata", a play by Aristophanes, should IMHO be included, if only for its pronunciation--Wikipedia even has different UK and US pronunciations. --Dan Polansky 12:36, 24 August 2011 (UTC)Reply

FWIW, regarding specifically Talk:Adidas, was the RFD ever closed? Because it looks like a fail with two people wanting to delete it, and none wanting to keep it. Mglovesfun (talk) 18:28, 25 August 2011 (UTC)Reply

From Talk:Adidas and the archived discussion, it follows that a RFD started on 16 September 2007. There, two people wanted to delete the page--Connel MacKenzie (who claimed that it was "spam"[2], having tagged the entry in this revision, which had the non-promotional definitions "The German sports apparel manufacturer adidas AG, formally founded in 1949" and "A clothing product of this brand, especially a pair of shoes") and Williamsayers79, while two people were sympathetic with the entry even if stating no boldfaced "keep": DAVilla, and bd2412. Connel was utterly anti-brand, as follows from the vote linked to below: "There is no reason to include any brand name, product name or trademark in a dictionary [...] --Connel MacKenzie 17:41, 31 August 2007 (UTC)". In case of doubt, you may send "Adidas" to a new RFD (for which I vote keep) or to RFV via WT:BRAND, but this is the sort of entry that is likely to meet the current strict requirements of WT:BRAND. The 2007 RFD on "Adidas" was running in parallel with the second vote on brand names, which was running from 5 September 2007 to 5 October 2007 (Wiktionary:Votes/pl-2007-08/Brand_names_of_products_2), a vote that had bearing on whether "Adidas" met CFI. --Dan Polansky 08:23, 26 August 2011 (UTC)Reply

"Category:en:Planets" with proper nouns only, etc.

Latest comment: 13 years ago6 comments4 people in discussion

Since Wiktionary:Votes/2011-07/Categories of names failed and our categories for names of things are named with language codes, I suggest letting these categories be populated only with proper nouns.

This means:

removing ice giant and extrasolar planet from Category:en:Planets;
removing pulsar and red giant from Category:en:Stars;
and so on.

This is already the common practice concerning most of these categories. Compare:

desert is not a member of Category:en:Deserts;
skerry is not a member of Category:en:Islands;
tributary is not a member of Category:en:Rivers;
planetary is not a member of Category:en:Planets;
French is not a member of Category:en:Countries (or of Category:en:Countries of Europe, for that matter).

Thoughts? --Daniel 00:25, 25 August 2011 (UTC)Reply

I think I intuitively approve of this. Disregarding my dislike for Category:Fictional characters, I would rather it only contained actual characters and not things like (deprecated template usage) protagonist and (deprecated template usage) soubrette. Equinox ◑ 00:36, 25 August 2011 (UTC)Reply

I think it could be useful to group words relating to (for example) rivers, though, like river, tributary, etc. I would not mind using appendices for that, though, or very long ===See also=== sections, or (perhaps the best option) linking to appendices from ===See also=== sections. I have no very strong feelings/interest in the matter. - -sche (discuss) 00:48, 25 August 2011 (UTC)Reply

Category:en:Rivers would be a terrible place to look for tributary, because that word would be effectively hidden among a long list of names of rivers. --Daniel 01:19, 25 August 2011 (UTC)Reply

I agree with Daniel. The terms "river", "tributary" and the like are also found in Wikisaurus:watercourse; while this is done outside of the category system, it is at least a workaround for those who prefer categories. --Dan Polansky 10:30, 25 August 2011 (UTC)Reply

OK. I created this. --Daniel 23:42, 26 August 2011 (UTC)Reply

Fancy button in rhyme pages

Latest comment: 13 years ago7 comments4 people in discussion

When editing rhyme pages such as Rhymes:English:-eɪm, I see a row that says "Add new rhyme:" followed by an input field. I find this pretty annoying and would like to see this disabled at least for me. How can I disable it?

What this button does is add an item to a wikilist. A person who cannot add an item to a wikilist should not edit a wiki, IMHO. --Dan Polansky 10:23, 25 August 2011 (UTC)Reply

I don't think it's for people who can't edit a list; the tool just makes it quicker and easier. No idea how to remove it, although I'm sure someone will be able to help you there. BigDom 10:32, 25 August 2011 (UTC)Reply

In the gadgets section of Special:Preferences, there's an option to "Disable the rhymes editor". --Yair rand 16:55, 25 August 2011 (UTC)Reply

(BTW, it doesn't only add it to the wikilist, it also adds the {{rhymes}} template to the rhyme's entry, and a pronunciation section if it doesn't already have one.) --Yair rand 17:05, 25 August 2011 (UTC)Reply

Thank you. Pretty straightforward; I should have looked in Special:Preferences myself. --Dan Polansky 07:58, 26 August 2011 (UTC)Reply

Unfortunately, it adds the "Rhymes" to the top of the Pronunciation section instead of to the bottom, and this is never correct. It also means that random vandalism or erroneous additions to Rhymes pages require additional cleanup, since users no longer have to open the page code and see the warning about stress on the correct syllable. --EncycloPetey 17:34, 28 August 2011 (UTC)Reply

I've changed the script so that it adds rhymes to the bottom of the pronunciation section. --Yair rand 21:52, 31 August 2011 (UTC)Reply

Removing words from Wiktionary:Wanted entries

Latest comment: 13 years ago9 comments6 people in discussion

I recently removed yhe from Wiktionary:Wanted entries as I believe that all Google book hits are either OCR errors or just typographical variants (for (deprecated template usage) the). Someone else reinstated it because "it might be a word in some language". Have we got a policy for such actions? SemperBlotto 14:02, 26 August 2011 (UTC)Reply

Why do we even have the page? Is it an inheritance from WP? It is really not much help to have a term "wanted" without a language specified. We have the whole family of requested entries by language. DCDuring TALK 15:24, 26 August 2011 (UTC)Reply

Would it be unreasonable to treat it as a cleanup page until emptied and discourage or "forbid" additions to the page? For example, we could restrict changes to admins and have blue links deleted daily or weekly. DCDuring TALK 15:36, 26 August 2011 (UTC)Reply

The only stumbling block I can think of is the way the top of the WT:Wanted entries list shows up at the top of a user's Watchlist. If there could be some way of allowing users to specify a language's "wanted" list (or maybe for multiple languages?) to display on the Watchlist, then I'd be all for DCDuring's proposal here. -- Eiríkr Útlendi | Tala við mig 15:45, 26 August 2011 (UTC)Reply

It actually says to check Special:WhatLinksHere before removing terms from the list. In some cases, all the incoming links are from outside the main namespace, often user pages and user talk pages. Such ones can be removed; any link in the main namespace should be checked for validity, such as typos. --Mglovesfun (talk) 15:59, 26 August 2011 (UTC)Reply

Couldn't a bot take care of the Special:WhatLinksHere check, then? The list is large enough, that would seem to make more sense than going through manually. -- Eiríkr Útlendi | Tala við mig 17:23, 26 August 2011 (UTC)Reply

Not really a Wiktionary bot, no, since no edits would be involved. A java script might be able to do it. But I'm not the person that can tell you about that. Mglovesfun (talk) 23:21, 27 August 2011 (UTC)Reply

I agree with DCDuring; the language-specific pages should be used (including Wiktionary:Requested entries:Unknown language). - -sche (discuss) 19:05, 26 August 2011 (UTC)Reply

I agree with removal since your work should be reflected somehow. Leaving all the terms there in perpetuity is not going to progress us toward our goal. If the language is unknown then it's even more useless. DAVilla 22:16, 27 August 2011 (UTC)Reply

WOTD

Latest comment: 13 years ago6 comments5 people in discussion

I was keeping this updated this fairly well for the last few months, but RL work situation has made it impossible for the moment. In a couple of weeks when I'm back from Libya I am happy to crack on, otherwise if anyone else wants to update them then feel free. Ƿidsiþ 08:16, 27 August 2011 (UTC)Reply

Stay safe! Mglovesfun (talk) 10:04, 27 August 2011 (UTC)Reply

Yes, stay safe! I have set words and changed the templates for the 28th of August through the 3rd of September; that should give other editors (or me) time to set the rest of September. I have also set the 1st, 2nd, 4th, 5th, and 7th of October. (I think we should pick a word derived from German, or having to do with unity, for the 3rd of October, Germany's Day of Unity; it would be topical.) - -sche (discuss) 08:56, 28 August 2011 (UTC)Reply

Thanks! I'm not familiar with all the details of word-of-the-day, but I do have a few tips from what I've observed:

The WOTDers try not to re-use words of the day. Therefore:
- Before setting something as word of the day, they check the upper-right-hand-corner of the entry to make sure it wasn't used before. According to [[pareidolia]], that word was already word of the day (17 February 2011).
- Conversely, when setting something as word of the day, they add {{was wotd}} to the entry (à la [[pareidolia]]) so that future editors can see that it has already been used (or is about to be used).
When setting words of the day, they list them at (e.g.) [[Wiktionary:Word of the day/Archive/2011/August]], so other editors can see what they are. (There are a few editors who keep watch for upcoming WOTDs and do last-minute cleanup.)

—Ruakh_TALK 13:57, 28 August 2011 (UTC)Reply

That pareidolia appeared twice is evidence of the conspiracy! (Wait, that's not an example of pareidolia, that's an example of paranoia.)

Ok, I've checked all of the other August, September, and October words I set, they all look new (no WOTD links at the tops of the pages or in Whatlinkshere); thanks for pointing that out! I've also added {{was wotd}} to the words. Thanks for adding August to the archive; I've started a September archive. - -sche (discuss) 18:24, 28 August 2011 (UTC)Reply

Will someone be selecting words for September? I notice that "-sche" has selected some for October, but most of September has not been set as far as I know. I would be willing to do November, to give Widsith a break (if needed; I needed the same from time to time when I was selecting them), but I don't think I'd have time right now. I even went most of this last week without logging on much because of duties in the physical world. --EncycloPetey 17:24, 28 August 2011 (UTC)Reply

Template:cmn-pinyin

Latest comment: 13 years ago1 comment1 person in discussion

Am hoping this will solve a problem or two; see my comment at Wiktionary talk:Votes/2011-07/Pinyin entries#Romanization. Mglovesfun (talk) 10:01, 27 August 2011 (UTC)Reply

Languages written in more than one script, attestation

Latest comment: 13 years ago11 comments5 people in discussion

Is it a good policy or not to require that for a language using more than one script, each script form of a word (term, idiom, etc.) should be attested for it to be included? For example, do we expect верс to be attested separate with three citations from vers, or will three citations for both forms, Cyrillic and Latin do? Mglovesfun (talk) 10:24, 27 August 2011 (UTC)Reply

I think it varies based on language. For something like Serbian, where the scripts have a one-to-one correspondence, I think demanding that each script be separately attested is pointless and bureaucratic. For other languages, that don't have a simple transliteration between scripts, it's probably necessary to attest each separately.--Prosfilaes 10:41, 27 August 2011 (UTC)Reply

Or languages that have a dominant script and a rare one. I seem to think Tatar can be attested in the Latin and Arabic scripts, but is predominantly written in Cyrillic. Mglovesfun (talk) 11:06, 27 August 2011 (UTC)Reply

And for some like Japanese, where a word might be usually spelled in kanji, for instance, but hiragana and romaji (i.e. Latin alphabet) words are added as well to aid learners.

Attestation concerns aside, given the kerfuffle about pinyin entries for Chinese and the discovery from that discussion that searching for pinyin finds hanzi entries just fine, I find myself asking -- do we really need Japanese headwords in kana and romaji? If we do, then wouldn't we also need pinyin headwords? What's the distinction? -- Eiríkr Útlendi | Tala við mig 18:33, 27 August 2011 (UTC)Reply

We allow pinyin as headword and have done for some time, though don't ask me when the first one was created (and not deleted). Mglovesfun (talk) 21:32, 27 August 2011 (UTC)Reply

I'm a bit invested in the romaji and kana pages so I'm biased, but at least they do serve to replace a Homophones section. Having one canonical kana page eliminated duplication or missed entries in Homophone sections scattered across many pages, and a single, comprehensive list of homophones might serve some benefit to learners of Japanese. Putting romaji pages in topical categories increases the duplication of entries but makes it easy for learners to read a group of related terms at a glance, which is helpful since learning groups of related terms at once is the best way to learn a foreign language. Having romaji terms in topical categories would allow people with no knowledge of Japanese to learn a handful of terms in a few seconds. There are arguments for and against, but I favor having them. Haplogy 04:16, 1 September 2011 (UTC)Reply

The disambiguation role of romaji and pinyin is just fine. The trouble is when all the contents that should be in kanji/hanzi entries goes into pages only serving learners to cope with the complex script and find a proper word. It also seems that some people have an agenda of promoting non-standard writing. The proper native script should not be replaced with the romanisation.

On Serbo-Croatian (or Serbian alone). One-to-one conversion is only 99% ok. Care should be taken on borrowed words, as Roman script often uses the orginal spelling and some letters combinations have variants. (I'm leaving the differences in dialects Ekavian/Ijekavian).

On Tatar, Belarusian. Some nationalistically minded users created a bunch of entries in Roman, especially in Tatar. Tatar (but not Crimean Tatar, it's a different language) is officially written in Cyrillic, so is the majority of the online and printed material in Tatar. You may think I am biased but Azeri, Turkmen and Uzbek are now officially in Roman, even though they were written in Cyrillic, the entries should be primarily created and have translations in Roman. Failure to do so confuses the readers.

We are on the way to create pinyin policy. Perhaps we should address some other languages, written in multiple scripts. There is no definiteness for some (e.g. Konkani in India) but we could have a guide for patrollers to use. --Anatoli 04:35, 1 September 2011 (UTC)Reply

The orginal question on верс. Yes, a correct Serbo-Croatian word and a correct conversion from Roman. --Anatoli 04:41, 1 September 2011 (UTC)Reply

In response partly to Haplology and partly to Anatoli, my view on Latin alphabet (and kana) entries for languages that generally use another script is to view them a bit like disambig pages. For Japanese in particular, romaji and kana pages may include multiple possible kanji, making romaji or kana pages very much indeed like disambig pages. As such, entries in non-standard scripts should probably do no more than provide a bulleted list of the main terms, with brief glosses to help users make the correct selection. -- Eiríkr Útlendi | Tala við mig 02:23, 2 September 2011 (UTC)Reply

This is in line with the current pinyin vote. --Anatoli 02:33, 2 September 2011 (UTC)Reply

So to sum up, trying to stay global rather than comment on specific cases, it depends on the language. Different languages should get different treatments. How am I doing? Mglovesfun (talk) 07:01, 2 September 2011 (UTC)Reply

Template:ante and Template:post

Latest comment: 13 years ago9 comments4 people in discussion

Why do {{ante}} and {{post}} abbreviate their output to a. and p.? It saves two whole characters on each of them, and makes them quite a bit more opaque in meaning.--Prosfilaes 22:07, 27 August 2011 (UTC) IFYPFY.—msh210℠ (talk) 04:59, 28 August 2011 (UTC)Reply

I would support changing them so that they do not abbreviate. - -sche (discuss) 23:01, 29 August 2011 (UTC)Reply

I think the usual meanings are (and our glossary says the meaning are) "not after" and "not before" rather than "before" and "after". Thus, a quotation dated a. 1924 might be from 1924. So whatever this conversation decides, it should not be to change the displays to "before" and "after" (unless someone goes through every single time the templates are used and edits the dates!). "Ante" and "post" have a similar problem (as people know what they mean); so do "a." and "p.", but not as badly. But maybe "ante" and "post" don't have it badly enough to worry about: I don't know.—msh210℠ (talk) 01:49, 30 August 2011 (UTC)Reply

Surely we shouldn't justify having an abbreviation because we have a weird definition and that makes it more opaque.--Prosfilaes 02:09, 30 August 2011 (UTC)Reply

Msh210: I understood this as a request to change "a." to "ante" and "p." to "post". But are you saying "a." means something different than "ante"? (Are you saying "a." means "no later than" but "ante" means "before"?) If so, I echo Prosfilaes' comment. If not, the expansion will not cause a semantic problem. PS, note that while Wiktionary:Glossary does not define "a." or "ante", it defines "p." as "post or after, often used in quotations", which disagrees with Appendix:Glossary... one or the other should be corrected. - -sche (discuss) 21:12, 31 August 2011 (UTC)Reply

To address your last point first, Wiktionary:Glossary shouldn't define either, as they're not used in discussions here (and, anyway, are in Appendix:Glossary). To your first point: Ante and a. both mean (translate as) "before", and post and p. "after". But we don't use them that way, so in our citations they don't mean that. That's a bad thing (unless it's the same as what other dictionaries do, in which case it's okay, I suppose). But if that's the way it is, then yes, essentially, a. means "not after", since (as it's an abbreviation) we can make it mean whatever we want: people might not know it comes from ante; otoh, ante, a Latin word, clearly means "before". So a. is better. Best of all, though, would be to change our system (again, provided it doesn't match other dictionaries'), which should be doable by a complicated bot. (It would need to look for post and ante, written in by hand or by template, and change the year by one unless the "year" is a century or the like (in which case leave it) and unless the citation had been added or edited since the decision was made to switch over (in which case tag it for human attention). Or something like that.)—msh210℠ (talk) 15:37, 1 September 2011 (UTC)Reply

I dunno about dictionaries specifically, but isn't the use of "ante [year]" to mean "in or before [year]" pretty normal? I mean, do you take “They were reviewed in this journal when they originally appeared (ante 1973), III, 103–4 and (1976) IV, 125–6” and “The projected growth rates of labour supply under ‘normal’ (that is, ante 1973) demand conditions in both countries are about the same as those prevailing since the mid-1960s” (both c/o b.g.c.) to mean strictly before 1973? —Ruakh_TALK 17:58, 1 September 2011 (UTC)Reply

I take it to mean "strictly before", yes. If I'm wrong as to the general intention of writers, or in the minority among readers, ignore my 2c, above.—msh210℠ (talk) 18:42, 1 September 2011 (UTC)Reply

I don't know. I understand it differently from how you do, but it really just might be me. (I'm pretty sure I'm the one who gave the current glosses at Appendix:Glossary.) Very relatedly — I'm reading a book called Semantic Antics, about various English words that have changed meanings in bizarre ways, and it frequently says that a certain word has a certain meaning (say) "before 1483". I've been taking that to mean "by 1483", since it seems strange to emphasize that a year after which the word is already known to have a certain sense, but again, maybe that's just me? —Ruakh_TALK 20:17, 1 September 2011 (UTC)Reply

Position of Template:was wotd

Latest comment: 13 years ago9 comments2 people in discussion

The current vertical position of {{was wotd}} is about level with the L1 page title, above all the language (L2) entries (no matter where it is included). This implies that the whole page was featured when it fact it was just the English entry. This has two problems. (1) About 500 pages where this template is used have entries below the English section for which this implication is false, and (2) eventually we might want to feature an English entry where there is a preceding Translingual entry, which would break the even looser implication that the template applies to the following/top entry. I therefore propose that this template float on the right-hand side (I think this is the only RHS one that doesn't) like others do. We can then move the template uses into the English entries, at least where there's possibility for confusion. Sound OK?--Bequw → τ 16:08, 28 August 2011 (UTC)Reply

Well, we've only ever chosen English words, so the initial decision was to place the template as far up the page and out of the way as possible, so that it would not overlap any page content. In practice, this varies a bit by browser.

Placing it in the English section will not be any less misleading. We only ever feature one part of speech, and English words often have more than one part of speech, so I don't see that moving the template would solve any actual problem. --EncycloPetey 17:24, 28 August 2011 (UTC)Reply

I know historically there were some "overlapping" layout problems, but our current floating RHS content doesn't suffer from any that I know of. And actually, the current position overlaps with the section-0 (page header) [edit] link that can be added with JS (originally from an en.wiki gadget). I find it quite useful and at least one other person uses this. As for the change being less misleading, the template could be placed in the actual part of speech that was featured. It'd be hard to see how narrowing the indication from the whole page down to a language's part of speech isn't more accurate. The move also helps with the logic/consistency that language content should be in their language sections. I can't think of any other language content template that isn't in its own section. Does the current position help the WOTD maintainers? If so we can have the default layout be "float"ing and then have a some WT:PREFS code to move to it's current, absolute position. --Bequw → τ 14:58, 29 August 2011 (UTC)Reply

Well, it helped me during the years I ran it. It was easiest to spot at the top of the page, rather than having to look around in (possibly) several places, where it might be hidden by images or wikipedia-link boxes. --EncycloPetey 05:11, 4 September 2011 (UTC)Reply

The move you have made to the new position has resulted in a serious problem. The text that the template is supposed to display is no longer visible in many of the entries. Please correct the problem so that the text is visible, or please revert the change in position. --EncycloPetey 18:11, 5 September 2011 (UTC)Reply

That's odd. In none of my browsers (Chrome, FF, IE 9 on Vista) has the display changed on entries where I've moved the template position into the English entry (eg putrescible). The CSS positioning is "absolute" so it shouldn't change (and there shouldn't be any difficult cache issues since I didn't change anything else). What's your setup? Does it happen when you're logged out? What if you view it using the Monobook skin (I assume you use Vector)? Does anyone else have this problem? --Bequw → τ 01:22, 8 September 2011 (UTC)Reply

It's fine on my Mac at home using Safari, but not in IE (Windows) at work. I'm not sure which old version we're using, but it's school software and can't be changed. If the template is going to display at the top of the page, then I don't understand how it will help anyone to position the coding for the template inside a language section. That will just confuse future editors. --EncycloPetey 02:57, 10 September 2011 (UTC)Reply

Checking pages against IE 5.5, 6, 7, and 8 I couldn't find any oddities with WOTD. Maybe your school has a mixed-up configuration that isn't popular. I think it is best to make this template float by default and create WT:PREFS to return it to the original, top position for those that prefer it. This is partly why I moved into the English entries those WOTD invocations on pages with multiple entries. See the start of a broader cleanup at Wiktionary:Todo/Anomalous section0 content. --Bequw → τ 03:09, 11 September 2011 (UTC)Reply

I've made the template simply floatright like other RHS templates. It might look weird as caches catch up. If you prefer the old style, you can get the raw CSS from the documentation shown at {{was wotd}} or if you have the default (vector) skin you can get use WT:PREFS (look for "was-WOTD" in the bottom of the display section. --Bequw → τ 13:21, 18 September 2011 (UTC)Reply

Klategory?

Latest comment: 13 years ago1 comment1 person in discussion

There are loads of these Ku Klux Klan terms. The following might be eligible for a category (some used only within the organisation and others more widely): Kladd, Klankraft, Klaliff, Klokan, Klarogo, Klexter, Klokard, kludd, klavalier, Kloran, klonvocation, klecktoken, kligrapp, Kleagle, Klabee, Klansman, Klanswoman, kloncilium, klonklave, klansman, klan, Klan, Ku Klux Klan, antiklan, klavern, KKK, Klannish. Equinox ◑ 10:57, 31 August 2011 (UTC)Reply

September 2011

Idiomatic translations

Latest comment: 13 years ago14 comments9 people in discussion

I've been wondering for a while how to add translations that are not strictly idiomatic in English or in the target language, but for which the translation itself is idiomatic and not obvious. An example I came across was 'I have a nosebleed', which is translated more or less word for word into Dutch as 'Ik heb een bloedneus', but in Catalan it is translated as 'Em sagna el nas' - literally 'The (my) nose bleeds to me'. Any translations given for nosebleed are only useful for Dutch, but they would not cover the Catalan case at all. The literal translation of 'nosebleed' in Catalan is 'hemorragia nasal', which is not helpful in this case, and is not even idiomatic itself so it can't be included. Cases like this are quite common between languages, and it seems like a rather big gap in Wiktionary to leave it out... —CodeCa t 13:04, 1 September 2011 (UTC)Reply

People keep talking about a phrasebook, maybe this would be a good use for it. Fugyoo 14:02, 1 September 2011 (UTC)Reply

But there should be something directly under [[nosebleed]] too. In a paper English-Catalan dictionary you would expect to see something like "nosebleed - hemorragia nasal. I have a ~ : Em sagna el nas". So why not do something like that here: when an English term is best translated with a phrase in the target language, we give the phrase in addition to the straightforward noun=noun translation. —An gr 15:40, 1 September 2011 (UTC)Reply

I would agree with this way but there is some overlap with entries that are idiomatic, which have their own entries. We might end up with a situation where the entry give contains translations for 'give up', while give up has its own translations as well. We would need to be careful that translations are not duplicated like this. —CodeCa t 15:52, 1 September 2011 (UTC)Reply

Not addressing your question, which is general, but, rather, only the specific example: Do other symptoms not translate into Catalan similarly? Is "I have a headache" in Catalan not literally "The head hurts to me"? If so (and, not knowing any Catalan, I have no idea whether it's so), then I don't think we should include such translations in any entry at all: they belong in a grammar, perhaps, but are not relevant to any one word of the language.—msh210℠ (talk) 15:47, 1 September 2011 (UTC)Reply

We already have some grammar in Wiktionary's entries, and I don't think it's much of a problem if we include things like this. They are very useful to someone who wants to say 'I have a nosebleed' in Catalan and looks at the translation table, and then notices immediately that what he wants to say is said differently. It's very user friendly that way. —CodeCa t 15:52, 1 September 2011 (UTC)Reply

On a tangential note, google:"I have a nosebleed" seems much less common than google:"my nose bleeds" and google:"my nose is bleeding". "my nose bleeds" seems like a fairly good candidate for a phrasebook entry, one that can be linked to from "nosebleed". --Dan Polansky 16:05, 1 September 2011 (UTC)Reply

'My nose bleeds' seems very awkward to me. It sounds like you are saying it bleeds habitually rather than that it is bleeding right now. —CodeCa t 16:06, 1 September 2011 (UTC)Reply

Where I live in Northern England, people would say "my nose is bleeding" or possibly "I have a nosebleed" but never "my nose bleeds". I imagine most of the ghits for "my nose bleeds" would be part of phrases such as "my nose bleeds when..." or such like. BigDom 16:10, 1 September 2011 (UTC)Reply

google:"My head hurts" and google:"my stomach hurts" also seem very common inspite of not using the present continuous tense. Also check the two phrases in Google books to see how very common they are also there. --Dan Polansky 16:12, 1 September 2011 (UTC)Reply

This American agrees with Codecat and BigDom: my nose bleeds sounds like it does so habitually, not now. My X hurts OTOH means now. Go figure.—msh210℠ (talk) 16:28, 1 September 2011 (UTC)Reply

I agree. But to me it seems similar to the Americanism "Do you have" instead of "Have you got" (I was once asked "Do you have children?" I replied "Not very often." and she was very confused!) SemperBlotto 06:59, 2 September 2011 (UTC)Reply

@ CodeCat, we often just split the link, [[hemorragia]] [[nasal]]. Mglovesfun (talk) 07:08, 2 September 2011 (UTC)Reply

More generally, the translation table should include help when needed. Lmaltier 17:03, 2 September 2011 (UTC)Reply

Question

Latest comment: 13 years ago8 comments4 people in discussion

I was directed here from a discussion section. So what is the "acceptability" of signatures? An editor since 8.28.2011. 06:36, 3 September 2011 (UTC)Reply

Any signature is probably going to be acceptable unless someone takes exception to it. If somebody has a problem with it, they will explain and then you will know how to improve its acceptability. Or you can ignore the complaint and advice and choose instead to burn your bridges with that editor. If you burn too many bridges, you may find it difficult or impossible to function effectively here. —Stephen ^(Talk) 06:44, 3 September 2011 (UTC)Reply

~~I have no clue how that is related to my comment? (Never mind.) Okay... what exactly are you trying to say?~~ Oh, I see! I'm stupid when I'm tired. Thanks! An editor since 8.28.2011. 06:48, 3 September 2011 (UTC)Reply

FWIW I find colorful signatures annoying... but I'd rather be annoyed than limit others' freedom to editor their own signature, unless the signature is really really silly. Mglovesfun (talk) 09:55, 3 September 2011 (UTC)Reply

Thank you! An editor since 8.28.2011. 17:08, 3 September 2011 (UTC)Reply

However: my signature is uni-colored. An editor since 8.28.2011. 17:11, 3 September 2011 (UTC)Reply

Colorful doesn't necessarily imply more than one color. --Mglovesfun (talk) 14:18, 7 September 2011 (UTC)Reply

Differently-coloured signatures might cause problems for people using different skins (colour schemes), perhaps because of poor eyesight. Fugyoo 14:26, 7 September 2011 (UTC)Reply

User:Yair rand/uncategorized language sections/English

Latest comment: 13 years ago1 comment1 person in discussion

Just want to ask for a few volunteers to fix these entries, using templates such as {{en-noun}}, {{en-verb}} or just {{infl}}. No obligation of course, but even fixing one entry at this late stage is a help. Thank you, Mglovesfun (talk) 09:54, 3 September 2011 (UTC)Reply

community's opinion on bot format

Latest comment: 13 years ago12 comments4 people in discussion

I received this message on my talk page: Hi there. It is a bit late now, but I have been meaning to ask you for some time if the form P.officer was a mistake. Also, in your subpages, we like to use {{conjugation of}} these days rather than {{form of}} e.g. {{conjugation of|pellettizzare||2|s|past historic|lang=it}} (Italian example).

The thing that I'm asking about is when it says "we like to use {{conjugation of}}...rather than {{form of}}". Is it important which template to use, if both produce identical results? To me, it looks unnecessary to change to {{conjugation of}}, if not a waste of time, but I'm eager to hear the voices of other users. --Pofficer 17:53, 4 September 2011 (UTC)Reply

If they produce the same thing, I don't see the point in switching.—msh210℠ (talk) 20:22, 4 September 2011 (UTC)Reply

Conjugation of is more uniform, there are many minor variation on how to write "first-person singular present indicative" using form of, while conjugation of only allows one of these. Mglovesfun (talk) 21:34, 4 September 2011 (UTC)Reply

I agree. Hence my "If..." clause.—msh210℠ (talk) 21:37, 4 September 2011 (UTC)Reply

OK, I shall continue the bot. If there are any problems, don't hesitate to leave me a message and I'll put a clamp on the bot. --Pofficer 09:56, 5 September 2011 (UTC)Reply

I got a message just now about a bot flag, the "small formality of requesting permission to run as a bot, and then getting a sysop to set the bot-flag on your user id". Can I request permission to run P.officer (talk • contribs) as a bot? Instead, perhaps, I could change the name of the bot to Officebot (talk • contribs) as it could avoid confusion. --Pofficer 10:20, 5 September 2011 (UTC)Reply

We do have a couple of bots without -bot or -Bot in the name but, if you don't really mind, I'll change it to PofficerBot before setting the bot flag (It seems to be functioning OK). SemperBlotto 10:31, 5 September 2011 (UTC) p.s. You would need to edit your user-config.py file to reflect the name change.Reply
- I certainly will change user-config.py. "Pofficerbot" is fine as a name. Thanks --Pofficer 10:40, 5 September 2011 (UTC)Reply

OK. Changed to "Pofficerbot" and bot-flag now set. SemperBlotto 10:44, 5 September 2011 (UTC)Reply
- Still needs a vote technically, no? Not that I object. Does anyone actually object to this bot? Seems a bit of a waste of time to have a vote if nobody would actually oppose it anyway. Mglovesfun (talk) 10:46, 5 September 2011 (UTC)Reply
  - It takes a second to remove the flag (I'm going to keep an eye on it for a while). SemperBlotto 10:48, 5 September 2011 (UTC)Reply
    - Thanks again SemperBlotto. As if by magic, the edits have been removed from RecentChanges. --Pofficer 10:59, 5 September 2011 (UTC)Reply

WT:About_Japanese

Latest comment: 13 years ago7 comments3 people in discussion

Calling all 日本語能力のある方...

Following comments in various other threads, it appears that the WT:AJA page needs some work. The issues I'm immediately aware of:

Quasi-adjectives (な adjectives): WT:AJA insists on including the な in the headword, which does not appear to be the current consensus.
の adjectives: WT:AJA does not include any clear guidelines for these. (Relatedly, {{ja-adj}} doesn't include any way of handling these either.)
Suru compound verbs: WT:AJA calls for using the {{ja-suru}} template. However, する is a standalone verb, so including the する conjugation on each and every compound verb page seems excessive.
{{ja-kanjitab}}: WT:AJA describes including this under an === Etymology === section if there is one, but including under the main == Japanese == section produces largely identical results, unless there are multiple etymology sections, in which case repeating the kanjitab seems excessive.
The Transliteration subpage could also use some work, particularly with regard to spacing and what constitutes a single word in Japanese (i.e., particles should be separate, suru should be separate, etc. etc.).
連体詞: WT:AJA states that this should be given a POS of "prefix", but that is really not what these words are -- a prefix is part of a word, whereas 連体詞 are clearly standalone words. They are less prefixes and more like true adjectives, in that they must precede a noun.
Single-kanji entries: WT:AJA has no clear instructions on how to specify okurigana in kun'yomi listings, nor any clear instructions on how to format these to link to verb forms. For instance, 食 shows one way of clarifying okurigana and linking to kanji+okurigana entries, but is a bit visually messy; ja:食#日本語 looks a bit cleaner with the use of hyphens to show the break between the kanji and the okurigana, and this roughly matches the format I've most often seen in dead-tree dictionaries, but the entry doesn't link to any kanji+okurigana entries, just to the hiragana entries; and 飲 doesn't show okurigana or link to any kanji+okurigana entries.

This post is really just meant to get the ball rolling. Many of these changes listed above are a departure from what WT:AJA currently says, so I'm hoping to spark a bit of discussion before making any edits. -- TIA, Eiríkr Útlendi | Tala við mig 17:41, 6 September 2011 (UTC)Reply

Please keep discussion in the fora here in English where possible. For the record, 日本語能力のある方 seems to mean "those skilled in Japanese" or similar (based only on Google Translate, not that I know any Japanese, myself).—msh210℠ (talk) 18:25, 6 September 2011 (UTC)Reply

Also, you might want to continue this discussion at Wiktionary talk:About Japanese, since it may wind up taking up a lot of screen space and is specific to Japanese (and indeed the AJA page!).—msh210℠ (talk) 18:28, 6 September 2011 (UTC)Reply

Fair enough. I've tried posting there a few times and got the overwhelming impression of crickets chirping, which led me to try posting on a more-trafficked page. I'll copy this thread over to there shortly. -- Eiríkr Útlendi | Tala við mig 19:16, 6 September 2011 (UTC)Reply

I think it is a good idea to have this post here, directing everyone to the Wiktionary talk:About Japanese page (where the discussion can take place). If no-one adds to the discussion, contact other active editors of Japanese directly on their talk pages. If there are none, or you have done that and they have not replied, then you (as the only active editor of the language) should make whatever changes you deem necessary. - -sche (discuss) 20:06, 6 September 2011 (UTC)Reply

I agree: keep this here, but continue discussion there. (That's what I meant in the first place: sorry I wasn't clear.)—msh210℠ (talk) 23:41, 6 September 2011 (UTC)Reply

Sure, no worries. :) I copied my initial post over to Wiktionary_talk:About_Japanese#Work_Needed. I hope to get into the nitty gritty over there. -- Cheers, Eiríkr Útlendi | Tala við mig 23:45, 6 September 2011 (UTC)Reply

I've created a list of the 1000 most common species epithets

Latest comment: 13 years ago6 comments2 people in discussion

Hi Latin lovers and barflies,

User:Pengo/Latin/Top_1000

Based on the Encyclopedia of Life database, I've compiled a list of the most common species epithets. I'm hoping this will help those who want to create new Latin/Translingual entries.

There's more details on the page. --Pengo 14:14, 7 September 2011 (UTC)Reply

Here's the top 5 words that are missing Latin/Translingual entries:

fasciata (banded)
apicalis (apex)
africana
nana (dwarf)
variegata

--Pengo 03:22, 8 September 2011 (UTC)Reply

In many cases it is just the inflected form that is missing, eg, (deprecated template usage) nana, (deprecated template usage) nanus, but in some cases lemmata are missing, even classical ones, eg, (deprecated template usage) variegatus, (deprecated template usage) variego. DCDuring TALK 14:21, 8 September 2011 (UTC)Reply

Looks like it would help if I grouped words with the same stem. I'm going to attempt to make another list that does that (at least crudely).

I, myself, don't know an inflection from a declension, so until I learn some Latin grammar and work out all the templates and formatting here, this list is really for you and other editors. So let me know if there's anything else that would be useful. --Pengo 02:53, 9 September 2011 (UTC)Reply

Grouping by stems is less helpful IMO for speeding entry creation than grouping by inflectional ending and suffix. Ie, the forms ending in "ata" have a very similar Latin section structure. That structure will have links to the participle lemma ending in "atus", which will have links to the lemma verb. Some of those links may be red. The entries for the red link lemmas should probably be added by an editor familiar with Latin with access to multiple Latin references, including some for Medieval Latin. Purely New Latin terms are much less interesting to most Latinists, however important they may be to taxonomists and to Wiktionary. DCDuring TALK 12:25, 9 September 2011 (UTC)Reply

Thanks for the feedback. Working on it. Will add some extra features too. --Pengo 04:56, 10 September 2011 (UTC)Reply

abuse filter

Latest comment: 13 years ago21 comments7 people in discussion

As of recently, we have an abuse filter. It allows us to create rules against which edits (and moves and other things) are filtered; if an edit matches such a rule, it can — at our option for each rule — tag the edit with a little note in special:recentchanges, not allow the edit to go through until the editor first sees a warning that the edit might not be wise (which warning can be customized for each filter rule), block the edit altogether, or remove the editor's "autoconfirmed" flag. (Or combinations of those.) It can also do these things only after the editor in question makes too many rule-matching edits in a short period of time (which rate, too, is customizable per filter rule). For more on the abuse filter, see the MediaWiki extension page and/or the Wikipedia abuse filter page (except that they call it the "edit filter").

I've set up some rules that I thought would be helpful.

One of them actually blocks an edit from going through: this filter checks that the user is not an autopatroller, admin, or bot; that the edit is in the main (entry) namespace; that the entry had a level-three header before the edit; that the edit had no level-three header after the edit; and that entry (after the edit) doesn't have a speedy-deletion template or {{only in}}. It blocks that edit from going through. That filter has (in its current incarnation) caught scores of edits, with no false positives (i.e., it not block any edit that we wouldn't have manually rolled back had it gone through).

No other rule currently does more than tag an entry on special:recentchanges. I propose, though, that three do.

One of them is a copy of a filter at enWP. These filters look for an edit that adds a single bad word and nothing else. (Approximately. The actual workings of the filter are hidden on enWP, so I've hidden our copy also. Admins and "edit filter managers" over there can see their copy, and our admins can see ours.) On enWP, it prevents the edit from going through, and has done so for months. (I don't, however, know how fastidious they are in looking for false positives.) Here, it does nothing; so far we've had only a handful of matches, with no false positives. I propose it prevent edits from going through here also. I also ask admins to edit it to enwikt purposes (testing it well of course, especially if it disallows the edit from going through).

Update: Now we've had a false positive.—msh210℠ (talk) 15:27, 8 September 2011 (UTC)Reply

Another rule I think should do more than tag is one that checks whether a new (main namespace) entry is created by a non-autopatroller (non-admin, non-bot), lacks a level-three header, and either {has both a capital letter and a space in its title} or {has a right-parenthesis ) at the end of its title}. It's only had a handful of hits, with no false positives. Again, please improve it; and I think perhaps it should also block edits from going through.

The third rule I think should prevent edits from going through currently also just tags. It checks whether an entry is not new, is being edited by a non-autopatroller (non-bot, non-admin), and has its after-this-edit text the same (but for capitalization and other normalizations) as its pagetitle.

Thoughts?

(Of course, edits to improve the other filters are sought, too. And new ones.)—msh210℠ (talk) 20:05, 7 September 2011 (UTC)Reply

This is a really neat tool and I applaud your initiative in creating a few filters to start out. - [The]DaveRoss 20:08, 7 September 2011 (UTC)Reply

Yeah, it's an excellent thing to have. I think I saw a rule to block edits that create a page whose content is identical to its title, which (for some reason) is a very common useless edit. Equinox ◑ 22:58, 7 September 2011 (UTC)Reply

We have it for existing pages: it checks whether the page content was reduced to its pagetitle. We could easily have it for new pages also (even by editing the existing filter rule).—msh210℠ (talk) 15:19, 8 September 2011 (UTC)Reply

I've updated that rule. We can watch and see if it picks up false positives.—msh210℠ (talk) 15:30, 8 September 2011 (UTC)Reply

I think we could disallow creating pages in the main namespace if the first character is a letter. All existing pages begin with either a header or with a template like {{also}} or {{wikipedia}}. —CodeCa t 23:26, 7 September 2011 (UTC)Reply

(You mean if the first char is alphanumeric?) Most people don't come here knowing the formatting rules, so if we did do that, we would need extra-prominent links to those and to places they might want, like WT:REE. Equinox ◑ 23:30, 7 September 2011 (UTC)Reply

I think we should tag those but not disallow 'em. There might be some usable content. Mglovesfun (talk) 11:45, 8 September 2011 (UTC)Reply

Yeah most people don't know ELE, so we shouldn't disallow them, but I think it might be wise to give the editors a notice before allowing them to save, which notice can outline the format, or something. (And tag the edit.)—msh210℠ (talk) 15:19, 8 September 2011 (UTC)Reply

I've created a filter along these lines. It checks whether the first character is anything but { or =. It does nothing for now (so we can check for false positives), but can warn the user.—msh210℠ (talk) 18:07, 18 September 2011 (UTC)Reply

Awesome! —Ruakh_TALK 00:29, 8 September 2011 (UTC)Reply

Could we write a filter that shows editors a warning before allowing them to put their edit through, if their edit introduces <ref> (and does not introduce <references/>) to an entry that does not contain <references/>? I sometimes forget, on both en. and de.Wikt

, to add <references/> when adding <ref>s. The warning would remind the editors to add the <references/> tag. - -sche (discuss) 18:50, 8 September 2011 (UTC)Reply

I've created it but have not yet tested it (or checked how expensive it is).—msh210℠ (talk) 21:34, 8 September 2011 (UTC)Reply

It works well, as far as I can tell, and has caught a couple of users. I think we need to update the location of the message, though (either move MediaWiki:Abusefilter-warning/ref-no-references back to MediaWiki:Abusefilter-warning/ref-no-reference or change the link, whichever is easier; at the moment it displays a default message rather than the nicer and more informative custom one). - -sche (discuss) 05:53, 10 September 2011 (UTC)Reply

I've fixed it, I think (not just now).—msh210℠ (talk) 17:30, 11 September 2011 (UTC)Reply

So (to repeat myself) we have a filter rule that catches edits that result in a page whose content matches its title (in the main namespace, and except for whitelisted folks, admins, and bots). Any objection to having that rule block the edit from going through? As of now we've had only about ten hits, but no false positives, and I can't think how there would be any.—msh210℠ (talk) 17:30, 11 September 2011 (UTC)Reply

Done.—msh210℠ (talk) 15:18, 13 September 2011 (UTC)Reply

A lot of anon users seem to create pages that just contain one or more instance of the text "[[File:Example.jpg]]" (perhaps they are accidentally clicking the delayed-loading JavaScript toolbar?). A filter for this might be worthwhile. Equinox ◑ 13:02, 17 September 2011 (UTC)Reply

Alternatively, we could push to have the toolbar fixed. ;-) Personally I have it turned off, because it's just too annoying to try to click in the textarea and suddenly have inserted something random. —Ruakh_TALK 14:49, 17 September 2011 (UTC)Reply

I would like there to be a fixed-sized empty space on the page until the toolbar loads and replaces it. Whom do we nag? Equinox ◑ 14:54, 17 September 2011 (UTC)Reply

Lemma entries for Japanese na type adjectives (形容動詞)

Latest comment: 13 years ago2 comments2 people in discussion

I've noticed that a the policy for な-type adjectives or keiyodoshi is to include the な as a part of the entry. This is not, as far as I know, standard practice in any Japanese dictionary or even the Japanese Wiktionary.

For example, both 元気 and 元気な are treated as lemma entries. I believe users would be better served to have the 元気な entry read: "Attributive (連体形) form of 元気", and have both noun and adjective lemma entries listed on 元気.

The -な suffix is merely a conjugation of form and should be treated as such. The most egregious example, and the one that brought this issue to my attention, is たくさんな. There is a page for the kanji version of this word, 沢山, but there isn't even a link to it from たくさんな, instead there is a broken link to 沢山な. But all of this is besides the point, the real issue is that たくさんな is a much less often used form than either たくさんの or even just たくさん. All of these forms would be better served by the lemma entry たくさん, which I would be happy to write tonight after work, but that doesn't solve the system wide problem of な-type adjectives being written with the な as part of the lemma.

The only policy on this I can find, Wiktionary:About Japanese#Quasi-adjectives_.28.E5.BD.A2.E5.AE.B9.E5.8B.95.E8.A9.9E.29, is not very clear on the issue. I propose that it be changed to include the ideas I've put forth, but I'm not sure exactly how to do so. Entries would still need to acknowledge that these are な-type adjectives, but this could easily be done in a header or something, right?

Also, perhaps a bot of some sort to change all of the entries made in the way I clearly find so offensive. *^_^*

MichaelLau 19:04, 9 September 2011 (UTC)Reply

Hello Michael, thanks for chiming in --

Those of us dealing with Japanese here on the English Wiktionary have been chewing on some of these issues recently, c.f. WT:BP#Preferred forms for Japanese lemmata, WT:BP#WT:About_Japanese, and a number of posts starting at Wiktionary_talk:About_Japanese#Lemma forms for keiyōdōshi and continuing further down that page. The emerging consensus is in largely line with what you describe. I'd really appreciate it if you could have a look at the other posts I've linked to here to get up to speed with what has already been discussed of late, and then it'd be great if you'd add to the discussion over at Wiktionary_talk:About_Japanese#Work_Needed. -- Cheers, Eiríkr Útlendi | Tala við mig 20:00, 9 September 2011 (UTC)Reply

Template:ja-kanji

Latest comment: 13 years ago2 comments1 person in discussion

I'd like to update this template to handle shinjitai / kyūjitai, much as the Japanese POS templates already do (see {{ja-noun}}, {{ja-adj}}, {{ja-verb}}, etc.).

Some kanji don't get used as words on their own, and thus the individual kanji entry won't have anywhere graceful to put shinjitai / kyūjitai information. It would seem most appropriate for that information to go in the {{ja-kanji}} template itself, rather than (or possibly as well as - removing would take work) in the POS templates.

Are there any admins who could either implement this change, or change the protection level of {{ja-kanji}} to allow me to do so? -- Eiríkr Útlendi | Tala við mig 20:40, 9 September 2011 (UTC)Reply

Looked at this again and realized I can indeed edit the template, so I did. I'll update the template documentation later to account for the new args. -- Eiríkr Útlendi | Tala við mig 21:51, 20 September 2011 (UTC)Reply

Classical/Literary Chinese entries

Latest comment: 13 years ago17 comments6 people in discussion

Is there a correct way to add a definition of a Classical or Literary Chinese word? I've seen information about noting an etymology, but I'm not talking about an etymology for a modern word, I'm talking about defining a word as used in Classical Chinese texts. Such an entry might have the same meaning in modern Chinese, or might have a different meaning, or might not be used at all any more. I've looked for a list of "official" wiktionary languages, and found the "random entry" list. It has Old Chinese, Middle Chinese, and Late Middle Chinese, which are names of reconstructed languages (mostly phonology) from different periods. Those were spoken languages, and Classical Chinese was the most common written language used during all of those periods. How about Early Vernacular Chinese, for example words used in the novels 红楼梦 or 金瓶梅? There are entire dictionaries devoted to this language, but is the distinction appropriate on wiktionary? If so, can I just enter Early Vernacular Chinese as the language? Craig Baker 06:26, 11 September 2011 (UTC)Reply

Such distinctions can be a bit arbitrary, I edit Old French, Middle French and French so I'm familiar with the issue. Important note one, please don't remove Mandarin headers. Mandarin is standard here, it's also a widely accepted language name. Like you say, depending on date it could be Old Chinese, Middle Chinese, and Late Middle Chinese. There's no reason not to create an ad hoc code for Classical Chinese if editors want it. But only if editors want it. For example we have 'ad hoc' codes {{roa-jer}} for Jèrriais and {{roa-leo}} for Leonese. Mglovesfun (talk) 13:38, 11 September 2011 (UTC)Reply

There is already a code {{lzh}} for Literary Chinese. —CodeCa t 13:53, 11 September 2011 (UTC)Reply

Right then, in which case definition don't replace Mandarin with Literary Chinese, as Mandarin is a language. We don't replace English with Middle English, we include both when the word/term is used in both languages. Mglovesfun (talk) 14:00, 11 September 2011 (UTC)Reply

Please see here for your references. Engirst 15:16, 11 September 2011 (UTC)Reply

Regarding "replacing" Mandarin, what about in entries I've added where the words are not found in Mandarin? Or, where I have no evidence that the word is found in Mandarin? Is there an expectation that when I add a Literary Chinese definition, I will also research whether the word is found in Mandarin? Craig Baker 15:52, 11 September 2011 (UTC)Reply

Could we enter Classical Chinese like this? Engirst 16:25, 11 September 2011 (UTC)Reply

The transliteration in this entry is based on Mandarin, the way Classical Chinese is taught in China. There really can't be another way, as they teach the words, grammar, sentence structure but not the pronunciation. So, in short it's not 100% accurate. --Anatoli 00:09, 12 September 2011 (UTC)Reply

Speedy deletion is only for patently wrong entries. Unlike Wikipedia, deleting one language section of an entry with more than one language section would be considered a speedy deletion, about equivalent to blanking a whole Wikipedia entry. You should likely be going to WT:RFV with these, though unless there's a pretty robust answer to the question 'what's the difference between Classical Chinese and Mandarin?' then a lot of these debates will be a waste of time. Mglovesfun (talk) 19:41, 11 September 2011 (UTC)Reply

From what I understand from Wikipedia, Literary Chinese is an obsolete writing standard based on the Middle Chinese spoken language that was used up till the early 20th century. It would be comparable to Ottoman Turkish, but being in use a lot longer. —CodeCa t 21:45, 11 September 2011 (UTC)Reply

Mglovesfun, just to be clear, my change of two entries to "Classical Chinese" which you reverted were new entries which I added earlier that day, and initially categorized them as Mandarin because I didn't know that Literary Chinese was an option. I otherwise wouldn't have considered changing the language of an existing entry, which is what I assume you mean by "speedy deletion". What I'm more curious about is new entries for which I can provide Classical Chinese definitions, but don't have any information about Mandarin or other modern varieties. Craig Baker 03:26, 12 September 2011 (UTC)Reply

I don't think we have contributors in Classical Chinese and in my opinion, we don't need to split Mandarin and Classical Chinese if a specific pronunciation for a specific period is not chosen. Also, The way Classical Chinese is used in Modern Mandarin, Cantonese, etc, the words can be classified as simply Mandarin, Cantonese, etc with some {{qualifier}}. The reason is that, they are borrowed into modern Chinese varieties and adjusted to the appropriate pronunciation, used in quotes quite often. The few words that are NEVER or SELDOM used in modern languages, like classical pronouns, prepositions, have a modern usage, anyway, e.g. 伊 (yī), 其 (qí), 之 (zhī), etc. and the modern pronunciation. Numerous Mandarin chengyu are an example how Classical Chinese is used in modern Mandarin. To understand their meaning, some knowledge of the Classical Chinese grammar and vocabulary is required but I don't think their components should have a separate entry as Classical Chinese. In any case, hanzi as such a complicated component, which is hard to classify as a part of speech, they often convey a meaning and only in combination become nouns, verbs, etc. --Anatoli 00:09, 12 September 2011 (UTC)Reply

The {{ qualifier}} idea sounds ok to me. As long as there is a way to note that they are Classical Chinese words, the information will not be lost, and it will be possible to use the dictionary when reading Classical Chinese texts for example. I'm curious why choosing a pronunciation is related to splitting the languages; pronunciations are not necessary to write a dictionary, though maybe some technical limitation of Wiktionary requires it? To me, the written form seems most important in a language like Classical Chinese where the pronunciation was not really even recorded, although I do think reconstructions can be interesting and useful in some ways. I agree that a good number of Classical Chinese words are Mandarin words too, but in general I don't agree with your example of chengyu; in most cases I think chengyu should be considered to be a single word in Mandarin (etc.), but just an ordinary phrase or sentence in Classical Chinese. In such chengyu, what used to be Classical Chinese words are no longer free to act like words in Mandarin sentences, and the meaning of the chengyu has fossilized and often shifted. In the terms used on the "Criteria for inclusion" page, the chengyu is idiomatic in Mandarin, but not in Classical Chinese; and the words it is composed of are not attested in Mandarin outside of that chengyu. In the end, I suppose the "language status" is not very important to me, as long as the two can be separated in some way by the reader or perhaps by an automatic script for the reader's use, so that the dictionary is useful for reading both Classical Chinese and Mandarin texts. Craig Baker 03:26, 12 September 2011 (UTC)Reply

I only said that chengyu in Modern Mandarin demonstrate the grammar and syntax of Classical Chinese, didn't say that one can use its components as they were then.

As a dictionary, Wiktionary deals less with stylistics and syntax, it would be really hard to define each hanzi for both modern Mandarin and Classical Chinese. 文言文 (Wényánwén) (Classical Chinese), unlike 白話／白话 (Báihuà) (Vernacular Chinese) was almost 100% monosyllabic, each word consisting of only one hanzi, and defining the classical sense and usage of hanzi would require major work on these entries. At the moment, most definitions for hanzi are under the Han character heading. The specific CJKV language sections mainly deal with the READINGS of those characters. --Anatoli 03:47, 12 September 2011 (UTC)Reply

I see your point about definitions for single characters currently being under the "Translingual" section. Of course it would require major work, but it's hard for the work to even begin without a language category, or to attract anyone capable of doing the work. I notice that many (most?) single-character entries already have definitions in the Japanese section, as well as etymologies (while the Translingual section has just a character etymology, not a word etymology). I would assume that the eventual goal is for definitions to be provided in the other languages/dialects too, so that we have information about how the word is used in those languages (or how it is not used—one of the most difficult things about reading Classical Chinese with a dictionary that includes both modern and ancient definitions is filtering out the modern definitions). I will continue reading around the Community Portal to try to understand the plan for this. Perhaps it would also help to note that there are already many large, good dictionaries devoted to just Classical Chinese, so they are definitely useful. Craig Baker 03:08, 14 September 2011 (UTC)Reply

Sorry to have not seen this sooner. I have created thousands of classical chinese words on wiktionary over the last several years. I have created a number of translations at wikisource that link back words to wiktionary definitions. My long term project is s:Romance of the Three Kingdoms. So far, the format has been largely decided by me, since I haven't come across anyone knowledgeable in the subject that wanted to contributed entries. My approach has been to view the problem through the lens of Mandarin. I'm not suggesting that this is the ideal approach, merely the most practical. Since Classical Chinese can be read in modern Mandarin, it made sense to create mandarin entries that used either the {{literary}} or {{archaic}} labels. The {{obsolete}} label might be another potential option, although I haven't used it all that much. These context labels recently underwent a minor change. They now put the words into categories called: Category:Mandarin archaic terms in traditional script, Category:Mandarin archaic terms in simplified script, Category:Mandarin literary terms in traditional script and Category:Mandarin literary terms in simplified script. These categories should gradually replace Category:zh-tw:Archaic, Category:zh-cn:Archaic, Category:zh-tw:Literary and Category:zh-cn:Literary as well as Category:Traditional Chinese archaic terms, Category:Traditional Chinese archaic terms, Category:Traditional Chinese literary terms and Category:Simplified Chinese literary terms. See 飲酒 and 征東將軍 for some typical examples of how I format entries. Also, I used 字 as a model of how we could do it if time and people were not limitations. Thanks. -- A-cai 01:37, 29 September 2011 (UTC)Reply

P.S. Other pieces that I've done in this way on wikisource: s:Departing from Baidi in the Morning, s:Preface to the Poems Composed at the Orchid Pavilion, s:Song of Everlasting Regret, s:The Peach Blossom Spring and s:Touring Shanxi Village -- A-cai 01:43, 29 September 2011 (UTC)Reply

adding-translation script

Latest comment: 13 years ago24 comments4 people in discussion

Discussion on de.Wikt: de:Wiktionary:Teestube#.C3.9Cbersetzung-Hinzuf.C3.BCgen-Skript.

Hello English Wiktionary-Users,

in the German Wiktionary we would like to add the function that allows people to add translations without manually editing the code section. Could anyone explain how to do it? That would be great. Thanks in advance! Kampy 08:11, 11 September 2011 (UTC)Reply

The German Wiktionary seems structure translations sections completely differently from the English Wiktionary, with translations showing which senses they correspond to by having numbers next to the translations, rather than putting the translations for each sense in a separate box, so simply copying the code wouldn't really work. --Yair rand 20:17, 11 September 2011 (UTC)Reply

The code would have to be modified, yes, but that shouldn't be too complex a task. One option: the de.Wikt programmers could code another box between the ISO box and the translation box, which would take the sense number(s) as input. Is all of the code that operates the function contained in User:Conrad.Irwin/editor.js? - -sche (discuss) 05:48, 12 September 2011 (UTC)Reply

No, it also uses the newNode function in MediaWiki:Common.js#Dom_creation. Another issue is that the script seems to make use of the translation table glosses, which dewikt doesn't have, for locating tables. (Not completely sure about that.) --Yair rand 06:08, 12 September 2011 (UTC)Reply

I think if we (on de.Wikt) changed

(values.qual? '{'+'{qualifier|' + values.qual + '}} ' : '') +

to

(values.qual? '[' + values.qual + '] ' : '') +

, we could use the code as-is (abgesehen von the problem of glosses, which we could add), with users adding the sense-numbers (1, 1–2) in the "qualifier" field. - -sche (discuss) 01:06, 13 September 2011 (UTC)Reply

About the numbers I think it shouldnt be too much of a problem. We dont use headlines saying the definition again instead we use those numbers. So there will only be one box at all times. Anything added to this box just needs an additional input box for the number it relates to. Can anyone code this? Kampy 00:05, 14 September 2011 (UTC)Reply

There will be more than one box (and there will be no numbers) once the translations are (per the vote) separated by sense, though...

I have copied the code to my de:Benutzer:-sche/common.js, and have copied the en.Wikt and de.Wikt translation tables into subpages of my userspace for testing (de:Benutzer:-sche/sw4), but even with classes and gloss-support added to the German translation tables (de:Benutzer:-sche/sw1c), I haven't got it to work yet. - -sche (discuss) 01:01, 14 September 2011 (UTC)Reply

Oh, I didn't know a vote was going on. I agree that the English version is more practical. I will support a change. Kampy 10:57, 14 September 2011 (UTC)Reply

Is it possible that the code in my de.Wikt .js isn't considering itself enabled, Yair rand? - -sche (discuss) 01:08, 14 September 2011 (UTC)Reply

[[de:Benutzer:-sche/common.js]] has some syntax errors that will cause browsers to stop processing it. [[de:Benutzer:Ruakh/common.js]] fixes the most severe errors — you can take it as a starting point for further debugging — but it still doesn't create the form for adding translations, so there's still something wrong. :-/ —Ruakh_TALK 02:30, 14 September 2011 (UTC)Reply

Thank you for catching that! I'm guessing that (among other things) I should add \ to all of the other instances of sche/, like you did to sche/sw1c (ie sche/sw2b ⇒ sche\/sw2b etc), yes? Or not to all of them? - -sche (discuss) 03:05, 14 September 2011 (UTC)Reply

Doesn't make a difference, it's only necessary inside the regexps, not simple strings (/.../, not "..."). --Yair rand 03:22, 14 September 2011 (UTC)Reply

It seems that makes the script work! The "±" sign displays atop the gloss, but that's a relatively minor problem. - -sche (discuss) 03:09, 14 September 2011 (UTC)Reply

That's because the dewikt tables don't have the show/hide button as the first node in the NavHead, and the script places the "±" after the first node. Can be fixed by replacing insertDiv.insertBefore(edit_button, insertDiv.firstChild.nextSibling); with insertDiv.insertBefore(edit_button, insertDiv.firstChild);, so that it's placed before the first node.--Yair rand 03:22, 14 September 2011 (UTC)Reply

Other issues: The dewikt language templates leave parserfunction residue when substed. This could be fixed by modifying the language templates to have {{{|safesubst:}}} before the #if: ({{ {{{|safesubst:}}}#if:{{{nolink|}}}|Französisch|[[Französisch]]}}). Also, the use of {{t}} needs to be replaced with whatever template dewikt uses. --Yair rand 03:32, 14 September 2011 (UTC)Reply

Thanks; that change puts the "±" in the right place! :) I'm working on replacing {{t}} with the de.Wikt counterpart {{Ü}}. I am also considering that certain functions, like "Page name:", may not be applicable to de.Wikt. (In fact, I replaced the code to input qualifiers with code to input sense numbers; it may be that I should undo that and instead use the AFAICT-unneeded-on-de.Wikt pagename-with-diacritics code as the vessel for adding sense numbers.) (No, that wouldn't work at all.) - -sche (discuss) 04:01, 14 September 2011 (UTC)Reply

Re safesubst: actually, the necessary change isn't to the templates (although having templates that subst safely is probably a good idea); the necessary change is to the code: we don't use use "Französisch" in translation tables on de.Wikt, we use {{fr}}. - -sche (discuss) 04:35, 14 September 2011 (UTC)Reply

I've changed the code so that it does not subst language codes. However, changing {{t}} to {{Ü}} caused the function to display "Could not find translation entry for 'pt:worde'. Please reformat" when I tried to add worde (with ISO code pt given) to a section containing other translations. However, it added correctly to an otherwise empty section. I thought residual "{{t"s or a "{{Üxx" I added might be confusing the script's sorting mechanism, but it was also confused by this version of the page. (That version also shows that I/we/de.Wikt-programmers need to change how/where gender information is added.) - -sche (discuss) 05:03, 14 September 2011 (UTC)Reply

The function getEditFunction might be the problem. It's built to look through the translation table wikitext for the translation to insert the new translation before (I think), but it's searching by first looking for * [[langname]]:, then for * langname:, and then for {{subst:langcode}}:, in case it's a newly added translation, but dewikt doesn't format translations like any of these. --Yair rand 21:21, 14 September 2011 (UTC)Reply

You're right; removing subst: (so that it only looks for the language code) makes it work. Now to remove cruft... - -sche (discuss) 21:55, 14 September 2011 (UTC)Reply

I have adapted the code to work with de.Wikt's Ü-templates. It even nests nb and nn correctly, when neither no nor nb nor nn is already in the table. However, de:Benutzer:Yoursmile gave me feedback that the adder appears but doesn't work (Could not find translation table for 'fr:reg'. Glosses should be unique) when other scripts are around, e.g. de:Benutzer:Yair rand/TabbedLanguages.js. I thought that might be because DOM-node code is redundantly in both codes, but when I separated the DOM-node and translations codes, and imported both, I found that the trans-adder no longer appeared. (If it had appeared, I would have imported only the trans-code and TabbedLanguages, to see if the absence of redundant DOM-code allowed the two to work together.) Any idea why splitting the two seemingly discrete scripts causes them to cease funktioning (ie causes the trans-adder to cease appearing)? Any idea why the trans-adder appears but does not work when TabbedLanguages are around?

Separate issue: any idea what I did wrong when I tried to remove the "script" bit? That edit caused the adder to cease appearing. That edit also removed a bit of "gender"-code, but that wasn't problematic; I successfully removed it later. I rendered the "script" bit harmless (we don't use script templates on de.Wikt) by causing it to input nothing and removing the interface, but that leaves a lot of cruft. - -sche (discuss) 23:56, 17 September 2011 (UTC)Reply

The top four lines of that edit are actually removing part of something completely unrelated to the "script" bit, but that part isn't what actually broke it. The edit contained an extra comma (ota:{,wsc:"ota-Arab"}) which caused a syntax error. --Yair rand 07:32, 19 September 2011 (UTC)Reply

Thank you! I have got the script to work with Tabbed Languages; with your syntax-fix, now the unneeded "script" part has been successfully removed. I tested Tabbed Languages and the Trans-Adder together in the main namespace on de:Katze. I moved the code to de:Benutzer:-sche/uebersetzung.js, if anyone wants to see for themselves (remember that at the moment it is still oriented to {{Benutzer:-sche/sw1c}} and therefore only works on test pages or modified pages). The only issue I note now is that it wouldn't add more than one translation without me refreshing or navigating away from and back to the page; I wondered if I just didn't wait long enough (de:Katze had a lot of translations for it to sort through), but it displayed the same behaviour on my simple test page. (A minor problem I remind myself to fix is the unneeded space between * {{langcode}}.) de.Wikt will have to adopt glosses for this to work. - -sche (discuss) 09:27, 19 September 2011 (UTC)Reply

Of note: the code works differently in different browsers and in the main vs the user namespace. (Those interested can temporarily restore this version of Katze and try using the code on it.) - -sche (discuss) 22:54, 19 September 2011 (UTC)Reply

Wiktionary talk:Etymology#Where to put etymologies

Latest comment: 13 years ago1 comment1 person in discussion

See Wiktionary talk:Etymology#Where to put etymologies --MaEr 10:56, 11 September 2011 (UTC)Reply

Correlative conjunctions

Latest comment: 13 years ago1 comment1 person in discussion

How should the entries for correlative conjunctions look? Both...and, neither...nor, both-and, neither-nor or something else? I'm not asking just about English, but about other languages too. Arath 14:51, 12 September 2011 (UTC)Reply

Making an arse of it ... ?

Latest comment: 13 years ago5 comments3 people in discussion

Interloper from Wikipedia here. While testing some new back-end scripts, I ran across template:a bum - a nonexistant template that is linked from many Wiktionary entries. Not sure if it's vandalism, preparation for some widescale future vandalism, or just a typo (album?) somewhere deep that invites same. I had a crack at finding the source, butt alas got nowhere. - Topbanana 17:24, 12 September 2011 (UTC)Reply

Thanks! It's from {{t|ar|[anything]}} (also t+ and t-), but I can't seem to track it down farther. Incidentally, you spelled (deprecated template usage) alass wrong.—msh210℠ (talk) 17:47, 12 September 2011 (UTC)Reply

[e/c] Got it. The culprit was template:ar/script, which had been vandalized. Thanks again.—msh210℠ (talk) 17:55, 12 September 2011 (UTC)Reply

(After an edit conflict) I dug through the histories of two of the entries, multiculturalism and dictionary. It seems to have been added to multiculturalism in this edit, and to dictionary in this edit. In other words, its transclusion seems to be caused by the removal of "sc=" from uses of {{t}}. - -sche (discuss) 17:54, 12 September 2011 (UTC)Reply

I've protected template:ar/script; can some admin who knows how to write such a bot please flood-protect the [langcode]/script templates as highly visible?—msh210℠ (talk) 17:57, 12 September 2011 (UTC)Reply

highly visible templates that may need protection

Latest comment: 13 years ago1 comment1 person in discussion

User:Topbanana has generated a list of highly transcluded templates that lack full protection. Some can even be edited by non-autoconfirmed users. We may want to protect some of these. The list was at [[User:Topbanana/Template_protection]], which I've deleted so as not to advertise the list to would-be vandals (cf. w:wp:BEANS), and admins can find it now at [[Special:Undelete/User:Topbanana/Template_protection]].—msh210℠ (talk) 15:16, 13 September 2011 (UTC)Reply

Template:given name - sorting missing

Latest comment: 13 years ago6 comments3 people in discussion

Would an admin be kind enough to fix {{given name}} to allow sorting? This template currently fails to properly categorize Japanese given names, for instance. The code to tweak (not all the code, just a snippet):

<includeonly>{{#if:{{NAMESPACE}}|| [[Category:{{langname|{{{lang|en}}}}} {{#if: {{{diminutive|}}}|diminutives of}} {{{gender|{{{1}}}}}} given names {{#if:{{{from|{{{2|}}}}}}|from {{{from|{{{2}}}}}}|}}]]

Change to:

<includeonly>{{#if:{{NAMESPACE}}|| [[Category:{{langname|{{{lang|en}}}}} {{#if: {{{diminutive|}}}|diminutives of}} {{{gender|{{{1}}}}}} given names {{#if:{{{from|{{{2|}}}}}}|from {{{from|{{{2}}}}}}|}}|{{{sort|{{{skey|{{PAGENAME}}}}}}}}]]

As a minor side note, {{given name}} and {{surname}} are a bit inconsistent, in that {{surname}} includes a period at the end, and {{given name}} does not. Immaterial really, but it'd look a bit more put together if these two were in agreement. -- TIA, Eiríkr Útlendi | Tala við mig 16:08, 13 September 2011 (UTC)Reply

I'll change protection so you can do it yourself. Then I'll protect it again. SemperBlotto 16:12, 13 September 2011 (UTC)Reply
Brilliant, thank you SemperBlotto, done and sorted! (Pardon the pun.) -- Eiríkr Útlendi | Tala við mig 16:20, 13 September 2011 (UTC)Reply
Could somebody fix {{surname}} to allow sorting too? Thanks Haplogy 16:14, 21 September 2011 (UTC)Reply
Thank you Haplogy for pointing that out, and thank you Mglovesfun for changing the protection settings on the template. Done and sorted. -- Eiríkr Útlendi | Tala við mig 16:51, 21 September 2011 (UTC)Reply

Of course, now there's the bother of going through Category:Japanese surnames and making sure that all the romaji entries are categorized alphabetically. In most cases, this just means deleting the Category:Japanese surnames line containing the mistaken hiragana sort key; the cat line is redundant anyway since the {{surname}} template already supplies the category.

For that matter, I don't suppose anyone has a bot handy for this? It would just need to go through these entries and delete the Category:Japanese surnames portion. The entry set is defined as:
Included in Category:Japanese surnames

Lemma is in romaji (ASCII + vowels with macrons)

Entry includes {{surname}}

-- Cheers, Eiríkr Útlendi | Tala við mig 17:07, 21 September 2011 (UTC)Reply

Question about cats

Latest comment: 13 years ago4 comments2 people in discussion

Cleaning up some Japanese given name entries, I've stumbled on a bit of a puzzle. The name 恵美 (Emi) can also be the name 恵美 (Megumi). Using the {{given name}} template and the {{{sort}}} argument for each reading on the page, I'd expect the entry to show up in Category:Japanese female given names under both え for Emi and め for Megumi -- but it only shows up under め. I'm guessing that the cat applied by the second {{given name}} call is overriding the first.

So the $60,000 question is, is there any way for a single entry that belongs in a single category to show up under two different indices in that same category? If not, am I right in guessing that the index to use is the first one alphabetically (or in this case, hiraganically)? -- Eiríkr Útlendi | Tala við mig 16:52, 13 September 2011 (UTC)Reply

You may have to create a redirect using a zero-width non-joiner (‌), as discussed here, and sort the main entry into one category and the redirect into another ... if it is possible to use sort in redirects. - -sche (discuss) 20:57, 13 September 2011 (UTC)Reply

Interesting. What a wonderfully ugly cludge. Well, so long as it works. That might just be what I do if no one has a better idea. :) -- Thank you, Eiríkr Útlendi | Tala við mig 21:12, 13 September 2011 (UTC)Reply

Well, I just went ahead and created the page 恵‌美 and put the second cat index there, and it works -- the entry 恵美 is now properly indexed under both えみ (Emi) and めぐみ (Megumi). Thank you, -sche! -- Eiríkr Útlendi | Tala við mig 15:52, 15 September 2011 (UTC)Reply

Pokemon get their own ja-noun template?

Latest comment: 13 years ago13 comments4 people in discussion

I'm going through Category:Japanese_nouns to clean up after realizing I'd left the indexing argument hidx out of a lot of entries I'd been working on, and I discovered that Appendix:Pokémon/アーボ and Appendix:Pokémon/アーボック still get categorized under katakana even after I added the hidx arg to the POS template. Then I realized that these entries get their own POS template: not the usual {{ja-noun}} template, but {{ja-noun/pokemon}} instead. Is this kosher? It seems awfully dodgy to me. If we're keeping this template, shouldn't we at least align its formatting with {{ja-noun}}? -- Bemused, Eiríkr Útlendi | Tala við mig 23:30, 14 September 2011 (UTC)Reply

Those entries aren't actually supposed to exist anymore. They're only still there because nobody bothered to merge them into the list yet. --Yair rand 00:02, 15 September 2011 (UTC)Reply

Okay. Merge them into what list, though? RFD? -- Eiríkr Útlendi | Tala við mig 06:56, 15 September 2011 (UTC)Reply

There was a vote on the issue, so you don't need an RFD. --Mglovesfun (talk) 08:56, 15 September 2011 (UTC)Reply

Is there anything I can / should do to help deal with these entries? My editor instincts for having things sorted out are getting itchy. :) -- Eiríkr Útlendi | Tala við mig 15:38, 15 September 2011 (UTC)Reply

The community decided these entries for fictional things should be in lists. Some appropriate lists are Appendix:DC Comics (Bat-Signal, Kryptonian...) and Appendix:The Legend of Zelda (Hylian, Ocarina of Time...).

The coverage of Pokémon is still a bloody mess; the work of making these pages with lists is half-done. You can help by creating the lists, if you are interested. Everything you need is here: Special:PrefixIndex/Appendix:Pokémon. --Daniel 17:53, 15 September 2011 (UTC)Reply

Thanks, Daniel. Looking at that list, I notice that Appendix:Pokémon/アーボ and Appendix:Pokémon/アーボック are the only two katakana entries on that list, and both already have their romanized versions of Appendix:Pokémon/Arbo and Appendix:Pokémon/Arbok. Am I right in guessing that all other katakana Pokémon entries have been removed and/or converted to the official romanizations? If so, can we delete the Appendix:Pokémon/アーボ and Appendix:Pokémon/アーボック entries? -- Eiríkr Útlendi | Tala við mig 21:24, 15 September 2011 (UTC)Reply

No, nobody had the idea of deleting the katakana entries and leaving only the official romanizations.

Ideally, Wiktionary can have a complete glossary of all 649 names of Pokémon in English, the 649 names in katakana, the 649 official romanized names and even the 694 romanizations from katakana. I think that would simply be two big appendices: one of English and one of Japanese; but I may be wrong. If these lists are created, then someone searching for, say, Beedrill, will at least be able to know it is the name of a Pokémon species.

We have few Pokémon listed simply because nobody bothered to make the full lists. --Daniel 01:05, 16 September 2011 (UTC)Reply

This is straying more into Grease Pit territory, but if we're going to keep the Pokémon appendix entries in all valid scripts, how do we make sure they get indexed properly? Appendix:Pokémon/アーボ and Appendix:Pokémon/アーボック are currently indexed as Japanese nouns under ア, when they should be indexed under あ. I tried adding the hidx arg used in the normal {{ja-noun}} template, but {{ja-noun/pokemon}} doesn't implement any explicit sorting.

Plus, the formatting of {{ja-noun/pokemon}} is a bit jarring in its differences from {{ja-noun}}. My gut instinct is to unify these, but I'm not sure what the designer of {{ja-noun/pokemon}} intended, or what anyone else thinks. Any advice? -- Eiríkr Útlendi | Tala við mig 01:19, 16 September 2011 (UTC)Reply

Appendix:DC Comics and Appendix:The Legend of Zelda don't use anything like {{ja-noun}} or {{ja-noun/pokemon}}. These lists don't have headword-lines like entries. If the appendices of Pokémon follow suit, then {{ja-noun/pokemon}} should just get deleted.

Indexing should be simple. Appendix:DC Comics is indexed under "D", because "DC Comics" starts with that letter. Appendix:Pokémon/Species would be sorted under "P".

However, the appendices should not stay in Category:Japanese nouns, because there it would be regarded as clutter. In that category, the focus is keeping the Japanese nouns of entries rather than the ones of appendices. --Daniel 02:33, 16 September 2011 (UTC)Reply

Thanks, Daniel. It's probably easy enough to remove the Pokémon entries from Category:Japanese nouns just by editing {{ja-noun/pokemon}} for now. Some sort of header template is probably a good idea, in order to include info like official Japanese name and romanization thereof (which sometimes differs markedly from the official name in the Latin alphabet), so I'm not sure we should just delete it -- maybe tweak it to categorize things properly, and maybe move it? I'll at least make a few minor changes tonight. -- Eiríkr Útlendi | Tala við mig 05:21, 16 September 2011 (UTC)Reply

About that - the list of Pokémon over at Special:PrefixIndex/Appendix:Pokémon are all actual term entries, and are mostly the official Latin alphabet names, which are included in Category:English nouns. By the same token, all (only two now) katakana Pokémon entries are listed under Category:Japanese nouns. ???
Did you mean that this appendix of Pokémon species be turned into a straight-text list like Appendix:DC Comics and Appendix:The Legend of Zelda, and that we should delete the individual Pokémon term entries? -- Eiríkr Útlendi | Tala við mig 05:37, 16 September 2011 (UTC)Reply

Yes, the consensus clearly is deleting individual pages like "Appendix:Pokémon/Super Potion" in favor of having big lists like, possibly, "Appendix:Pokémon/Items" or "Appendix:Pokémon/Objects".

Now I edited {{ja-noun/pokemon}} so it doesn't categorize appendices into Category:Japanese nouns anymore.

Actually, when the appendices of English terms of Pokémon were created, they were using a version of {{en-noun}} that did not categorize them into Category:English nouns; but now it categorizes. Naturally, if all the appendices that use {{en-noun}} are replaced by lists, the miscategorization will stop.

I created Appendix:Pokémon/Species with a basic format for organizing headwords, definitions and translations together. One small problem of this system is that the translations are all redlinks to entries. This should be fixed eventually. --Daniel 06:12, 16 September 2011 (UTC)Reply

Adjectives in the translation sections of nouns

Latest comment: 13 years ago9 comments5 people in discussion

User:DCDuring and I have been discussing how to handle adjective translations of English attributive nouns. Many languages use adjectives where English uses nouns attributively. For example, a "cork^n." = a "пробка^n.", but "cork^{attr. n.} insulation^{attr. n.} material^n." = "пробковый^{adj. (cork)} изоляционный^{adj. (insulation)} материал^{n. (material)}"1. As I see it, there are several ways we can handle this:

1. List the adjectives in the nouns' translations tables, roughly like in 'cork'. This has long been en.Wikt's general practice — to include in translation sections, where appropriate, words not the same part of speech as the word they translate. The translations section of the adjective abroad contains the German preposition + article + noun im Ausland; German routinely uses nouns in compounds to express things English expresses with adjectives, so adjectives like racial routinely contain nouns like Rassen-.

2. List the adjectives in separate tables in the translation sections of the nouns, like in 'brass'. This was Matthias Buchmeier's suggestion. It has the advantage of distinguishing those many translations, which are systematically different; it has the disadvantage of inviting confusion or duplication of languages which do not use other parts of speech but rather use nouns like English does.

3. Have separate sense lines for attributive nouns, and list the adjectives in the separate tables of those sense lines, like in 'spruce'. This would represent a significant change to our current practice. This has the advantage of establishing great clarity. It has the disadvantage of inviting duplication of translations from languages which use nouns attributively like English does, and it is duplication in the English section, or at least overspecificity. It would represent a significant change to our current policy and practice. Multiple sense lines would be required, or the attributive sense lines would hold multiple definitions: "cat paw" is a paw "of a cat", "cat booties" are booties "for a cat", a "cat addiction" is an addiction "to (having) cats"...

4. Include adjective POS sections for attributive nouns. This would represent a significant change to our current policy and (in most cases, eg insulation) our current practice. It would create some clarity (words which were used in some of the same ways as adjectives would have adjective POS sections), but would also create some unclarity or would mislead (words would be called adjectives, though they could not be graded or used alone after 'become' or in other key ways adjectives can be used). It would also have the disadvantage that users who understood that "cork insulation material" was [attributive noun + attributive noun + noun] would only find the translations of the attributive noun "cork" in the adjective section.

5. Allow foreign-language entries to host translation sections, as we do on de.Wikt. This would represent a significant change to our current policy and practice. It would have advantages in other areas (few languages have verbs with exactly the same meaning, for example), but that is for another discussion (please!): it would be a poor solution to this issue: the adjective пробковый and all other languages' same-meaning adjectives could host translation sections, but to find such a translation section, a user would have to know one of the translations already (and it would have to be a bluelink).

6. Not present the information. This would represent a significant change to our current policy and practice. It would have the disadvantage of preventing us from having complete translation sections; if brass can be used in two ways, "that metal is brass" and "the brass knob", but we only provide the translation that works in uses like "that metal is brass", we'd be missing an accurate translation.

User:DCDuring (who may have other ideas on how to handle adjective translations of attributive nouns) and I felt we should bring this up for discussion here, in the Beer Parlour, for several reasons. Wiktionary has long included translations which are different PsOS than the words they translate, but the inclusion of this giant category (words, in all languages, which translate attributive nouns) has not — AFAICT — been discussed. Should we continue to include it? If we do, should we simply list the adjectives in the noun tables without distinction, or should we introduce them with some clause, or list them in separate tables (like brass)? If we set them apart with some clause, what should it say?

How should we handle nouns (I give oak as an example, although at the moment it still has an adjective section) which are often used attributively, but which have corresponding adjectives (oaken)? Should adjectives like дубовый be in oak (for consistency, accuracy, and completeness, because the nouns are used in places in English that other languages uses their adjectives for — because English, despite having oaken, still says "oak table") and in oaken, or only in oaken (since it, an adjective, exists)? - -sche (discuss) 02:23, 16 September 2011 (UTC)Reply

(deprecated template usage) Oak really shades into adjectiveness. It's used both attributively and predicatively in ways that resemble an adjective (google books:"oak furniture", "the furniture was oak"). In such cases my own opinion is that it's best to include an adjective section, even if Occam's razor would prefer we consider it solely a noun; but even if we don't, I think we could have a sense “(deprecated template usage) Lua error in Module:parameters at line 376: Parameter 1 should be a valid language or etymology language code; the value "shading into adjective use" is not valid. See WT:LOL and WT:LOL/E. Made of this wood; oaken” and a corresponding translations box. Adjective translations in that box could be tagged with {{pos a}} or {{qualifier|adjective}} or whatnot.

However, (deprecated template usage) oak is an extreme case. In the typical case, the ordinary translation of the noun is as a noun, and if a given language uses an adjective to translate certain cases, I think it's probably best addressed through usage notes in that language's noun entry. For example, [[étoile#French]] could explain that in many cases where English tends to use the noun (deprecated template usage) star, in French the adjective (deprecated template usage) stellaire is likely to be used: (deprecated template usage) système stellaire, (deprecated template usage) amas stellaire, etc. (I realize that explaining it in the French entry does not preclude explaining it in the English entry's translations-box as well, but I think the latter is just too hard to do intelligibly.)

At [[oaken]] and [[stellar]] and such, we'd of course give the usual adjective translations. (Well, maybe [[oaken]] would just have {{trans-see|oak}}, or vice versa.)

—Ruakh_TALK 03:15, 16 September 2011 (UTC)Reply

Ruakh is right - put them in "usage notes in that language's noun entry". All such long exlanations will - if carried to conclusion - result in unmanageably large (and probably unreadable) translations tables.

I would also say that all synonyms, gender forms, &c are also best put in the foreign language word's entry. —Saltmarsh^{talk-συζήτηση} 10:54, 16 September 2011 (UTC)Reply

@Ruakh: I've continued the discussion of whether or not the specific word oak is an adjective at WT:RFV.

@Saltmarsh: I may be misunderstanding what you mean when you suggest omitting "synonyms". Our general practice has AFAICT always been to include (like other translations dictionaries) all words in the foreign language which mean what the English word means. If you would not include all words, how would you decide which word was the main word and which words were just "synonyms" of it? (For example, how would you decide whether to include священный and omit святой from holy, or include святой and omit священный?)

@Ruakh and Saltmarsh: it is possible to have the adjectives in the translations sections of nouns without long notes, if your concern is that the notes are clutter: the translations could be given without introduction, like this, or in separate tables, like this (which seems similar to what you consider doing at [[oak]]).

If you are more broadly concerned about having adjectives in the translations sections of nouns, do you think we should remove the nouns that are in the translations sections of adjectives, where languages routinely use nouns in compounds where English uses adjectives, eg German Modell-^noun in model^adj., Finnish yhteiskunta-^noun in social^adj., Swedish stjärn-^noun in stellar^adj., etc? Do you think we should remove all Navajo translations from our adjective entries? Navajo expresses "it^pronoun is^verb adjective^adj.", eg "it^pronoun is^verb white^adj.", as "łigai^{verb (it is white)}".

What do you think should be done in cases where the use of a specific different POS is not routine (eg German im Ausland^{article+prep. + noun} in abroad^adj.)? - -sche (discuss) 05:38, 17 September 2011 (UTC)Reply

@-sche/synonyms - dont synonyms usually have subtle differences? And in some cases you may have 4 or 5 of them - how does the user judge which is best for their needs? Look at all five? Easier to give the most accurate translation or the most frequent. The user will follow this link, where more information for each form can be given - as under the See also head (which could have been Synonyms) for επειδή. —Saltmarsh^{talk-συζήτηση} 06:29, 17 September 2011 (UTC)Reply

I think that the way used in cork is good. It's understandable. Lmaltier 16:31, 16 September 2011 (UTC)Reply

To clarify, do you mean with the introduction "corresponding to English attributive use, meaning ‘made of cork’:", or without it? - -sche (discuss) 05:38, 17 September 2011 (UTC)Reply

WIth the introduction "corresponding to English attributive use, meaning ‘made of cork’:"; without it, it is very misleading. Lmaltier 08:35, 17 September 2011 (UTC)Reply

An adjective in a non-English language should IMHO be found on the page of the noun whose attributive use it captures, whether in a usage note, in derived terms or on the headword line. Thus, I dislike the Dutch translation line in the English section of this revision of "cork": I prefer "Dutch: kurk (nl), kurklaag" to "Dutch: kurk (nl), kurklaag; corresponding to English attributive use, meaning ‘made of cork’: kurken"; I disagree with "Dutch: kurk (nl), kurklaag; kurken (nl)", as "kurken" has no place to seek there. In this I seem to agree with Ruakh and Saltmarsh. --Dan Polansky 07:02, 22 September 2011 (UTC)Reply

Japanese POS templates and how entries are indexed

Latest comment: 13 years ago6 comments3 people in discussion

Chewing on the issue of how entries are categorized and indexed, I re-read WT:About Japanese, in particular the section Wiktionary:About_Japanese#Sorting. As I'd suspected, romaji entries should be indexed alphabetically, not under the corresponding hiragana. However, a quick look at Category:Japanese nouns, for instance, shows many romaji entries indexed hiraganically.

Looking deeper, {{ja-noun}} has the following line towards the end:

[[Category:Japanese nouns|{{{hidx|{{{hira|{{PAGENAME}}}}}}}} {{PAGENAME}}]]

This needs to be changed to:

[[Category:Japanese nouns|{{#ifeq:{{{1}}}|r|{{lc:{{PAGENAME}}}}|{{{hidx|{{{hira|{{PAGENAME}}}}}}}}}} {{PAGENAME}}]]

This tweak will index JA noun entries lower-case-alphabetically if the first template arg value is the letter "r", which it should be for romaji entries. I just made a similar change to {{ja-verb}}, but {{ja-noun}} is locked down. I'm about to crash for the night, so if anyone unlocks the template, don't expect much for the next ten hours or so. :) -- Cheers, Eiríkr Útlendi | Tala við mig 06:41, 16 September 2011 (UTC)Reply

Perhaps you also want to request a dump from the database, similar to Index:Russian, so that a Japanese index is updated. The current Index:Japanese is a joke. Then you'll see both red (from translations) and blue Japanese words. I don't know how this is done though. --Anatoli 07:14, 16 September 2011 (UTC)Reply

Don't ask me why, but it does seem to be very common to sort using the hiragana. Perhaps it's WT:AJA that needs to be changed. Mglovesfun (talk) 08:50, 17 September 2011 (UTC)Reply

@Anatoli:

Who would I make such a request of? And what does a dump do? My (admittedly limited) understanding of DB management is that a database dump is just an output of its contents. And what are the language indices used for? Just looking at Index:Japanese, I'm not sure why it would be a joke, but then again I've never had occasion before this to look at indices for a whole language. -- Ta, Eiríkr Útlendi | Tala við mig 23:11, 18 September 2011 (UTC)Reply

@Mglovesfun:

Some of us (so far User:Haplology, User:MichaelLau, and myself) are discussing and working on editing WT:AJA, having created a working draft at Wiktionary:About Japanese/Draft. We haven't done too much yet, but the idea was that a separate draft page would make it easier to implement drastic changes on the fly and see how it all looks, without confusing folks by changing the main About page until we have something we're happy with. Please chime in at Wiktionary talk:About Japanese/Draft if you have any ideas or opinions to discuss on how to change things.

With regard to sorting, the current policy stated at Wiktionary:About_Japanese#Sorting seems to state that kanji and kana headwords should be sorted by hiragana, while romaji entries should be sorted alphabetically -- this makes the most sense from a learner's standpoint, in that someone just starting with Japanese can still find a word in the index even if they don't know kana. However, it seems that 1) the Sorting section might stand some clarification, and 2) the description of how to use the hidx sorting parameter provided in the documentation for each of the Japanese POS templates is a bit less clear than WT:AJA. Moreover, many contributors of Japanese terms seem to be ignorant of WT:AJA, or at least not fully versed in it (which isn't too hard to understand given how long the document is, and all the complexities of dealing with Japanese when writing in English).

Anyway, I'll give Wiktionary:About_Japanese#Sorting another look this week, maybe tweak our Draft version of it a bit, and also go through the Japanese POS templates and update their documentation to make things a bit clearer when it comes to indexing. -- Cheers, Eiríkr Útlendi | Tala við mig 23:11, 18 September 2011 (UTC)Reply

I didn't know who created a database dump, I think it must be User:Conrad.Irwin, I remember reading about it and people requested indexes for other languages. I suggested to explore who created and refreshed Index:Russian and I find it a really good index, it looks like it has indexed tens of thousands entries and translations (many are in red). The possible challenge I see in creating the Japanese index is in, well, sorting and indexing the Japanese words. Why it (the Japanese index) would be a joke? Because it only has about a hundred KANJI, not tens of thousands WORDS. --Anatoli 04:59, 19 September 2011 (UTC)Reply

do not and does not

Latest comment: 13 years ago12 comments6 people in discussion

Any reason why the entries do not and does not don't exist? --The Evil IP address 13:56, 17 September 2011 (UTC)Reply

...and will not, could not, must not, might not, may not, etc.? Or is there something special about do? Equinox ◑ 13:58, 17 September 2011 (UTC)Reply

Nothing special about it, but I just noted that we had the contractions don't or doesn't without the spelled out versions (which are not uncommon). --The Evil IP address 14:09, 17 September 2011 (UTC)Reply

Are they not just sum of parts? Do + not? —CodeCa t 14:10, 17 September 2011 (UTC)Reply

Yeah, I think so. We don't have the contractions in order to define them per se, but in order to explain what sequence of words they are short for. Equinox ◑ 14:14, 17 September 2011 (UTC)Reply

I think these are an exception. The entry could be quite useful, for example both negate a sentence. This cannot be said about most other verbs, as well as most verbs can't have a "not" after it. The etymology section could also be very interesting, stating why do can be negated, but most other verbs not. --The Evil IP address 14:22, 17 September 2011 (UTC)Reply

Maybe, although it feels more like something for a grammar book or grammatical appendix. I was thinking the other day about how isn't and don't mostly behave alike, but not always ("didn't you do..." is fine, but "didn't you be..." is impossible). Equinox ◑ 14:25, 17 September 2011 (UTC)Reply

Why didn't you be more certain of that and look for it in Google book search? SemperBlotto 14:33, 17 September 2011 (UTC)Reply

Ha! Interesting. Those sentences sound very odd to me, especially "why didn't you be including the bombing of civil populations amongst their crimes?". Equinox ◑ 14:38, 17 September 2011 (UTC)Reply

Using 'not' is rare with most verbs now but it wasn't so rare in Shakespeare's time I think? This is also considered 'modern' English so we would need to include it. —CodeCa t 15:18, 17 September 2011 (UTC)Reply

I think it's sufficient to relegate this to a usage note at [[not#Adverb]] (where it is already, as the first usage note) or an English-verbs appendix.—msh210℠ (talk) 18:15, 18 September 2011 (UTC)Reply

The negative forms that most English auxiliary verbs have are just that: negative forms. They are not contractions like apostrophe d for would or apostrophe s for is or has, etc., though that's how they started. Rather, the n't is an inflectional ending. The rational is here.--Brett 01:13, 25 September 2011 (UTC)Reply

A new list of Latin Epithets (same suffixes together)

Latest comment: 13 years ago1 comment1 person in discussion

new list: Epithets by Suffix (contents)

I've created this new list which I'm calling "Epithets by Suffix". It's pretty huge (300,000+ entries) and is "retro-sorted" from nausicaa to tausaghyz. This allows words with the same suffixes to be worked on together. (as requested by DCDuring)

This is a follow-up to the Top 1000 list I posted recently. I created the Top 1000 list because I think it's really important that all these words have entries. For example, there are 882 species on this planet which have been given the specific epithet fasciata (including five animals and one plant which are all threatened with extinction, such as the Fiji Banded Iguana, Brachylophus fasciatus). There's no Latin entry for fasciata, nor for another 82 epithets which are each used by over 300 species.

I made these lists for Wiktionary's editors, so we can create or improve entries for any of the most common specific names as well as those of threatened species. I am interested in the diversity of life and its conservation, and would like to see the subject become less difficult for others studying or interested in biology. Scientific names are often seemingly opaque in meaning, and can be intimidating and difficult to work with when there's no easy to way understand them.

Wiktionary is becoming the go-to source for definitions. I want to encourage those who are improving it, and thus (deliberately or not) making the biological sciences less frustrating, less intimidating, and less mysterious for all. Latin is very well represented on English Wiktionary, currently having more entries than any other language (including English, counted by number of definitions). So I hope adding scientific Latin overlaps with the interests of Wiktionary's Latinist contributors, and I hope it's not too much of a stretch to sometimes delve into the less "pure" world of New Latin and scientific interlingual words. I'm trying to learn enough Latin and Wiktionary syntax to help more, but even if I were a grandmaster, I can't do it all on my own, so I'm hoping the top 1000 list as well as this new suffix-sorted list will encourage the Latinists here to consider looking at modern scientific Latin usage. And thanks also to those who have made some kind of start.

I've since improved the Top 1000 list so that filled entries have a strike-through, making it easier to identify blue links which don't yet have Latin or Translingual entries. Also I've highlighted epithets which belong to threatened species in order to increase their visibility (my own personal interest). These markups are on both lists.

This has all been a lot more work than I expected, and has all been done in my spare time, so I'm hoping it pays off with people actually using the lists to improve Wiktionary. If the lists get used, I'll take that as a show of their usefulness and spend the time to keep updating and improving and expanding or creating tools to help flesh out entries. Otherwise I'll leave it at this. It would be nice to move on to genera too.

TL;DR: New list of "Latin" specific epithets sorted by suffix (mostly Latin adjectives). I really do hope these lists are helpful and lead to the creation of new Wiktionary entries. Pengo 06:28, 18 September 2011 (UTC)Reply

American vs European music terms

Latest comment: 13 years ago8 comments5 people in discussion

I'm not sure whether this has been discussed before but I'm not sure what to do with regard to some musical entries. You may or may not be aware that in America music theory is taught with quite an array of different terms to the words used in European (chiefly British) music theory which have the same meanings.

e.g. whole note (American) --> semibreve (British)

quarter note (American) --> crotchet (British)

staff (American) --> stave (British)

From my experience a lot of American musicians have a hard time understanding the British equivalents as they often don't get taught, and vice versa. I find this hard to incorporate in definitions such as 2/2 where it may be difficult for a British reader to understand (half note would be a minim, and a measure would be a bar). Equally it would be hard for an American reader if the definition was in British English. There are quite a lot of entries where this problem arises; I often have to look up the meanings of the words because I was never taught the American terms.

I'm not sure if there's a way around it except to assume that someone will look up a word if they don't understand it. It's quite a niggling issue in the academic music world. —Jakeybean^TALK 05:55, 20 September 2011 (UTC)Reply

Perhaps a table expanded from one similar to that shown below could be added to each relevant page - or assigned to an appendix?

	British	American
Notes	semibreve	whole note
	minim	half note
	crotchet	quarter note
Miscellaneous	stave	staff

—Saltmarsh^{talk-συζήτηση} 10:07, 20 September 2011 (UTC)Reply

It appears to be a simple AE vs. BE difference. At least in Germany, the American terms are used. -- Liliana • 11:52, 20 September 2011 (UTC)Reply

Perhaps the definition for (deprecated template usage) 2/2 can read {{music}} A [[meter]] of [[two]] [[half note]]s {{gloss|[[minim]]s}} per [[measure]] {{gloss|[[bar]]}} or the like, providing the BrE and AmE terms each time. (The above looks like: Template:music A meter of two half notes (minims) per measure (bar).)—msh210℠ (talk) 16:34, 20 September 2011 (UTC)Reply

We don't need both "measure" and "bar" since both terms are used in American English, so in the definition we could just change "measure" to "bar" without loss of understanding to American readers. —An gr 17:13, 20 September 2011 (UTC)Reply

But non-native speakers, like me, don't understand bar, only measure, because the former is not associated with any musical term in Germany. -- Liliana • 18:34, 20 September 2011 (UTC)Reply

We can't start trying to second-guess what words non-native speakers might know and what words they might not. The German word is Takt, which isn't obviously connected to either "measure" or "bar", and which English word Germans learn depends on which variety of English they're exposed to. I don't think we can get around the fact that non-native speakers may have to look up words in a gloss whose meaning they don't know, but it would be good if we could minimize that for native speakers. —An gr 20:28, 20 September 2011 (UTC)Reply

Oh, I wouldn't know. I was trying to split it on BrE/AmE lines, and may have erred. My general idea though is that {{gloss}} be used when the Brits have one word and the Yanks another and never the twain shall meet.—msh210℠ (talk) 18:53, 20 September 2011 (UTC)Reply

Japanese kanji entries and classical vs. modern readings

Latest comment: 13 years ago4 comments2 people in discussion

Going through Category:Japanese_terms_needing_attention to do some mostly-mindless clean-up work, I've run across a number of kanji entries where the list of readings includes things that semantically sorta make sense, but that I've never seen. 不#Readings, for instance, lists the kun'yomi "せず (sezu), にあらず (niarazu), いなや (inaya)", which make sense since 不 essentially means "not" and all these kun'yomi are related to negativity, but I've never heard of 不 having any kun'yomi at all. Moreover, neither the Jisho.org entry nor Jim Breen's site (you'll have to enter the kanji yourself, I can't link directly) list any kun'yomi, nor do my dead-tree dictionaries. The Weblio entry does list these kun'yomi, but various things about Weblio make me think that they include classical Japanese readings, not just modern. That said, classical Japanese was much more varied in terms of how things can be spelled -- imagine Chaucerian English spelling, only far looser -- and thus classical readings aren't always terribly pertinent to the modern language.

This leads me to wonder if we should mark classical readings somehow? Or should we leave them out altogether? -- TIA, Eiríkr Útlendi | Tala við mig 20:25, 20 September 2011 (UTC)Reply

Japanese multiple readings are pain in the butt, especially names. Im amused at how 夜神月 is actually read Yagami Raito. Just use {{qualifier}}, I guess. I'm sure many kanji don't have a comprehensive list of all possible readings. --Anatoli 01:34, 21 September 2011 (UTC)Reply

Hm, yes, marking non-standard readings using qualifiers or something similar seems to be the emerging consensus. However, I don't think we even could go for "all possible readings", given the flexibility of how kanji are used.

FWIW, 夜神月 seems to be a manga or anime character, in which case all bets are off as to reading - the author(s) could just as well decide that a given kanji string should be read Furī Uirī, or Ai Raiku Dōnattsu, and that would be that. Manga and anime readings are sometimes the very picture of arbitrariness.

With that in mind, I'd be more inclined to have kanji entries here limit the list of readings to attested historical readings, and leave out anything that's clearly a creative neologism of limited currency -- basically apply something like CFI to the readings themselves. :) -- Eiríkr Útlendi | Tala við mig 15:48, 22 September 2011 (UTC)Reply

夜神月 (月 (tsuki) meaning Moon in this name is read as Raito, from English "light", watch Death Note - highly recommended, the best quality anime I've seen (the movie is not as good)!) is an extreme example but this arbitrariness is not restricted to names and not only manga names. I see your point but I find that listing too many readings for a kanji can also be counterproductive. Readings can be borrowed from other kanji with similar meanings, like with your example of いなや (inaya)", which is normally written as 否や in kanji. --Anatoli 22:56, 22 September 2011 (UTC)Reply

Hindi and Urdu vs Hindi-Urdu or Hindustani

Latest comment: 13 years ago9 comments5 people in discussion

I don't want to be mean and just change the headings from Hindi-Urdu to separate Hindi and Urdu as in the translations for Hindustani. I don't think there was a policy of merging the two languages together, even if Hindi and Urdu templates allow to display words in both scripts. Any thoughts? --Anatoli 00:22, 21 September 2011 (UTC)Reply

They should definitely be separate. There was no discussion on merging them (and even then, there is no code for Hindi-Urdu we could use). -- Liliana • 11:37, 21 September 2011 (UTC)Reply

I think we should at least discuss it. We could create a code, perhaps {{inc-hin}}. Currently our Hindi and Urdu template include things like 'Hindi spelling' and 'Urdu spelling', implying that they are the same language. I have absolutely no input on whether we should treat them as the same language, but we should discuss it. --Mglovesfun (talk) 17:15, 21 September 2011 (UTC)Reply

They are the same, yes. We only treat them separately due to two different scripts being used, so we can have all Hindi words in Devanagari and all Urdu words in Arabic script. -- Liliana • 17:19, 21 September 2011 (UTC)Reply

A small correction. There are layers of heavily Sanskritised words in Hindi, which are not used in Urdu, the reverse is true as well. There are many words of Persian and Arabic origin in Urdu, which are not used in Hindi. Having said this, Urdu can be written entirely in Devanagari (this type of writing is, in fact, more precise about consonants, which are missing in Sanskrit, like z, f, x, q, ġ, etc., Hindi writers often replace them with j, ph, k, g, etc.) and Hindi can be written entirely in Perso-Arabic script as well. The high level words are getting more out of use, as Hindustani, a spoken variety of both Hindi and Urdu is getting popular due to Bollywood, songs and media. Hindi and Urdu now borrow a lot from each other and from English making them even closer. --Anatoli 23:39, 21 September 2011 (UTC)Reply

Structurally they are different standardized registers of the same language, comparable to Croatian, Serbian, etc. being standardized versions of the same language (which is called Serbo-Croatian). Because they are the same language, I would be in favor of a unified header, like we have for (Roman and Cyrillic) Serbo-Croatian. Though, the damage would not be like we used to have in the case of Serbo-Croatian (three or even four identical entries on the same page), because, AFAIK, there should never be both a Hindi and an Urdu entry at the same page anyway because they use different scripts (well, at least the standardized registers, of course). --JorisvS 19:26, 22 September 2011 (UTC)Reply

I love arguing with Indian people about Hindi and Urdu being different languages. I quote religious Urdu stuff that they understand perfectly and I'm like "really, because half those words are from Persian, so if Urdu wasn't the same language as Hindi you wouldn't understand this." Anyway. As JorisvS points out, the mess isn't as serious as Serbo-Croatian once was because the headers aren't used on the same page. If what's desired is one header, I think Hindi-Urdu is a bit odd, and Hindustani would probably be the most neutral. In translation tables (I already do this for descendants tables and in Etymology) we could have

* Hindustani 
*: Hindi: 
*: Urdu:

I'm sure (=positive) some people (mostly racists) would bitch and whine, as with Serbo-Croatian. But they're lesser people. If there are words that aren't used frequently in India, they can be marked as predominantly Pakistani in Usage notes, and vice versa. Wouldn't be a big deal. The main concern would be categorization. We have problems with Chinese (simplified and traditional) and some people worrying about Serbo-Croatian, but I wouldn't really be opposed to something like Cat:Urdu spellings of Hindustani nouns/verbs/whatever. </ideas> — [Ric Laurent] — 20:04, 22 September 2011 (UTC)Reply

Hindustani sounds more interesting than Hindi-Urdu. I also noticed that Hindi speakers like to say that their language is closer Sanskrit and Urdu speakers say Urdu is closer to Persian. In reality, they both have enough from both. It may be harder to find Urdu equivalents for "clever" Hindi words like प्रदूषण (pradūṣaṇ) (pollution) but otherwise most Hindi words have Urdu equivalents and vice versa. Didn't you say you were avoiding Beer Parlour? :) Thanks for your input, Ric. --Anatoli 23:05, 22 September 2011 (UTC)Reply

Sometimes things of actual importance are discussed here, so when I see notifications of good conversations, I try to throw a few cents at it lol. (In fact, I'm considering our below-discussed Arabic problems, get some ideas out there) — [Ric Laurent] — 22:44, 25 September 2011 (UTC)Reply

Filipino and Tagalog

Latest comment: 13 years ago12 comments7 people in discussion

Update: On 1 Nov 2011, unification of Cat:Tagalog and Cat:Filipino was approved by a vote (opened 8 Oct)

I found a page (paalam) with an entry for the Filipino language, and another for Tagalog.

These are the same language.

The government of the Philippines wanted to make a national language and they decided in 1937 that it would be "based on Tagalog", the language of the capital. In 2007, the chair of the government's Commission for the Filipino Language (Komisyon sa Wikang Filipino) reported on these efforts:[3]

Are “Tagalog,” “Pilipino” and “Filipino” different languages? No, they are mutually intelligible varieties, and therefore belong to one language. [...]

The other yardstick for distinguishing a language from a dialect is: different grammar, different language. “Filipino”, “Pilipino” and “Tagalog” share identical grammar. They have the same determiners (ang, ng and sa); the same personal pronouns (siya, ako, niya, kanila, etc); the same demonstrative pronouns (ito, iyan, doon, etc); the same linkers (na, at and ay); the same particles (na and pa); and the same verbal affixes -in, -an, i- and -um-. In short, same grammar, same language.

This explains why there are no Tagalog-Filipino dictionaries, no Tagalog-Filipino translators/interpreters, and no documents or cultural goods ever produced in separate versions for each.

I can also personally confirm this, as a speaker of the language.

To fix the above-mentioned article, I removed the "Filipino" section from the page (and pasted it on the Talk: page, for reference). Gronky 11:32, 21 September 2011 (UTC)Reply

I am not opposed to it. I always wondered why we cover Filipino and Tagalog separately. -- Liliana • 16:35, 21 September 2011 (UTC)Reply

I have no input, other than it's an important issue and should be discussed rather than individual editors working using their own opinions. --Mglovesfun (talk) 17:18, 21 September 2011 (UTC)Reply

Is there a right place to discuss it, with an eye to setting policy?

This isn't controversial. Gronky 23:25, 21 September 2011 (UTC)Reply

I support this. We should probably just use Tagalog. It's the most common word now in use for the official language of the Philippines and we already use Tagalog much more often than Filipino/Pilipino. The difference is subtle and there's nothing that can't be resolved with occasional {{qualifier}} tags. --Anatoli 23:29, 21 September 2011 (UTC)Reply

Here. And it'd be nice to mention to the frequent contributors in both languages (or the language, whatever) that the discussion exists.—msh210℠ (talk) 00:09, 22 September 2011 (UTC)Reply

I support this too. Different registers of the same language should use the same header; in this case Tagalog is the name of the language and so should be used. When differences exist these can indeed be properly tagged anyway. --JorisvS 19:36, 22 September 2011 (UTC)Reply

Do we have any? As for me, I used {{tl}} (Tagalog that is) for some translations. --Anatoli 02:49, 22 September 2011 (UTC)Reply

I'll set up a vote, but leave enough time for the discussion to continue. Mglovesfun (talk) 20:26, 25 September 2011 (UTC)Reply

I understand that, for the moment, it's the same language, but that Filipino should become a mix of different languages used in the country, and that a commission is working toward this objective. It seems logical to use Tagalog only for the moment but, if somebody creates a Filipino entry nonetheless, there is no reason to delete it. It might become useful in the future. Lmaltier 10:06, 2 October 2011 (UTC)Reply

The "mix of different languages" proposal was the plan that was announced in 1937, but no effort was put into it, so it never even got off the ground. In the intervening 74 years Tagalog has been used in every situation where "Filipino" was meant to be used, and it has been taught in every school in the Philippines for the past two generations.

There are no efforts currently under way to create a "new" Filipino. ::The speed at which the Spanish language mostly disappeared from the Philippines is an example of how quickly things can change. (Sidenote: Spanish was ubiquitous there a century ago and most Philippine authors wrote in Spanish, but now, the lack of Spanish knowledge among first-language Tagalog speakers is such that when 19th century Philippine literature is being translated to Tagalog, they usually have to do two-step translations Spanish->English->Tagalog.) But changing a language does take at least a generation, and no such effort has begun yet or is being proposed. Gronky 20:48, 3 October 2011 (UTC)Reply

"Should become" makes me think of Wikipedia's Crystal Ball. We cannot do something just because we think something might happen/be in the future. If and when such a situation arises (and this is, as Gronky points out, not all too likely) we can deal with it then. --JorisvS 21:40, 3 October 2011 (UTC)Reply

Deprecating zh, zh-cn and zh-tw in category names

Latest comment: 13 years ago6 comments5 people in discussion

That's it really. AFAICT this always refer to Mandarin, though it's possible that in some cases zh could be used erroneously for another Chinese language such as Cantonese (NB, {{zh}} displays Mandarin). Would anyone like to expressly support or oppose this proposal? The proposal is 'replace zh, zh-cn and zh-tw in topical category names' like Category:zh-cn:Computing to Category:cmn:Computing. Mglovesfun (talk) 20:03, 21 September 2011 (UTC)Reply

I agree with you. Engirst 20:12, 21 September 2011 (UTC)Reply

Err... but how are you going to separate traditional and simplified script? ---> Tooironic 21:26, 21 September 2011 (UTC)Reply

How are we going to - well, if we want to Category:cmn:Computing in traditional script, or something similar. Mglovesfun (talk) 21:37, 21 September 2011 (UTC)Reply

FORTRAN?—msh210℠ (talk) 00:05, 22 September 2011 (UTC)Reply

I posted a comment related to this issue at the end of Wiktionary:Beer_parlour#Classical.2FLiterary_Chinese_entries. -- A-cai 01:09, 12 October 2011 (UTC)Reply

Language merges

Latest comment: 13 years ago21 comments11 people in discussion

Looking at this page and WT:RFDO, there are four merges being proposed:

Category:Koongo language into Category:Kongo language (very small)
Category:Colloquial Malay language into Category:Malay language (very small)
Category:Filipino language into Category:Tagalog language
Category:Hindi language and Category:Urdu language into a Category:Hindustani language

Of course, Bosnian, Croatian and Serbian were merged a few months ago

On top of that, I would personally like to see Category:Anglo-Norman language merged into Category:Old French language (which I might add, would render a few hundred of my own edits useless or worse). Interesting issue, isn't it? Mglovesfun (talk) 20:08, 21 September 2011 (UTC)Reply

At the risk of turning this into a very, very broad topic, I've occasionally wondered if having all the 'Arabics' separately is appropriate. Mglovesfun (talk) 20:10, 21 September 2011 (UTC)Reply

The Arabic languages are almost as distinct as the Slavic languages. They share a formal standard literary/media language but the languages of daily speech are so different as to be incomprehensible to people at the other end of the Arabic language area. —CodeCa t 21:37, 21 September 2011 (UTC)Reply

That's correct. However, it doesn't make sense to create Egyptian, Levantine, Moroccan, etc. entries for words which are identical. Most formal vocabulary and many other words are shared between dialects and MSA or have a very slight difference in the pronunciation. We don't use the pedantic case endings here, anyway (e.g. غرفة ghurfa vs ghurfatun) . The difference in pronunciation between j/g, q/' (Standard/Egyptian) could be ignored, since the spelling is the same, the conversion is rather consistent but Egyptians pronounced Arabic words differently. So MSA قلم qalam is Egyptian 'alam and MSA حج Hajj is Egyptian Hagg. The words, which ARE different in dialects should have separate entries, IMHO, e.g. tomorrow غدًا "ghádan" (MSA) vs بكره "bukra" (Egyptian). --Anatoli 23:24, 21 September 2011 (UTC)Reply

I know no Arabic so can't opine, but, assuming what Atitarev says is true, pronunciations differences can be relegated to the Pronunciation section.—msh210℠ (talk) 00:07, 22 September 2011 (UTC)Reply

The contributors in Arabic dialects have almost died out. I looked at some Egyptian Arabic nouns, many (not all) are just Arabic. The quality Egyptian Arabic entries with different plural forms shouln't be merged, like يد. We should also check with Stephen G. Brown and Dick Laurent on this. --Anatoli 02:47, 22 September 2011 (UTC)Reply

Look at water. The translations in Arabic dialects are all quite different from each other. -- Liliana • 03:25, 22 September 2011 (UTC)Reply

Using this same example, if one looks at the translations in Chinese, they are written in the same way (Dungan excluded of course), and yet we divide Chinese into God knows how many languages. 60.240.101.246 12:49, 22 September 2011 (UTC)Reply

Many of these are difficult to discuss, because we don't have very many people who specialize in foreign languages. Therefore, it's hard to gain any sort of consensus. -- Liliana • 03:38, 22 September 2011 (UTC)Reply

(Edit conflict). That's the trouble with most common words, they are very few but make the speech very distinct and hard to understand with no previous exposure. The same will be for the question word what. Still the written dialects (if they are written, only a few are ever written down) tend to be much closer than the spoken forms, much closer than Slavic languages. The fact that dialects are for speaking not for writing make making entries for them less important.

Agree to your last message. --Anatoli 03:40, 22 September 2011 (UTC)Reply

Yes, it may make sense to create distinct entries for words in different Arabic languages. I would allow both the macro-language (for those not willing to create distinct entries) and individual languages (for those seeing an interest in creating them). The fact that specific words are often mostly oral and not found in usual dictionaries makes all the more important to include them here.

In addition, systematically accepting sections for languages with an ISO code would be a simple rule, would make things much simpler and would avoid many discussions. Lmaltier 16:45, 22 September 2011 (UTC)Reply

I dislike putting simplicity ahead of accuracy. ISO 639 isn't really designed for our purposes; they don't care if a language is actually not a language but a dialect of another language, they just attribute a code when a code can be useful. There's a code for no linguistic content but I don't think we want Category:No linguistic content language. Mglovesfun (talk) 20:59, 25 September 2011 (UTC)Reply

No, they try to define codes for languages, not for dialects. They created a code for no linguistic contents butthis is an exception, they don't state that this is a language. It's not always obvious. They created Occitan as a macrolanguage, and codes for individual varieties, then they changed their mind. And the word language may be interpreted in different ways, so you may disagree with their decisions. But this is what they try to do. Lmaltier 09:49, 2 October 2011 (UTC)Reply

Jesus. Arabic. So, on one hand, there are things that would be pretty smart about having separate L2s for the major dialects, but as has been pointed out there would be lots of overlap. However when you get to the details, you have variations in pronunciation, verb conjugation... these would have to be compensated for if we wanted to be complete. Without separate L2s the only logical way to represent variations in regional conjugation in an L2 Conjugation section is with several drop-down tables. We'd have a few {{a}} tags for pronunciation variance, stuff like that. Really it would be possible to treat all Arabic dialects under one header. In all likelihood, it wouldn't be pretty, and it would require a lot of tags - for example, for words specific to certain dialects, it would be very easy to just do like we do with English with regional tags before definitions. (I apologize for the scattered nature of these statements lol... there's a lot to consider.) — [Ric Laurent] — 22:57, 25 September 2011 (UTC)Reply

Hey! If you're going to merge Hindi and Urdu (see above), then arguably Romanian and Moldovan are also candidates for merging! -- Liliana • 13:54, 28 September 2011 (UTC)Reply

Yes, let's add Category:Moldavian language→Romanian to the list of merges to consider. - -sche (discuss) 07:28, 30 September 2011 (UTC)Reply

Definitely! --JorisvS 12:43, 30 September 2011 (UTC)Reply

Naturally — [Ric Laurent] — 23:45, 5 October 2011 (UTC)Reply

M→RL!...Exista o singură (daco-romană) limbă în România și Republica Moldova: Romanian language. Este bine să se poarte o discuție aici, dar rezultatul trebuie să fie clar: o (una) singură limbă (oficial "dacoromană", Romanian ) pentru spațiul carpato-dunărean al României și R. Moldova. Un argument clar și decisiv pentru Romanian language: Românii din provincia Moldova (din România, Vest-Moldova), - români-moldoveni - vorbesc aceeași limbă (language) ca și românii din Est-Moldova (Republica Moldova): Romanian, și ei nu afirmă că "oficial" limba lor se cheamă "moldovenească" (moldavian)!

Aside from merging "Colloquial Malay" into "Malay", for consistency and accuracy "Indonesian" and "Malaysian" should also be merged into "Malay", because these are standardized varieties of the Malay language (like Croatian etc. are of Serbo-Croatian and Hindi and Urdu of Hindustani/Hindi-Urdu). On the other hand I'd like to point out that there are several "Malay languages" that should not be merged (do we have any entries?). --JorisvS 12:43, 30 September 2011 (UTC)Reply

Category:Banjarese language comes to my mind. But yeah, Standard Malay and Standard Indonesian are virtually the same language, so there isn't really a need to have them separately. -- Liliana • 11:54, 1 October 2011 (UTC)Reply

I think the way we handle Spanish would be appropriate in a lot of these cases, one unified language header and then context tags for meanings which are distinct to a region. Spanish has regions which conjugate differently, pronounce differently, use significantly different vocabulary, but we manage to represent all of these things without too much confusion. Obviously I don't know enough about the particular languages brought up here, but we do have a model for how we can make this work. - [The]DaveRoss 12:31, 1 October 2011 (UTC)Reply

Non-idiomatic translations

Latest comment: 13 years ago7 comments3 people in discussion

As a side-thought to the Idiomatic translations section above, what is the preferred method of handling translations out of English that are non-idiomatic phrases in the target language? I'm thinking now of (deprecated template usage) disarm, where the Japanese translation of the intransitive sense "to lay down arms" could be 武器を捨てる, which is redlinked here as it should be, and which is not included in any other dictionary due to the same SOP restriction we have here. That said, is it kosher in translation tables to only use the {{t}} template for parts of a translated phrase?

Instead of:

{{t|ja|武器を捨てる|tr=ぶきをすてる, buki o suteru|sc=Jpan}}

should we have the following?

{{t|ja|武器|tr=ぶき, buki|sc=Jpan}}{{t|ja|を|tr=o|sc=Jpan}}{{t|ja|捨てる|tr=すてる, suteru|sc=Jpan}}

This is so incredibly ugly and unwieldy that I'm pretty sure it's not the way to go, but that brings me right back to the question -- does anyone have advice on how to input non-idiomatic phrases as translations of English terms? -- Eiríkr Útlendi | Tala við mig 20:26, 22 September 2011 (UTC)Reply

Yeah, I've wondered about that, too. Probably it's better to just use {{l}}/{{onym}} for those:

* Japanese: {{onym|ja||[[武器]][[を]][[捨てる]]|tr=ぶきをすてる, buki o suteru}}

Japanese: Template:onym

(It's not ideal — {{onym}}, unlike the {{t}} family, italicizes its transliterations — but I think we've more than used up the sensibly mnemonic t template-names. What are we gonna create, a {{t:}}?)

—Ruakh_TALK 20:41, 22 September 2011 (UTC)Reply

Cool, thanks for the feedback. I think I'll use more manual formatting, since {{onym}} is apparently deprecated and since it italicizes the kana. I thought about adding a lang or sc param, but it looks like these aren't implemented for {{onym}}, so there you go. So would the below be acceptable?

* Japanese: {{Jpan|武器を捨てる}} ({{Jpan|ぶきをすてる}}, buki o suteru)

Japanese: Template:Jpan (Template:Jpan, buki o suteru)

It looks like the font used in translation tables is smaller, so maybe I shouldn't use {{Jpan}} either? -- Eiríkr Útlendi | Tala við mig 21:26, 22 September 2011 (UTC)Reply

I don't think {{onym}} is deprecated. I don't know what you mean by "it looks like [lang and sc params] aren't implemented for {{onym}}"; but manual formatting seems just fine to me. (As does using {{Jpan}}.) —Ruakh_TALK 22:13, 22 September 2011 (UTC)Reply

Cheers, thanks. {{onym}} is marked RFDO, which I discovered when I looked at the template page itself to figure out about params. Some templates like {{term}} have a lang param for specifying a language, and the template handles formatting differently in some cases for specific languages, like using a slightly bigger font and no italics for Japanese. The sc param shows up in some other templates as a way to specify a certain script, again so the template can select an appropriate font size and style. {{onym}} doesn't change its Japanese output when I add either lang=ja or sc=Jpan. Japanese being the odd duck that it is, typographically speaking, I may have run across some of these wrinkles more than folks working with European languages. :) -- Ta, Eiríkr Útlendi | Tala við mig 22:33, 22 September 2011 (UTC)Reply

{{onym}} doesn't have lang=ja because it just has ja, as in the example wiki-text above; that is, {{onym|ja|foo|tr=bar}} is like {{term|foo|lang=ja|tr=bar}}. It does support sc=Jpan, theoretically, but Jpan is already the default script for Japanese, so {{onym|ja|foo|tr=bar}} implies {{onym|ja|foo|sc=Jpan|tr=bar}} anyway. Neither version applies the script template to the transliteration, though. (And {{t}} doesn't, either.) —Ruakh_TALK 03:34, 23 September 2011 (UTC)Reply

Why not create a {{t-SOP}}? Matthias Buchmeier 09:09, 23 September 2011 (UTC)Reply

Serbo/Croatian

Latest comment: 13 years ago1 comment1 person in discussion

There is NO Serbo/Croatian language. These are two languages: Srbian and Croatian. So please treat them in that way.

Why? —CodeCa t 09:33, 25 September 2011 (UTC)Reply

re-e... or ree...

Latest comment: 13 years ago3 comments3 people in discussion

I'm currently adding some Italian words that start rie... - they mostly translate as English words starting ree... - I never know whether to use ree... or re-e... for the English. (See (deprecated template usage) riesposizione as an example. We seem to use both forms (sometimes as alternative forms). Is there any sort of rule? SemperBlotto 10:00, 25 September 2011 (UTC)Reply

In my experience, both forms usually exist. Some writers don't like the fact that it looks like a single ee vowel. Same thing as with (deprecated template usage) cooperate. Equinox ◑ 10:10, 25 September 2011 (UTC)Reply

The New Yorker would doubtless use reë.... Older works, too, for older words. So older words' reë... versions are (most of them) probably attested.—msh210℠ (talk) 15:15, 25 September 2011 (UTC)Reply

Target audience

Latest comment: 13 years ago10 comments6 people in discussion

In reaction to comments such as Gtroy (talk • contribs) on WT:RFD#Cyrillic script saying "keep both are really common, a new speaker or child would find it useful." This calls into question our target audience. WT:CFI#Idiomaticity alludes to the same thing in saying "An expression is “idiomatic” if its full meaning cannot be easily derived from the meaning of its separate components." How easy it is to derive the full meaning depends on how good you are at deriving meanings. What level should we aim at? Educated adults, children, non-native speakers? The problem I have with aiming for non-native speakers (and above, naturally) is how far do we want to go to accommodate very weak English speakers. Someone without much experience in English may find I laughed like a child difficult to understand, but I doubt we would include that. Reading WT:FEED, many of the users seem to be unable to spell even very basic English words right, so maybe dumbing down is a good option. Mglovesfun (talk) 20:51, 25 September 2011 (UTC)Reply

I don't think it harms us any to aim at the lower threshold. First, who is likely to be looking up something like target audience in a dictionary unless they don't know what it means and need help figuring it out? Second, what possible harm could come from having the entry? So long as we guard against becoming shills for people who are trying to market a product or make Urban Dictionary style humorous coinages, I think we should be very liberal about allowing combinations like that. bd2412 T 23:27, 25 September 2011 (UTC)Reply

I actually doubt that [[Cyrillic script]] would be useful to a child or a new speaker, because neither group would know that anyone might consider "Cyrillic script" to be a single unit to look up. I think it's more likely to be useful to an adolescent or adult native speaker who already recognizes that "Cyrillic script" is a common phrase, and wants more information about it, but doesn't quite understand how to use a dictionary, or doesn't quite understand the difference between a dictionary and an encyclopedia. (It could be useful to translators, who would likely prefer to find the usual translation for "Cyrillic script" rather than have to assemble the translations for the relevant senses of "Cyrillic" and "script", since — for example — their target language might use different words for alphabetic scripts as for other kinds of scripts.) —Ruakh_TALK 23:43, 25 September 2011 (UTC)Reply

I think it's not bad, but it can be dangerous. Wiktionary is already very big, and that means a lot of pages will be created and only visited once or twice after that, if that at all. There is very little opportunity for review, which could mean a very obscure term might have a bad entry for a very long time. Wiktionary is not paper, but its users aren't omnipresent either... —CodeCa t 23:46, 25 September 2011 (UTC)Reply

The amount of attention we give to such phrases in RFV/RFD discussions suggests that they can be adequately spotted and brought up to our standards. bd2412 T 23:51, 25 September 2011 (UTC)Reply

That assumes that someone first RFDs or RFVs them, which might take a long time for an entry hardly anyone ever needs. —CodeCa t 23:55, 25 September 2011 (UTC)Reply

That could be said of an enormous number of our entries. Consider all the conjugations of verbs, some into forms that are almost never used or called for. bd2412 T 01:32, 26 September 2011 (UTC)Reply

That's true, but I'm wondering if we should make it even worse. Verb form entries are usually bot generated, so they are correct as long as the bot was written correctly. —CodeCa t 10:32, 26 September 2011 (UTC)Reply

The Simple English Wiktionary is small, but it is specifically aimed at people with lower levels of English proficiency.--Brett 11:39, 26 September 2011 (UTC)Reply

"Wiktionary is very big" ... really? I would say Wiktionary is very small, compared to the size required to adequately cover its subject matter. The scope of Wiktionary is not "all words in all languages as long as the word will get more than one or two hits from direct search. It is important to remember that only a part (and a small part I would guess) of the usage of Wiktionary's information is in direct search. Indirect search (onelook, Google, Yahoo etc.) and the many places Wiktionary has been parsed and culled for particular relationships probably get a lot more attention, and we have no idea what those users really want or need. - [The]DaveRoss 13:39, 27 September 2011 (UTC)Reply

Completing the projects of User:Robert Ullmann

Latest comment: 13 years ago12 comments7 people in discussion

I can't think of a better way to honor Robert Ullmann's work on this project than to complete the many unfinished projects that he initiated. Many of us have begun projects for the introduction of substantial bodies of material into this dictionary, and any one of us could die with our work unfinished. I'd like to think that if that happens to one of us, the rest will pick up that work and carry it to its completion. Robert's work is epitomized in his many subpages - Robert Ullmann subpages, more Robert Ullmann subpages. Of these, I think the projects where we as a community can really pull through are User:Robert Ullmann/Missing, User:Robert Ullmann/Oldest redlinks, and the various pages showing use of foreign language words in news articles. Let's do this. Cheers! bd2412 T 23:47, 25 September 2011 (UTC)Reply

I would revert his user page to this revision, from which some of the projects including "Oldest redlinks" and "Missing" are accessible. The text "Robert Ullmann passed away on March 19, 2011 in Massachusetts General Hospital in Boston at an age of 50 years. [1]", which is currently the only contents of his user page, could be placed into an infobox at the top of his original page. --Dan Polansky 06:25, 26 September 2011 (UTC)Reply

Some user lead projects (including my own) are pretty massive, and may not be finished for years, maybe 10 years or more. Would be nice to continue to make progress however. Mglovesfun (talk) 06:49, 26 September 2011 (UTC)Reply

I think this is a good idea. An interesting thing about your two primary examples is that neither one will ever be finished, unless we somehow finish the entire project. Are you proposing that we finish the lists that Robert generated or ought we update those lists from the most current dump and progress from there? Also I think reverting his userpage and putting a note at the top is appropriate. - [The]DaveRoss 11:26, 26 September 2011 (UTC)Reply

OK. I shall have a go at the Italian ones. Also, since I'm probably the most aged regular contributor, I shall try to document my future projects in case I also fall off my perch in the next few years. SemperBlotto 11:37, 26 September 2011 (UTC)Reply

While there will always be "oldest" redlinks and missing pages, my thought was to prioritize the last lists that Robert generated. Of course, the dynamic nature of language insures that we'll never "finish" the project, but we will get it much closer to the leading edge of being a complete lexicon. I also agree with restoring a working version of his userpage, with the current note retained on it. Cheers! bd2412 T 12:51, 26 September 2011 (UTC)Reply

We could also add Category:Tbot entries to the list of Robert Ullmann projects. --Mglovesfun (talk) 16:47, 26 September 2011 (UTC)Reply

It is funny you should say that, because I got an e-mail from WF yesterday urging me to propose a project to continue Blotto's work after his death. This seemed like an obvious troll to get me to upset everyone by mentioning the death of somebody present, and anyway I hate community huggy-kissy stuff, so I let it go. And then within 12 hours I read SB talking about falling off his perch! My main concern is whether falling off one's perch is an idiom and whether we should have an entry. And secondarily who is going to learn Italian and fix all the vandalism. (And seriously: yes, RU had some good stuff and it would be a pity for nobody to continue with it.) Equinox ◑ 20:43, 26 September 2011 (UTC)Reply

I restored the userpage content as some have suggested, and I also thought it was a good idea. Also, the link to the obituary is broken. —Internoob 18:18, 26 September 2011 (UTC)Reply

I have added a working obit link. bd2412 T 17:44, 27 September 2011 (UTC)Reply

Thanks. One more, more informative, obituary: http://www.obitsforlife.com/obituary/344565/Ullmann-Robert.php. See also W:Wikipedia:Deceased_Wikipedians#Robert_Ullmann_.28Robert_Ullmann.29. An image such as File:Nuvola grave.png or File:Nuvola grave with cross2.png could be placed to the box on his user page, as {{notice|image=Nuvola grave with cross2.png|...}}. (I cannot edit his user page.) --Dan Polansky 07:18, 30 September 2011 (UTC)Reply

Thanks, I've switched out the obit link and made it a running text link instead of a note. I think it looks nicer and is easier to find. I've also chipped away a tiny bit at his missing entries pages. Cheers! bd2412 T 03:29, 14 October 2011 (UTC)Reply

Adjective+noun entries.

Latest comment: 13 years ago14 comments9 people in discussion

There are many adjectives that only have a particular meaning when modifying a noun with a particular sense (or a hyponym thereof). For example, there is a sense of (deprecated template usage) prime that applies only to natural numbers, so you get phrase like "prime integer", "prime and composite numbers", "numbers that are prime", and so on, but in particular you get the phrase "prime number", which passed RFD. Other similar cases — adjective-noun pairs that are SOP, but only because the adjective has a noun-specific sense — include "vintage car", "active volcano", "acute/obtuse/right/reflex/round angle", "exploitative competition", "oblique leaf", "Cyrillic alphabet/script", and others. (Actually, some of the "angle" ones are debatable; no one produced any evidence that "round", for example, is used to mean "360°" outside the one phrase "round angle".) All of these entries were created, but all came to RFD, and few were kept. (A few still appear at RFD.) All of the discussions were fairly ad hoc; although various arguments were presented for keeping or deleting specific entries, no really general principles were proposed, and the result of a given discussion generally seems to have depended largely on who participated in the discussion.

I'd like to raise this question more generally. I would posit that no single editor agrees with the result of every single one of the above discussions. Which ones do you agree or disagree with, and why? Should we have kept all of them? Deleted all of them? What criteria should we have applied? Should we strive for some sort of consistency on this, or are ad hoc discussions the way to go?

—Ruakh_TALK 00:39, 26 September 2011 (UTC)Reply

I think we should generally keep these, as they may be useful to someone looking up the term who does not know, for example, which meaning of "acute" and which meaning of "angle" are likely to be implied by reference to an "acute angle". This is a voluntary project; if some Wiktionarians want to make such entries, and the meanings provided can be backed up with sources (most of the above are clearly in widespread use), then others who don't care for them should not spend time making them, and should instead focus on adding the many words still missing from the lexicon. bd2412 T 01:37, 26 September 2011 (UTC)Reply
If "number that is prime", "a number is prime", "prime, large number", "prime large number", "very prime number", "more prime a number", or "prime integer" exists, that's enough IMO to say the adjective is separable enough from the noun to delete the "prime number" entry. People shouldn't see the phrase's entry and think the adjective's tied to the noun. However, if the above phrases, and others like them, don't exist, and the only variant on "prime number" that exists is "prime or composite number" or "large, prime number" (with a comma), then I don't know (but am tending at the moment to want to keep in the former case, as the "or" doesn't really break up the phrasiness, and delete in the latter, as the "large" shows that "prime" is just an adjective). If, on the third hand, the only attested variant on "prime number" is "prime effin' number" or "large prime number" (no comma), then I'd say keep "prime number".—msh210℠ (talk) 03:50, 26 September 2011 (UTC)Reply
I am in favour of these (or the very great majority) being allowed (as some of our users will expect them to be here). But, as bd2412 says, we shouldn't go out of our way to create them. — This unsigned comment was added by SemperBlotto (talk • contribs) at 03:41, 26 September 2011 (UTC).Reply
I agree with msh210 on this. DCDuring TALK 11:00, 26 September 2011 (UTC)Reply

msh210 has more or less nailed it. I am thoroughly in favour of specialist adjective definitions like "(of a [noun class]) [its meaning in that context]". A pet hate of mine is (deprecated template usage) palindromic prime. Equinox ◑ 20:33, 26 September 2011 (UTC)Reply

IMHO "palindromic prime" could be deleted as sum of parts, and is unlike "prime number", in that the meaning of "palindromic" used in the phrase is not specific to primes. In fact, the meaning of "palindromic" relates to strings over an alphabet. Thus, a number (prime or not) can be palindromic only with respect to a particular system of encoding, such as decadic, binary, or using Roman numerals. --Dan Polansky 08:22, 28 September 2011 (UTC)Reply

If the adjective has a noun-specific sense, this is a very strong clue that the adjective + noun phrase is a set phrase belonging to the vocabulary of English. But this is only a clue, this should not be the criterion. It seems obvious to me that prime number belongs to the vocabulary of the English language, that this is a mathematical term (while blue bicycle doesn't belong to the vocabulary of English), and this is the reason why prime number must be included. I also agree with Equinox, but this is not a reason to exclude prime number. Lmaltier 20:45, 26 September 2011 (UTC)Reply

How do you feel about entries for prime integer, prime member (of a set), etc.? These are equally natural for mathematicians. I don't know whether they are "set phrases" in English but it's hard to see how they are different from "prime number" since integer and member both refer to numbers. Equinox ◑ 20:50, 26 September 2011 (UTC)Reply

Are they really equally natural for mathematicians? This was not my feeling, because it's not the case in French (nombre premier must be considered as a word, not entier premier). I feel that prime member is built by the brain when needed (prime + member) while prime number is already available as a whole in the brain, and this is the reason why this is a word. Lmaltier 05:25, 27 September 2011 (UTC)Reply

I suspect that you're right, but I think it would be nice to have somewhat more objective criteria, no? —Ruakh_TALK 11:51, 27 September 2011 (UTC)Reply

In many cases, it's obvious from one's intimate knowledge of the language (reasoning doesn't help). For specialized terms, it may be less obvious if you are not a specialist. In such cases, I think that we must trust specialists and that, whenever the phrase is defined in a specialized lexicon, it must also be accepted here: if specialists find that a definition is needed, a definition is needed here too. Note that my Pocket Oxford Dictionary (printed in 1972) defines prime number... Lmaltier 16:44, 27 September 2011 (UTC)Reply

My tentative principle is this: If a phrase "<adjective> <noun>" is such that (a) the meaning of <adjective> used in that phrase is specific to things referred to by <noun>, and (b) "<adjective> <noun>" is much more often used written together as a phrase rather than separately as in "<noun> is <adjective>", then (c) we should have an entry for "<adjective> <noun>", regardless of (d) there being a suitable definition in the <adjective> entry that makes "<adjective> <noun>" a sum of parts. Examples include algebraic number, algebraic integer, bound variable, cardinal number, complex number, free variable, imaginary number, rational number, real number, transcendental number, free software, open set, closed set, complete graph, normal distribution; see also talk:free variable. I am not sure I require (b) to hold; (a) is the crucial part of the condition of the principle. As to the rationale, I tend to store such terms under "<adjective> <noun>" in my mind, and I estimate this is also the headword under which people tend to look these things up. In German, I store "vorstellen" under "vorstellen" in spite of its also ocurring in the separate position as in "stell dich noch mal vor". Thus, I deem this approach convenient for the users of the dictionary. --Dan Polansky 08:45, 28 September 2011 (UTC)Reply

I think this is a very necessary discussion, but the problem is that I'm not sure there are really objective criteria that can be brought to bear. A lot of this is to do with subjective feelings from native speakers about the extent to which a given phrase ‘feels’ like a set unit. The tests that Msh210 mentions are definitely suggestive but not, I think, definitive. That is why the RFD discussions are a good way of settling it and why they will probably always be needed. Personally I find some terms are semantically transparent but still feel like individual lexical units (like the late lamented (deprecated template usage) downloadable content), whereas other terms apparently meet our CFI but to me do not appear idiomatic or natural at all (like (deprecated template usage) Egyptian pyramid). Another point I want to make is about the usefulness of these entries, which is often called into question. The point of them is not to answer the question ‘what does XY mean?’ but rather ‘do native speakers of this language actually use the term XY?’. A good dictionary should be able to say: yes, and here are citations proving it, and preferably some indication of when it was first used. Ƿidsiþ 06:44, 30 September 2011 (UTC)Reply

A small idea for formatting discussions

Latest comment: 13 years ago6 comments4 people in discussion

I noticed some people use bullets with * instead of indenting the text with : . I think this is a lot clearer because you can easily see when the next message begins, even if they both have the same indenting level. Do you think we could make this general practice on Wiktionary, maybe? —CodeCa t 11:59, 26 September 2011 (UTC)Reply

This would get confusing if somebody actually posted a bulleted list. Equinox ◑ 12:28, 26 September 2011 (UTC)Reply

True, but sometimes people post blockquotes also:

Four score and seven years ago our fathers brought forth on this continent, a new nation, conceived in Liberty, and dedicated to the proposition that all men are created equal.

I'm not sure bullets are worse in that regard.—msh210℠ (talk) 16:41, 26 September 2011 (UTC)Reply

Okay, to rephrase: I don't think the current discussion formatting is perfect, but I think we need to distinguish, codewise, "semantic" bullets (intended to serve as bullets) from "pragmatic" bullets (just happen to be useful). Cf. the difference between BOLD and STRONG elements in HTML. Equinox ◑ 19:26, 26 September 2011 (UTC)Reply

Maybe a special template could be used instead of a bullet or indending, when used in discussions? Something like {{*}}? —CodeCa t 20:28, 26 September 2011 (UTC)Reply

Actually, I have a better idea. Look at my monobook css and js files and you'll find a bunch of code which colors and indents talk pages to make them much more readable. It should be a cinch to make this work with pages such as the BP. -- Liliana • 20:34, 26 September 2011 (UTC)Reply

JA translations suddenly all borked

Latest comment: 13 years ago6 comments2 people in discussion

I just noticed that Japanese translations are very badly borked, suggesting that someone this morning (Pacific Time in the US) has either inadvertently broken or vandalized something somewhere. For reference, have a look at move#Translations, fill#Translations, and anything else that has Japanese included in the translation table.

Does anyone know where to look for sudden changes? Neither {{trans-top}} nor {{t}} have been changed today, and I'm not sure where else to search for such mistakes or foul play. -- Eiríkr Útlendi | Tala við mig 17:57, 27 September 2011 (UTC)Reply

I just noticed that the translation table headers are normal when the page is first loading, and only go funny when the translations JavaScript gets applied. Alternately, loading the page with JS disabled shows things as they should appear. I have no idea who maintains this script -- who should we talk to about this? -- Eiríkr Útlendi | Tala við mig 18:01, 27 September 2011 (UTC)Reply

I can't reproduce that. What sort of b0rkage do you see? Does a "hard refresh" (Ctrl+F5) help? —Ruakh_TALK 18:21, 27 September 2011 (UTC)Reply

Sorry, I should have been more specific. The header lines of translation tables now include the Japanese entry as well. So for (deprecated template usage) move, the header for the first translation table should be:

to change place or posture; to go

Instead, the Japanese gets munged into the header, producing:

to change place or posture; to go - Japanese: 動く ^(ja) (うごく, ugoku)

Note that the Japanese still appears within the table where expected. Hard-refreshing doesn't help, and this issue is present on pages I've never looked up before (and thus wouldn't be in my cache), like at procrastinate#Translations.

In addition, attempting to use the assistance JS dialogs to add a translation directly would fail out earlier this morning when attempting to add Japanese, giving an error message, but this seems to be working now. Is someone editing these scripts? That's certainly what it looks like... -- Eiríkr Útlendi | Tala við mig 18:43, 27 September 2011 (UTC)Reply

I think that's not a bug, but a feature: do you see the little "Select targeted languages" link inside each translations-table? —Ruakh_TALK 18:56, 27 September 2011 (UTC)Reply

Now I feel stupid. :) In my defense, I only clicked that because the translation edit assist was acting strangely on the (deprecated template usage) full page, where hitting the Preview translations button would throw up errors about "Japanese translation [term] not found" (of course it's not found, I'm trying to add it...). I figured the two behaviors were linked, but it seems the only link was myself. :-/

FWIW, the translation edit assist seems to be working now, so that's all good. -- Eiríkr Útlendi | Tala við mig 21:23, 27 September 2011 (UTC)Reply

{nonstandard, rare} form of

Latest comment: 13 years ago3 comments3 people in discussion

Could somebody please make these templates:

Lua error in Module:form_of/templates at line 132: Parameter 1 is required.

Lua error in Module:form_of/templates at line 132: Parameter 1 is required. --Pilcrow 01:16, 28 September 2011 (UTC)Reply

Moved from WT:ID Jamesjiao → ^{T ◊ C} 01:20, 28 September 2011 (UTC)Reply

A good tip is to find a template that does a similar job, copy its contents and adapt it. So {{nonstandard spelling of}} might be a good choice here. Mglovesfun (talk) 19:57, 29 September 2011 (UTC)Reply

Thames河

Latest comment: 13 years ago47 comments11 people in discussion

Why is it deleted? Cite: Google Books 2.25.211.161 13:08, 28 September 2011 (UTC)Reply

It seems attestable to me, albeit rare. But since I don't know Mandarin, I'd like to have a second opinion. -- Liliana • 13:17, 28 September 2011 (UTC)Reply

For your reference: Nouns 2.5 - Personal and place names not in the Chinese Han language 2.25.211.161 16:05, 28 September 2011 (UTC)Reply

See Talk:Ampere定律 and Special:WhatLinksHere/Talk:Ampere定律. Mixed-script entries are deleted if they aren't cited. On the other hand, they are allowed, if cited. - -sche (discuss) 18:33, 28 September 2011 (UTC)Reply

If you want to learn Chinese, learn the proper script. Otherwise, don't bother. Don't go around and tell others that their script is shit and advocate that they should use blah blah blah instead, like these people. 60.240.101.246 23:20, 28 September 2011 (UTC)Reply

I deleted it. Use 泰晤士河 or 泰晤士. Proper names are transliterated into Mandarin using Chinese characters, most foreign names in Roman letetrs can be attested in a Mandarin because not everyone knows how to write it in Hanzi or can't be bothered. --Anatoli 00:07, 29 September 2011 (UTC)Reply

To anon 2.25.211.161. Don't go creating English words with the Mandarin heading, this is wrong. --Anatoli 00:12, 29 September 2011 (UTC)Reply

Discussion continued at Wiktionary:Requests_for_deletion#Thames.E6.B2.B3. - -sche (discuss) 02:51, 29 September 2011 (UTC)Reply

We have a more serious matter at hand here than just this entry. We should not allow Chinglish to be spread, no matter if it's attestable or not. Chinese often use foreign words in a Chinese context (not to be confused with mixed script words) but I repeat, these words don't become Chinese if they are used in a Chinese context. --Anatoli 04:23, 29 September 2011 (UTC)Reply

Could someone help me to set up a vote on this (Banning foreign proper nouns as Mandarin). --Anatoli 05:21, 29 September 2011 (UTC)Reply

I didn't realize Engirst had already created a thread here. Anyway, I have commented on the talk page on Wiktionary_talk:About_Sinitic_languages. — This unsigned comment was added by Jamesjiao (talk • contribs).

A general comment: we should not create Chinese sections only for pure Chinese words, but for all words used in Chinese (and not only mentioned, this is very important). This rule applies to all languages. Take a word such as autoroute: can this word really be considered as an English word? Yet, it deserves an English section. Creating a section for a language does not mean that the word fully and naturally belongs to the language, only that it is used in the language. Lmaltier 05:34, 29 September 2011 (UTC)Reply

What you're offering for Chinese is quite dangerous. In fact, any foreign proper name can be used in Mandarin in the Roman script. Your example is a borrowing, which can happen in any language. Thames河 is not a borrowing, it's not common and is just an example of a person not willing to write this word in Chinese. As much as many users want to see Chinese switch to Roman, this is not happening and we shouldn't promote it. --Anatoli 05:43, 29 September 2011 (UTC)Reply

Actually, it's also a borrowing (most such proper nouns are borrowings). What you contest is only the way it's written. We should not promote anything, only describe the language as we find it is written, with appropriate comments and explanations (uncommon, etc.) Lmaltier 06:03, 29 September 2011 (UTC)Reply

If a word is normally written in one script and transcription, and someone uses that word from its original script, I don't think that counts as borrowing into the main language of the text. If I decide to use 这个单词 instead of "this word", that doesn't turn 这个单词 into English. This is how most (all?) alphabetically spelled words are used in Chinese and Japanese -- specifically as foreign words. Sure, they're being used in a Chinese or Japanese context, but that doesn't make these words Chinese or Japanese. -- Eiríkr Útlendi | Tala við mig 17:39, 29 September 2011 (UTC)Reply

The difference is that English speakers are generally not expected to be able to read Chinese characters. I imagine that Chinese speakers on the other hand have at least some understanding of the Latin script used to write English. The same kind of mixing of scripts is done elsewhere too, like in Russian, Greek or Arabic. Just as a language such as English is often expected to be known to non-native speakers, in the same way the Latin script name 'Thames' may be expected to be known in China, while the reverse is not true. —CodeCa t 18:18, 29 September 2011 (UTC)Reply

Yet is there anything intrinsically Chinese about using (deprecated template usage) Thames in a Chinese text? If an alphabetically written word used in Chinese contexts takes on a specifically Chinese meaning, then I would be open to the idea of categorizing it as Chinese. If it never has anything but its original meaning from the source language, such as when it is only ever used as a disambig, then no, I would say that it is still decidedly not Chinese, in part as the main reason it's being used is precisely *because* it's not Chinese.

And as a side note, the times that I've seen alphabetic text used in Japanese (the non-English language that I read the most), it is again used precisely because it is not Japanese. In cases such as placenames, the non-Japanese rendering is given generally in parentheses, and is provided not necessarily because the expected audience should know it, but more to clarify the original spelling should a reader want to look into things more, such as here or here. -- Eiríkr Útlendi | Tala við mig 19:46, 29 September 2011 (UTC)Reply

that doesn't make these words Chinese or Japanese: nobody thinks that this makes them Chinese words. Nonetheless, if they are used in Chinese, a Chinese section is useful. highway is not a French word, but a French section would be helpful nonetheless (for sense, gender, pronunciation, etc.), because it's used in French (as a foreign word, but used nonetheless). In the case of a foreign word such as psychanalyste mentioned in a sentence such as The French word for psychanalyst is psychanalyste., it's very different, the word is not used in the sentence. Lmaltier 19:25, 29 September 2011 (UTC)Reply

If there is no argument about whether these words are Chinese or Japanese, then what is the argument? Foreign words belong under their respective headings. (deprecated template usage) vis-à-vis is listed as English because it's been accepted into the English language, and is used enough in purely English contexts that its meaning is diverging from the French meaning over time. Likewise with terms like (deprecated template usage) al fresco or (deprecated template usage) honcho -- they came into English as foreign terms, but have since taken on specifically English senses. I would strongly argue that (deprecated template usage) Thames has no such specifically Chinese sense -- and, thus, does not belong under a Chinese header. -- Eiríkr Útlendi | Tala við mig 19:46, 29 September 2011 (UTC)Reply

Acceptance of a word in a language is something subjective. Use of a word in a language is something objective. The meaning is irrelevant. Lmaltier 05:29, 30 September 2011 (UTC)Reply

If we don't stop this madness, Mandarin Wiktionary space will be full of - (river name)河, (city name)市, (disease name)病, (mountain name)山, (island name)岛, etc! They are attestable all right but they are not Mandarin. Then any serious person will doubt the quality of this dictionary. --Anatoli 23:07, 29 September 2011 (UTC)Reply

?? People read pages of interest to them, not other ones. It's better to keep simple principles and to apply them consistently. See KISS principles. Lmaltier 05:29, 30 September 2011 (UTC)Reply

Mandarin has become the biggest language on the internet. A person with a pinyinisation agenda will dig out a couple of quotes out of thousands see 泰晤士河 in Google Books [4] just to prove his point and move his agenda, a passage where a place name is written in Roman letters. It doesn't prove anything. Not to me. Only that people who can't read Mandarin will be able to read that word. --Anatoli 05:51, 30 September 2011 (UTC)Reply

I want to ask you, Lmatier. You seem to be very thorough about the quality of English entries, which is only commendable, but why do you disregard the opinion of editors who are active in Chinese and who may know a lot about the language and who voiced their strong opposition to these kind of entries, created by a person known for his ignoring of Wiktionary rules? Don't you think that by encouraging this you you may jeopardise your own reputation and your opinion against violators of rules set by you will not be supported in the future? For obvious reasons, I think it's only fair to have language specific policies and allowing entries Thames河 will open the door for low quality entries. --Anatoli 06:07, 30 September 2011 (UTC)Reply

I only want to support simple, sound, easy-to-understand and consistent rules. I don't want to exclude words only because contributors think that the use of these words or of these writings should be discouraged, because it's a question of opinion (just like political opinions should not lead to the exclusion of some pages on Wikipedia). Lmaltier 17:42, 30 September 2011 (UTC)Reply

No one is arguing that (deprecated template usage) Thames or (deprecated template usage) 河 are not words, and no one is trying to exclude these words -- both are already here, as clearly indicated by the blue links. The argument instead revolves around the use of two terms in two languages in a single attempted lemma entry. So far, only one IP user seems to be a strong proponent of the view that (deprecated template usage) Thames河 constitutes an integral Chinese term. Most others have been arguing along varied lines that generally converge on the points that 1) this is a generic sum-of-parts phrase, and thus has no place in Wiktionary, and that 2) (deprecated template usage) Thames is a word in English and (deprecated template usage) 河 is a word in Chinese, and using the two together does not constitute a new Chinese term, but is instead a prime example of w:code-switching.

Any discussion of human endeavors revolves around opinion to some degree or other. The opinion at the core of this particular issue is, are SOP terms that involve code-switching valid terms for inclusion in Wiktionary? The emerging consensus is that no, such terms do not belong here.

The main holdouts from this consensus are the aforementioned IP user, and apparently yourself. The behavior of the IP user has been quite trollish and stubbornly POV from my perspective, but I confess I have less of a handle on why you (Lmaltier) seem to be contrarian about (deprecated template usage) Thames河 and similar terms. Are you of the opinion that mixed-language code-switching SOP phrases do indeed merit inclusion as lemmata here? Are you just unfamiliar with the phenomenon of code-switching? Do you have some other strong opinion pertinent to this issue that might help elucidate your position? I'm honestly curious and I do not understand your opposition to deleting terms like (deprecated template usage) Thames河. -- Cheers, Eiríkr Útlendi | Tala við mig 18:48, 30 September 2011 (UTC)Reply

I think that code-switching would not explain the number of attestations. My only reason is that no term in actual use should be deleted. Lmaltier 20:08, 30 September 2011 (UTC)Reply

There are only 4,580 google hits for google:"Thames河". Roughly 1,000 of these also include the (deprecated template usage) 泰晤士 official Mandarin spelling of (deprecated template usage) Thames (and, incidentally, also the spelling of (deprecated template usage) times, as in newspaper names), reducing our pool to only 3,500 at google:"Thames河"+-"泰晤士", and this is before weeding through to exclude sources that don't meet WT:CFI. Compared to the 892,000 google hits at google:"泰晤士河", it certainly looks to me like the number of attestations is actually quite small. I find only 8 hits at Google Books here, of which the first two seem to be the same book, one uses (deprecated template usage) Thames in parentheses after first using the alternate Mandarin spelling (deprecated template usage) 太晤士, one is clearly Japanese, and one offers no context or snippet at all, leaving us with only four or five books using this particular combination in a way that might meet WT:CFI. It seems to me that (deprecated template usage) Thames河 is quite rare, actually.
You haven't actually addressed my question about mixed-language code-switching SOP phrases. Do you view (deprecated template usage) Thames河 as somehow not SOP? If so, why, and by what reasoning? -- Eiríkr Útlendi | Tala við mig 22:39, 30 September 2011 (UTC)Reply

There is no only one standard for Chinese language. Chinese is not only for Mainland China, but for Taiwan, Hong Kong, Macau, Singapore and overseas. Such as President Bush is written as 布什, 布殊 and Bush as well. 2.25.212.4 12:59, 30 September 2011 (UTC)Reply

You already made this point at Wiktionary:Requests_for_deletion#Thames.E6.B2.B3. As I already stated there, you are being disingenuous. As others already pointed out there, Bush is not "standard" Chinese. Likewise, "Thames河" is not "standard" Chinese -- and as such, as well as for other reasons, "Thames河" does not belong here as an entry in Wiktionary. -- Eiríkr Útlendi | Tala við mig 18:53, 30 September 2011 (UTC)Reply

What is the difference between a standard term and a non-standard term? Inclusion in existing dictionaries? And why do you think that only "standard" terms should be included? Lmaltier 20:08, 30 September 2011 (UTC)Reply

I interpreted Engirst's argument as being that Bush is standard Chinese, under a particular standard for Chinese. That standard seems to be that any word in any script that is used in a Chinese sentence merits inclusion as a Chinese word -- a stance that I categorically refute. Using (deprecated template usage) Bush in a Chinese context does not make it a Chinese word any more than using (deprecated template usage) Москва or (deprecated template usage) natsukashii in an English context makes these English words. -- Eiríkr Útlendi | Tala við mig 22:39, 30 September 2011 (UTC)Reply

Also note that mixed script terms do exist, even in English, e.g. α-particle. Lmaltier 20:36, 30 September 2011 (UTC)Reply

And these are clear examples of non-SOP terms -- there is no way of deriving the meaning of (deprecated template usage) α-particle etc. purely from its constituent parts. Meanwhile, (deprecated template usage) Thames河 is practically the definition of an SOP term -- the meaning is baldly plain to anyone who knows that (deprecated template usage) Thames and (deprecated template usage) 河 both mean. The script alone is not part of my argument.

To clarify, there are two issues here that I am arguing, both of which are against inclusion of (deprecated template usage) Thames河:

(deprecated template usage) Thames河 is an SOP phrase, and as an SOP phrase, specifically as an SOP phrase with no special idiomatic meanings, it has no place here in Wiktionary. This applies equally to other non-idiomatic SOP phrases like (deprecated template usage) blue necktie, (deprecated template usage) fresh apple, or (deprecated template usage) 赤い花, where the meaning is plain from the meanings of the constituent parts.
(deprecated template usage) Thames as it is used in (deprecated template usage) Thames河 has no specifically Chinese meaning -- it is being used as an English term, and therefore it is not a Chinese term, and thus should not be treated as a Chinese term. -- Eiríkr Útlendi | Tala við mig 22:39, 30 September 2011 (UTC)Reply

Lmaltier, this IP user (2.25.212.4) is abc123, Engirst and his multiple IP addresses he is automatically generating, avoiding all the blocks, including range blocks. He doesn't deserve to be here and was blocked multiple times by multiple administrators. The only reason he is still here is because we can't do much about him and we don't want to stop anonymous users from contributing here because of one. You are doing a disservice to by supporting his crazy ideas of pinyinisation of Chinese. He has dug a few examples of code-switching or Chinglish, which are not typical and represent nothing. Chinese people write 泰晤士河, not Thames河, which can be easily proven by checking the internet. Non-standard terms could be included if they are typical, Thames河 is not typical at all. Mandarin also has mixed script terms and they are already included here. Place names are always written in native scripts in any language. You can find all weird things on the internet if you try hard or have an agenda, that's what Engirst is doing. His last block is expired today, so he has reappeared as Engirst. --Anatoli 21:00, 30 September 2011 (UTC)Reply

I don't know anything about Chinese, it's difficult to argue, I just try to understand. Let me take other examples.

I know the author of a very good, prize-winning, thesis in mineralogy. Yet, she consistently writes Abkhazia instead of the French term Abkhazie because she was unaware of this French word. Does this make Abkhazia a word worth inclusion in a French section? Certainly not, because this use was a mistake due to ignorance.
Now assume that she was referring instead to a Chinese province, using the Chinese characters. It could be called code-switching. The case is closer, but this assumption is absurd, because nobody would do that (because almost no reader of the thesis would understand the Chinese characters in a French text).
alpha particle is used, and α-particle too, because English readers are expected to understand the α character. Is this the same kind of case?

My feeling is that the case under discussion is somewhere between the 2nd and the 3rd example. Am I right?

I also feel that this writing is used by some people because 1. Chinese people might read the name of English rivers more often in English texts than in Chinese texts (??). 2. Most Chinese are expected to recognize English letters 3. This writing of foreign proper nouns is felt by some people as less ambiguous than a transcription, because closer to the original word. This is probably truer with tiny unknown rivers or unknown people when you want to refer to them in your language. When you use the same alphabet, it's not shocking to use the original word (it's even systematic), it's more shocking when you don't use the same writing system. If there is no word in the language but there is a clear transliteration system, then this system is used (e.g. in Russian), but it's not the case in Chinese. How would you translate the ru de Marivel (a very tiny French river not visible any more) to Chinese without using the Roman script?

(true or not, I don't know:) This way of writing foreign proper nouns is uncommon, but might become less and less uncommon, and this is considered as very bad by people liking their writing system (and I understand them very well).

If my feelings are not wrong, then these writings should not be promoted, but there is no reason to delete these pages, provided that required (sound, helpful and correct) information is provided (e.g. explaining why the Roman script is used, explaining that this is not standard, explaining how you pronounce it in Chinese). People not liking them may simply ignore them. Providing information to people looking for these terms is better that a message no page found.

((I'm not interested at all in whom writes something here, only in what is written.)) Lmaltier 05:59, 1 October 2011 (UTC)Reply

"(true or not, I don't know:) This way of writing foreign proper nouns is uncommon, but might become less and less uncommon, and this is considered as very bad by people liking their writing system (and I understand them very well)." - It is definitely true because of globalization nowadays. Engirst 13:19, 1 October 2011 (UTC)Reply

This is not code-switching or code-mixing, it's just (un)intentional reluctance to transcribe into Chinese characters. There is no way, for example, for "Thames河" or any other proper nouns written in the Latin alphabet to appear on news from Xinhua (official press agency in PRC). This is how Xinhua handles this in news: [5] It writes the name of the Assistant Secretary of Bureau of Near Eastern Affairs of the US as "杰弗里·费尔特曼" (Jiéfúlĭ Fèiĕrtèmàn), and that of the Syrian ambassador to the US as "伊马德·穆斯塔法" (Yīmădé Mùsītăfă), without providing the original scripts (English and Arabic) or transliteration of the latter. Although I have no way of knowing the original names just from looking at these transcriptions (the first one is probably Jeffrey Felt(e)(r)man(n)), these names are in Chinese, unlike "Jeffrey Feltman" which may appear (without a transcription given) in non-official Chinese news.

As for code-mixing or code-switching, it happens all the time in Singaporean Mandarin and Hong Kong Cantonese (and Chinese spoken overseas in general; also Singlish, to a lesser extent). Code-mixing doesn't make "office" Chinese or "tahi" ("poo" in Malay) English. 60.240.101.246 10:17, 1 October 2011 (UTC)Reply

If Singdarin were written, I'd happily classify it as a pidgin or creole and record it as such. That whole section on Hong Kong Cantonese you point to is full of linguistic information that should be recorded somewhere, and is hardly English, like "yeah" meaning trendy and mouse being pronounced mau1-si2.--Prosfilaes 23:37, 1 October 2011 (UTC)Reply

I agree with you. Engirst 00:10, 2 October 2011 (UTC)Reply

The vote to ban this kind of entries is set up here. Wiktionary:Votes/2011-10/CFI for Mandarin proper nouns - banning entries not in Chinese characters. --Anatoli 01:03, 3 October 2011 (UTC)Reply

Our "friend" who avoided all blocks so far is now busy editing Chinese characters entries, wow, adding examples he dug out where proper names are written in English. He'll teach us good Mandarin spelling - so "London University" in Mandarin is "London 大學". Pinyin is not enough, now Chinese will be written half in English, half in Chinese! --Anatoli 11:10, 3 October 2011 (UTC)Reply

I have protected 英國／英国 (Yīngguó) for his "英國 London 大學地質學博士" but there are other bad edits. Can we stop this somehow? --Anatoli 11:15, 3 October 2011 (UTC)Reply

In the UK, there are a lot of spoken and written usages of mixed scripts, can you ban them? Engirst 11:22, 3 October 2011 (UTC)Reply

It's called code-switching, very common with communities living outside their homeland. People have already spent too much time repeating the same thing to you but you keep trolling. We already have London and Thames in English, they don't need a Mandarin umbrella. Everybody knows here that you here only because nobody is able to block you completely. It's YOU who needs to be banned indefinitely, troll. --Anatoli 11:31, 3 October 2011 (UTC)Reply

Above is talking about "英國 London 大學" (UCL) but not Thames河. Engirst 12:08, 3 October 2011 (UTC)Reply

The vote Wiktionary:Votes/pl-2011-10/Mixed script Mandarin entries has started. --Anatoli 11:05, 18 October 2011 (UTC)Reply

Categories and single entries with multiple indices

Latest comment: 13 years ago1 comment1 person in discussion

This issue has already been touched on above at Wiktionary:Beer_parlour#Question about cats, but I'm realizing this is a sizable issue for Japanese.

The underlying problem is that kanji (Chinese characters used in a Japanese context) only carry the meaning of a word, and can often be read using multiple pronunciations. Kanji entries in Japanese are generally indexed by their readings, so a single kanji compound may appear at several locations in a Japanese dictionary. The example at #Question about cats regarded the given name 恵美, which can be read either Emi or Megumi. Using WT's categories as-is only lists the entry under the final category call on the page, so 恵美 was only categorized under めぐみ (Megumi), when it should have been categorized under both the めぐみ and えみ (Emi) indices.

The solution -sche brought up was to create a redirect under a visually identical header, and categorize the header under the additional index. This does work, happily.

However, there is no dearth of kanji terms in Japanese that have multiple readings. 砂岩 can be read either sagan or shagan; 月食 can be read either gesshōku or gasshōku; 一度 can be read either ichido or hitotabi; 正直 can be read shōjiki, jōjiki, or seichoku; etc., etc. All of these should ideally be categorized under all readings. Manually going through and creating redirects for all entries that currently have uncategorized additional readings is not a tenable proposition.

Is anyone aware of any way of getting the categorization mechanism to allow multiple categorizations? I.e., is there any way of getting something like:

[[Category:Japanese nouns|しょうじき]]
[[Category:Japanese nouns|じょうじき]]
[[Category:Japanese nouns|せいちょく]]

to allow a single entry to show up under all the provided indices? Listing multiple cats as above only categorizes under the reading supplied in the last cat listed. Simply adding additional sorting indices as additional arguments, like [[Category:Japanese nouns|しょうじき|じょうじき|せいちょく]], only indexes under the first argument given. Is there a WikiMedia / MediaWiki dev we should contact about this? -- Curious, Eiríkr Útlendi | Tala við mig 21:51, 28 September 2011 (UTC)Reply

Translation FROM non-English language

Latest comment: 13 years ago10 comments6 people in discussion

How can I add the Norwegian version of Swedish koka soppa på en spik on that page? (koke suppe på en spiker) __meco 18:14, 29 September 2011 (UTC)Reply

In my opinion, just like any other translation. Sometimes, translations in foreign word pages are very helpful. This is why this should be allowed. Lmaltier 19:30, 29 September 2011 (UTC)Reply

If you're asking for an explanation of how to create it, I'd say copy the entire contents of koka soppa på en spik, paste them into koke suppe på en spiker, change every mention of sv to no and Swedish to Norwegian. If there's a policy question in there then I'm afraid I've missed it. Mglovesfun (talk) 19:31, 29 September 2011 (UTC)Reply

I probably misunderstood. I interpreted Norwegian version as a translation of the phrase. Lmaltier 19:35, 29 September 2011 (UTC)Reply

Wiktionary does not allow translations of non-English language terms. Instead, you can add the Norweign entry at koke suppe på en spiker and add a See also link at the Swedish page koka soppa på en spik. ---> Tooironic 23:00, 29 September 2011 (UTC)Reply

Good idea. Also, what if we use Category:English non-idiomatic translation targets? It may be another workaround for non-idiomatic translations, especially for English terms, which may not pass CFI. There are many foreign terms that are translated as single words where English uses two or more (e.g. fur coat) or idioms as above where there is no English equivalent. --Anatoli 00:22, 30 September 2011 (UTC)Reply

OK, the See also solution is probably the best for this situation where we have two phrases in non-English languages but no English equivalent. Later, when we see more instances of this happening, a more integrated solution will probably have to be devised. __meco 08:26, 30 September 2011 (UTC)Reply

Or instead, just create sv:koka soppa på en spik and add the Norwegian translation there. The target audience is only going to be Swedish or Norwegian speakers. --Rockpilot 10:03, 30 September 2011 (UTC)Reply

I don't see making this connection irrelevant for the English Wiktionary, do you really? __meco 20:21, 30 September 2011 (UTC)Reply

I do. Having said that, I'm sure we have an idiom in English that means the same as the Swedish koka soppa på en spik. --Rockpilot 12:45, 1 October 2011 (UTC)Reply

TheDaveBot wants to tidy up a bit.

Latest comment: 13 years ago9 comments6 people in discussion

I would like to use my bot account to run a new version of AutoFormat. It is entirely new, so new that it isn't finished yet. There are a couple of "modules" which are ready for testing. Since the task is already pretty well designed and approved of (this new script doesn't do anything AutoFormat wasn't doing) and the account is already a bot I am just posting this to give people an opportunity to make suggestions or complaints or requests for more information or whatever. I have done a few on-wiki tests, none in NS:0 yet, but that will proceed within the next few days I would imagine. Once I am reasonably comfortable with the test results I will go through the normal bot approval routine. The bot wont be going unsupervised before that time.

Now is also a good time for any other format related tasks to be brought up so they can be added, but unless they are non-controversial and conform to the ELE I probably won't add them. - [The]DaveRoss 21:21, 29 September 2011 (UTC)Reply

Thank you for offering to autoformat!! As for other things to do, well, [[user talk:AutoFormat]] is full of suggestions.—msh210℠ (talk) 17:41, 2 October 2011 (UTC)Reply

Support, I've been meaning to suggest that we need at least two auto-format bots running at a time as the workload is too much even for a bot! Mglovesfun (talk) 17:58, 2 October 2011 (UTC)Reply

Can the old account be transferred to Dave? It would be nice because everyone still calls it AutoFormat... —CodeCa t 14:13, 3 October 2011 (UTC)Reply

I think it would be better just to continue calling the task AutoFormat but let the account names be whatever they are. Ideally this is something which would eventually migrate to the toolserver and not be run by an individual, but I don't have the skills required to make that a reality. - [The]DaveRoss 21:44, 3 October 2011 (UTC)Reply

Any valid autoformatting should be supported by all. I've missed it lately and have a bad feeling about the likely backlog. What is the entry-processing capacity of a fully functional AF-type bot, in entries per week? Does having it on the tool-server improve capacity or effectiveness? DCDuring TALK 23:09, 3 October 2011 (UTC)Reply

Putting it on the toolserver means that it will have a much higher uptime, and be maintainable by multiple people. As far as throughput goes, it is limited by the server more than the program. I think one entry per second would be easily achievable by a single instance of the bot running, and an arbitrary number of instances of the bot can be run simultaneously. The downside is that if there are 20 edits per second happening constantly the server load would be dramatically increased. - [The]DaveRoss 01:36, 4 October 2011 (UTC)Reply

If you post the code for at least the structure of the bot then some of us could contribute actual code rather than just things for you to do. --Bequw → τ 15:59, 7 November 2011 (UTC)Reply

I can post the code eventually, at the moment I am only able to work on it sporadically as I am pretty busy. If you are interested in helping out that is cool, you really don't need to know anything about the rest of the framework, other than it is written in Java. I have made it to be very modular, so that pieces can be turned on and off or added and removed without impacting anything else. Each "task" is passed a unicode string which represents the whole contents of the page and is expected to return a unicode string which represents the modified version of the page contents. The upshot of this is that anyone can write any task and it should play nicely with the rest. Some of the remaining tasks which I haven't touched yet are etymology, context tags, categorization, synonyms (and other relational sections) and anything related to foreign language terms (conj tables etc.) If you want any more clarification than that let me know. - [The]DaveRoss 01:02, 10 November 2011 (UTC)Reply

Template:etyl and Template:proto

Latest comment: 13 years ago2 comments2 people in discussion

{{etyl|ang}} (deprecated template usage) [etyl] Old English links to Wikipedia; {{proto|Germanic|qwerty}} Template:proto links to Wiktionary. Why the discrepancy? This, that and the other (talk) 12:20, 30 September 2011 (UTC)Reply

Dunno really. Isn't not a problem per se. Mglovesfun (talk) 11:40, 1 October 2011 (UTC)Reply

halfpace

Latest comment: 13 years ago6 comments4 people in discussion

"A platform of a staircase where the stair turns back in exactly the reverse direction of the lower flight." This is an archaic word; what's the modern term for this? I've always, rather awkwardly, had to describe them as "U-turn staircases". Equinox ◑ 20:58, 30 September 2011 (UTC)Reply

I think they are called half landings, also this is the Beer Parlour not the Tea Room you silly person. - [The]DaveRoss 21:04, 30 September 2011 (UTC)Reply

Thanks. Also, I just found dogleg, which is a word I have wanted to know for years. (I always dream about that kind of staircase.) Equinox ◑ 21:11, 30 September 2011 (UTC)Reply

Upon further research, it seems like a "half landing" can be any landing which is not at the top or bottom of the stairs. To be precise many people are calling them "180 degree half landings" but that is not a succinct as it should be. - [The]DaveRoss 00:13, 1 October 2011 (UTC)Reply

I always just called it a landing. Ƿidsiþ 06:03, 1 October 2011 (UTC)Reply
In my house there is a square landing at the top of the stairs, then a 90 degree turn and a single step to the landing proper. We call it "the little landing at the top of the stairs" - probably not the technical terminology. SemperBlotto 07:00, 1 October 2011 (UTC)Reply

October 2011

Wiktionary:Votes/2011-09/Unified Tagalog

Latest comment: 13 years ago1 comment1 person in discussion

The discussion about Tagalo/Filipino quickly died out above. So I'm posting this link to avoid this vote being forgotten and made obsolete. Mglovesfun (talk) 11:38, 1 October 2011 (UTC)Reply

Romanizations of words in languages including Gothic

Latest comment: 13 years ago6 comments3 people in discussion

In light of the comments on Wiktionary:Votes/pl-2011-08/Romanization of languages in ancient scripts, I have created two new votes:

a vote on Gothic only: Wiktionary:Votes/pl-2011-10/Romanization of Gothic
a vote on a list of languages, including Gothic: Wiktionary:Votes/pl-2011-09/Romanization of languages in ancient scripts 2

Please give feedback before the votes start. Please vote after the votes start. - -sche (discuss) 01:36, 3 October 2011 (UTC)Reply

Where I do I give feedback? Well I'll say it here for now:

I think this is a horrible idea. I don't see why we should use another alphabet to write Gothic when there is a perfectly good one which was created specifically for Gothic. One of the reasons the Unicode Consortium adds characters from "unused" alphabets is so people like us can write words in ancient languages in their own script, instead using a transliteration. Maybe you could add a heading for trasliterated forms in ELE, and make the transliterated form redirect to the actual entry.

In the rationale it's written "Modern readers will most likely want to look up words in their Romanized form; these readers will not necessarily know or be able to input the words' original-script forms." To be honest, I think this applies more to modern languages such as Arabic or Russian. What kind of person would look for a word in Gothic? Some one interested in the Gothic language; such person is quite likely to know the Gothic alphabet. But what kind of person would look for a word in Arabic or Russian? Could be anyone; could be some guy who heard the word on TV, saw it on a newspapers, or whatever. None of these are expected to know the Cyrillic or Arabic alphabets.

I'm not saying that you should add entries transliterated from Cyrillic or Arabic; I'm just trying to show that adding transliterated Gothic entries is a worse idea. Ungoliant MMDCCLXIV 15:50, 6 October 2011 (UTC)Reply

To be clear, we're not proposing to move the content to the romanisations: entries in the Gothic alphabet (like 𐌵𐌹𐌽𐍉) will still exist and define that Gothic word, but romanisations (like qino) will exist as soft-redirects, similar to pinyin and romaji entries. This seems to be almost or exactly what you're proposing regarding redirects. The two major reasons which our Gothic contributors have given for allowing romanisations are: that users might know the Gothic alphabet but be unable to type it, and that Gothic texts and secondary sources (dictionaries, etc) are often published in romanised form (and we should have entries for the forms as they are in fact published, which means: both in the Gothic alphabet and in romanised form). - -sche (discuss) 21:51, 6 October 2011 (UTC)Reply

Ok. I misunderstood that. Ungoliant MMDCCLXIV 23:09, 6 October 2011 (UTC)Reply

Actually, that's not why the Unicode Consortium added Gothic. For one, I think most Unicode members with an an opinion would encourage the continued use of transliteration; see Don't Proliferate; Transliterate!. If you look at the historical record, approaching Unicode 3.1, Unicode had a problem. The concept of being 16-bit didn't last long; it was obvious that Unicode would need to expand, and there was a theoretical expansion area added in Unicode 2.0. But most programs only supported 16-bit Unicode, so characters that were added outside the base 16-bits wouldn't be accessible to many users; so nobody wanted their characters to be added outside that base 16-bits. But nobody had incentive to fix their programs until there were characters in that section. So they found some scripts that were completely useless, that were wanted by non-scholars because scripts are cool, and encoded them outside the 16-bit limits. Stuff like Old Italic, the Deseret alphabet and Gothic.--Prosfilaes 20:30, 7 October 2011 (UTC)Reply

"What is the scope of Unicode?

A: Unicode covers all the characters for all the writing systems of the world, modern and ancient."

http://www.unicode.org/faq/basic_q.html

These scripts aren't completely useless. Epigraphers, medievalists, classicists, and bible scholars find them important. Consider the Medieval Unicode Font Initiative (www.mufi.info), why would they recomend such characters if they didn't think they're useful?

Ungoliant MMDCCLXIV 23:15, 7 October 2011 (UTC)Reply

Merging Moldavian and Romanian

Latest comment: 13 years ago21 comments8 people in discussion

A couple of editors suggested above that Wiktionary discuss merging Moldavian and Romanian. Let's discuss! I favour merging the two. The issue seems quite like that of Serbo-Croatian: that is, the distinction is politically motivated. It is also similar in that Moldavian can be written in Cyrillic, whereas Romanian is not written in Cyrillic anymore: but we could handle that just like we handle Cyrillic/Latin Serbo-Croatian. A possible vote (not started!) is here. - -sche (discuss) 06:31, 3 October 2011 (UTC)Reply

Moldavian or Moldovan is essentially dead. I don't think anyone is trying to really revive it and we don't have active editors using it. Moldavian Wikipedia is locked. Romanian is written entirely in Roman script. Maybe Moldavian is worth keeping for historical reasons? There is still material written in Moldavian out there. It doesn't create any maintenance problems, like Serbo-Croatian, as far as I can tell. I don't have a strong opinion on this, though. --Anatoli 06:53, 3 October 2011 (UTC)Reply

Cyrillic could be treated as an alternative but obsolete script, like Arabic for Turkish. —CodeCa t 10:26, 3 October 2011 (UTC)Reply

Obsolete? It is still used in the de-facto regime of Transnistria, I'd hardly call that "obsolete". -- Liliana • 18:17, 3 October 2011 (UTC)Reply

Right, I would keep (and add) Latin spelling entries and Cyrillic spelling entries, and just explain the use of Cyrillic (that it is no longer standard to write Romanian in Cyrillic in Romania, but that the language continues to be written and published in Cyrillic in the region of Transnistria) on WT:About Romanian. - -sche (discuss) 21:55, 3 October 2011 (UTC)Reply

Sounds OK to me. --Anatoli 10:47, 3 October 2011 (UTC)Reply

If Romanian templates are modified like Hindi/Urdu or Serbo-Croatian (Cyrillic/Roman) then we could always add the optional Cyrillic spelling flagged as "Moldavian spelling". Russian/Ukrainians from Transnistria have to use Russian and Romanian to communicate with Moldova. It's just my opinion, prove me wrong, if you disagree. --Anatoli 06:40, 4 October 2011 (UTC)Reply

I've asked the Robbie SWE for input. :) - -sche (discuss) 23:44, 6 October 2011 (UTC)Reply

I'm kind of torn; the Cyrillic alphabet is a thing of the past for Romanian, most certainly not something most Romanians would want to promote today. The fact that Bogdan Stăncescu - the founder of the Romanian Wikipedia project - cancelled the Moldavian ISO 639 code (mo an mol) back in November of 2008 indicates that there is no substantial difference between Romanian and Moldavian. This initiative was welcomed by Marius Sala, vice-president of the Romanian Academy.

Personally, I don't think that the solution should be in the form of "Cyrillic/Latin Serbo-Croatian". I can't however provide a solution, but will follow this discussion and see how it evolves. --Robbie SWE 10:35, 7 October 2011 (UTC)Reply

Wiktionary including Cyrillic Romanian does not mean that we 'promote' it, as we only document. If indeed still used in Transnistria (as Liliana has pointed out), then we can't just tag them "obsolete", because we would then be misrepresenting things. However, we could tag them with both "archaic" and "Transnistrian", and problem solved. --JorisvS 12:18, 7 October 2011 (UTC)Reply

I'm not saying that including Cyrillic Romanian promotes a regression. It just makes things problematic; I mean where do we draw the line? Will we start including runes for old Swedish? From what I've heard (might be wrong; the socio-political distance between Sweden and Moldova is quite far), most inhabitants of Transnistria speak Russian and therefore use the Cyrillic alphabet. --Robbie SWE 12:33, 8 October 2011 (UTC)Reply

I also checked and it's being taught at schools in Cyrillic. I still don't see a problem in merging (despite being a native Russian). "Moldavian" as a name of the language is still used colloquially but "Romanian" is used increasingly in both Moldova and Pridnestrovye (Transnistria). There is no serious efforts to separate them again (unlike say Serbo-Croatian). Perhaps "Moldavian spelling" is better than "Cyrillic", e.g. "România f (genitive/dative României), Moldavian spelling: Ромыния" --Anatoli 12:49, 8 October 2011 (UTC)Reply

Of course we'd allow runic Old Swedish entries. Perhaps that's a bad analogy as it's a dead language. We have a couple of runic Old English entries. Mglovesfun (talk) 12:53, 8 October 2011 (UTC)Reply

Ok, I'm sorry for the bad analogy. I think that the use of "alternative/variant" might work, maybe worth giving it a try. I'm not sure though that we'll be doing the same thing in the Romanian Wiktionary project; the task seems too big for two active users. --Robbie SWE 18:19, 8 October 2011 (UTC)Reply

When reading w:Moldovan language, it's clear that this is a controversial issue, which would be a good reason to allow Moldovan as a separate language (even if the category is almost empty). This would be a good reason because we must be neutral about controversial issues, and because the definition of language may involve political issues as well as linguistic issues. Also remember that dead languages are allowed here, even when nobody is able to contribute to them and their categories are empty. However, as the use of the ISO code for Moldovan is now discouraged, I don't know. Lmaltier 19:03, 8 October 2011 (UTC)Reply

How is Serbo-Croatian any more neutral than this? -- Liliana • 19:24, 8 October 2011 (UTC)Reply

@Lmaltier: Separating Moldav(i)an and Romanian is as neutral or non-neutral an approach as unifying the two. If those who consider there to be two languages would find it controversial if we unified them into one language, those who consider there to be one language will find it controversial if we separate it into two languages.

@Anyone who knows: do words have the same Cyrillic spellings in Transnistria today that they had in Romania in the past, when Cyrillic was used there? - -sche (discuss) 01:48, 9 October 2011 (UTC)Reply

That's where the problem arrises: the Wikipedia article states "Its structure is based on the Russian Cyrillic alphabet (excluding three Russian letters and adding another), and does not have a direct resemblance to the historical Romanian Cyrillic alphabet used from the Middle Ages until the second half of the 19th century in the Principalities of Vallachia and Moldavia[1] and until 1932 in the Soviet Union." We're basically talking about two different interpretations of the Cyrillic alphabet. --Robbie SWE 13:09, 9 October 2011 (UTC)Reply

So we should have three entries for the same word: one using a Roman script, and two different ones using Cyrillic scripts and one of which tagged as "obsolete" and the other tagged as "Transnistrian", right? Of course all in accordance with WT:CFI. I wouldn't have any problem with that. --JorisvS 14:18, 12 October 2011 (UTC)Reply

Right, ro.Wikt may wish to wait and not add Cyrillic entries at this time because they do not have enough users to manage such an addition, but we already have entries (in Category:Moldavian language) and presumably enough users to manage them. Robbie and JorisvS, please take a look at Wiktionary:Votes/2011-10/Unified Romanian and see if anything needs to be changed. - -sche (discuss) 00:37, 13 October 2011 (UTC)Reply

Maybe mention that there can be two different Cyrillic spellings that will be allowed, one archaic, one Transnistrian? Or is that superfluous? --JorisvS 09:00, 13 October 2011 (UTC)Reply

Loss of usage-context categories

Latest comment: 13 years ago15 comments6 people in discussion

At one time, before the "reform" of our category system, we had categories that indicated the usage context of many specialist terms. We now have topical categories instead. I propose that we need to reinstate the usage context categories.

Topical categories for a specialist field are intended to include senses of all terms that relate to the topic in question. Context tags (of the occupational type) are intended to indicate that a given term is likely to be understood only by those with a specialized knowledge in the area.

I think that all terms in a specialist context connected with a given topic should be member of the topic category, but that not all terms in a given topical area should bear the context tag and be in the usage context. DCDuring TALK 11:45, 3 October 2011 (UTC)Reply

What reform of the category system are you referring to, and at what timepoint is it supposed to have occurred? From what I recall from 2006, the category for, say, physics was always a topical one. We have many usage context categories; what we do not have are usage context categories for the likes of physics, chemistry, mathematics, etc. A usage context category for, say, mathematics cannot be reinstated, as it never was there in the first place; rather, it can be newly introduced, such as "Category:English terms only used in the "context of mathematics", or "Category:English terms restricted to mathematics" or the like. --Dan Polansky 11:57, 3 October 2011 (UTC)Reply

I had always interpreted Category:Physics as a usage context, defined by a usage context label which was not applied to terms that that were not so limited. I had no interest in topical categories and have little interest now at Wiktionary, as I find such categorization information at Wikipedia when I need it.

What I perceive as a reform was probably the unintended result of the actions of those who did/do not perceive there to be a worthwhile distinction between topical categories and usage contexts. The use of context tags to create populate the topical categories without also creating appropriate usage-context categories is my evidence of the lack of sensitivity to the distinction. DCDuring TALK 12:37, 3 October 2011 (UTC)Reply

I think DCDuring has either slightly midunderstood or is being sarcastic (the latter, I think). Labels like {{physics}} are allowed, just they should serve as true contexts and not just a shortcut for convenience. So boson legitimately uses physics, but it would be silly to use it for entries like solid, light, liquid, gas and so on where they are clearly not only or chiefly used in the field of physics. Ditto England shouldn't have a {{geography}} tag. Mglovesfun (talk) 12:41, 3 October 2011 (UTC)Reply

However, solid could be in a physics category (a category of physics-related terms), without having any sense-line tag. - -sche (discuss) 13:05, 3 October 2011 (UTC)Reply

Just as we have regional and register context tags that populate usage categories, IMO, we should also have usage categories that reflect occupational usage contexts. Maintaining a consistent distinction between topical and usage categories, while, of course, recognizing the distinction, would be quite worthwhile. For example, terms that are in a given topical category can have some senses (Type I) that are not in any topical category, some senses (Type II) that are in the topical category but properly understood outside any specialist context, and some topical senses (Type III) that are properly understood only by cognescenti and belong in a usage context category. (The last sometimes verge on being prescriptive, but, to be included here, must show evidence of use by multiple authors.) As categories themselves are limited in usefulness for a dictionary because they do not specify a specific Etymology or PoS, let alone sense, it will always be tempting for well-intentioned contributors to apply usage-context labels to senses of Type II.

To be clear: Entries with senses of Types II and/or III should be in topical categories if folks want to maintain such things. Entries of Type III should certainly be in a usage-context category if we aspire to be useful as a dictionary.

I doubt that it makes sense at this time to have some sense labeling to indicate which sense it is that qualifies a term to be in a given topical category, though such labeling would discourage misuse of context tags that have associated topical categories. DCDuring TALK 13:42, 3 October 2011 (UTC)Reply

Like how {{slang}} denotes a sense that is likely to be understood only by those with specialized knowledge of slang? —Ruakh_TALK 13:20, 3 October 2011 (UTC)Reply

Sorry, I should have also said what -sche said, that in cases where a term is used in a context but not specialized (that is, the term retains its general-use meaning) a written category could/should be added at the bottom. So rather than tagging foul with every sport that has the concept of fouls, add the categories at the bottom by hand. I'm not sure why some users are so reluctant to add categories at the bottom, it's not particularly difficult to do. Mglovesfun (talk) 13:49, 3 October 2011 (UTC)Reply

I thought it obvious that we have different types of usage contexts (which actually reflect reality). We have register (informal, formal) and regional. We have some contexts which indicate offensiveness and we have some that indicate media-related restrictions (colloquial, IM/internet). There may be some types missing and there are other useful ways of classifying usage contexts. Occupational contexts are another type of usage context. They are a superior approach IMHO to marking some terms as "jargon" and hoping that a user could figure out from topical categorizations the specific places in which a given term could be expected to be understood when used. DCDuring TALK 13:57, 3 October 2011 (UTC)Reply

It is indeed obvious that we have different types of usage contexts, but I don't think any of them indicate — nor should indicate — who is likely to understand a given term. Rather, they should indicate the context in which a term is used. Hence the term "usage contexts". ;-) If "solid" is only used in physics contexts, or has a specific sense when used in a physics context, then it doesn't really matter that it's a term everyone knows. —Ruakh_TALK 14:17, 3 October 2011 (UTC)Reply

I've been thinking along these lines myself. I think it may be worthwhile. I'm really not sure, though: after all, the benefit of a jargon dictionary is that it provides all the terminology for a field, and (e.g.) solid is terminology in physics, even if it's also used by others. But if this is something we want to do, then a good way to proceed might be as follows: Keep [[Category:en:Physics]] as a topical category and restore [[Category:en:Jargon:Physics]] (or perhaps [[Category:English jargon:Physics]] or even [[Category:Jargon:en:Physics]]) as a term-of-art category.—msh210℠ (talk) 16:06, 3 October 2011 (UTC)Reply

I like this idea. - -sche (discuss) 20:28, 3 October 2011 (UTC)Reply

I strongly object to the use of "jargon" in any category name or usage-context label. Whatever it may mean in a linguistics context (!), a few of the common senses are definitely pejorative. We have enough difficulty trying to prevent contributors (not just newbies, either) from being prescriptive without providing such encouragement. Even our inadequate entry has one of the pejorative senses, though it is not so labeled. AFAICR that is why {{jargon}} was deleted. DCDuring TALK 23:02, 3 October 2011 (UTC)Reply

Here's how to use written categories instead of contexts: diff. Mglovesfun (talk) 12:18, 9 October 2011 (UTC)Reply

The only trouble is that {{sports}} puts the entry in a topic category, not a context category.

Also, there is no reason for the context to be "sports" in general rather than the specific sports in which this is understood.

The usage contexts are one set of categories, which are linguistic. I think they are relatively well defined. The topical categorization is not well-defined, except as it is derived from the usage categorization. For example, (deprecated template usage) bending brake could be in topical categories "Tools", "Metalworking", even "Roofing". I'm not sure about the range of usage contexts, but I doubt that it is in the vocabulary of the most in the general population. "Metalworking" and "Roofing" would seem to be possible usage contexts, but not "Tools". DCDuring TALK 02:11, 11 October 2011 (UTC)Reply

英國, Nei Mongol

Latest comment: 13 years ago4 comments3 people in discussion

Why are they locked? Engirst 13:00, 3 October 2011 (UTC)Reply

Dubious; it is normal to lock a page if there's an edit war, but it becomes ethically dubious when one of the contributors in the edit was protects a page with their version instated. It's always better for someone outside of the conflict to lock such a page. FWIW the edit to Template:Hani looks valid to me; the fact the citation contains some Latin script characters doesn't make it valid. I'll accept that Engirst is doing it to prove a point that Latin terms are used in Mandarin as 'borrowings' but that doesn't invalidate the citation. FWIW the Middle French version of 'Le Tiers Livre' I read contains some Greek citations in Greek characters, but I'd like to think that doesn't make it invalid as a source for Middle French. Mglovesfun (talk) 13:54, 3 October 2011 (UTC)Reply

Engirst was edit-warring on Thames河 to prove his point. He never writes anything in Mandarin except when he needs to troll his ideas. I removed his edit, which is 1) not synchronised with the simplified version, 2) doesn't provide pinyin and translation into English - it's a long-time convention Engirst has been violating. And most importantly 3) pushes his English words in Mandarin before the decision is made about the usage of English words in Mandarin. All his edits are generally considered bad by Mandarin contribitors. This is not a personal attack. He has been banned for his behaviour (i.e. the attempt was made to ban him multiple times). I will remove the block on 英國／英国 (Yīngguó) when the decision is made about the usage of foreign proper noouns in Mandarin. It's a concern that a person who worsens the quality of our Mandarin entries continues editing. --Anatoli 21:30, 3 October 2011 (UTC)Reply

英國/英国. I've synchronised the entries, added formatting, pinyin and translation, removed example with a very uncommon English name in the Mandarin sentence. Will have to lock the entries if edit warring starts. --Anatoli 22:22, 3 October 2011 (UTC)Reply

Company names

Latest comment: 13 years ago4 comments3 people in discussion

I feel there should be a vote on confirming the Company names section of WT:CFI. As it is, too many people disagree with it, and it clearly doesn't constitute consensus anymore. -- Liliana • 18:19, 3 October 2011 (UTC)Reply

There should better be a vote on removing the section "Company names" from CFI. See also Wiktionary:Beer_parlour_archive/2011/April#Poll:_Including_company_names. Rather than not being supported by consensus any more, the section never was supported by consensus in the first place, I figure. --Dan Polansky 18:36, 3 October 2011 (UTC)Reply

The straw poll you (Dan) link to is interesting. Five of its twelve respondents opined that "No company should have a dedicated sense line in any entry", which is at least as restrictive as our CFI and possibly (depending on how you read our current CFI) much more restrictive. The other seven opined that "Some companies should have dedicated sense lines in some entries", which is possibly (depending on how you read our current CFI and depending also on what those respondents would include in their "some") the same as our current CFI, though possibly more or less restrictive than our current CFI. So while a vote (fairly composed) might lead to some change, I suspect it will not: I suspect that the current CFI are a good compromise in this regard, where there is no consensus.—msh210℠ (talk) 18:45, 3 October 2011 (UTC)Reply

The current CFI on company names, above all, is unsupported by consensus. CFI should not contain an unsupported compromise between two positions; if no consensus has been reached, CFI should state only so much. And if there is no consensus on specific rules for inclusion of company names, then the regulation of company names can be left to the section for names of specific entities, which is achieved by removing the section WT:CFI#Company names, and by removing the second bullet item from the section WT:CFI#Names of specific entities. While the statement "Some companies should have dedicated sense lines in some entries" does not contradict current WT:CFI#Company names, I find it likely that those who support the statement would like to see WT:CFI#Company names removed as unclear and too restrictive. The critical part of WT:CFI#Company names to be removed is this: "To be included, the use of the company name other than its use as a trademark (i.e., a use as a common word or family name) has to be attested." --Dan Polansky 19:05, 3 October 2011 (UTC)Reply

Requesting short-term block for Special:Contributions/90.205.76.53

Latest comment: 13 years ago17 comments6 people in discussion

As can be seen at http://en.wiktionary.org/w/index.php?title=%E9%AC%BC&action=history, among other places, this user is becoming a persistent nuisance. Re-reverting registered editor fixes multiple times should be grounds for a short-term block, no? -- Annoyed, Eiríkr Útlendi | Tala við mig 22:05, 3 October 2011 (UTC)Reply

Let's try a bit harder to reach out to this person before blocking, they do seem to be editing in good faith. - [The]DaveRoss 22:23, 3 October 2011 (UTC)Reply

Is anyone but a checkuser likely to succeed at communicating to an anon? DCDuring TALK 23:17, 3 October 2011 (UTC)Reply

I am not sure what being a checkuser might do to increase success, everyone can see what the IP address of an anonymous editor is. - [The]DaveRoss 23:20, 3 October 2011 (UTC)Reply

D'oh. DCDuring TALK 00:57, 4 October 2011 (UTC)Reply

For future reference, the place for this is [[WT:VIP]].—msh210℠ (talk) 23:24, 3 October 2011 (UTC)Reply

Thanks msh210, I'd posted there in August about a different IP address (that seems to be the same user) and got no response, so I thought I'd try posting here instead. -- Eiríkr Útlendi | Tala við mig 23:31, 3 October 2011 (UTC)Reply

:-) Good point.—msh210℠ (talk) 00:20, 4 October 2011 (UTC)Reply

What do we do with people not acting in good faith but capable of avoiding all administrator blocks like Engirst and his many-many aka's? Maintaining and fixing his entries is time-consuming and unproductive. His whole activity is about proving his points, which is otherwise called trolling. A rhetorical question, perhaps, his activity and entries have been discussed many times. --Anatoli 01:24, 4 October 2011 (UTC)Reply

There are fancy blocks for people who change IPs frequently. Other than overt vandalism I can only think of two times we have bothered making that effort and both times there was strong community support for banning a particular person outright. - [The]DaveRoss 01:28, 4 October 2011 (UTC)Reply

Well, to me Engirst (+ many aka's and anons) is such a case where a sophisticated block might be in order or long overdue. It was discussed too but I think the attempt to do it failed. He is just wasting a lot of time "promoting Mandarin written in Roman letters". First pinyin - full of errors, incosistent and out of synch with both traditional and simplified entries, now English proper names used in Mandarin in Roman letters. I agree with people saying he's clearly got some agenda (pinyinisation, converting Mandarin to Latin alphabet?). --Anatoli 01:44, 4 October 2011 (UTC)Reply

As for Japanese at least he doesn't have the adequate knowledge to make useful contributions in good faith in the first place even if he wanted to, which he does not seem to. Haplogy 02:10, 4 October 2011 (UTC)Reply

@Haplogy: Just to be clear, do you mean User:Engirst, or User talk:90.205.76.53? -- Eiríkr Útlendi | Tala við mig 05:24, 4 October 2011 (UTC)Reply

I mean the IP user. Sorry I should have been more specific. --Haplogy

Based on the conversations people have had with him on wiki I think it is clear that this is a young person who has recently become interested in Japanese. While I agree that language learners are not the most useful editors to have on the project there is certainly merit to having them. If there is any way to channel this person's energy into more useful edits we should try that rather than putting more effort into blocking someone who is probably just trying to figure things out. - [The]DaveRoss 10:48, 4 October 2011 (UTC)Reply

I generally agree with Dave here. What got up my nose about this particular IP user was their insistence on reverting my fixes, multiple times, in the same entries. Figuring things out is one thing; being a persistent nuisance is another. -- Cheers, Eiríkr Útlendi | Tala við mig 16:16, 4 October 2011 (UTC)Reply

There is always that question, whether something is willful disregard or simply confusion or ignorance. Depending on how much experience someone has with a wiki community they may not know that reverting multiple times is taboo, or even really understand why their edits are getting undone. - [The]DaveRoss 19:24, 4 October 2011 (UTC)Reply

True enough. However, when the IP user's own edit summary is "Undo revision ...", it starts to look a lot like they're aware of the editing and history features, and are choosing to ignore other edit summaries. This is just my own perspective, which calls into doubt the user's good faith - I'd be happy to be proven otherwise. :-/ -- Eiríkr Útlendi | Tala við mig 20:54, 4 October 2011 (UTC)Reply

Gheg Albanian

Latest comment: 13 years ago54 comments7 people in discussion

We have a category for Gheg Albanian, and we have ten entries with Gheg Albanian as an L2 header. We also have numerous entries which handle Gheg Albanian like this/this (with context tags). Should we convert the ten Gheg entries to use an ==Albanian== header and a (Gheg) context tag, or should we move the Gheg information out of the (standard) Albanian sections like this?

The former is preferable for me. BTW, the second example seems to be missing some important categories - parts of speech. There should not be many under Gheg Albanian header. --Anatoli 03:31, 4 October 2011 (UTC)Reply

How different are the two variants, anyway? -- Liliana • 05:12, 4 October 2011 (UTC)Reply

We don't have skilled people here but Tosk Albanian is the standard and most common, most entries/translations are in Tosk. Albanian Wikipedia and Wiktionary are not separated into Tosk and Gheg, perhaps we shouldn't separate either, like we don't separate Belarusian. --Anatoli 06:36, 4 October 2011 (UTC)Reply

The two are sufficiently different to be mutually unintelligible, and so can be considered distinct languages. The old-Tosk derived Arbëreshë and Arvanitika are also unintelligible with Standard Albanian (Tosk), even their "dialects" are perceived as unintelligible by their speakers. --JorisvS 11:44, 4 October 2011 (UTC)Reply

So, should we split the Gheg and Tosk entries? - -sche (discuss) 07:43, 7 October 2011 (UTC)Reply

I'd say, therefore, yes. --JorisvS 10:37, 7 October 2011 (UTC)Reply

Alright, that sounds reasonable to me, as they do have separate ISO and Wikt codes, and it was User:Dick Laurent (who speaks at least some Albanian, sq-1) who created some of the ==Gheg Albanian== entries. I'll start splitting. Less than 100 words will be affected (when split, less than 200). - -sche (discuss) 18:45, 7 October 2011 (UTC)Reply

I think we should nest Albanian in translations tables (like this). - -sche (discuss) 18:52, 7 October 2011 (UTC)Reply

Sure, why not? Though Tosk should possibly be the default, ~~ala water~~. -- Liliana • 19:26, 8 October 2011 (UTC) addendum: oh I see someone changed that too, hmmReply

We also have a good number of Albanian entries with a Gheg "pronunciation". I suspect, however, that the orthography is actually just Tosk and that it should be different when properly written in Gheg (as opposed to Tosk written by Gheg speakers). While we could add entries by using the key at Wikipedia's Gheg Albanian article, I don't know how reliable the IPA used in these entries is, nor whether the result would be how the words are actually written. --JorisvS 14:06, 12 October 2011 (UTC)Reply

(Hm, this discussion hasn't had all that much participation...) Any objections to changing the translation adder to nest aln (Gheg) as Albanian/Gheg and sq as Albanian/Tosk? --Yair rand 15:32, 25 October 2011 (UTC)Reply

No objections; I think it should nest them. - -sche (discuss) 20:33, 25 October 2011 (UTC)Reply

Done. --Yair rand 17:54, 27 October 2011 (UTC)Reply

(Pointing out that as of two months ago there are 2187 "Albanian" translations. That's a lot of edits if this is going to be standardized.) --Yair rand 17:59, 27 October 2011 (UTC)Reply

Maybe something a bot could do? --JorisvS 20:21, 27 October 2011 (UTC)Reply

Could you also change the translation adder to nest Arbëresh (aae) and Arvanitika (aat) the same way it now nests Tosk and Gheg? --JorisvS 16:23, 2 November 2011 (UTC)Reply

Done. --Yair rand 16:30, 2 November 2011 (UTC)Reply

Now that Gheg is separated from Tosk, I've begun wondering about the clarity of our ==Albanian== specifically for Tosk (Albanian) to our users. Thoughts? --JorisvS 16:23, 2 November 2011 (UTC)Reply

You could compare this to {{de}} "German", {{gsw}} "Alemannic German", {{nds}} "Low German". I don't think clarification is needed, since, similar to the German case, Tosk is the dominating variety. -- Liliana • 16:45, 2 November 2011 (UTC)Reply

So is this completely incorrect? --Yair rand 16:51, 2 November 2011 (UTC)Reply

Splitting Gheg off from Standard/Tosk is foolish. — [Ric Laurent] — 19:38, 2 November 2011 (UTC)Reply

Do you also have a reason for your opinion? --JorisvS 20:27, 2 November 2011 (UTC)Reply

Separating Gheg and Tosk and the few other minority dialects makes as much sense as splitting American/Scottish/British/Australian English. The differences that cause problems in mutual intelligibility are like flashlight and torch in English. There might be bumps, but they are NOT different languages. This is a fact and that's all I have to say on it. If you all want to do something completely retarded, go for it. — [Ric Laurent] — 22:35, 2 November 2011 (UTC)Reply

No, all sources I've seen addressing mutual intelligibility say that this is limited between the varieties of Albanian, which means that they are different languages by this criterion. So, the comparison with the Englishes is wrong. It is more appropriate to compare the situation to German, where we have (at least) Alemannic, Austro-Bavarian, and Low alongside the Standard, all with limited intelligibility with each other. --JorisvS 23:01, 2 November 2011 (UTC)Reply

When one adds an Albanian translation, it starts nesting (it didn't nest before):

* Albanian:
*: Tosk: {{t|sq|WORD}}

I don't know if it's a good idea. Tosk Albanian is standard, if anything Gheg could be marked using {{qualifier}} or nested. Can we nest only non-standard versions or just mark them? Tosk Albanian shouldn't be nested, just like Norwegian Bokmål, IMHO.

Can we change it back to

* Albanian: {{t|sq|WORD}}

If anyone wants nesting, here are the codes for Albanian varieties:

aln – Gheg
aae – Arbëreshë
aat – Arvanitika

--Anatoli 22:46, 2 November 2011 (UTC)Reply

As per Liliana, Tosk is the dominating version, like standard German, so please change back. --Anatoli 22:48, 2 November 2011 (UTC)Reply

What does it mean to be the "dominating" version? --Yair rand 22:50, 2 November 2011 (UTC)Reply

Standard, most common, most likely to have texts to be written in, more useful for users learning the language or wanting to find translations. Albanian dictionaries are also in Tosk. Tosk Albanian or standard Albanian is used in Kosovo as well, although the native dialect is Gheg. --Anatoli 22:55, 2 November 2011 (UTC)Reply

Hm. The link I posted above in response to Liliana's comment is to a WolframAlpha query of Tosk and Gheg, and it says that Gheg has about a million more native speakers than Tosk, and about a million more total speakers than Tosk. I don't know how reliable those statistics are, or whether they're contradictory to Tosk being "dominating", since the meaning isn't very clear. (Incidentally, it also lists translations of the numbers one to ten for both languages, and not a single one of them are the same in both Tosk and Gheg.) --Yair rand 23:03, 2 November 2011 (UTC)Reply

I want to point out that Tosk is not synonymous with standard Albanian, but standard Albanian is based on the Tosk dialect. I'm not really familiar at all with Arvanitika or Arberashja, so I won't comment on whether they should nest in translation tables, but if Tosk and Gheg alternatives exist, those varieties can follow whatever is used in standard Albanian with {{qualifier}}. L2 for Gheg and Tosk should both be ==Albanian==.

Like Anatoli suggested, people from Tirana and Pristina will understand each other fine. Maybe not perfectly, but certainly as well as someone from Valley Forge and Edinburgh. — [Ric Laurent] — 23:01, 2 November 2011 (UTC)Reply

I'm not aware of Anatoli having said that. Because Gheg speakers also come into contact with the Standard (Tosk), they learn to understand it. This is passive bilingualism, not mutual intelligibility. (Cf. in former Czechoslovakia speakers of Czech and Slovak could communicate with each other in their own native language, not because Czech and Slovak are mutually intelligible, but because each was a passive speaker of the other. The young generation of today has not (passively) learned the other's language and they have trouble talking to each other). --JorisvS 23:15, 2 November 2011 (UTC)Reply

You aren't aware of Anatoli saying that because he didn't - he was talking about standard Albanian and the Gheg they speak in Kosovo. I used the city names.

I study standard Albanian and Tosk. I used to talk to a young guy from Kosovo and we understood each other perfectly fine. You can resist this simple fact all you want, but the fact remains that Gheg and Tosk are no more different languages than what they speak in Texas vs what they speak in Dublin. There are certainly dialectal differences, and understanding may come with a bit of strain at places, but the existence of a Gheg incubator does not make Gheg its own language. Ask people who speak Gheg what they speak, and they say Albanian. That's not some wild coincidence. Most words are the same or identical, inflection is identical. Anyone with even the most basic knowledge of Albanian like myself can read that Gheg article about Albania you linked to and understand it with little trouble at all, even if they aren't particularly familiar with the quirks of Gheg beforehand. — [Ric Laurent] — 23:40, 2 November 2011 (UTC)Reply

Asking people what they speak doesn't mean anything. Ask a Croat and he'll say "Croatian", even though it is Serbo-Croatian he speaks. Q: How deep was this conversation? --JorisvS 23:54, 2 November 2011 (UTC)Reply

We sent messages back and forth for at least two weeks. We got pretty involved and talked about a bit of a range of subjects. — [Ric Laurent] — 00:13, 3 November 2011 (UTC)Reply

As a matter of fact, it was so easy to understand him, I didn't even realize that it was Gheg he was speaking until long after we met. He used words like "asht" and used n's instead of r's in a lot of places, which I thought was odd or maybe just typoes. But no, they were just regional spellings - because that's what Gheg is: a regional dialect. Just like they have in Texas or Georgia or Ireland or New Zealand. — [Ric Laurent] — 00:19, 3 November 2011 (UTC)Reply

I understand that it was the written language you're referring to, not the spoken one. Written languages are always easier to understand than when spoken and closely related languages may be completely intelligible to each other's native speakers when written. E.g. being a native Dutch, one can easily read the Afrikaans WP, yet understanding the spoken language is much harder. --JorisvS 14:29, 4 November 2011 (UTC)Reply

This is a digress but mutual intelligibility is a tricky thing. Languages can be extremely close but still hard to understand without some exposure. What makes languages mutually intelligible is understanding the pronunciation, knowing how sounds change. Even very short exposure to a similar language can open these secrets. Czech and Slovak, like Russian/Ukrainian/Belarusian are extremely mutually intelligible, Slavic languages have up to 60% of common vocabulary and up to 80% or more in closer languages but pronunciation and other factors confuse people who never heard a language. Of course, nationalists will disagree and will highlight differences, rather than making some effort to understand. --Anatoli 23:28, 2 November 2011 (UTC)Reply

It's not a digress, it's the core of the issue. Yes, mutual intelligibility is tricky. The percentages you mention are actually easily enough to prevent varieties from being intelligible (though usually sufficient for partial intelligibility, which may still remain for far lower percentages), and thus being distinct languages. And no, I don't need to be reminded of those Croatian nationalists who desperately tried highlighting differences, however superficial, to "prove" that Croatian is a different language from Serbian. --JorisvS 23:54, 2 November 2011 (UTC)Reply

I'm not even trying to prove that Slavic languages are distinct languages (not talking about Serbo-Croatian controversy). Partial but very high intelligibility and learnablity is what characterises Slavic languages, far more mutually intelligible than Semitic, Romance or Germanic languages. Long time ago I went with a Russian friend (not a linguistic type) to Poland who had 0% Polish. He could communicate in 4 weeks on a passable level and understood a lot when he learned to transform the Polish phonology in his mind into something closer to him + learned a few linking, most common words. I have a proof that Russians can learn Polish in a month (without actually learning just talking and listening) and I don't think Polish is more complicated than Slovenian (the most distant) to the Russian language. Back to the topic, Albanian shouldn't be split, Albanians don't do it, why should we? --Anatoli 00:18, 3 November 2011 (UTC)Reply

I've studied standard Albanian and understand Gheg just fine. Not only are the vast majority of words the same or very similar (like sodomize vs sodomise) but inflection is identical. They're not two languages, Tosk and Gheg. This is a simple fact to which one person around here seems to be highly resistant. — [Ric Laurent] — 23:40, 2 November 2011 (UTC)Reply

Tell me, then why do the sources I've seen speak of limited intelligibility? --JorisvS 23:54, 2 November 2011 (UTC)Reply

I don't know, maybe because they're retarded dicks. Like I said earlier and have said numerous times: Even for someone with a basic knowledge of standard Albanian, Gheg is not difficult to understand. Most words are the same, as are inflections. Even skimming the Shqipnia article, I understood it fine. If you don't want to trust the only person in this conversation who has actually studied Albanian fine like I said, if you want to do something utterly retarded, it's on you. If you want to take a month to study basic Albanian grammar based on the templates I made for wiktionary, you'll be able to understand that incubator entry with little difficulty.

I'd apologize for my hostility, but this is as stupid as trying to keep Serbo-Croatian separate. — [Ric Laurent] — 00:13, 3 November 2011 (UTC)Reply

(Clueless person speaking: ) Despite the fact that the only person here who speaks the language is saying that they're not separate languages, I'm having a hard time understanding how this could be the case. w:Gheg_Albanian#Differences_between_dialects lists a whole bunch of really common words that are different between dialects/languages: "to be", "I do", "I can", "is" (If it's only those specific tenses then it makes more sense, but...), and it seems that none of the lowish numbers are the same either? Ric, do you think you could give a rough estimate of what percent of words are the same spelling in Gheg and Tosk? --Yair rand 00:43, 3 November 2011 (UTC)Reply

That's really difficult to say for a couple of reasons. For one thing, Albanian has a really rich system of derivation. Related adjectives, gerunds and verbs are very easy to make, and they easily compound the differences. Albanian makes sort of a dialect continuum so basic words will vary a lot from place to place. Also, the spellings are likely to vary quite a bit because Albanian is a million times more phonetic than English. As for the list on that page, I've seen it and I'm hesitant to trust it completely. I have a feeling it may have been made from words that aren't exactly "standard" in their dialects, but from subdialects - especially since they aren't using the citation forms. Half the verbs listed are participles, like qenë/qënë/kjenë, punuar/punue (never seen that participle in Gheg, I'm almost sure that's a subregionalism).

One of the most common differences between Tosk and Gheg (which you can see in the name of Albania itself) is the r/n variation. Shqipëria vs Shqipnia. There's other stuff like that in the list, like pjekuri/pjekuni, dhelpër/dhelpën Oh also the ë, Tosk tends to keep it even when it's silent in a word. Some Gheg-speakers keep it, some don't.

All that aside, when you're only focusing on the differences it's easy to think they might be different languages. There are some really distinct differences in regional Serbo-Croatian, but they're still one language. For god's sake, they use different names of the months in different places.

I maintain that the differences in the various dialects are no more serious than the dialects of English - they're just spoken in a much more narrow area. — [Ric Laurent] — 01:13, 3 November 2011 (UTC)Reply

A whole bunch is easy to find when one is looking. Even in this list there are obvious similarities. If we base our understanding on how American and British are different (looking at a vocabulary list), rather than how similar they are, than we may think that they are different languages. The list of similarities would not be practical, as it would be most of the vocab. --Anatoli 00:54, 3 November 2011 (UTC)Reply

I wouldn't consider it significant evidence if it were just random words like "desk" or "drive" (well those might be bad examples, but you know what I mean), but if, say, American English had different words for "in", "be", "it", "the", "do", "can", "because", "like", "what", "have", "good", etc., then it would be a different situation. --Yair rand 01:08, 3 November 2011 (UTC)Reply

The examples in the Gheg Albanian link don't show the most common words, besides, there's synonymity and word choices. An opinion of a native speaker or someone speaking Albanian would be more appropriate to make a good judgement. When I look at Swadesh list of Slavic languages, in many cases it's not that I don't understand e.g. some Macedonian or Czech words, in Russian we may use a different root or a word can be obsolete or less common, the common practice could be slightly different but the same thing could be expressed in various ways. I give an example to demonstrate what I mean. If you look at translations for I have a question, without some knowledge of Slavic languages, you may wonder why Polish/Czech/Slovak phrases are so radically different from Russian/Ukrainian/Belarusian but Slavic people won't have trouble understanding them because we can rephrase Russian/Ukrainian/Belarusian in such a way that would be close to Polish/Czech/Slovak grammatically but that would be less common. The spelling and variations in pronunciation are not unique to Albanian, for someone not knowing English, thru/through, color/colour, "do you have"/"have you got" could look like drastic differences. --Anatoli 02:44, 3 November 2011 (UTC)Reply

In my ~5 years here I've only known two Albanian speakers. Zeke and Nemzag, though I'd say Nemzag is obsessed with obscure fringe nonsense and his own research and assumptions, which I'd categorize as completely unreliable. Haven't seen Zeke in ages, though she's still in my G-mail contacts. But to Anatoli's point about grammar and vocabulary choice, I would say the differences between Gheg and Tosk are closer to English variants than Slavic languages. Inflection is, if not completely identical, nearly identical in my experience. The main differences come, as in English and Serbo-Croatian, in the form of the phrasal formations. One of the examples in Yair's link was apparently the infinitive construction, which in Tosk and standard Albanian I know for a fact is "për të" plus the participle, whereas the table in the link (I can't personally verify this, having never seen it) suggests that it's "me" plus the participle, which appears to take a slightly different form in Gheg - again based only on the table, not my personal experience. Substantival and adjectival inflection are identical in Gheg and Tosk. If there are differences in verbal inflection, they can be easily noted in my totally beautiful conjugation tables without making whole new ones for Gheg and Tosk. — [Ric Laurent] — 11:23, 3 November 2011 (UTC)Reply

We must separate American from English! List of British words not widely used in the United States. --Anatoli 00:59, 3 November 2011 (UTC)Reply

I love you. lol — [Ric Laurent] — 01:13, 3 November 2011 (UTC)Reply

He-he :) My dictionary doesn't have "diaper", "trash" or "checkers". I only know "nappy", "rubbish" and "draughts". How did you manage to deviate so much? --Anatoli 02:44, 3 November 2011 (UTC)Reply

Apropo, it wouldn't surprise me if there were pockets of people in the Tosk regions who speak some local dialect that's completely unintelligible to all other Albanian speakers, just like if I were to drive 2 hours to the East I have no idea what the fuck any of those damn Hyde-county rednecks are talking about. — [Ric Laurent] — 23:04, 2 November 2011 (UTC)Reply

In any case, "sq" is reserved for Albanian, not Tosk Albanian. The language code for Tosk Albanian is "als". So if someone adds a translation using "sq" into Albanian, which is not Tosk, then it gets misleading. Cf to Norwegian - "no" where Norwegian Bokmål's code is "nb" and Nynorsk is "nn". Tosk Albanian is used in education, media, books, etc. in both Albania and Kosovo, although there are writers who still write in Gheg. --Anatoli 23:12, 2 November 2011 (UTC)Reply

I agree with Ric Laurent, unsurprisingly. And I would like to reiterate what's already been said: Standard Albanian is a Tosk variety, or, at least, a variety heavily influenced by Tosk. Standard Albanian does not equal Tosk, any more than the British version of Standard English equals Cockney. And according to a good friend who is a native speaker of a Gheg variety, since the fall of Communism in Albania, the standard language has been more and more influenced by Gheg varieties. I'm not sure why, exactly. Maybe economic reasons or something? embryomystic 01:39, 4 November 2011 (UTC)Reply

User persistently inserting examples of proper names written in Roman, using his aliases or his own user account

Latest comment: 13 years ago32 comments7 people in discussion

A user persistently inserting examples of proper names written in Roman into Mandarin entries or creating "Mandarin" using English proper nouns - Leeds, Hyde Park, London, Thames, etc. with or without Chinese siffixes, using his aliases he has no problem generating (this time it was Special:Contributions/2.27.73.75 or his own user account - Special:Contributions/Engirst. I had to fix - convert to proper Mandarin spelling and protect quite a few entries from him. --Anatoli 04:51, 4 October 2011 (UTC)Reply

He has no problem generating new IP addresses: Special:Contributions/2.27.72.78. --Anatoli 04:58, 4 October 2011 (UTC) (Note: range blocks were tried before). --Anatoli 05:01, 4 October 2011 (UTC)Reply

User:Liliana-60 has unprotected Nei Mongol with a summary "this is the wrong way to deal with single user issues".

Copying my question to Liliana-60, which may be of interest to others:

It may be wrong to protect pages because of one user but can you suggest anything else? I've been trying not just revert everything he does but fix and use some of the positive information he adds. It's hard though. He is very productive and inventive as far as avoiding blocks goes and his whole activity is about to show that Mandarin can be written in Roman letters, proving his point and edit-warring. --Anatoli 05:35, 4 October 2011 (UTC)Reply

Use abusefilter to record all changes to Mandarin entries, block all edits which create Mandarin sections in entries with names containing two consecutive Latin letters, block all edits which create Mandarin entries with name containing one word which doesn't contain any Pinyin diacritics (āáǎàōóǒòēéěèīíǐìūúǔùüǖǘǚǜêê̄ếê̌ềĀÁǍÀŌÓǑÒĒÉĚÈ), and block edits which add #: or #* examples to Mandarin Pinyin entries (characterised by presence of ===Romanization=== and/or {{pinyin reading of|...}} and/or {{cmn-pinyin}}). 60.240.101.246 07:23, 4 October 2011 (UTC)Reply

Pinyin entries (===Romanization===) are allowed but I see what you mean. ;) --Anatoli 10:27, 4 October 2011 (UTC)Reply

I think that using abuse filters to enforce policy might be a bad idea, I think expanding their scope beyond pure vandalism can have potentially harmful side effects. Does anyone have links to the range blocks or discussions about blocking this user? I am always concerned about getting rid of people who have so much passion about this project, even if that passion seems (or is) completely misdirected. - [The]DaveRoss 10:41, 4 October 2011 (UTC)Reply

From memory he generated IP addresses way outside his normal ranges. Also, there is a chance that if he knows that he may be blocked somehow (that he is not so "invincible") - temporarily or permanently, he may change his attitude and won't work against the rules and consider opinions of others. No, not asking for a definite block just yet. In any case, if a complex block would be used, that would be a collective decision, not individual. I heard something about the possibility of contacting ISP, if there is a serious attack or vandalism but it's not that bad. Yes, passion should be controlled, otherwise they cause problems or work for others.--Anatoli 12:00, 4 October 2011 (UTC)Reply

(merged with above) --Anatoli 21:52, 4 October 2011 (UTC) Planck常数 (Planck Chángshù) It is a real word, please see Google Books. 2.25.214.61 21:38, 4 October 2011 (UTC)Reply

Yes, mixed language example are citable alright but I hope after the vote, only 普朗克 (Pǔlǎngkè)常數／常数 (chángshù) / 普朗克 (Pǔlǎngkè)常数 (chángshù) (Pǔlǎngkè chángshù) will be allowed, the reasons explained many times, I won't repeat. Your pinyin entries, romanising mixed language will also become invalid. You are the only person pushing to write foreign words in Mandarin using Roman letters. Otherwise, there wouldn't be a need for the vote. Also, use your real account, Engirst, no need to pretend you are many.

A freshly generated IP-address - 2.25.214.61

I have started keeping records of the number of IP addresses you are using and wil try to find your old user names and IP addresses, as this is a rather rare case of abuse. Nothing personal. --Anatoli 21:52, 4 October 2011 (UTC)Reply

A recently generated and blocked (not by me) IP Address: Special:Contributions/2.25.212.83. --Anatoli 23:12, 4 October 2011 (UTC)Reply

For what it is worth, even though there seem to be a lot of IPs it is really only one ISP. That ISP has 3 /16s, but we can narrow it down to maybe 4 or 5 /21s or higher I think. I would want to check and see how much collateral damage that would mean but blocking all of the IPs this person uses would not be hard if that was the consensus. - [The]DaveRoss 01:50, 5 October 2011 (UTC)Reply

I previously range blocked him before the vote on pinyin entries had passed using a subnet filter of 16 bit. I have been trying to help him in good faith, but he's really testing my patience. His sole purpose is to make romanization of Mandarin the standard in this dictionary. Personally, I think this is simply not viable in the long run. It's the sheer amount of information in the language that will be lost and the amount of confusion that this will cause as a result, should everything be written in pinyin. Mandarin, unlike English, has a huge number of homophones and heteronyms. This is THE reason why it should not be romanized. The syllable yī, has over 50 known homophonous characters associated with it, each with its own set of meanings and in some cases, its own set of heteronyms. This is also one of the reasons the Japanese adopted Kanji characters to distinguish between the meanings of homophones. If things keep worsening, I will consider blocking him again. Jamesjiao → ^{T ◊ C} 03:23, 5 October 2011 (UTC)Reply

The "user with many names" seems to be following pinyin entry rules, more or less. It's a new issue. abc123 (his original name) or Engirst is ready to fight (edit war) over Thames河, Hyde公园 and many others, forces examples like "London是英国的首都。" instead of "伦敦是英国的首都。". I had to protect some pages (temporarily) from his edits. As a Chinese speaker, what's your opinion on this type of entries? --Anatoli 03:50, 5 October 2011 (UTC)Reply

A freshly generated IP-address - 2.25.212.57 --Anatoli 09:41, 6 October 2011 (UTC)Reply

My proposal: I will block anonymous contributions from the ranges which have recently been abused. We leave Engrist unblocked (unless new reason for blocking arises) for the time being at least until the end of discussions on how to handle the particular Mandarin issues currently in debate. Once those issues have been resolved, Engrist can choose whether or not to abide by the results; if Engrist chooses not to follow the community resolution we formally ask Engrist to leave the project, modify the blocks to include logged in users, and actively block future socks. I think this can be done with minimal collateral damage. - [The]DaveRoss 21:25, 6 October 2011 (UTC)Reply

At the moment, we are only blocking individual IPs. I propose to range block. I've done some research and it seems that his ISP (in London, UK) gives out dynamic IPs in the range of 00000010 00011000 00000000 00000000 and 00000010 00011011 00000000 00000000 with a subnet mask of 11111111 11111100 00000000 00000000 (in IPv4: 2.24.0.0 to 2.27.255.255 with a 14-bit subnet mask). Blocking the whole range would mean possible collateral damage, but it wouldn't be too bad if we still allow account creation. Jamesjiao → ^{T ◊ C} 22:29, 6 October 2011 (UTC)Reply

FWIW, here is a list of IPs: 2.25.191.81 (?), 2.25.191.225 (?), 2.25.193.30, 2.25.212.57, 2.25.213.147, 2.25.214.61, 2.27.72.254, 2.27.73.75, 2.27.72.78 (I am not convinced of this one). AFAICT, few other editors have edited recently from IPs in that range. - -sche (discuss) 22:28, 6 October 2011 (UTC)Reply

All of these are incarnations of the same entity. Jamesjiao → ^{T ◊ C} 22:32, 6 October 2011 (UTC)Reply

This is why I proposed to do the range block, I checked the potential ranges and can avoid pretty much all collateral damage as well as target the IP address space which is allocated to whatever region this user is in. - [The]DaveRoss 22:46, 6 October 2011 (UTC)Reply

I actually forgot about my previous reply to your proposal. Silly me. Well at least you have my support. Jamesjiao → ^{T ◊ C} 22:50, 6 October 2011 (UTC)Reply

Another IP-address for the record - 2.27.72.125. Does anyone still think it's different people? --Anatoli 01:20, 7 October 2011 (UTC)Reply

Immediately after me trying to talk to him, he "moved" to a new IP address: Special:Contributions/2.25.212.90. It must a game of chasey for him. --Anatoli 02:06, 7 October 2011 (UTC)Reply

Does anyone else think that the range is too wide between 2.27.... and 2.25...? --Anatoli 02:08, 7 October 2011 (UTC)Reply

The IP hops might be intentional, they might be the way the ISP operates. Seeing as we are not blocking most of these IPs I can't imagine why they would change IPs between edits. There are three much smaller ranges (/23s) which are more realistic. - [The]DaveRoss 02:13, 7 October 2011 (UTC)Reply

Some ISPs allow a fresh IP address to be assigned when you cold restart your modem. I think Engirst might have found this trick. Jamesjiao → ^{T ◊ C} 02:32, 7 October 2011 (UTC)Reply

Some ISPs also give their users subnet IPs and force all traffic through proxies, which means that every few minutes or hours they may have a different IP presented to the outside network based on which proxy they end up on. AOL was like this for many years. What you say makes sense if we were blocking each IP, since we hardly have any blocked it makes little sense. - [The]DaveRoss 04:03, 7 October 2011 (UTC)Reply

I don't understand why, though. I have addressed him several times with no answers but every time he changes his IP address. His user account (Engirst) is not locked. He prefers the backdoor, as if nobody can see what is happening. BTW, his first account was "123abc", not abc123 as I said before. Then, there was "Ddpy". Most of his edits are now gone but there are still many to be fixed or deleted. --Anatoli 02:43, 7 October 2011 (UTC)Reply

I'd be happy for us to delete any pinyin where we don't have the Hanzi equivalent. Perhaps we could make it a formal rule. I'm not sure such a rule is needed, as I deleted about 100 such entries last night and nobody objected. Mglovesfun (talk) 07:54, 7 October 2011 (UTC)Reply

Would that be bot-able? Basically check all pinyin entries to see if there are any hanzi entries that list the same pinyin, and delete the pinyin entry if no such hanzi entries are found? -- Eiríkr Útlendi | Tala við mig 16:49, 7 October 2011 (UTC)Reply

Well, a pretty good but imperfect solution is this edit to {{pinyin reading of}} which checks if the first parameter (aka tra, trad) exists. If it doesn't exist it categorizes the entry in Category:Mandarin pinyin entries without Hanzi. This of course won't work for entries that don't use pinyin reading of, and it will miss entries that exist but lack the correct language (that is, they have only Japanese/Cantonese/Korean or whatever). Mglovesfun (talk) 16:53, 7 October 2011 (UTC)Reply

Hanzi live in a particular range of Unicode, so I think it would be possible to find all of the pinyin readings that way, regardless of template usage. - [The]DaveRoss 19:56, 7 October 2011 (UTC)Reply

New "additions" of Special:Contributions/2.25.214.239, all mixed language items, the foreign names are all deliberately untranslated: Ohm定律, a correct Mandarin is "欧姆定律" (Ōumǔ dìnglǜ), Banach空间 - "巴拿赫空间" (Bānáhè kōngjiān), Hilbert空间 - "希伯特空间" (Xībótè kōngjiān), also by another 123abc's sockpuppet: Special:Contributions/2.27.73.100 Hausdorff空间 - "豪斯多夫空间" (Háosīduōfū kōngjiān). Happy to block to user and delete all these entries, they are not Mandarin. Soft redirect might be considered if we have the Mandarin, not mixed entries.--Anatoli 21:56, 9 October 2011 (UTC)Reply

Planck常数 (Planck Chángshù)

Latest comment: 13 years ago65 comments11 people in discussion

Concurrent discussion: Wiktionary:Requests_for_deletion#Planck.E5.B8.B8.E6.95.B0, Talk:Planck常数

It is a real word, please see Google Books. Anyhow, shouldn't deleted and blocked. 2.25.214.61 21:38, 4 October 2011 (UTC)Reply

I answered above. --Anatoli 21:55, 4 October 2011 (UTC)Reply

All languages in the world, big or small should be equal. They are real words, only dictators want to ban real words. 2.25.214.61 22:03, 4 October 2011 (UTC)Reply

Am I banning a language? I don't want you to insert mixed language entries and translations - simply called Chinglish. Wiktionary is not to show how people with poor language skills use it. Quoted "普朗克常数" has 1,500 hits in Google Books, why would anyone want to promote "Planck常数" instead (29 hits)? You are just spreading illiteracy. No-one is trying to force "Planck 상수" (Korean) or "постоянная Planck" (Russian). I have to protect pages because of you, just stop it, will you? --Anatoli 22:32, 4 October 2011 (UTC)Reply

"poor language" is just your personal idea, but Planck常数 is used in professional books. 2.25.214.61 22:39, 4 October 2011 (UTC)Reply

Who defines poor language? CFI does not have a quality stipulation on acceptable words, nor is it acceptable for any Wikimedia project to promote anything. If there's 29 hits of Planck常数, then it has the same justification as Planck's constant, and quite probably should have a usage note pointing to 普朗克常数.--Prosfilaes 05:12, 5 October 2011 (UTC)Reply

I have no choice but temporarily protect pages from you. To anyone, please contact me if you think I'm abusing my administrator rights. I really see no choice at the moment. The reasoning is explained many times, I won't repeat. Seems like déjà vu. "Poor language" is my abbreviation of all said before. --Anatoli 22:42, 4 October 2011 (UTC)Reply

Yes, you are. You are a language dictator. 2.25.214.61 22:47, 4 October 2011 (UTC)Reply

Whatever, when a person like you says it, it means I'm doing the right thing, thank you. Knowing your records, I'm 100% sure that if you had adminsitrative rights you would dictate Mandarin without Chinese characters onto Wictionary or something. I don't think you are a passionate linguist, you're obsessed with your "transition to Roman letters" ideas. If you are Chinese, you must be ashamed, I support the Chinese person who told you off. --Anatoli 22:55, 4 October 2011 (UTC)Reply

(@2.25.214.61 etc) Please refrain from name calling, it will certainly not further your cause. It is very important to recognize that this website is a collaborative project. Even though we all have our own opinions about what Wiktionary should be, we agree to sacrifice some of what we want so that Wiktionary can be what the whole community wants. Please take some time and consider what your goals are, and then present them to the community for discussion, persuade us, and allow us as a community decide which course to take. A sure fire way to lose any support you may have had is to simply try to impose your ideas on the community against its will and then get defensive about it. If you continue to be as combative as you have been then we will most likely ask you to leave for the good of the project. That may not be a bad thing, Wiktionary is not a good fit for everyone, but I would rather you decide that the community effort is worth some sacrifice and join us under the agreed upon terms. Thanks, - [The]DaveRoss 01:40, 5 October 2011 (UTC)Reply

Thank you, TheDaveRoss. I just want to comment that similar ideas are shared by people on pinyininfo.com, some of them do make sense to me - standardisation of pinyin and transliteration of Chinese names for all Mandarin speaking countries and areas. I'm sure he will be welcome there. Another thing - he was already told to leave, blocked numerous times, only a few by me. Talk to User:Tooironic. Then he reappeared with no difficulty, making the administrator right to block a contributor who creates more problems than adds value, a joke. --Anatoli 01:52, 5 October 2011 (UTC)Reply

Problems are caused by somebody abuse power. Wiktionary has no rule to ban mixed scripts till now. 2.25.214.61 02:12, 5 October 2011 (UTC)Reply

I agree, problems can be caused by those who abuse power. Problems can also be caused by those who refuse to listen to others in the group. The way rules are developed on Wiktionary is very organic. We don't have a rule for something until a disagreement about it arises. Once a disagreement does arise, those who are in disagreement stop what they are doing and open the issue up for discussion, either between those who are close to the problem or the community as a whole. If it makes sense, there is a rule created based on the result of the discussion. Just because something doesn't have a "rule" doesn't mean that it is allowed. We don't have a rule about deleting the main page, yet it is something for which a person would be punished. Thank you for your willingness to discuss the issue. - [The]DaveRoss 02:27, 5 October 2011 (UTC)Reply

(@2.25.214.61 - Engirst) Really? It must be me? What about your toneless pinyin story? Remember this Wiktionary:Beer_parlour_archive/2010/May#block_list?

BTW, I added many Mandarin translations using mixed scripts, eg. DVD player, edited 卡拉OK, created T恤衫. I have no problems with many others. Good try but you need more correct answers. --Anatoli 02:34, 5 October 2011 (UTC)Reply

On such topics, decisions should be based on consensus, not on votes based on personal opinions. E.g. even if a majority wants to exclude a language (for political or whatever reasons) while a minority wants to keep it, it should be kept if it can be called a language. It's the same for words. There should be a discussion between open-minded people until a consensus is reached. Lmaltier 18:05, 5 October 2011 (UTC)Reply

Lmaltier, although I broadly agree with you here, I'm not sure you know what (deprecated template usage) consensus means, and as a result, you seem to contradict yourself.

Regarding individual terms, the crux of the current issue revolves around what constitutes "Mandarin", and the majority opinion (i.e. rough consensus) appears to be that Mandarin does not include "Alzheimer's" or "Einstein" or "Thames" or "Planck". I think all the Chinese editors here would agree that 常数 (chángshù) is Mandarin, which makes Planck常数 a curious mixed-language hybrid term.

I see only two clear paths forward for keeping terms like Planck常数:

Create a ==Chinglish== (or similar) language heading, and categorize such terms under this.
Include such terms, but keep the entries extremely simple, just listing the terms as alternate spellings or misspellings and linking through to the hanzi-spelled entries that contain the definitions, usage examples, etc.

Without any general consensus (there's that word again) as to which course to take, I'm inclined to view these as truly mixed-language terms, that would thus not belong under any single-language header. -- Cheers, Eiríkr Útlendi | Tala við mig 18:42, 5 October 2011 (UTC)Reply

In any case, there should be an agreement before we allow creation of so many hybrid entries, it's not a common practice here. Only one user (even if under different names or anonymously under different IP addresses pushes it so ardently). That's why I created the vote - to get a collective decision. Lmatier, you'll have a chance to vote and express your opinion. If the vote fails (hope not), then we need to discuss the details. I agree with Eirikr that we shouldn't keep them as just Mandarin entries because they are not. --Anatoli 23:12, 5 October 2011 (UTC)Reply

What I mean is that there should be clear and simple principles, the main ones being all words in all languages and a header for a language means that the word is used in this language. A consensus is not the result of a vote, only the fact that all open-minded people agree, after discussion based on arguments, that principles are met, even if some (or the majority) would prefer not to include the word for personal reasons (because they don't like it, etc.), or agree that principles are not met. Lmaltier 05:21, 6 October 2011 (UTC)Reply

We don't allow some SoP's, even if some people think they are words, do we? We don't need "blue sky" or "tram no. 20" or "Chinese for London is 伦敦". To me and a few others, as you can see "Planck常数" is not a word but two: Planck + 常数, and one of them is not Chinese, even if it's used in a Chinese, it's still English inside Chinese. There will be more and more English words used by Chinese but why do we need to include them here if they are haven't become part of the language? --Anatoli 05:41, 6 October 2011 (UTC)Reply

Anatoli, "Planck常数" is not a semantic sum of parts: its meaning cannot be obtained from the knowledge of the meaning of "Planck" and "常数". The same is true of the English "Planck constant", for which we have Planck's constant. This discussion should not be in BP anyway, but rather in RFV (if you think the term is not attestable) or in RFD (if you think the term is a semantic sum of parts or have other reasons to believe the term does not meet CFI). Furthermore, "Planck常数" is not a proper noun, so the vote you have proposed (Wiktionary:Votes/2011-10/CFI for Mandarin proper nouns - banning entries not in Chinese characters) will have no effect on the inclusion of "Planck常数". See also WT:RFD#Planck常数, which I have created based on your having tagged the term for RFD on 4 October. --Dan Polansky 06:41, 6 October 2011 (UTC)Reply

The Chinese term for "Planck constant" is "普朗克常数", not "Planck常数". There's nothing Chinese about the name "Planck". If a person called Mark says "我叫Mark" - my name is Mark, rather than "我叫马克 (wǒ jiào Mǎkè)". "Mark" doesn't become Chinese translation of English "Mark" but "马克" is. (One of the books with "Planck常数" has "Heisenberg 和 Schr6dinger"(?)). Engirst hardly contributes in the main Mandarin area, only pinyin. Why when he does, it's "Planck常数", not "普朗克常数". The ratio is 1,500 to 29. Don't you see he has an agenda? Quite amusing were his examples he was forcing - "London是英国的首都". Is this Mandarin?! You can find "Obama总统", "Cameron首相". So what? Do we start adding them as Mandarin? --Anatoli 08:36, 6 October 2011 (UTC)Reply

Do you acknowledge that "Planck常数" is not a semantic sum of parts? --Dan Polansky 11:43, 6 October 2011 (UTC)Reply

That would be recognising it as a term, no I don't recognise it as a word, there are two languages in one sentence. It's artificial and not assimilated at all. "A rare misspelling of" is the best I can give it, it could be a single unit to someone using a hybrid of languages, like the person who could say "我是America人" - "I'm American". Are you actually reading what I said before? --Anatoli 12:02, 6 October 2011 (UTC)Reply

I am asking about whether you acknowledge that the term is not a sum of parts. I am not asking whether you acknowledge the term to be a word, whether you deem the term worthy of inclusion, or whether the term is "artificial" or "assimilated". This question of whether it is a sum of parts can IMHO be fairly objectively answered in the negative, so I am asking whether you can confirm the observation that the term is not a semantic sum of parts, disregarding for a while your goal of getting the term excluded. If you claim that the term is a semantic sum of parts, can you explain whether you deem "Planck constant" a semantic sum of parts and why? --Dan Polansky 12:10, 6 October 2011 (UTC)Reply

普朗克常数 (Pǔlǎngkè chángshù) is a semantic term. Planck常数 is European Planck + Chinese constant. If you look up 常数, then you will have the translation of Planck常数 without any need for Planck常数. —Stephen ^(Talk) 12:17, 6 October 2011 (UTC)Reply

As another example of a term, which is not assimilated but was uttered and there is one citation, is "сраный ковбой" (shitty cowboy). I requested its deletion because it never caught on, not assimilated in the meaning American (abuse). Mixing English names and words into Chinese is not a new trend and we already have the English names. Most famous English proper nouns can now be found in a Chinese text, rivers, cities, mountain ranges, formulas, theorems will follow the original term with a Chinese suffix. Will "Mont Blanc山" or "California州" become Chinese only because they are followed by 山 and 州? --Anatoli 12:19, 6 October 2011 (UTC)Reply

@Dan Polansky. "Planck常数" is a sum of parts. --Anatoli 12:25, 6 October 2011 (UTC)Reply

Are you saying that "Planck常数" is a semantic sum of parts, while the English "Planck constant" is not a semantic sum of parts and "普朗克常数" is not a semantic sum of parts? Is this conjunction of three assertions what you are saying? --Dan Polansky 15:48, 6 October 2011 (UTC)Reply

@Anatoli, @Dan --

I think you two might be talking past each other. @Anatoli, by saying that "Planck常数" is not SOP, I think Dan is stating that the meaning of this phrase is not clear just from the parts -- if I only know (or only look up) "Planck" and "常数" as individual pieces, I have no idea that "Planck常数" is intended to mean h in physics.

Meanwhile, @Dan, I think what Anatoli is getting at is that "Planck" is a term in English (and other European languages), and "常数" is a term in Mandarin, and while the average Mandarin reader would understand the latter, the former would only be understood by that subset of Mandarin readers who are also at least somewhat familiar with European languages. By saying that "Planck常数" is SOP, I think Anatoli is stating that this term is comprised of two distinct parts, and only one of these parts is recognizable as Mandarin.

@Anatoli, @Dan, have I understood each of you correctly? -- Hoping this helps clarify, Eiríkr Útlendi | Tala við mig 20:05, 6 October 2011 (UTC)Reply

Eiríkr Útlendi (or just "Eiríkr"?), you understand me perfectly well. I believe my use of the phrase "sum of parts" and "semantic sum of parts" is in perfect align with the customary use of the phrase in English Wiktionary, and also fits the natural reading of the phrase "semantic sum of parts". My use refers to WT:CFI#Idiomaticity and its use of the term "idiomatic", defined in CFI in this way: 'An expression is “idiomatic” if its full meaning cannot be easily derived from the meaning of its separate components.' Where CFI says "idiomatic", I say "not a sum of parts" and "not a semantic sum of parts".

Re: 'By saying that "Planck常数" is SOP, I think Anatoli is stating [...]': This would mean that Anatoli has invented a new meaning of "sum of parts" as applied to terms, a meaning that has nothing to do with WT:CFI#Idiomaticity and hence is irrelevant. I reject this new meaning as part of a meaningful discussion about inclusion-worthiness of terms; the term "sum of parts" has, in Wiktionary discussions, a specific bound meaning such that editors are not free to redefine the term as they see fit. --Dan Polansky 20:48, 6 October 2011 (UTC)Reply

┌────────────────────────────────────────────────────────────────────────────────────────────────────┘ @Dan, one thing I haven't heard articulated yet by you (or at least haven't understood) is your views on the status of the term Planck常数 with regard to language. SOP or not, the main reservation from Anatoli (and myself if I'm perfectly clear about that) is that Planck常数 is not a single-language term. In arguing for this term's inclusion, do you view Planck常数 as common use in Mandarin contexts, and therefore meeting CFI as a single-language term?

Anatoli and IP user 60.240.101.246 are self-identified as Mandarin speakers, and neither are happy including this term, with both stating that Planck常数 as a whole is not Mandarin. James Jiao identifies as a native speaker and weighed in here regarding the term Thames河, and at least some of his points in that thread would seem to apply to Planck常数 as well. The main user(s) adding such terms and arguing for their inclusion, specifically Engirst and multiple IP users who may or may not also be Engirst, have never to my knowledge indicated whether they are Mandarin speakers, even when asked point-blank. (I'm not fluent in Mandarin nor familiar enough with writing styles to say much about specific Mandarin terms on my own authority, but I am concerned about the possible precedent and how that might affect entries classified as Japanese; hence my participation here.)

I would appreciate it if you could explain a bit about your specific reasons for wanting to include Planck常数. Your views on this term's non-SOP-ness are clarified by your post above, so what other reasons do you have? I'm honestly curious, and I do not feel like I understand your position well enough to really agree or disagree in any clear and reasoned fashion. -- Eiríkr Útlendi | Tala við mig 21:12, 6 October 2011 (UTC)Reply

I have not said anything about whether I want "Planck常数" included. Rather, I wanted Anatoli to stop erroneously claim that "Planck常数" was a sum of parts. This is a hard subject; the thought about it is not made clearer by fallacious argumentation that involves erroneous claims of sum-of-part-ness, and terms such as "madness", "spread illiteracy", "have an agenda", and "Chinglish". A reason for wanting the term included would be that it meets CFI. A term that meets CFI can still be tagged as "rare"--which it definitely is--or even as "nonstandard"--which it seems to be as well. The dictionary's containing a term does not yet mean that the dictionary somehow endorses the term or recommends its use. The dictionary merely registers observations about the actual use of language. I have no strong feelings about "Planck常数"; it is so rare that it can be considered a rare malformation or something, not much unlike a rare misspelling; I do not really know. I do admit that no one will probably want to look up the term, an indication that it could be deleted. What I am passionate about is elimination of wrong argumentation, though, wrong as far as I am able to tell anyway. Furthermore, CFI does not say anything about "common use", other than in "Any word may be rendered in pig Latin, but only a few (e.g., amscray) have found their way into common use", which is a sentence in a rather poorly phrased section of CFI that has been kept for no consensus for deletion (5:4:0 for deletion) in the vote Wiktionary:Votes/pl-2011-01/Final_sections_of_the_CFI, but should better be deleted anyway so as not to mislead, as WT:CFI#Attestation does not say anything about "common use". --Dan Polansky 21:39, 6 October 2011 (UTC)Reply

Ah, thank you, now I have a better understanding of where you're coming from. FWIW, I am slowly warming to the idea of inclusion with a soft redirect to the main entry at 普朗克常数 and a note about rarity, iff acceptable citations can be provided.

As a minor point, WT:CFI#Attestation does state “Attested” means verified through 1. Clearly widespread use -- notably, not as the sole limiting criterion, but "common use" would appear to be one of the criteria. -- Eiríkr Útlendi | Tala við mig 22:08, 6 October 2011 (UTC)Reply

Above all, point 1. of WT:Attestation is an item of a disjunctive list (A or B or C or D), so it is not a necessary requirement for attestation. Point "1. Clearly widespread use" should IMHO be deleted from CFI; it just misleads. Fact is, point 3. of WT:Attestation ("Usage in permanently recorded media, conveying meaning, in at least three independent instances spanning at least a year, [...]") provides more lenient criterion than the point 1., so the point 1. is redundant. What point 1. currently does is make it possible for people to claim in WT:RFV that a questioned term "is clearly in widespread use", but that is IMHO a matter of procedure rather than an extended definition of "attested": CFI does not state how the attestation should be documented, in particular, whether the quotations need to be actually entered into the Wiktionary database. So again, removing point 1. would simplify things without changing the substance of CFI, IMHO. --Dan Polansky 07:16, 7 October 2011 (UTC)Reply

That's a clear explanation, thank you. Wiktionary:CFI#Attestation_vs._the_slippery_slope suggests that attestation alone should not be the sole justification for inclusion, however; what's your view on that? (And perhaps this particular discussion should be moved into a separate thread? This is getting unwieldy.) Never mind on that second part, just saw your reply over at Wiktionary:Requests_for_deletion#Planck常数, which answers my question. -- (Updated) Eiríkr Útlendi | Tala við mig 16:54, 7 October 2011 (UTC)Reply

My view on that is that Wiktionary:CFI#Attestation_vs._the_slippery_slope is purely informative; it provides information on interpreting the rest of the document, not actual rules for what passes CFI.--Prosfilaes 17:00, 7 October 2011 (UTC)Reply

I think that Planck's constant itself is SOP. It means, after all, any constant invented by some guy named Planck. I had a friend called Martha Planck who made a constant once.--Rockpilot 22:18, 6 October 2011 (UTC)Reply

(After an edit conflict)I do not claim to have a new definition of "sum of parts" but "Planck常数" is a sum of parts because it's not a Chinese term at all. I do have strong feelings about NOT keeping this type of entries because they are simply wrong. A physical Chinese dictionary simply uses 普朗克常数 (Pǔlǎngkè chángshù), explaining that 普朗克 (Pǔlǎngkè) is the transliteration of the name "Planck". I'm not skilled at presenting my arguments in English well but allowing "Planck常数" would present a bad precedent, like "Archimedes螺线" instead of "阿基米德螺线" (Archimedean spiral) or similar (don't quote me on the exactness of the possible way of someone writing in a Chinese text). I'm less worried about "sum of parts" rules than the quality of foreign language entries. Sum-of-parts problem is noticed quickly when an entry is English but if they are in FL, many get through unnoticed. Are you angry with me because I used "madness", "spread illiteracy", "have an agenda", and "Chinglish"? It's madness to convince everyone that "Thames河" is Mandarin for "Thames", it's also illiterate, although it's often forgiven to overseas Chinese not knowing how to write a foreign name in Chinese. "Madness" is a strong word but I do have strong feelings about it. I'm not calling Engirst (he doesn’t want to use this account any more?) mad but I DO think he has an agenda. His agenda (it's only one male user, not many) was confirmed many times by Chinese speaking contributors, let me call it "Mandarin in Latin script". Next term - "Chinglish", among other things, means "mixed Chinese and English" or a hybrid language, not offensive. People do use Chinglish, Japlish, Runglish, Konglish, etc. but we don't have CFI for them. I don't think I was offensive to anyone but if I was I apologize. I had an argument with a Russian Wikipedia editor that "Bluetooth" is not a Russian term, well he quoted sources like our case with "Planck常数", still "Bluetooth" hasn't become a Russian word in this spelling. Languages not using Roman letters all have different perceptions of what IS part of their language, especially if it is written in a different script, generally, in 99.9% cases - if a word is not in a native script, it's not part of this particular language, with a very few known exceptions. Shall we agree to disagree at this point? You are welcome to take part in the vote and present a summary of your reasons.

After reading new Eirikr's comment - yes, rather than deleting, having a soft redirect could be a compromise I would accept, Chinese struggle themselves knowing how to transliterate a foreign name and there could be variants, not just between China/Taiwan/HK but even in one country. --Anatoli 22:33, 6 October 2011 (UTC)Reply

Re: '"Planck常数" is a sum of parts because it's not a Chinese term at all': If this is not a redefinition of "sum of parts", then I do not know how a redefinition would look like. It does not seem to have anything to do with WT:Idiomaticity: 'An expression is “idiomatic” if its full meaning cannot be easily derived from the meaning of its separate components'. --Dan Polansky 07:16, 7 October 2011 (UTC)Reply

Should we decide if we are willing to accept soft redirects before starting the vote? - -sche (discuss) 07:19, 7 October 2011 (UTC)Reply

I already expressed my acceptance. --Anatoli 21:59, 7 October 2011 (UTC)Reply

Right, but if the vote passes, it will ban soft redirects for entries that contain or are proper nouns. If we're OK with making the entries "soft redirect" (point in an explanatory way to the main entries, like 'ave points to have), we shouldn't necessarily hold that vote; we should just make mixed-language/mixed-script entries into soft redirects. - -sche (discuss) 22:35, 7 October 2011 (UTC)Reply

We have some time before the vote. We need to see the reaction of opponents of the proposal first. --Anatoli 22:42, 7 October 2011 (UTC)Reply

Besides, I don't see a contradiction of the vote and redirects. Banning will not disallow redirects like Mockba. I may add a clause. --Anatoli 22:45, 7 October 2011 (UTC)Reply

I am going to oppose the vote Wiktionary:Votes/2011-10/CFI for Mandarin proper nouns - banning entries not in Chinese characters. It is essentially prescriptivist ('"Thames河", "Planck常数", "Alzheimer病", etc. could be made soft redirects to the correct Mandarin entries.', emphasis on "correct" mine). Furthermore, it seems fairly incoherent at this point. The vote seems to modify CFI for Mandarin, yet the sixth reason stated in the vote claims the discussed terms such as "Alzheimer病" already fail CFI as being sum of parts. The vote seems to be predicated on the assumption that it is the business of a dictionary to "prevent spreading illiteracy", whereas the business of a descriptivist dictionary is to document what can actually be observed, and mark it as "rare" and "nonstandard" if it fits observation. By including a term, a dictionary does not promote the term, especially when the term is marked as nonstandard. In particular, by including vulgar terms, a dictionary does not promote their use; by including terms marked as obsolete, a dictionary does not promote their use. By an analogy, a library of all books ever published on the Earth contains all books, regardless how objectionable the books may seems to the librarians or majordomos of the library. There are some further issues with the vote. --Dan Polansky 07:54, 8 October 2011 (UTC)Reply

It's normal to have language specific policies, especially if they are more restrictive than CFI for English (not breaking the existing rules). People who know the language they are editing in, will know better, other people may offer decisions that may be wrong or not followed. Dan, languages you work in, are mostly Roman based, not sure you understand that words like iPhone, for example, can be used in a Russian, Mandarin or Japanese text, you'll find a millions of citations of it, but they are not part of that language even if they are the official forms - a sign will say "iPhone", not "айфон", does it make sense? Similar with "Planck常数", only Chinese living overseas and mixing languages will know what it means. I start suspecting that you too have some agenda. Why so much enthusiasm towards Mandarin all of a sudden when we are dealing with a mixed script? You are even being aggressive towards me calling my arguments "fallacious". Also, why does it worry you personally, what is included as a Mandarin term in Witionary? Do you work with Mandarin? No Mandarin dictionary in whatever country, no matter how large, would include such terms. Anyway, the discussion in Talk:Planck常数 seems to lead to a possible compromise. If we won't reach it, we'll decide on the vote. BTW, I don't think extending it to one month is needed, two weeks will suffice. --Anatoli 11:13, 8 October 2011 (UTC)Reply

"Have an agenda" and concern with personal motivation are fallacies of irrelevance; my enthusiasm is of no one's concern. I find many of your arguments fallacious ("characterized by fallacy; false or mistaken"), and feel entitled to say so without considering it a personal attack. I am worried with a proliferation of prescriptivist inclusion criteria, and with spread of prescriptivist thought in Wiktionary, as well as with incorrect use of the term "sum of parts" AKA "nonidiomatic", incorrect with respect to WT:Idiomaticity.--Dan Polansky 13:08, 8 October 2011 (UTC)Reply

I don't understand what this discussion is about. This is a non-Mandarin word (Planck) combined with a Mandarin word (常数). Mandarin speakers don't perceive "Planck常数" to be a Mandarin word (ask any native speaker if in doubt), so even if this is not a SOP, it shouldn't be kept. "Москва" is used in English but English speakers don't regard it as an English word, hence it was deleted unanimously. 60.240.101.246 13:21, 8 October 2011 (UTC)Reply

Re: 'Mandarin speakers don't perceive "Planck常数" to be a Mandarin word': What evidence for this assertion do you plan to provide? Are you saying that not a single Mandarin speaker considers "Planck常数" to be a Mandarin word or that most Mandarin speakers considers "Planck常数" to be a Mandarin word? --Dan Polansky 13:33, 8 October 2011 (UTC)Reply

With your capability you will not be able to find one that does (exclud. possibly Engirst, who doesn't seem to be Hanzi-literate). 60.240.101.246 13:36, 8 October 2011 (UTC)Reply

Care to answer my questions? What evidence? Not a single does or most don't? --Dan Polansky 13:39, 8 October 2011 (UTC)Reply

OK. Here is what you wanted. I can't speak for everyone of course, but I do understand the mindset and perceptions of languages by native Chinese speakers better than you do. There are too many references on this issue, eg. 《直用原文──现代汉语外来语运用中的一个新趋势》,《試論漢語文字和中國人的傳統思維方式》,《原形借词——现代汉语吸收外来语的新发展》,《论外来语对现代汉语的冲击》,《关于外来语及其周边概念的考察》,《关于汉语文字的几点认识》,《2010年中国语言生活状况报告》,《现代汉语中字母词研究综述》,《外来语在汉语中的使用及对汉语的影响》, they basically all comment that the increase in loanwords needs to be noted and become alerted to; they don't fit into Chinese phonology and sound very foreign; a recent trend is words in other languages used directly without being transcribed or translated; these words are used to avoid confusion or for convenience; they do not appear in formal situations where transcription and translation always occur and the general public doesn't regard these words as being assimilated into the Chinese lexicon; phonologically adapted loanwords tend to be replaced by native calques eventually; this tendency contrasts starkly with the Japanese and Korean cases, where massive and indiscriminate importation is currently occurring; and in conclusion the import of loanwords damages the structural integrity and purity of Chinese, although some young people view this as fashionable, it should be regulated and discouraged. 60.240.101.246 14:32, 8 October 2011 (UTC)Reply

Essentially the Wiktionary community is a miniaturised version of the general public. Out of those who actively participated in the deletion discussion of these mixed script entries, people who know some Mandarin (Anatoli, Tooironic, Jamesjiao, me) all voted against the inclusion, and people who don't know the language (Lmaltier, Dan Polansky, Prosfilaes, -sche (initially)) tended to keep these. The chance of this occurring assuming equal probabilities for the two cases is 1/256, or 0.4%, low enough to be considered statistically significant. 60.240.101.246 14:40, 8 October 2011 (UTC)Reply

Thus, you do not plan to provide any evidence; instead, you offer yourself as a witness.

Let me highlight this quotation: "[...] the import of loanwords damages the structural integrity and purity of Chinese, although some young people view this as fashionable, it should be regulated and discouraged". This quotation is outright prescriptivist. A prescriptivist lexicographer sees it as a goal of a dictionary to protect "the structural integrity and purity" of a language. Such a prescriptivism is typical of language academies around the world. By contrast, the English language has no such central regulatory body of language; an Anglo-American descriptivist dictionary does not see it as its aim to protect the purity of language but rather aims at documenting the use of language as it actually occurs, regardless whether language authorities approve or disapprove of its use. Moreover, it is still possible in a descriptivist dictionary to note that some authorities consider a term incorrect, whether by means of the template {{nonstandard}} or by means of a usage note. The current entry "Planck常数" lead the user of the dictionary to synonyms: 普朗克常数, 浦朗克常数, 卜朗克常数. If this entry is deleted, the fact that '"Planck常数" is a nonstandard term whose standard and widely accepted synonyms include 普朗克常数, 浦朗克常数, and 卜朗克常数' remains undocumented in the dictionary, an unfortunate circumstance. --Dan Polansky 14:53, 8 October 2011 (UTC)Reply

Re: "[...] they do not appear in formal situations": Neither does ain't and gonna; you do propose to delete these as improper English? Should Category:English informal terms be deleted? And what about such foreign importations as English háček, which threatens the purity of the English language? --Dan Polansky 14:59, 8 October 2011 (UTC)Reply

Dan, I find your doggedness in this issue to be a bit odd. I understand your concerns about prescriptivism versus descriptivism; that part makes sense to me. That said, IP user 60 here, Anatoli, and James Jiao, among others, are basically making the point that terms like "Planck常数" are about as intelligible to Chinese readers as "Москва" is to English readers. If that is the case, and if "Москва" has been deleted as "not English", why are you apparently so opposed to deleting "Planck常数" as "not Mandarin"? I confess I'm confused by your stance, and I must assume it's because I don't fully understand your perspective. -- Eiríkr Útlendi | Tala við mig 03:14, 9 October 2011 (UTC)Reply

"Москва" is perfectly intelligible to English readers; or at least as intelligible as pemoline, ironbark or votator. Furthermore, we didn't delete Москва because it wasn't an English word; we decided it was Russian in English and deleted the English section and not the Russian section. You want to delete Planck常数 as a whole and act like this attestable word doesn't exist just because it doesn't fit your constraints.--Prosfilaes 04:33, 9 October 2011 (UTC)Reply

@Prosfilaes: Are you speaking for yourself, or on Dan's behalf?

That aside, your comment here comes off as disingenuous. English readers unversed in Cyrillic will not find "Москва" at all intelligible, certainly not as "moskva". Moreover, we decided it was Russian in English and deleted the English section sounds an awful lot like "Москва" has been deleted as "not English", leaving me uncertain what distinction you are making. Is your intended point that, since some headword "Москва" still exists, removing the English is acceptable?

Regarding you want to delete Planck常数 as a whole and act like this attestable word doesn't exist just because it doesn't fit your constraints -- my only constraint is that a term be filed under the appropriate language. "Москва" is not English, so I support that term not being listed under an English heading. Planck常数 doesn't appear to fit under any of our existing language headings, so I support that term not being listed under any of our existing language headings.

Regarding attestation that a particular string exists in use somewhere, (deprecated template usage) ittyshay seems like it might be attestable given the number of hits at google:ittyshay, but WT:CFI, as it's currently written, counsels against including pig Latin. In a similar mien, google:"my+natsukashii" suggests attestability for (deprecated template usage) natsukashii in English contexts, but it is not included here under an English heading, ostensibly as it is not recognized as English. Attestation alone appears to be insufficient for inclusion -- which strikes me as reasonable, for it is unreasonable to argue that attestation in a given language context alone makes a term that language -- which is kind of the whole point of this thread, that Planck常数 is not Mandarin. -- Eiríkr Útlendi | Tala við mig 05:27, 9 October 2011 (UTC)Reply

@Eiríkr Útlendi: CFI does not advise against pig Latin but rather says that pig Latin can be included as long as it is attested, giving amscray as an example; you should read the relevant "slippery-slope" section again, and read again my response that the "common use" and "general use" used in that section are misleading and match neither the current practice nor WT:Attestation. Again, in case in doubt, you can create a new thread here in Beer parlour in which we clarify whether people agree that "common use" should be required for pig Latin. google:"my+natsukashii" searches world wide web, which does not count for attestation; google books:"my natsukashii" finds nothing and google books:"ittyshay finds nothing; the relevant point of CFI is "Usage in permanently recorded media, conveying meaning, in at least three independent instances spanning at least a year". From what I can tell, you still have a poor grasp of how CFI usually gets applied, especially the attestation section. Instead of focusing on idiomaticity and attestation as specified in CFI and as usually applied, the supporters of deletion variously claimed that "madness" must be stopped, that importation needs to be regulated, that we must not "spread illiteracy", or, now, that the phrase is non-intelligible to many native speakers. But the non-intelligibility to many native speakers is not a concern per WT:CFI; "Planck常数" or "Planck 常数" seems attested Google books. There are many specialist terms entered into Wiktionary as English that are not readily understood by the majority of English-speaking population. Wiktionary registers attested terms rather than terms that are readily understood. The assertion that 'Planck常数 is not Mandarin' seems implausible, as the term seems attestable in running Mandarin text; the presence of Latin characters alone does not exclude the phrase from Mandarin, as then also "AA制" and "T恤" would be no Mandarin. As regards "Planck常数" vs "Москва", "Москва" can be claimed to be Russian embedded in English and is borderline-attestable in Usenet, none of which holds for "Planck常数", which seems attestable in Google books. As regards my motivation described above as "doggedness in this issue", that again is of no one's concern and has no bearing on the correctness of my arguments, and thus, again, is a fallacy of irrelevance. I do not see why my stubborn attempt to defend CFI and lexicographical descriptivism is "doggedness", while the stubborn attempt to import lexicographical prescriptivism into English Wiktionary (most conspicuously documented in one of the responses of the anon 60.240.101.246 above in this thread) should be considered non-dogged or reasonable. I also don't see why your repeated responses, the last of which mostly ignores points made in my post, should be considered non-dogged; you had the option of not butting in in the conversation between me and 60.240.101.246, now disclosed as a marked prescriptivist who wants to protect the purity of language. In any case, "have an agenda", "madness", "doggedness", and similar non-concerns are best avoided in the discussion. --Dan Polansky 07:04, 9 October 2011 (UTC)Reply

I speak for myself, of course. Москва was not deleted; someone looking it up will still find the word. As a practical matter, you haven't changed the entry at all for users. If Planck常数 does not fit under any of our existing language headers, then we need to create one that it does fit under.

I disagree hugely on your reading of WT:CFI. Even ignoring my argument that the whole "Attestation vs. the slippery slope" section is informative, not prescriptive, it starts "This is not a problem, as each term is considered on its own based on its usage, not on the usage of terms similar in form." ittyshay, looking at Google Groups, is fair game. We don't use the Web for attestable materials, and the first dozen pages on Google Books and Groups for "natsukashii" don't show many examples that are clearly uses and clearly English.

"it is unreasonable to argue that attestation in a given language context alone makes a term that language"? What? Can you offer a general rule to figure out what language a word is from then? I was going on the general rule of thumb that "Platonic" was English because it's always used by English speakers in English sentences, but apparently that's not good enough.--Prosfilaes 07:30, 9 October 2011 (UTC)Reply

┌────────────────────────────────────────────────────────────────────────────────────────────────────┘ (after edit conflict) @Prosfilaes: I would argue Москва is less intelligible than pemoline (or is not intelligible) because it isn't in the same script as English readers know. One of my English friends, after living in Kyiv for a year, told me how carefully he had guarded slips of paper with the addresses of his destinations on them, because he could read no Cyrillic and had no way other than those slips of paper to tell taxi drivers where he wanted to go. In contrast, if his destination had been "Pemoline-Ironbark Station on Votator Street", he could have lost the paper and still pronounced for the taxi driver where he wished to go, even if he had no idea what the words meant. However, I appreciate your underlying argument about the difference between Москва and Planck常数, even if I don't entirely agree with it. I trust you wouldn't object to usage notes in a [[Planck常数]] entry explaining how it was nonstandard, proscribed? If you wouldn't, I'm trying to advance soft redirects (after others suggested them) as a compromise to banning mixed-script entries, precisely so as to address that concern, which you and Dan and Lmaltier express well.

@Eirikr: what would you think of soft redirects, like this? You're warm to them? I want to know if this is compromise is acceptable to more people than a ban is.

If it is... we should all stop arguing XP hehe. - -sche (discuss) 05:39, 9 October 2011 (UTC)Reply

I'm okay with soft redirects as a general idea, provided the sinophone editors are on board (since they're more the ones to say about Chinese entries anyway :). As an interesting wrinkle, google:allintext:+"planck常数"+は shows some use in Japanese contexts, albeit only seven hits, which I haven't gone through to evaluate. -- Eiríkr Útlendi | Tala við mig 06:35, 9 October 2011 (UTC)Reply

Historically, there may have been a few English speakers who only knew the w:Deseret alphabet. (Okay, historically they're probably outnumbered by the number of bilingual English/Russian children who only knew the Cyrillic alphabet.) I would find usage notes on Planck常数 almost essential. As a compromise, I'd have no complaint with soft redirects.--Prosfilaes 07:30, 9 October 2011 (UTC)Reply

(after edit conflict) I removed the controversial point about SoP. It wasn't the main reason, anyway. I agree that mixing words from English and Mandarin when one is speaking or writing in Mandarin is always brushed off as Chinglish by native speakers, no matter how educated the speaker or writer is, who uses it. I stand by what I said before. Leaving people's or place names untranslated is only used when either the writer or the reader may not be able to read or write that name, no exceptions made for small or rare names. @60.240.101.246 Well, if we had an agreement, it would be deleted immediately but obviously we don't. So we have to go through the vote or decide in favour of a soft redirect option (the latest version suggested by -sche). To me it's obvious that "Planck常数" or "Alzheimer病" are not Mandarin just because Mandarin common words are attached to them but I don't want to argue about this forever, let the vote decide, hopefully the common sense will prevail, not the desire to include everything for which there is some attestation. Honestly, it's tiring. --Anatoli 13:52, 8 October 2011 (UTC)Reply

Then what are they? I don't care if they're Mandarin or not; they're words. Label them how you will, but don't just delete them because you don't like them.--Prosfilaes 04:33, 9 October 2011 (UTC)Reply

Italian Wikipedia

Latest comment: 13 years ago12 comments8 people in discussion

The Italian Wikipedia is closed in protest over far going plans by the Italian government that threaten independence see Jcwf 02:51, 5 October 2011 (UTC)Reply

How does an Italian law apply to a website edited by users from around the globe, even if it is, in Italian? Jamesjiao → ^{T ◊ C} 02:57, 5 October 2011 (UTC)Reply

One of the many open questions I suppose. I do not know. But if all governments start doing this I do think we are in trouble. Jcwf 03:02, 5 October 2011 (UTC)Reply

"This proposal, which the Italian Parliament is currently debating". Wait, so it's not actually a law yet? Wikipedia has jumped the gun. The European court of human rights might throw it out, no? As it would seem to contradict the laws protecting freedom of expression. Mglovesfun (talk) 09:15, 5 October 2011 (UTC)Reply

As Jamesjiao says, who does the law apply to? I thought the Wikipedia server was in the US. Would it only apply to Italian citizens? If so, why only in Italian? And if it doesn't only apply to Italian citizens, and I wrote something on the Italian Wikipedia, could I hypothetically break Italian law and get extradited to Italy? Also, nobody is 'in charge' of the Italian Wikipedia, so if Wikipedia fails to conform with a ruling against it, who gets charged? It does say defamatory statements should not be made, just that any such statements should be removed if requested. So the person making the original statement isn't guilty of anything. I'll look into it. Mglovesfun (talk) 09:19, 5 October 2011 (UTC)Reply

This seems to be a voluntary action of Italian Wikipedia in protest over a planned law. If this is so, I would like to see the vote on Italian Wikipedia that has lead to this decision. it:W:Pagina_principale now redirects to W:it:Wikipedia:Comunicato_4_ottobre_2011, as does W:it:Portale:Comunità, so we cannot even read a discussion in the community portal that could have lead to that decision. --Dan Polansky 14:24, 5 October 2011 (UTC)Reply

This seems to be the vote: it:W:Wikipedia:Bar/Discussioni/Comma_29_e_Wikipedia. --Dan Polansky 14:36, 5 October 2011 (UTC)Reply

...which says the notice is up for but a day (according to Google Translate, anyway).—msh210℠ (talk) 15:39, 5 October 2011 (UTC)Reply

Read on, and the community decided to make it indefinite, with post-notice discussion w:it:Wikipedia:Bar/Discussioni/Sciopero:_il_punto_della_situazione (here). - -sche (discuss) 18:46, 5 October 2011 (UTC)Reply

The protest action does not seem to affect the mobile version of Italian Wikipedia, so the page is available for reading here: http://it.m.wikipedia.org/wiki/Wikipedia:Bar/Discussioni/Sciopero:_il_punto_della_situazione. --Dan Polansky 06:11, 6 October 2011 (UTC)Reply

m:Wikimedia Forum#Italian Wikipedia -- Liliana • 14:40, 5 October 2011 (UTC)Reply

Now m:Wikimedia Forum/Italian Wikipedia. —An gr 06:46, 6 October 2011 (UTC)Reply

biblical quotes as example sentences

Latest comment: 13 years ago26 comments8 people in discussion

Should long-winded biblical quotes be used as example sentences? I'm asking because an anon IP (probably User:123abc) has been adding them to many different Mandarin entries (e.g. 肚子, 一切, 什么, etc), but none of them are really practical for learners nor really relevant to the words themselves. Do we have a policy on example sentences? It's also possible that the translations are copyright. ---> Tooironic 03:34, 5 October 2011 (UTC)Reply

Yes, it's abc123/Engirst. I have wikified and fixed his examples in 肚子. He just copies them from one entry to another. The other issue is that only simplified is given (if the entry is for both) and the traditional version is out of synch 甚麼 or 什麼. It's not answering your question but I wanted to mention this as well. --Anatoli 04:01, 5 October 2011 (UTC)Reply

His contributions (Special:Contributions/2.25.214.61) are also discussed here. --Anatoli 04:06, 5 October 2011 (UTC)Reply

(after an edit conflict) There is nothing wrong with quoting the Bible to illustrate the use of a word, or to attest to its existence; I've added lines from the Bible to entries. We prefer sentences which illustrate the usage of a word well, and therefore we shorten overly long sentences by using ellipses and move sentences that do not illustrate words' usage well to the Citations namespace, but we generally do not remove accurate quotations of literature, because these attest to the existence of the word. The sentences in the entries you link to fail to acknowledge their source, however, which is indeed a copyright/credit issue. The sentences also fail to bold the portion of the English translation that corresponds to the Chinese headword. I would remove the sentence in 一切 because it is fails to acknowledge its source and is badly formatted and opaque; if the source were added, I would just format it correctly and move it to the Citations namespace (because it is still not good as an illustration of the use of the word). I will try to format and source the sentence in 肚子, and shorten it to "你必用肚子行走，終身吃土", because that is a good example sentence. - -sche (discuss) 04:11, 5 October 2011 (UTC)Reply

I think it is helpful and good to wikify/linkify the individual words in Chinese example sentences, because it is otherwise unclear where the word-separations are, but I think it is our policy not to linkify any words in example sentences. (Do we make an exception for Chinese? That would be fine by me.) - -sche (discuss) 04:13, 5 October 2011 (UTC)Reply

I think it is our policy not to linkify any words in example sentences. Oh, I didn't know that! If that's true I wasted my time but I'll seek confirmation. I think it's very useful too (like Wikibooks) and you can also see what's missing. The word forms could link to lemmas. I'll just wait for others to comment on the quotes. --Anatoli 04:18, 5 October 2011 (UTC)Reply

All of these overly promotional or propaganda-like quotes should go. Adding a few quotes from Bible is fine, but adding tons of content from the Bible to entries which barely have any citations is unacceptable. 60.240.101.246 07:33, 5 October 2011 (UTC)Reply

I disagree, if an entry barely has any citations, what it needs is citations! Removing the only current citation seems perfectly counterproductive. We use Bible quote for other languages, notably English and Hebrew. Can't we just treat the Bible like any other book? I'd be happy for Qu'ran, Torah etc. quotes to be used as well. Anyway, these are citations not example sentences; an example sentence is 'made up' for convenience, as it's quicker than finding an actual quote. Mglovesfun (talk) 09:02, 5 October 2011 (UTC)Reply

How about using Qur'an for common English or Japanese entries, like "all", "what", "belly" (using translations of course)? It'll be weird, wouldn't it. I'm happy with Buddhism quotations - that's something acceptable and deeply ingrained in Chinese culture. But Christianity quotations? No. And those sentences (e.g. 一切, 肚子) - they are not how Chinese sentences are normally constructed. They just sound so - "preachy". 60.240.101.246 09:47, 5 October 2011 (UTC)Reply

I'm not saying not to replace the citations with other, better citations. Mglovesfun (talk) 09:57, 5 October 2011 (UTC)Reply

As Mglovesfun said: entries without many citations are the ones that need citations! Citations which do not show normal, fluent sentence construction should be moved to the citations page, though. - -sche (discuss) 18:54, 5 October 2011 (UTC)Reply

I have only seen quotes from Genesis, which would make them equally Hebrew and Christian, but that is not the point. I am not sure what the problem is here, it almost seems like we are looking for reasons to get mad at Engrist. I have used quotes from the Bible (Old and New Testaments) and have not heard a thing about it, it is a seminal text in Hebrew, Greek, Italian and English. I do understand that it doesn't have the same cultural weight in Chinese, but that doesn't have any bearing. If you said "these are bad usexes because they don't accurately or readily convey the usage of the word" I would be on board. As it is it seems more like you are either against Engrist or against the Bible and neither of those stances make a compelling argument. — This unsigned comment was added by TheDaveRoss (talk • contribs).

It's quite obvious that Engirst is here to preach and to promote Pinyinisation of Chinese. Do we really want to see all basic Mandarin entries accompanied by nothing but one or more quotes from the Bible, and the Chinese category dominated by Pinyin not character entries? It's madness really. I'm sure if I were a user who adds uninterruptedly advertising quotations, or a user who constantly writes Chinese Communist Party propaganda by adding English-language quotes from the official PRC press, I would have been banned instantly. There really is no difference. What's more - e.g. in the two quotes added to 一切, both have errors in their Pinyin somewhere. 60.240.101.246 10:53, 5 October 2011 (UTC)Reply

Since pinyin entries are valid, there's no need to 'promote' then, no more than Russian in Cyrillic script needs 'promoting'. Mglovesfun (talk) 11:02, 5 October 2011 (UTC)Reply

Pinyin IS promoted and preached by him and by some other people. Please read this site]. I agree with standardisation movement but not with the replacement of Chinese chracters with pinyin. Mao planned this too. A few Westerners took it literally, including the owner of pinyin.info and Engirst. Some of the material on the site caused outrage by Chinese people. Anyway, this transition is not happening and writing purely in pinyin is only used in educational purposes but we may get into situation when we have more pinyin than Chinese characters. --Anatoli 05:15, 6 October 2011 (UTC)Reply

Pinyin entries are at present allowed iff the corresponding character entry exists and no quotations should be included in Pinyin entries (Wiktionary:Votes/2011-07/Pinyin entries). Both rules were made to control the Pinyin enthusiasm of Engirst, but neither rule is obeyed by him [6][7][8]. 60.240.101.246 11:13, 5 October 2011 (UTC)Reply

I personally delete pinyin when there's no corresponding traditional or simplified. I mentioned this on Wiktionary talk:About Sinitic languages but nobody's supported me as of yet. Mglovesfun (talk) 13:05, 5 October 2011 (UTC)Reply

I did express my weak or tentative support (Sounds like a reasonable suggestion...), read my reply @03:11, 5 October 2011. We only check randomly the pinyin entries, many wouldn't SoP by any standards and we wouldn't create Mandarin entries to match. There are so many of them, he could have spent more time creating the matching hanzi. On the other hand, toned pinyin entries can be good if they are correct, follow the rules, we may not catch up fast enough on creating Mandarin entries, besides, I do a lot of translations, many of them are red-linked, anyway. E.g. qíshǒu is missing at the moment, we don't have 騎手／骑手 (qíshǒu) and 骑手 (qíshǒu) (rider, horseman) yet but there is nothing wrong with the term. Not to sound like we are "bullying" him, perhaps the pinyin editor should be invited to the discussion. --Anatoli 05:05, 6 October 2011 (UTC)Reply

Does 123abc speak much Mandarin? I think if he were a native speaker he'd be able to write in Chinese characters and also I hope would make fewer mistakes in pinyin. The thing is he's immune to blocks, you can block him as much you as like and he just comes back with a new IP address. He's put himself above the rules. Mglovesfun (talk) 19:29, 6 October 2011 (UTC)Reply

If folks have identified his (assuming this user is male) ISP, it's just a matter of blocking everything from that ISP. Or possibly contacting that ISP and getting the user banned at that level. This single user's disruptiveness is wasting a considerable amount of time and energy, so much that I'm beginning to think that losing the potential contributions of other anons by blocking the whole ISP's block would be more than offset by the actual savings made by getting rid of this one user.

Unless they can somehow be persuaded to change... except they seem immune to any attempt at two-way communication. < sigh. > -- Eiríkr Útlendi | Tala við mig 19:55, 6 October 2011 (UTC)Reply

I just checked a couple entries on my watchlist that IP user Special:Contributions/2.25.212.57 edited today, specifically 在 and 来. Both edits added biblical usexes that didn't actually show very clearly how the word in question is used, so I reverted both edits. Looking at this user's contributions shows what can only be described as a crapflood. Would someone please block this IP? The time to assume good faith is long since past. -- Eiríkr Útlendi | Tala við mig 20:16, 6 October 2011 (UTC)Reply

My instinct is that having extensive Biblical quotations in many Mandarin entries is a poor idea. What remains unclear is whether to have a Biblical quotation in a Mandarin entry is better than to have no quotation at all. If someone starts removing these Biblical quotations, I do not think I will object. --Dan Polansky 21:04, 6 October 2011 (UTC)Reply

I don't object to the idea of including biblical quotes. To me they are just example sentences. However, what I do have a problem with is the fact Engirst is not attaching such a quote to an existing definition (in the cases I've seen - 来); as a result, it renders the effort meaningless as users will likely be more confused than enlightened. Jamesjiao → ^{T ◊ C} 21:51, 6 October 2011 (UTC)Reply

Right. The quotations are sometimes OK, and sometimes great examples of the figurative usage of terms (presuming other Mandarin texts use them figuratively, not just the Bible), but other times they are not good illustrations, and should be moved to the Citations: page for that reason (if sourced). Quotations should be removed if unsourced, as incompatible with the GDFL (because they appear to be quotations created by the user and released under the GDFL, but are in fact quotations created by another person and possibly not released under such a licence). Other times there is no definition and adding incorrectly-formatted quotations is unhelpful. - -sche (discuss) 21:57, 6 October 2011 (UTC)Reply

I just went through a slew of them out of curiosity -- most didn't clearly show the word in question, most appeared to be copy-pasta of the same few quotes, and many were for words where the entry doesn't even have a def and the usex doesn't really provide one either. What a waste of time and effort. -- Eiríkr Útlendi | Tala við mig 06:44, 7 October 2011 (UTC)Reply

abc123/Engirst strikes again as Special:Contributions/2.27.72.125 with his biblical examples. --Anatoli 01:17, 7 October 2011 (UTC)Reply

More biblical examples by a fresh IP address Special:Contributions/2.27.73.100. --Anatoli 00:35, 10 October 2011 (UTC)Reply

Japanese and Korean affixes

Latest comment: 13 years ago28 comments10 people in discussion

Japanese Wiktionary hasn’t been using a hyphen for Japanese affixes, and they decided officially not to use it (→ ja:Wiktionary:編集室/2011年Q3#日本語の接頭辞・接尾辞). Korean Wiktionary has already decided not to use a hyphen for Korean affixes either (→ ko:위키낱말사전:자유게시판#접두사 및 접미사 and ko:위키낱말사전:자유게시판/2010-12#접미사에 하이픈을).

The affixes with a hyphen in the following categories must be renamed, except the ones written with Latin letters.

Although page names must follow the rule strictly for the sake of interwiki links, entry names in a page can have a hyphen. — TAKASUGI Shinji (talk) 04:29, 5 October 2011 (UTC)Reply

Note: we also have the option (if our Japanese and Korean editors prefer to include the hyphens in the page titles) of creating unhyphenated pages as redirects, and asking the Japanese and Korean Wiktionaries to create hyphenated versions as redirects. This is how en.Wikt and de.Wikt (which use l') link to and from fr.Wikt (which uses l’). - -sche (discuss) 05:03, 5 October 2011 (UTC)Reply

FWIW, I prefer hyphenless headwords, but that might just be me. :) -- Eiríkr Útlendi | Tala við mig 05:10, 5 October 2011 (UTC)Reply

I have no preference. - -sche (discuss) 05:23, 5 October 2011 (UTC)Reply

I have no preference either, and I made most of them and hyphenated a lot of them. I can go back and make the necessary changes if we decide to go without hyphens. That's cool. The other languages linked to Category:Japanese suffixes, such as fr:Catégorie:Suffixes_en_japonais, have no hyphens. Only English. There's only one complication I can think of--Template:suffix and Template:prefix automatically add hyphens. Redirect from [[-affix]] to [[-affix]]? Stop using them? Anyway let's vote at Wiktionary:About Japanese. Another newbie like me might make the same mistake. Haplogy 05:32, 5 October 2011 (UTC)Reply

As you know, I’m talking only about Japanese and Korean affixes. Japanese and Korean Wiktionarians use a hyphen for affixes in languages written in latin alphabet, just like English Wiktionarians. — TAKASUGI Shinji (talk) 05:50, 5 October 2011 (UTC)Reply

We could add a nohyphen= parameter to {{suffix}} et al, or create {{ja-suffix}} etc. - -sche (discuss) 05:53, 5 October 2011 (UTC)Reply

{{suffix}} already has a language switch, we could easily add ja and ko to it. Mglovesfun (talk) 11:04, 5 October 2011 (UTC)Reply

{{suffixcat}} would need to be changed as well. —CodeCa t 11:14, 5 October 2011 (UTC)Reply

So it looks like the consensus is Japanese and Korean affixes should have no hyphens. Wiktionary:About Japanese does not address the issue, so are there any objections to making a vote to add a section called Affixes with this information? The page says any changes must be put to a vote so I guess I can't just change it myself without a vote. I assume this means that counters should not be hyphenated as well. AJA is unclear--change that too? Haplogy 13:36, 5 October 2011 (UTC)Reply

I'd suggest that no vote is needed, since everyone seems to agree. Mglovesfun (talk) 13:41, 5 October 2011 (UTC)Reply

In that case, I'd like to make that change if there are no objections. Please take a look and change the wording or whatnot if necessary. The sole alterations in extant text are that I removed the hyphen from " e.g., -本" under Counter word (助数詞) and added "Do not use a hyphen" to Counter word, and changed Counter word to Counter words since every other POS header is plural there. @Dan: The argument on the Japanese beer parlour is mainly that hyphens are not customarily used in Japanese, and that other languages should follow suit for consistency. If there was anything else I didn't get it, but consistency is good enough for me. Haplogy 15:18, 5 October 2011 (UTC)Reply

Things were more complicated: counter words are traditionally classified as suffixes in Japanese but as nouns in Korean, even though they function quite similarly. Now we don’t have to show the disagreement. — TAKASUGI Shinji (talk) 00:43, 7 October 2011 (UTC)Reply

What is the reason to use no hyphens for Japanese and Korean affixes, while we customarily use hyphens for English affixes? I speak no Japanese, so I cannot read any rationale provided in the Japanese Wiktionary. --Dan Polansky 14:09, 5 October 2011 (UTC)Reply

I urge that no action be taken yet: The 'consensus' referred to above is over the span of but half a day! My initial instinct is that ja and ko be treated the same as en, but I await an answer to Dan's question.—msh210℠ (talk) 15:53, 5 October 2011 (UTC)Reply

Haplogy added an answer above to Dan's question, but I'll chime in too and note that Japanese does not use hyphens at all -- those *exceedingly* rare situations where I've seen a hyphen used in Japanese text, it was used precisely because it looks unusual and out of place. No monolingual Japanese dictionary that I've ever seen uses hyphens. Bilingual dictionaries that I've seen appear to be a bit more varied, suggesting no hard-and-fast convention but rather editor preferences. My gut instinct is to follow the JA WT decision, partly for consistency and partly from my perspective that hyphens in Japanese just seem wrong somehow. -- HTH, Eiríkr Útlendi | Tala við mig 16:32, 5 October 2011 (UTC)Reply

I don't really support either option over the other one at this point, but if I had to defend the support of hyphens of Japanese affixes in English Wiktionary, it would be thus: The use of a hyphen before or after a term is immediately understood by English speakers as indicating an affix. Thus, it makes sense to use hyphens with Japanese affixes in English Wiktionary, even if Japanese Wiktionary decides not to use them. A notable feature of the decision of Japanese Wiktionary (ja:Wiktionary:編集室/2011年Q3#日本語の接頭辞・接尾辞) is that only two people voted in support (Mtodo, and Goat), with presumably TAKASUGI Shinji (talk • contribs) having proposed the whole thing and thus implicitly having voted in support, making up only three people in total. --Dan Polansky 17:01, 5 October 2011 (UTC)Reply

Dan makes a good point here -- EN WT is targeted at readers of English, something some of us (myself included) occasionally lose sight of when getting our heads deep into our other languages. Consider me back on the fence for now regarding this issue. -- Eiríkr Útlendi | Tala við mig 17:09, 5 October 2011 (UTC)Reply

Before I started working on affixes, most of them did not have hyphens, and that leads me to think that I have been the only person to use them with Japanese. Eirikr and I are the only particularly active editors right now in Japanese that I know of, and Eirikr is more knowledgeable than me, so I thought that was consensus enough. I've already changed AJA, too early it seems. I noticed that Goat cited EN WT's decision to delete トランス-[9], but it was deleted for a completely unrelated reason. By the way Category:Mandarin_suffixes uses hyphens most of the time. Hyphens or none are both okay by me, but not half and half as they are right now. Haplogy 18:04, 5 October 2011 (UTC)Reply

Just for clarification, I didn’t propose to stop using a hyphen; it was already a de facto rule not to use it. I just proposed to make it official on Japanese Wiktionary. I don’t think the number of voters matters a lot. — TAKASUGI Shinji (talk) 00:14, 7 October 2011 (UTC)Reply

In light of Eiríkr Útlendi's comments, I lean towards deleting the hyphens (from entries and headwords), so that users of Wiktionary have the correct impression that hyphens are not used in Japanese. We could use etymology sections or usage notes to note that the prefixes etc are prefixes etc, like this. - -sche (discuss) 19:07, 5 October 2011 (UTC)Reply

Is there some alternative to a hyphen that would make sense? One comment above is that the hyphen "looks wrong" in Japanese. Would U+FF0D FULLWIDTH HYPHEN-MINUS look better; for example, Template:Jpan instead of Template:Jpan? (This is similar what we do with Hebrew, using e.g. Template:Hebr instead of Template:Hebr, though we keep the latter as a redirect.) —Ruakh_TALK 19:45, 5 October 2011 (UTC)Reply

Hmm, my comment about hyphens looking wrong in Japanese is simply because no one in Japanese uses them. They look about as out of place as using Japanese punctuation in English would look、 a bit like this 「sample」 here。 :) I don't think using these different types of hyphen fixes the "wrongness", simply because they're still hyphens, and still look out of place in a Japanese context. -- Eiríkr Útlendi | Tala við mig 21:26, 5 October 2011 (UTC)Reply

I agree that Japanese and Korean affixes should not bear hyphens. The same should be applied to Chinese affixes as well. (btw, there are many more practices in the Japanese Wiktionary which are potentially beneficial here. They disallow romaji, pinyin, or any other romanisation entry; combines Chinese into one header; writes wago with kana and kango with kanji, etc. Their entries do look a lot clearer than ours: ja:字, 字) 60.240.101.246 20:23, 5 October 2011 (UTC)Reply

It bears noting that JA WT doesn't need to use romaji because they can safely assume that everyone using JA WT already knows at least kana. We can't make that same assumption here on EN WT with regard to kana, kanji, hanzi, Devanagari, Hebrew, Khmer, what-have-you.

Whether we should allow or encourage the creation of Latin-alphabet entries for languages that traditionally use other writing systems is a different question, but the persistence of many editors suggests that there is a demand for such entries, perhaps in part because of the limitations of the MediaWiki software. For instance, I may know that Hindi and Urdu for formal second-person plural is (deprecated template usage) āp, but if I don't know how to write this using the Nastaliq or Devanagari scripts and can only search for the Latin-alphabet rendering, I am instead directed automatically to a page about Tocharian A and B, with no hint that the pages for (deprecated template usage) آپ or (deprecated template usage) आप even exist. Similarly, if I know that the Mandarin for (deprecated template usage) stone is pronounced (deprecated template usage) shí but I don't know how to input 石, a search for (deprecated template usage) shí would show me just Irish and Navajo, leaving me confused and frustrated, were it not for the editor(s) who added the romanized Mandarin entry to that page.

Until such serious usability shortcomings are addressed, Latin-alphabet renderings are an easy workaround. -- Cheers, Eiríkr Útlendi | Tala við mig 21:26, 5 October 2011 (UTC)Reply

Another complication- so we are going with no hyphens. But is that no hyphens in romaji too? For example, for the suffix 会 do we have {{ja-pos|k|suffix|hira=かい|rom=kai}} or {{ja-pos|k|suffix|hira=かい|rom=-kai}}? User TAKASUGI seems to think that Japanese character pages (kanji and kana) should not have a hyphen but Roman character pages should. TIA Haplogy 04:54, 7 October 2011 (UTC)Reply

I just think they are separate. My understanding is that the use of hyphens is not language-dependent but character-dependent, like spaces, which Japanese don’t use when they write with kanji and kana but they use when they write with Latin letters. Anyway the community should decide it. — TAKASUGI Shinji (talk) 09:00, 7 October 2011 (UTC)Reply

I understand now. That makes sense, using hyphens with romaji but not using hyphens with kanji or kana. AJA has been updated to reflect this and most affixes have been updated per policy as well. Haplogy 17:07, 10 October 2011 (UTC)Reply

Edit tools for search

Latest comment: 13 years ago9 comments5 people in discussion

Would it be possible to have edittools for the search bar? Right now, we can use them to type special characters in entries, but not when they appear in the title of an entry. If I want to create new Gothic or Proto-Germanic entries I first have to edit an existing page, use the edittools to type the name there and then copy it into the search bar. It's not very convenient that way. —CodeCa t 11:18, 5 October 2011 (UTC)Reply

You don't have to use the search bar to create a new entry, though. You can just create a new redlink in your sandbox and then click on it. —An gr 11:36, 5 October 2011 (UTC)Reply

And that's what I said is inconvenient... —CodeCa t 12:35, 5 October 2011 (UTC)Reply

Well, you said you had to then copy the name into the search bar. Clicking the redlink is slightly less inconvenient than that, but admittedly still more inconvenient than having the edittools right there at the search bar. I just wonder how much clutter that would create, considering the search bar is present on every page, regardless of whether it's being edited or not. —An gr 12:46, 5 October 2011 (UTC)Reply

Maybe the edit tools could appear in a small menu to the left of the bar, and only appear in a small window below when you click on it? —CodeCa t 13:01, 5 October 2011 (UTC)Reply

We used to have a preferences option for this, but IIRC it broke a while ago with nobody having fixed it to date. -- Liliana • 14:39, 5 October 2011 (UTC)Reply

My preferred edittools character set does not appear under the search box by default for me, but can be made to appear and persist if I select a different character set and then select the one I prefer. It would be nice not to have to bother, but this is just two clicks. I am unsure how long the edittools characters persist. DCDuring TALK 15:57, 5 October 2011 (UTC)Reply

Apparently it disappears after each save. DCDuring TALK 15:58, 5 October 2011 (UTC)Reply

Yeah, we really need a way to type special characters in the search bar. I'm thinking a little popout keyboardy thing next to the search bar. Started working on a script at User:Yair rand/keyboards.js. --Yair rand 00:44, 7 October 2011 (UTC)Reply

Gtroy sockpuppets

Latest comment: 13 years ago24 comments12 people in discussion

Does the community want me (or any other sysop) to continue to block sockpuppets of the permanently blocked User:Gtroy? His latest ID was User:Totallynotfairbro, most of whose contributions seemed reasonable (but he still forgets basic formatting issues from time to time). SemperBlotto 07:22, 7 October 2011 (UTC)Reply

Not a very useful comment, but I don't know why he needed sockpuppets. He seemed to me to be slowly gaining respect after a bad start, then decided whilst not blocked (though has been blocked since) to create a load of supplementary accounts, even working off two accounts simultaneously. Mglovesfun (talk) 07:27, 7 October 2011 (UTC)Reply

I suggest we allow him to edit, for now. His pronunciation of beefcake is interesting; many entries are borderline SOP, but... we have RFD for those, and his other pronunciations are OK. - -sche (discuss) 07:38, 7 October 2011 (UTC)Reply

I suggest when unblock Gtroy (talk • contribs), his primary account, give him a stern warning and indef block if said warning is not sufficiently adhered to! Mglovesfun (talk) 07:49, 7 October 2011 (UTC)Reply

I agree with Gloves. He's not a perfect editor, but does more help than harm. Also, chasing sockpuppets can last for ever. --Rockpilot 08:03, 7 October 2011 (UTC)Reply

I agree, I think his entries are mostly quite good, and most of the problems seem to be typos or relating to complex formatting issues. He’s new here and does not know how seriously Wikimedia views legal threats. He should be warned about that. He seems to be allergic to Ric (like Ric was allergic to Razorflame). I don’t really understand this personality friction very well, but I suspect if they knew each other just a little better, they would be friendly. Gtroy takes Ric entirely too seriously. The pronunciation at beefcake, while interesting, is not, I think, very useful and probably should be replaced with a plain vanilla model. —Stephen ^(Talk) 09:58, 7 October 2011 (UTC)Reply

Haha just heard this, sounds like the death metal interpretation to me. Yeah, should be deleted. Mglovesfun (talk) 10:08, 7 October 2011 (UTC)Reply

I agree, unblock and mentor. bd2412 T 20:26, 13 October 2011 (UTC)Reply

Just to be clear on this, Gtroy = Wonderfool, right? Or at least, Totallynotfairbro = Acdcrocks = Rockpilot = Wonderfool (whether or not Gtroy = Totallynotfairbro). - -sche (discuss) 08:20, 12 October 2011 (UTC)Reply

Nope, Gtroy appears to be another user entirely. -- Liliana • 10:04, 12 October 2011 (UTC)Reply

I am 99% sure WF doesn't have an American accent, and GT was recording new audio with one, so no, he isn't. Equinox ◑ 20:32, 13 October 2011 (UTC)Reply

My suspicion was aroused because (Rockpilot=Wonderfool) and (Acdcrocks=Totallynotfairbro=?) both nominated words on the WOTDN talk page (Wiktionary_talk:Word_of_the_day/Nominations). I suppose it's simple that Gtroy could have seen WF do it and decided it was a good idea (since neither could edit the semi-protected WT:WOTDN page itself). - -sche (discuss) 21:11, 13 October 2011 (UTC)Reply

User sent yet another email to info-en

wikitionary.org about this block; ticket 2011101210018795. I explained the problem of legal threats to him before but I'm not getting involved in this. I'm going to be blunt and state that en.wiktionary is poorly set up for me to suggest avenues with which to request consideration of an unblock by anyone but the original administrator. There is no {{unblock}} template at all and if you compare MediaWiki:Blockedtext to w:MediaWiki:Blockedtext, it's pathetic. The stated advice to email the OTRS team is not the correct course of action. Your email address is not necessarily monitored by admins at Wiktionary to handle this sort of thing and you leave me no options to provide to the user. Adrignola 03:52, 13 October 2011 (UTC)Reply

How bout a vote of confidence on whether I should be blocked or not by all the admins that takes everything that both I and Dick have said and done into account with a public comment period?Catch22 09:11, 15 October 2011 (UTC)Reply

I've updated our MediaWiki:Blockedtext a bit, and recreated the (previously deleted!) {{unblock}} template. - -sche (discuss) 05:57, 13 October 2011 (UTC)Reply

Thanks. Adrignola 16:09, 13 October 2011 (UTC)Reply

This is Acdcrocks/Gtroy, I got blocked by Dick again but with no cause, he seems to have willfully ignored this entire discussion and only blocked me for "sockpuppetry" even thought I have not created any new accounts. I would like to maintain the ACDCrocks account and be able to maintain a contributions history and watchlist in one place. I can't place unblock on my talk page as me because I am blocked from editing my own talk page, I can also not e-mail any users as I am blocked from doing that too.71.142.74.66 21:07, 13 October 2011 (UTC)Reply

For the record, I blocked Troy (Gtroy/ACDCrocks) indefinitely because he started making weird legal threats at me. No matter how furious I get with other editors (and it does happen, believe it or not) I never lose my mind enough to threaten them. Troy doesn't handle criticism well, constructive or otherwise, doesn't seem to take direction well, and I don't know if he'll ever quite understand the criteria for inclusion - the sum of parts issue in particular. In my opinion, the pros of letting him stay are outweighed by the cons. The quality of his editing weighed against the content of his apparent character... doesn't inspire me. Say what you will about my personality, but I do kickass work, and I listen when you say "hey you did this wrong asshole". (PS Troy, I wasn't ignoring this topic - I just didn't know it existed. I don't frequent the BP. I tend to have more constructive things to do.) — [Ric Laurent] — 22:58, 13 October 2011 (UTC)Reply

I made one such claim after Dick made some very offensive and vulgar insults at me and he continues to use the most uncouth and incendiary rhetoric about me whenever possible. Not much class there. He is using his admin powers despotically and is insincere in his claims of being the victim in this situation. I handle criticism very well, what I didn't do was at first understand how wiktionary differed from wikipedia but I did figure that out over time. And learned a lot from the suggestions of others particularly SemperBlotto and Equinox. Dick's comments here really just show he does not like that items that I have added and instead of taking them to verification and deletion just is justifying blocking me for not harboring his opinion of sum of parts and exclusionary wordview by blocking me for the restraining order comment but from his own narcissistic comment preceding this one its clear to me the his true reason from blocking me was the ulterior motive of disliking my lexicographic style and my person. I think when there is clearly just a personality conflict it should be left the the community to decide what to do, not either of the parties involved.Catch22 09:11, 15 October 2011 (UTC)Reply

I only created this account because acdc rocks has been blocked I am not trying to be a sockpuppet and I in no way deny I am Troy McCormick / gtroy / acdc rocks.Catch22 09:13, 15 October 2011 (UTC)Reply
- I'm as good as my word, I warned you, you reoffended and I blocked you with an expiry time of infinite. Mglovesfun (talk) 11:44, 15 October 2011 (UTC)Reply

- - I think that is what you don't understand. I did not reoffend. Dick did not consider this community discussion a valid reason for me to be unblocked so he blocked my account again. This discussion seemed to me to decide I should be allowed to stay and just that I should be warned about sockpuppetry and legal threats. I didn't make any more legal threats nor was the original one in any way serious. But since my account was blocked, I could not bring it up here. I could not e-mail anyone but the info@wiktionary e-mail address and they said basically they changed the rules and now they don't take e-mails. They said to place an unblock template on my talkpage which i did. But no response. My IP address got blocked too, so I created a new account so that I could parpiticipate in the discussion and taking into account that your word said I could stay (ACDCrocks) if I did not create any accounts but was blocked anyways because Dick considers this discussion non binding doesn't that make the "I can stay part" null as well, making it pretty fair and logical I would create another account? I didn't deny it, I did not try to be a sockpuppet, I outed myself right here in fact. And I chose catch22 as the handle to show the position I was in. You guys did not meet your end of the bargain that I could stay. And you say I was disruptive and harassing, but I wasn't, look through the edits, I did not conflict with anyone or ever even contact Dick in any way with the ACDC account. I did contact everyone involved in this discussion to get to the bottom of it and participate in the discussion. I also take offense to the fact that you think me quoting other users that Dick "sucked" when we first started here was any sort of insult. It wasn't. I was trying to relate to him. In any case how could you consider that an insult when he says things like "People who suck dick are troublemarkers" when talking about me. That is outright vile in comparison. So in closing, are you as good as your word? And did I reoffend? I don't think so. I was as transparent as could be.

??Anybody at all?71.142.74.66 17:48, 18 October 2011 (UTC)Reply

- This is really quite ridiculous. So far as I can tell, there has been only one offense mentioned and proven, and it was a small offense of a nonserious nature that only required a warning and explanation, and a retraction by Troy. The warning and explanation were given, the retraction was made and an apology. The other accusations (like "started making weird legal threats" = multiple threats...no evidence of multiple threats has been shown) and "you reoffended" (again, no evidence or explanation of what the "offense" was). Those who are blocking Troy are bullying him and abusing your administrative powers; you are not giving him or us a reasonable explanation of your actions; and you are not allowing him any measure of due process. —Stephen ^(Talk) 18:03, 18 October 2011 (UTC)Reply
  - I've unblocked User:Acdcrocks. - -sche (discuss) 18:12, 18 October 2011 (UTC)Reply

Twice-borrowed terms

Latest comment: 13 years ago2 comments2 people in discussion

We have categories for twice-borrowed terms, which are words that were borrowed into another language and then later borrowed back from that language into the language it originated from. I've been adding Dutch words to this category but there is a question I have. At what point can you consider something 'the same language'? I would consider Frankish (the source of many French words) a form of Dutch, so any of the French words of Frankish origin that were borrowed into Dutch later would be twice-borrowed terms. But is a word that was borrowed from Old Norse into Norman French and then from French into modern Norwegian a twice-borrowed term? What about words that were borrowed from Proto-Germanic into Latin and then from Old French into Middle English? —CodeCa t 12:52, 7 October 2011 (UTC)Reply

Just a simple comment: The Wikipedia article w:Reborrowing explicitly has an example of a term that is twice-borrowed because its derivations are "Old Norse → English → Swedish". --Daniel 19:19, 18 October 2011 (UTC)Reply

Usability of translation tables

Latest comment: 13 years ago2 comments2 people in discussion

Translation tables are currently actual tables in HTML, but they don't actually contain tabular data. The two-column layout is nice for people with wide screens, but for those who have less width available it's not really convenient. I also noticed that the 'mobile view' feature still shows the translations in two columns, like here. This is obviously less than ideal for people using mobile phones. I'm not quite sure how this could be improved, but I would like there to be at least some kind of option to show the translations in one column (and in <div> if possible). —CodeCa t 13:13, 7 October 2011 (UTC)Reply

Hmm, yes, it's been a while since I've messed around with CSS and such, but isn't there some way of specifying the minimum and maximum widths of a display element? Would it be possible to rework things like {{translations}} and {{der-top}} to allow for dynamically resizing these lists into however many columns fit best on the user's screen? -- Suddenly feeling the urge to break open my HTML references, Eiríkr Útlendi | Tala við mig 16:47, 7 October 2011 (UTC)Reply

...-based pidgins or creole languages

Latest comment: 13 years ago2 comments2 people in discussion

Look at the beginning of Category:Pidgins and creole languages and you'll see what I mean. I never put a high value in these "pidgin/creole by source language" categories, and this proves excellently why they're pointless - some languages have so many conceivable sources that you can put them in five or six of these categories. (heck, Category:Gullah language has 15 source categories!) Therefore, I propose to delete them, and no longer categorize creoles by source languages. -- Liliana • 16:33, 8 October 2011 (UTC)Reply

I think that we should only categorize by superstratum languages. Creoles have so many substratum languages that it's very hard to identify them all. —Internoob 21:25, 10 October 2011 (UTC)Reply

Vote on banning Latin-containing Mandarin

Latest comment: 13 years ago42 comments11 people in discussion

Some thoughts on Wiktionary:Votes/2011-10/CFI for Mandarin proper nouns - banning entries not in Chinese characters, in a separate thread.

The vote seems to be a response to the reckless activity of 123abc (talk • contribs) aka Engirst (talk • contribs). The vote seems unneeded to me, going overboard. The reckless activity of the user can be checked by changing the RFV procedure for Mandarin terms containg Latin as follows:

A term that contains Latin letters and is marked as "Mandarin" can be speedy deleted without RFV process unless the citations namespace of the entry already contains attesting citations.

This would be a change of procedure rather than definition of what is included in Wiktionary, a change concerning only a well-defined subset of would-be Mandarin terms, many of which are unlikely to be attestable. The process simplification would be major: instead of sending terms created by Engirst to RFV one by one, admins could speedy delete such terms. The only place in which the citations would be collected for these entries would be citations namespace, so the mainspace entry could remain deleted until the citations are provided. --Dan Polansky 07:53, 9 October 2011 (UTC)Reply

This appears to be a good idea. Introducing exceptions to the CFI rules is unwise, because it makes things more complex (see KISS principle) and less neutral. Banning a whole class of entries because of a user is very unwise (it's like closing the project because of vandalism). Also don't forget that users specializing in some categories of entries help very much, especially when they specialize in uncommon terms, less likely to be addressed by other editors.

I would propose the same procedure change for all infinite series such as numbers or the like. Lmaltier 08:10, 9 October 2011 (UTC)Reply

I don't think you should delete (deprecated template usage) 卡拉OK. Fugyoo 08:21, 9 October 2011 (UTC)Reply

卡拉OK would be kept as soon as it would be attested in Citations:卡拉OK. Attesting those few Latin-containing Mandarin entries that we already have and are genuinely attestable should be a manageable amount of work, don't you think? --Dan Polansky 08:28, 9 October 2011 (UTC)Reply

(after 3–4 edit conflicts, haha) We should possibly say "non-Hanzi" in place of "Latin letters" (to exclude Cyrillic, Greek etc), but amending procedure in this way is a good, practical idea. Should we generalise it to all languages? (Ie, speedily delete any mixing of scripts? I can think of arguments in both directions, though the arguments in favour of generalisation are more hypothetical: someone could create a flood of inيظ#English entries.) - -sche (discuss) 08:15, 9 October 2011 (UTC)Reply

Good point, Fugyoo. Maybe we should just be direct (without a vote to make it any formal part of policy or procedure, just using our common sense) that it is a single editor whose contributions we have reason to doubt, while we would allow a month at RFV for doubtful terms from other editors? We'd use the same common sense to speedily delete any flood of inيظ#English entries. - -sche (discuss) 08:38, 9 October 2011 (UTC)Reply

I would keep the procedure as narrow and non-generalized as possible, tailored to check Engirst. Thus, I would go for "Latin letters" and for "Mandarin". I would not oppose a generalized procedure, though. A more general procedure needs more testing and is more likely to have unexpected side effects. --Dan Polansky 08:53, 9 October 2011 (UTC)Reply

You want to ban riemannsche ζ-Funktion? -- Liliana • 14:16, 9 October 2011 (UTC)Reply

No, of course not. English and Mandarin use foreign letters, if they have to. Both α粒子 and α-particle are perfectly OK but when they are transliterated, they are transliterated using native scripts - Template:Hans (ā'ěrfā lìzǐ) and alpha particle. --Anatoli 23:42, 9 October 2011 (UTC)Reply

I think Liliana was directing that comment at me, anyway, for asking if we should make the rule apply to all languages. I was only asking, though, and I see the arguments against making it apply to all languages are convincing. - -sche (discuss) 23:53, 9 October 2011 (UTC)Reply

(deprecated template usage) 卡拉OK and (deprecated template usage) OK and a few others will be kept, they are legitimate exceptions and they are common nouns. The vote is about proper nouns, not common nouns. The common noun containing Latin proper nouns in full, in particular (deprecated template usage) Planck常數, will not be allowed either. Proper nouns containing Latin or other letters invented by Chinese will be allowed as well. It's all on the page. If we all agree to soft-redirect, there won't be a need for the vote. --Anatoli 09:12, 9 October 2011 (UTC)Reply

Re: "The vote is about proper nouns, not common nouns": Wrong. From the vote: "This vote only affects proper nouns and common nouns using non-Chinese proper nouns as part of a common noun [...]". "Planck常数" and "Alzheimer病" are common nouns.

Are there any people who oppose having the soft-redirects?

What do you think about the speedy-delete procedure for Latin-containing mixed-script Mandarin terms? --Dan Polansky 09:49, 9 October 2011 (UTC)Reply

If you reread my comment I mention common nouns containing proper nouns in full - Planck and Alzheimer are proper nouns. We have one person, native Chinese speaker opposing soft-redirects. Speedy-delete procedure? Good idea. Forms like Thames河, London市 should be deleted on sight. If we agree on soft redirects, we have the common and standard Chinese term and somebody insists on having them, they could be converted to soft redirects. This practice should not be encouraged, native Chinese people don't consider them Mandarin. Borrowings are transliterated or translated into Chinese characters, exceptions are abbreviations. --Anatoli 10:10, 9 October 2011 (UTC)Reply

You would do well to ensure that every sentence you say is true. It is a poor practice to expect me to correct one your sentence from a later sentence. The sentence "The vote is about proper nouns, not common nouns.", ending in fullstop, is false, and you should acknowledge as much.

If the only person who opposes soft-redirects is 60.240.101.246, there is nothing to worry about: he is a self-proclaimed prescriptivist, who wants to protect the purity of language. --Dan Polansky 10:41, 9 October 2011 (UTC)Reply

I disagree. 60.240.101.246 is a native speaker. It's not prescriptivism, it's common sense. There is no real equivalent of "Alzheimer病" in English I can quote but think of errare humanum est. Is it attested? Yes. Is it used by English speakers and writers? Yes, a lot. Is it English, though? No. You and Engirst are using citations as a weapon to introduce words into Mandarin, which don't belong there. --Anatoli 23:42, 9 October 2011 (UTC)Reply

As I've said many times in the course of the debate, I don't care what language they're listed under, so long as someone can look them up. You want to rip them out of the dictionary as a whole, which is against the spirit of a multilingual descriptive dictionary. I'll see your errare humanum est and raise you noli illegitimi carborundum. Is it Latin? Certainly not. It doesn't look like English. So should we delete it from our dictionary and screw over all the users who might want to look it up?--Prosfilaes --70.180.206.122 09:29, 10 October 2011 (UTC)Reply

Of course, errare humanum est' is English. Of course, if it's used in English, it should also get an English section. It's useful, because it's an indication that it's used in English, and for pronunciation (I suspect it's not pronounced the same in English and in French, but I don't know how it's pronounced in English). The most popular French dictionary (Petit Larousse) has a famous section (pink pages) about these foreign phrases used in French. The principle presence of a section for a language if the term is used in the language is a very sound principle (and the only possible principle if we don't want to be subjective). Lmaltier 17:13, 10 October 2011 (UTC)Reply

I would be careful about generalizing and banning all mixed scripts in all languages. Some modern languages mix scripts as standard practice. Examples include some of the Caucasian languages that prefer to use Latin I instead of Cyrillic Ӏ; Ossetic prefers Latin æ to Cyrillic ӕ; and Chuvash prefers ă/ĕ/ç to Cyrillic ӑ/ӗ/ҫ. I know that some of us think we should force everyone in the world who uses a non-Roman script to adopt the recently devised Unicode Consortium ranges to write their languages, excluding all exceptions, but really, the native speakers and writers of each language do have a right to come to an agreement with each other to use the letters and code points that they decided upon. And in technical usage, it is not uncommon to find terms such as u-bend translated into some non-Roman script languages with the Roman letter u. There are many, many valid exceptions to a rule to ban all mixing of scripts. —Stephen ^(Talk) 09:48, 9 October 2011 (UTC)Reply

The vote doesn't concern languages other than Mandarin, especially if it's the norm for these languages to mix scripts. There are valid exceptions in Mandarin (and other languages) as well. 三K黨／三K党 (Sān-kèi-dǎng) and 三K党 (Sān-kèi-dǎng) (Ku Klux Klan) are perfect examples of Mandarin proper nouns containing Latin letters. They are Chinese inventions. --Anatoli 10:10, 9 October 2011 (UTC)Reply

Yes, I know, but -sche suggested making this a blanket ban against all mixing of scripts, asking, "Should we generalise it to all languages?" —Stephen ^(Talk) 10:21, 9 October 2011 (UTC)Reply

The vote is complicated as is, no need to generalise. I won't agree to generalisation. I think -sche meant Japanese. There is no current controversy there. The few exceptions are known and no-one is pushing unwanted mixed-script terms. --Anatoli 10:29, 9 October 2011 (UTC)Reply

Good points. Keep it specific to Chinese (perhaps even use our common sense not to speedily delete but to RFV existing Chinese entries which we know are good but which are not cited, as the vote does say they "can" be deleted, not that they "must" be). - -sche (discuss) 20:21, 9 October 2011 (UTC)Reply

Why are Banach空间, Banach空間 and Hilbert空间 deleted? They are cited. Please see here, here and here. 2.27.73.100 22:41, 9 October 2011 (UTC)Reply

Why should we reply to you when you never reply to anyone? Anyway, for others, the only compromise the majority of Chinese speaking editors except for one native speaker - Special:Contributions/60.240.101.246 (he is outright against such entries), could reach is a soft redirect, like this one Planck常数, provided the correct Mandarin entry exists. That, of course excludes, city, park, state, people, whatever names entirely in Roman letters with or without qualifiers, "London#Mandarin" or "London市#Mandarin" will be deleted on sight. As your entries are all bad - no value in them, close to 100%, we may use bulk delete of all your entries, under any IP-address you use for the sanity of Mandarin entries. I don't think there will be strong opposition to expelling you completely and deleting all your "work" in one go. --Anatoli 22:55, 9 October 2011 (UTC)Reply

@2.27.73.100/Engirst: FYI, citations have to be formatted correctly and placed in the entry or the Citations: page, it isn't enough to link to a Google search. Raw Google results are not acceptable citations, anyway; citations (for any word in any language) must be durably archived, which in practice means you should look on Google Books and Usenet (which you can access via Google Groups, but notice that not all Google Groups are Usenet groups). WT:" tells you how to format a citation of a Book, and you can look at entries like rainburn to see a common format for citing Usenet posts. - -sche (discuss) 23:12, 9 October 2011 (UTC)Reply

Understanding of what is a good Chinese entry will now differ unfortunataly as we now have 123abc's entries' advocates with no knowledge of Mandarin. "Ohm定律" and "Planck常数" are bad enough but next will be place and personal names in Roman letters - entirely of with place name qualifiers. In 123abc's point of view "London" or "London市" is also a Mandarin word. --Anatoli 22:26, 9 October 2011 (UTC)Reply

Nah, we'll delete "London" and "London市" (unless someone proves it means something non-SOP); there's been overwhelming consensus on both of those points, because we have London#English already, and because "London市" is sum-of-parts, just like "London city" or "the city of London" would be, except in the narrow, uncommon sense of that term. The "existing Chinese entries which we know are good" I referred to above are entries like "卡拉OK", which Fugyoo brought up. - -sche (discuss) 22:51, 9 October 2011 (UTC)Reply

Alright, to codify the two ideas we've reached agreement on (soft redirects, and speedy deletion), I will start a Wiktionary:Votes/ page for the soft-redirect policy vote, and then invite everyone to tweak and improve my wording. Dan, would you set up the vote on changing RFV procedure? :) - -sche (discuss) 00:02, 10 October 2011 (UTC)Reply

Wiktionary:Votes/pl-2011-10/Mixed script Mandarin entries. Discuss the vote's wording on the talk page, please, where I ask several questions. - -sche (discuss) 02:49, 10 October 2011 (UTC)Reply

I answered some questions, renamed to "Mandarin", added some comments and made some changes. We need to describe the criteria for the established and standard Mandarin terms containing Latin, Greek, etc. letters. --Anatoli 03:21, 10 October 2011 (UTC)Reply

@-sche and vote: I would create a vote for my proposal, but I want to let it sit in Beer parlour a bit longer, so people can comment on it, oppose it, and propose changes in wording. I think the discussion should better sit from 3 to 5 days in BP before I create a vote. An updated proposed wording is this:

A term that contains Latin letters and is marked as "Mandarin" can be speedy deleted without RFV process unless the citations namespace of the entry already contains attesting citations. Such a term can but does not have to be speedy deleted: each admin can decide to avoid deleting "卡拉OK" in spite of there being no citations in "Citations:卡拉OK".

A deletion summary, which is not part of the vote-to-be, could be this: "Mixed-script Mandarin entry that is not yet attested by quotations in citations namespace; see also WT:Attestation" Anyone please feel free to create a vote if I forget to do so in a couple of days. --Dan Polansky 09:59, 10 October 2011 (UTC)Reply

I don't like "Such a term can but does not have to be speedy deleted". An admin can choose not to delete any file they want, and this sentence gives no protection for when an admin walks by and does delete 卡拉OK. It provides no guidance and doesn't change the rules at all.--Prosfilaes 13:20, 10 October 2011 (UTC)Reply

The second sentence merely highlights the use of "can" rather than "should" in the first sentence, as such distinctions get easily overlooked. It emphasizes that a deletion is not a necessary consequence of missing quotations. The second sentence could be dropped, but it seems to me that it makes the first sentence clearer. --Dan Polansky 13:31, 10 October 2011 (UTC)Reply

I'm not a huge fan of that. I think it better if admins are mop wielders, not deciding whether or not a page is "good enough" to stick around. I also think it provides at best illusionary protection to 卡拉OK; whether that says can or should, an admin can walk by anytime and be fully justified in deleting it. If you want 卡拉OK to stick around, cite it; otherwise accept the fact that your new rule will make it speedyable.--Prosfilaes 13:42, 10 October 2011 (UTC)Reply

Here's an alternative for you:

A term that contains Latin letters and is marked as "Mandarin" should be speedy deleted without RFV process unless the citations namespace of the entry already contains attesting citations.

If there's going to be a vote, both alternatives can be offered for consideration. --Dan Polansky 13:51, 10 October 2011 (UTC)Reply

An example of cited mixed-sript Mandarin entry could be like this Banach空间 (This cited example has been deleted by Anatoli):

Mandarin

Noun

Beer parlour (simplified, Pinyin Banach kōngjiān)

Banach space
- Title Banach 空间结构理論; Author 赵俊峰; Publisher 武汉大学出版社, 1991; ISBN 7307011395, 9787307011397
  Banach 空间结构理論
  
  Banach Kōngjiān Jiégòu Lǐlùn
  Theory of Structure of Banach Space

2.27.73.173 12:30, 10 October 2011 (UTC)Reply

Oh you can talk? No, they will be deleted on sight in this format. That's a general consensus. --Anatoli 12:36, 10 October 2011 (UTC)Reply

Engirst AKA 2.27.73.173, free free to collect three properly formatted quotations at Citations:Banach空间. However, chances are the entry will be restored only days later: you have been evading blocks and showed very little cooperation with other editors, so restoring entries that you have created is no priority for Wiktionary editors. --Dan Polansky 13:43, 10 October 2011 (UTC)Reply

The entry should be formatted exactly as Planck常数 ("mixed language") - a soft redirect to 普朗克常数 (Pǔlǎngkè chángshù) ("correct term"), Dan Polansky, you and all editors agreed to this. Banach空间 will not be created before 巴拿赫空间 (Bānáhè kōngjiān) exists. Some didn't agree to this condition. We may need to go ahead with the vote - Wiktionary:Votes/pl-2011-10/Mixed script Mandarin entries. --Anatoli 22:00, 10 October 2011 (UTC)Reply

Alright, let's go ahead and make "卡拉OK" speedily-deletable. (Admins should use common sense in deciding what to delete, and defer to Mandarin-speaking editors when uncertain, but let's accept for the moment the presumption that they will not.) As Dan said, "卡拉OK would be kept as soon as it would be attested in Citations:卡拉OK. Attesting those few Latin-containing Mandarin entries that we already have and are genuinely attestable should be a manageable amount of work, don't you think?" I'll start citing some of them. - -sche (discuss) 22:42, 10 October 2011 (UTC)Reply

Yes, we should save genuine "mixed script" Mandarin entries and make a clear distinction between "mixed script" terms and "mixed language" (code-switching). --Anatoli 23:39, 10 October 2011 (UTC)Reply

An administrator shouldn't uses double standard. Please see here. 2.27.72.128

A revised version of the vote has started: Wiktionary:Votes/pl-2011-10/Mixed script Mandarin entries. --Anatoli 22:14, 17 October 2011 (UTC)Reply

Attestation vs. the slippery slope

Latest comment: 13 years ago4 comments3 people in discussion

I would like again to get the section WT:CFI#Attestation vs. the slippery slope removed from CFI. A previous attempt at Wiktionary:Votes/pl-2011-01/Final_sections_of_the_CFI ended 5:4:0 for deletion.

I argue that the section is needless and misleading.

The section is needless, as, if it gets removed, the following dialogue covers the case:

Alice: Adding the entry for the particular term "ttt" will lead to entries for a large number of similar terms. Thus, we should delete "ttt".
Bob: That is not a CFI consideration. CFI mandates that a term should be included if it is attested and idiomatic.

Done; no need to list every wrong argument for deletion in CFI.

The section is misleading, as two of its bullet points refer to "common use" and "general use" in contradiction with "Attestation" section, implying that a term in pig Latin should be included only if it "has found its way into common use". My undestanding of how CFI should work is that a term in pig Latin should be included only if it is idiomatic and attestable, regardless of whether it "has found its way into common use".

Do any opposers of the vote find any of this convincing? Are there any new supporters of the removal of the section? --Dan Polansky 08:25, 9 October 2011 (UTC)Reply

For anyone who would want to respond is a poll-like fashion, in which discussion is of course also welcome, here are some templates: {{subst:support}}, {{subst:oppose}}, {{subst:agree}}. --Dan Polansky 08:34, 9 October 2011 (UTC)Reply

I supported (and still support its removal) as it's not criteria for inclusion, but rather more of a discussion about what to include and what not to. If anything it's more suited to Wiktionary talk:Criteria for inclusion! Mglovesfun (talk) 11:46, 9 October 2011 (UTC)Reply

Support: I'd like to see WT:CFI clarified, and removing the WT:CFI#Attestation vs. the slippery slope section would help in that regard. -- Eiríkr Útlendi │ Tala við mig 17:58, 18 October 2011 (UTC)Reply

fr:Template:en-nom-rég2

Latest comment: 13 years ago6 comments4 people in discussion

Hello, I propose to merge this template with {{en-noun}}. It automatically displays the plurals and their pronunciations. JackPotte 13:49, 9 October 2011 (UTC)Reply

We don't indicate the pronunciations of words in the headword line of our entries on en.Wikt, though, we indicate pronunciations in the ===Pronunciations=== section. A very large number of English words have at least two different pronunciations (UK and US); some words have eight or more possible pronunciations (of the singular alone!), like pecan. That would require the headword line to be more a headword paragraph! - -sche (discuss) 20:33, 9 October 2011 (UTC)Reply

What -sche said. Jamesjiao → ^{T ◊ C} 03:24, 10 October 2011 (UTC)Reply

Clever thing mind you, it attempts to work out the pronunciation of the plural based on the IPA inputted and it attempts to work out the plural using only the PAGENAME. --Mglovesfun (talk) 12:03, 10 October 2011 (UTC)Reply

I've just finished fr:Template:fr-accord-rég2. JackPotte 18:51, 15 October 2011 (UTC)Reply

And fr:Template:es-rég and fr:Template:pt-rég2. JackPotte 21:27, 16 October 2011 (UTC)Reply

Making 'see also' clearer to users

Latest comment: 13 years ago6 comments4 people in discussion

We use the template {{also}} to show links to other pages that are written with the same letters but with diacritics or capitals. The recent discussion at Wiktionary:Feedback#Prestige shows that this can be very confusing to new users. It has to be added to every page and it's easy to miss a few possibilities or even just to forget to add it. And compared to fr:Prestige, it's just too small and doesn't stand out. It's very easy to miss. The 'see also' text itself isn't really always confusing to users, only if the difference is just capitalisation. When you edit a new page beginning with a capital, like Nonsenseword, the wiki software warns you that the title might not be correct. But there is no warning if the page already exists.So for that reason I think it would be nice if warnings about capitalisation could be automatically added to every page, perhaps even outside the wikitext. —CodeCa t 11:59, 10 October 2011 (UTC)Reply

maybe 'the title of this page is {{PAGENAME}}, see also […] '. --Mglovesfun (talk) 12:01, 10 October 2011 (UTC)Reply

That isn't really any clearer at all, it just repeats the name of the page. The problem isn't that users don't see the name of the page, it's that they don't understand the significance of the capitalisation. The current system with {{also}} helps somewhat to clarify this, but it's not very obvious to users and it's not used consistently enough either. —CodeCa t 12:05, 10 October 2011 (UTC)Reply

It might help to provide a more visible contrast between say Fish and fish. Mglovesfun (talk) 16:27, 10 October 2011 (UTC)Reply

I'd like to put something Wikipedia does for ambiguous titles, eg "This entry is a name, for other senses see fish" or "This entry is about a German noun, for other languages see prestige". I think a bot could do it. Fugyoo 22:23, 11 October 2011 (UTC)Reply

Maybe have {{also}} display "Entries for similar words:" or similar.—msh210℠ (talk) 00:44, 12 October 2011 (UTC)Reply

Why is Banach空间 deleted?

Discussion moved to Talk:Banach空间.

Bot generation of Portuguese verb forms

Latest comment: 13 years ago3 comments3 people in discussion

I have noticed that there are quite few entries for Portuguese verb forms (mainly some forms generated by WF's bot)and currently no bot dealing with their creation. So I have modified my User:BuchmeierBot code in order to be able to deal with Portuguese verb conjugation tables. I would like to generate the forms of verbs, that already have a conjugation table (of course after checking for correctness of the conjugation). Should I start a vote? Matthias Buchmeier 16:10, 10 October 2011 (UTC)Reply

No. We trust you. Just go for it. SemperBlotto 16:12, 10 October 2011 (UTC)Reply
- Seconded. Just start slowly and build up speed (I speak from experience). Mglovesfun (talk) 16:24, 10 October 2011 (UTC)Reply

We shouldn't use double standard

Latest comment: 13 years ago6 comments5 people in discussion

- -sche said: "deleted, per the precedent and discussion of WT:RFD#Москва". pizza#Mandarin is deledted, so OK#Mandarin should be deleted as well. Actually their meaning can be found from English entries. So, they are not necessary. 2.27.73.173 18:56, 10 October 2011 (UTC)Reply

You're right, we shouldn't. We don't want to have a double standard for you as opposed to anyone else who won't listen to what people say. —CodeCa t 19:01, 10 October 2011 (UTC)Reply

I propose this post by a banned user who does not cooperate with Wiktionary editors, rarely answers questions but feels himself entitled to start a new BP dicussion whenever he sees fit, and possibly cannot even read Chinese characters, is left without any further response. --Dan Polansky 19:02, 10 October 2011 (UTC)Reply

You have NO STANDARDS. That's the problem. You keep inventing stuff AND not being consistent with what you do. You have been blocked again for 3 days for doing this shit: [10]. Jamesjiao → ^{T ◊ C} 21:28, 11 October 2011 (UTC)Reply

Japanese Romaji used "-" for suffixes as well, please see here. So, you use double standard indeed. Alexando 07:16, 12 October 2011 (UTC)Reply

Sorry.. you didn't really respond to the 'inconsistency' part. Besides, why are you compareing Mandarin with Japanese again? Jamesjiao → ^{T ◊ C} 21:22, 16 October 2011 (UTC)Reply

Phrasebook, again

Latest comment: 13 years ago19 comments10 people in discussion

I think Equinox pretty much got it at Talk:I'm transsexual - our phrasebook, as is, is a sick joke. Pretty much half the phrases are about sex (some of them as silly as I'm horny - I mean c'mon, who actually says that?), while actual phrases that you would find in a printed phrasebook (what day is it, can you give me directions, etc.) are curiously absent. This shows it needs some kind of reform, and most importantly a radical pruning. -- Liliana • 22:07, 11 October 2011 (UTC)Reply

Agreed. Having travelled to countries with languages that I speak very little of, I'd say most of the sex-related phrases should be removed, unless you travel for the sole purpose of fornication. Jamesjiao → ^{T ◊ C} 22:14, 11 October 2011 (UTC)Reply

Being transsexual has nothing at all to do with sex, though. —CodeCa t 22:31, 11 October 2011 (UTC)Reply

is it necessary though? I mean, if you were a real transperson, the last thing you would do is disclosing it to strangers... no? -- Liliana • 22:33, 11 October 2011 (UTC)Reply

Yeah, it's hardly something you'd just drop into a conversation, is it? One that made me laugh earlier was "I'm mute", as though a mute person could actually say it. BigDom (t • c) 22:39, 11 October 2011 (UTC)Reply

I'm illiterate is a good one as well. -- Liliana • 22:42, 11 October 2011 (UTC)Reply

Well, a mute person could write the phrase. - -sche (discuss) 22:44, 11 October 2011 (UTC)Reply

The pronunciation section is rather pointless though. -- Liliana • 22:46, 11 October 2011 (UTC)Reply

True.. but how hard is it for a mute person to express this notion via body language? I'd bet body language (point at mouth, wave hands) can convey this more swiftly. Jamesjiao → ^{T ◊ C} 22:48, 11 October 2011 (UTC)Reply

It could be said over the internet? —CodeCa t 23:11, 11 October 2011 (UTC)Reply

Someone who is recognizably foreign might be misunderstood as trying to convey their inability to speak the local language, rather than their inability to speak at all. (But I agree with DCDuring, below. To the extent possible, we shouldn't be asking "Could this be useful?", only "Is this useful?") —Ruakh_TALK 20:13, 12 October 2011 (UTC)Reply

Does a phrasebook require more constancy of purpose and contributor discipline than we can sustain? Subtle aspects of policy don't seem sustainable for very long here. We seem be susceptible to anarchism.

If we had users constantly asking us how to say or write phrasebook-type expressions, we could at least focus on meeting the needs of real users. But we have only the vaguest notion of what we are trying to do. Though pruning might be necessary, I doubt that it is the key breakthrough that a phrasebook needs to achieve success at Wiktionary. DCDuring TALK 23:25, 11 October 2011 (UTC)Reply

Agree to the clean up. Also agree that "I'm mute" is useful. We are are a written dictionary. You can write or print the translation in another language. --Anatoli 06:50, 12 October 2011 (UTC)Reply

I'll agree as well then, sometimes I think our phrasebook is about who can create the silliest entry without it being deleted. I'm might go with I'm fucked meaning I'm drunk, I'm tired, I'm disabled/crippled, I'm in trouble (etc.). Mglovesfun (talk) 10:47, 12 October 2011 (UTC)Reply

For the record, Liliana, I say "I'm horny" all the motherfucking time. Frequently there are some qualifiers between the subject/verb and adjective. But yeah. All the time. I'm horny right now, even. — [Ric Laurent] — 11:25, 14 October 2011 (UTC)Reply

Do you frequently feel the need to say it in languages that you don't even speak well enough to construct a simple sentence in? —Ruakh_TALK 13:17, 14 October 2011 (UTC)Reply

Uh... What kind of question is that? Was it even serious, or just meant as some sort of affront. — [Ric Laurent] — 00:44, 4 November 2011 (UTC)Reply

Oh I am so glad you're unable to hit on me. On a more serious note, what about other phrasebooks? I know it's not a CFI rule, but still a good guideline. -- Liliana • 19:53, 14 October 2011 (UTC)Reply

I could hit on you if I wanted. "Unable" is perhaps the wrong word. — [Ric Laurent] — 00:44, 4 November 2011 (UTC)Reply

123abc, again???

Latest comment: 13 years ago15 comments9 people in discussion

See http://en.wiktionary.org/wiki/Special:Contributions/Christofo -- it appears that the many-times-banned user is back, now creating hyphenated pinyin entries as if that is a normal thing to do. Please direct attention to this development. 71.66.97.228 01:13, 12 October 2011 (UTC)Reply

Yes, your diagnosis was correct. Nuked all his entries. --Anatoli 06:46, 12 October 2011 (UTC)Reply

Japanese Romaji used "-" for suffixes as well, please see here. So, you use double standard indeed. Alexando 07:18, 12 October 2011 (UTC)Reply

this has been mentioned multiple times before. Japanese entries have nothing to do with Mandarin entries. If there are issues, they need to be treated separately. Obviously you don't listen. So... I am gonna block you again.. this time, I will not allow you to create new accounts. Jamesjiao → ^{T ◊ C} 03:44, 13 October 2011 (UTC)Reply

He is very good at avoiding all blocks and generating new IP-addresses whenever he wishes. He was blocked multiple times including range blocks. He doesn't have a lot of linguistic or communication skills but he's got that skill.

As for the issue with Japanese, first of all, it's a language policy. If most editors agree to do it one way, it goes, if not, then there's a vote. Japanese editors may be happy to discuss the issues of triplication related to Romaji entries. The Romaji entries usually contain the minimum information, so no one complained and Romaji entries were created ONLY when Kana/Kanji entries were also there. --Anatoli 06:42, 13 October 2011 (UTC)Reply

User 123abc again, again

Blocked, and still edits?

http://en.wiktionary.org/wiki/Special:Contributions/Afex

He's now assiduously adding bible verses (and links) to Mandarin entries. What is wrong with this project that this has happened several dozen times now, over a period of nearly a year? 71.66.97.228 19:54, 12 October 2011 (UTC)Reply

I guess the problem is that {users with enough knowledge of Chinese to deal with his edits} and {users with enough technical knowledge to deal with his edits} seem to be two mutually exclusive sets. (The overlap of these sets with {users with enough time and patience to deal with his edits} may also be relevant.) Previously, I've tried to address this by starting a vote that would reduce the amount of knowledge of Chinese that was necessary — in fact, my goal was to make the formatting for Mandarin pinyin entries so restrictive that it could be enforced by a bot — but Chinese-speaking editors' responses to the vote, while positive in tone, just left me more confused than ever. So maybe I should work on it from the other angle: trying to reduce the amount of technical knowledge that is necessary, in the hopes that that will enable the Chinese-speaking administrators to cope with his edits better. —Ruakh_TALK 20:09, 12 October 2011 (UTC)Reply

The pinyin entries are now as simple as can be, see yánlì, all Category:Mandarin pinyin should be formatted as per Wiktionary:Votes/2011-07/Pinyin entries. If they are automatically created by a bot and the job is good, we should revisit it. The Chinese entries are indeed, a bit complicated, noteably the "rs" value (radical sort for the initial character) but this info is available in Wiktionary. Anatoli 21:47, 12 October 2011 (UTC)Reply

The main problem is that somebody don't know the function of Pinyin entry especially for learners, but oppose Pinyin just because of don't like Pinyin. For your reference, an good example for make use of Pinyin entry for learners, please see here. Afex 20:41, 12 October 2011 (UTC)Reply

The problem is your unwillingness to engage in dialogue unless things are going you're way. You're happy to engage in dialogue when people are agreeing with you, and when people stop agreeing with you, you just clam up. Mglovesfun (talk) 20:46, 12 October 2011 (UTC)Reply

Record of fact cannot be deleted. Please see here. Afex 21:27, 12 October 2011 (UTC)Reply

Do you intend in engaging in dialogue? Mglovesfun (talk) 06:26, 13 October 2011 (UTC)Reply

Why not if you like, but don't block me and try to close my mouth first. Sundy 12:26, 13 October 2011 (UTC)Reply

If you want to engage in dialog, use Engrist. As it is you are abusing multiple accounts which is against the rules. All other accounts will be indefinitely banned on sight. - [The]DaveRoss 20:13, 13 October 2011 (UTC)Reply

123abc talked on Mglovesfun's talk page, for the first time I see more than one sentence at a time. --Anatoli 22:37, 13 October 2011 (UTC)Reply

Colloquialisms and nonstandard terms

Latest comment: 13 years ago2 comments2 people in discussion

Are colloquialisms considered nonstandard terms? My take is that they are not, hence my edit to Template:lexiconcatboiler/colloquialism. --Dan Polansky 10:04, 12 October 2011 (UTC)Reply

Sort of, I suppose all slang terms, informal terms and colloquial terms nonstandard. Mglovesfun (talk) 10:53, 13 October 2011 (UTC)Reply

New administrator nomination - User:Haplology and User:Eirikr

Latest comment: 13 years ago1 comment1 person in discussion

Please don't ignore the new nomination - Wiktionary:Votes/sy-2011-10/User:Haplology for admin. He has been very active in Japanese and works quite professionally - Special:Contributions/Haplology.

I also nominated User:Eirikr, another Japanese editor but he is not available at the moment, the vote will start as soon as he accepts it. --Anatoli 06:58, 13 October 2011 (UTC)Reply

Linking to a particular sense within an entry

Latest comment: 13 years ago6 comments3 people in discussion

Is there any way to link to a particular sense within an entry, rather than to the entire entry? I know how to use the pound sign to link to a section, but for most words (I'm working with Chinese entries) this will only go as far as the section for a particular language, not to the individual senses. I have read about MediaWiki's "subpage" feature, but I don't know if that would work. In particular, I would like to be able to link words in a Wikisource document to the particular sense used in that context. If this is not currently possible, where would I start in proposing this feature, or perhaps in helping to implement it? Craig Baker 20:52, 13 October 2011 (UTC)Reply

You can use {{senseid}} to link to a particular sense. - [The]DaveRoss 21:10, 13 October 2011 (UTC)Reply

There is no documentation to {{senseid}}. How does one link to a properly formatted sense ? DCDuring TALK 23:45, 13 October 2011 (UTC)Reply

Essentially it just sets up a span id which you can then refer to like any other anchor. The formatting is as follows: (taken from peach) # {{senseid|en|fruit}}, the first parameter is the language section and the second parameter is a unique (for the page) gloss which is also the name of the anchor when referring back to the sense. To refer back you include the language and gloss [[peach#English-fruit|peach]] resulting in peach. This certainly should be documented at the template too. - [The]DaveRoss 01:36, 14 October 2011 (UTC)Reply

I can't get it to work when the gloss contains spaces; for example, neither among: mingling or intermixing nor among: mingling_or_intermixing works. Does anyone know how to do it? If I find out, should I add the senseid doc page, or should I wait for those involved in its development to write docs? Craig Baker 03:51, 19 October 2011 (UTC)Reply

As no one has stepped up to add the documentation, you might take a run at it. DCDuring TALK 11:31, 19 October 2011 (UTC)Reply

An idea... wanted languages

Latest comment: 13 years ago18 comments6 people in discussion

We have pages for wanted entries, but so far we're lacking a list that shows which languages are in the most need of improvement. For example, our Old Norse coverage is quite bad given its popularity, and there aren't many Estonian entries either. It would be nice to see at a glance which languages need the most work, so that editors (also potential new editors) can see if their skills would be especially needed on Wiktionary. —CodeCa t 21:04, 13 October 2011 (UTC)Reply

Some kind of easily available statistics per language would also be good. It won't show the quality of entries or translations but some education. Also, it may sound harsh for small languages but what people think about ratings or "languages in bad need of contributions"? Well, we have few entries in Old Norse but how important is it? We also have very little Burmese, Lao, Malay, let alone Sinhalese content. These are state languages with millions of speakers but we don't have very few contributions in these languages. --Anatoli 23:10, 13 October 2011 (UTC)Reply

Re: "Some kind of easily available statistics per language would also be good": We have Wiktionary:Statistics; and if there's anything that you want that isn't already there, I bet you can convince Conrad to add it. —Ruakh_TALK 03:10, 14 October 2011 (UTC)Reply

Thanks for the advice. After posting, I actually found Wiktionary:Statistics. That's useful. --Anatoli 03:41, 14 October 2011 (UTC)Reply

Why not go ahead and start a draft somewhere? -- Liliana • 03:46, 14 October 2011 (UTC)Reply

Perhaps worth discussing first what we want to achieve. Will a new policy attract new editors? Having a list of languages in need of improvement is a good start or something (better than nothing). Statistics may show only the quantity, not quality.

If the statistics is true for the last year, look at number of entries for some official languages:

Sinhalese - 75
Malagasy - 84
Kazakh - 173
Burmese - 134
Kyrgyz - 152
Malay - 409
Lao - 558

Do we need to advertise? --Anatoli 04:15, 14 October 2011 (UTC)Reply

I adore your Russian bias. Belarusian is very much in need of improvement. -- Liliana • 04:17, 14 October 2011 (UTC)Reply

Not sure whether you were sarcastic, I actually didn't mention Russian or any Slavic languages. Yes, that's right. Belarusian needs improvement but Belarusians themselves do not seem to be worried about their language loss. --Anatoli 04:39, 14 October 2011 (UTC)Reply

The last sentence is so true. 60.240.101.246 09:42, 16 October 2011 (UTC)Reply

No comments? I'm throwing in Interlingue then - for a constructed language, its coverage here is poor at best. -- Liliana • 09:39, 16 October 2011 (UTC)Reply

I've created Wiktionary:Languages needing improvement. —CodeCa t 13:08, 16 October 2011 (UTC)Reply

I have added some obvious ones, focusing on state languages. --Anatoli 22:55, 16 October 2011 (UTC)Reply

It's hard to tell which languages are actually needed. Needed in what way, for what purpose? Russian has many speakers, but only a few English words are loans from Russian. The etymology sections of English entries have more use for French and Latin words (and Old Norse) than for Russian words. But another approach is to ask, for which languages could a recruiting campaign yield good results? In that case, we can compare the ranking of each language in WT:Statistics with its ranking in the list of Wikipedias. The Russian Wikipedia is large and growing fast, and there are many Russian wikipedians, which could be recruited to Wiktionary. There are far fewer active Arabic speaking wikipedians, so trying to recruit among them would be less successful. Of the languages closest to me, Norwegian has the healthiest Wikipedia, far larger than the Danish Wikipedia. And Norwegian is still a very small language in WT:Statistics (smaller than Danish), so recruiting among Norwegian wikipedians could be useful. --LA2 18:01, 17 October 2011 (UTC)Reply

Quote: "Russian has many speakers, but only a few English words are loans from Russian. The etymology sections of English entries have more use for French and Latin words (and Old Norse) than for Russian words." What does this has to do with wanted languages? --Anatoli 02:53, 19 October 2011 (UTC)Reply

It has to do with the definition of "wanted" as a measurement of which languages are more wanted than others. If more Russian entries are "wanted", by which definition of "wanted" is that? --LA2 08:39, 19 October 2011 (UTC)Reply

I never said Russian was wanted more than others. We do have Russian speaking editors but very few in other languages. You're answering a question with a question. So, by your logic, contributions into a language close to English or which gave English more borrowings (German, Dutch, Latin, French?) are more important than those further away from English - Chinese, Hindi, Burmese, Sinhalese? --Anatoli 08:51, 19 October 2011 (UTC)Reply

I'm just mentioning etymology sections of English words as one possible definition of a want. Q: Why do we need Latin entries? A: Among other things, to explain the etymology of English words. (This means we would focus on those Latin words that have found use in English.) But by that measurement, the want for Russian entries is not so large. To just list the most "wanted" languages is pointless unless we define the want. If the purpose of the list is to recruit new contributors, then we can skip the definition of a want, and instead list languages for which we might successfully recruit new contributors. Russian would be a good candidate, because there are many active Russian wikipedians, who only need a little extra training to become productive in Wiktionary. --LA2 16:47, 19 October 2011 (UTC)Reply

I'm still puzzled about your idea of a "want for a language" being linked to English somehow - borrowings, etymology of English words, as if other languages only serve the purpose of understanding English words better. Anyway, as it was mentioned above Wiktionary:Statistics shows languages with few contributions or individual categories, like Category:Sinhalese_nouns show how few noun entries Wiktionary has (or other parts of speech). To be fair to all languages - a number of entries alone could be a criterion for "wanted". As discussed on Wiktionary_talk:Languages_needing_improvement, other criteria could be chosen as well - number of speakers (Hindi wins over Cebuano), the status of a language - official (Sinhalese has less native speakers than Kannada in India but it's official (along with Tamil) and it may be more important to reach out to a whole country), co-official (do you really need to know Maori, Irish or Hindi to communicate in New Zealand, Ireland or India?), only spoken and seldom written (Hakka, Min Nan have many speakers but is there enough written material or information broadcast in these languages. As User:CodeCat said, even official languages with a very small population like Marshallese may have lower priority than an unofficial language with a lot of speakers. Is a language or a dialect important for survival in a country? You can't do without Mandarin in China but you can get away without any Indic language in India. Not meaning to lower the status of any language or dialect, it's just some ideas for consideration coming to mind. --Anatoli 21:59, 19 October 2011 (UTC)Reply

Not the kind of jumper that makes you itches

Latest comment: 13 years ago3 comments3 people in discussion

[11] "He said, I know a little Latin, man a cus man a kai / I said I don't know what it means; he said neither do I". Do any of us know? Sounds more like Greek. Equinox ◑ 21:05, 13 October 2011 (UTC)Reply

Maybe it is w:Manacus, Manacī. —Stephen ^(Talk) 22:45, 13 October 2011 (UTC)Reply

I think it's a garbled version of amicus amici. Fugyoo 00:23, 14 October 2011 (UTC)Reply

Block page spoiled with JavaScript

Latest comment: 13 years ago5 comments3 people in discussion

When you have to block someone, the page now has an extra dropdown box that disappears or reappears depending on your selection. It disappears with a stupid JavaScript "delayed fade" effect. This means you cannot efficiently use the Tab key to move from one UI control to the next. Who makes these retarded decisions? Equinox ◑ 21:39, 13 October 2011 (UTC)Reply

I dunno, the tabbing works pretty O.K. for me. Even if I tab to the control right before it disappears, Firefox remembers my position in the tab order, so if I hit tab again, it moves me to the next field. How does it behave in your browser? —Ruakh_TALK 01:46, 14 October 2011 (UTC)Reply

If I tab while the "tabbee" is in mid-fade, the focus apparently vanishes. It could be a problem with Opera, since the focus should certainly never be on an invisible thing, but I can't be sure exactly where the focus is, and anyhow given the general awfulness and incompatibility of browsers you'd hope that stuff like this would be tested thoroughly. My main objection is that the "fading" is purely a cosmetic gimmick, offering nothing useful (modally hiding controls is nasty anyway — why not disable them?), and yet manages to get in the way. Equinox ◑ 01:52, 14 October 2011 (UTC)Reply

I see. What happens if you hit tab after the fade-out? By the way, I think that if — if — you're going to hide controls this way, then the fading is actually a good idea, since it gives the user time to register what's happening. Otherwise they'll just catch that something changed, but they won't understand what. But yeah, I agree with you that it would be better to just disable the control. We can probably override this somehow with site-wide JavaScript, though I don't know if it's a good idea to do so, since I doubt it's intended to be messed with. —Ruakh_TALK 03:00, 14 October 2011 (UTC)Reply

Duh! I knew something annoying had happened, but couldn't quite figure out what it was. If it ain't broke, don't fix it! SemperBlotto 07:18, 14 October 2011 (UTC)Reply

Brand names and physical products

Latest comment: 13 years ago3 comments3 people in discussion

WT:CFI (WT:BRAND in particular) says this: "A brand name for a physical product should be included if it has entered the lexicon". Some people in RFV (DCDuring, Equinox, and others) have been acting as if the part "for a physical product" were not there, arguing that WT:BRAND is intended to cover banking services, among other things. I have repeatedly argued that, whatever the part of CFI is intended to do, what it actually does is speak only of physical products, which are tangible, space-extended objects with non-zero mass, such as food, clothing, footwear, consumer electronics, and cars, but not software, databases (data collections), books, movies, and the like.

Please, let those who want WT:BRAND to apply to all brand names including "Citibank" and "Lufthansa" create a vote that removes "for a physical product" from CFI's section for brand names. Then the repetitive discussions in RFV are over.

By contrast, I would like to see WT:BRAND removed from CFI. There is IHMO no serious risk of commerical spam relating to inclusion of brand names. Above all, single-word brand names can host interesting lexicographical material, including pronunciation and etymology. --Dan Polansky 07:59, 14 October 2011 (UTC)Reply

Our entry physical doesn't cover it, but I think there's a difference in two senses of physical here. For example is a table physical in the same way that wind or heat is physical? So a website isn't a 'physical product' like a table is, but it can be considered physical in terms of bits on a server, which correspond to electricity (um, I think, I'll let the experts explain it).

Specifically in response to Dan Polansky, I agree that some products are non-physical. Cartoon characters like Mickey Mouse are non-physical. They may have physical representations (toys, etc.) but are by nature non-physical. It would be nice to clean up WT:BRAND and WT:COMPANY. Mglovesfun (talk) 09:57, 14 October 2011 (UTC)Reply

Thanks for raising this issue. I agree that editors have been wrongly trying to enforce WT:BRAND's rules for things that are not physical products — just because something has some physical reality, that doesn't make it a "physical product" — but I support resolving the issue removing the "physical product" bit. —Ruakh_TALK 11:17, 14 October 2011 (UTC)Reply

Patrolling enhancements now on by default, and now include deletion.

Latest comment: 13 years ago31 comments7 people in discussion

Admins —

I've been bold and made two big changes to the patrolling enhancements. If anyone disagrees with either of them, please either revert, or let me know and I'll revert.

The changes are:

A "delete" button is now added for each newly-created page that has not yet been marked as patrolled. A text field also appears at the bottom of the page; whenever you click an edit's "delete" button, the current contents of the text field will be used as the deletion reason (the edit-summary-like message that appears in the deletion log). For example, if you are an administrator who knows Chinese, you can just visit http://en.wiktionary.org/wiki/Special:NewPages?hidepatrolled=1 every day or two, type something like "Engirst cruft" in the text-field, and go to town.
- The text field looks kind of crappy, and is probably confusing. I welcome any improvements.
- There's no drop-down to choose one of the predefined deletion reasons at MediaWiki:Deletereason-dropdown. Anyone who's better at UI design than I am, please feel free to add this. :-)
The patrolling-enhancement Gadget is now turned on by default for anyone with the "patrol" right.
- If you dislike it, you can turn it off via Special:Preferences: in the "Gadgets" tab, uncheck "Patrolling enhancements – makes it faster and easier to mark edits as patrolled.".
- Also, if you dislike it, please comment here. If it turns out that multiple admins dislike it, then we should probably de-defaultize it.
- Edited to add: Of course, it would be even better if we could improve it so that all admins do like it, if that's possible.

I welcome any questions, comments, suggestions, concerns, threats, . . .

—Ruakh_TALK 14:52, 14 October 2011 (UTC)Reply

Addendum: By the way, I should have mentioned: the code for the Gadget itself is at MediaWiki:Gadget-PatrollingEnhancements.js. The bit of wikitext that turns it on by default is at MediaWiki:Gadgets-definition. —Ruakh_TALK 15:29, 18 October 2011 (UTC)Reply

Thanks!—msh210℠ (talk) 18:05, 17 October 2011 (UTC)Reply

I suspect it just isn't working properly, but shouldn't individual new pages have a 'delete' button next to them, not just a single delete button. Or else how do I know what I'm deleting? A small 'delete' button next to every new page that's also an unpatrolled edit sounds fine to me. But currently, that isn't what this is. Mglovesfun (talk) 15:12, 14 October 2011 (UTC)Reply

For me, in Firefox 7, in IE 8, and in Chrome, I do have a small "delete" button next to individual new pages. What browser are you using? I can try to debug . . . —Ruakh_TALK 15:26, 14 October 2011 (UTC)Reply

I've just tried to delete "super-calli-frage-listic-epi-ali-doctus" with delete reason of "tosh" and I get a message saying that a token must be set. SemperBlotto 15:31, 14 October 2011 (UTC)Reply

Yup, bug. (Introduced during migration from my personal JS to the Gadget's JS.) I noticed and fixed it a moment ago. Sorry about that. :-/ —Ruakh_TALK 15:35, 14 October 2011 (UTC)Reply

I use Firefox. Will clear my caché now to see what the current version is like. Mglovesfun (talk) 15:50, 14 October 2011 (UTC)Reply

I have two patrol buttons and two delete buttons. Using the deletion summary didn't work, it just displayed the default. Mglovesfun (talk) 15:54, 14 October 2011 (UTC)Reply

Re: two patrol buttons and two delete buttons: keeping up my string of excessive boldnesses for the morning: http://en.wiktionary.org/w/index.php?title=User:Mglovesfun/vector.js&diff=14073538&oldid=14039687. Re: deletion summary not working: Oops, thanks, you're right, it doesn't work for me anymore, either. It worked yesterday, though, so hopefully it's a quick fix. —Ruakh_TALK 16:00, 14 October 2011 (UTC)Reply

O.K., that's working now. Thanks again. :-) —Ruakh_TALK 16:08, 14 October 2011 (UTC)Reply

I just tried the delete button in Firefox and in Opera; it worked in FF and in Opera. :) Is the "mark" button intended to be used in conjunction with another feature? If not, it just seems to allow marking as patrolled with checking, which seems odd. (Nonetheless, it works in both browsers.) - -sche (discuss) 17:03, 14 October 2011 (UTC)Reply

I'm sorry, I don't understand the question. What do you mean by "marking as patrolled with checking"? :-/ —Ruakh_TALK 17:43, 14 October 2011 (UTC)Reply

Oops, I mean "without checking". In the past, I had to click on "diff" and look at the diff to find the "mark as patrolled" button. Now, I could just click "mark" in Recentchanges, without checking the diff to see if it was vandalism or not. Why would I do that...? - -sche (discuss) 18:19, 14 October 2011 (UTC)Reply

Ah, I see. You're right, of course, but there are a number of cases where it's useful:

Whitelisting (whereby the button gets "clicked" automatically when you load the page).
- A number of pages in the Wiktionary: namespace are whitelisted. These are pages that are so high-traffic that we don't really have to worry about vandalism going unnoticed and unreverted. Similarly, all pages in the User talk: namespace are whitelisted, as are users' edits to their own user-pages and sandboxes (e.g., in my case, User:Ruakh and User:Ruakh/Sandbox).
- An IP address can be whitelisted, which has roughly the same effect as granting a user the "autopatrolled" privilege (except that it's mediated by this Gadget, rather than being built-in).
- When granting the "autopatrolled" privilege to a user, we can also whitelist him/her temporarily, so that their existing unpatrolled edits can be quickly marked as patrolled.
If there are a bunch of edits to a single page, I can just go to its history, view the overall diff of edits, and if the overall result is O.K., then I don't need to view each individual diff to mark all the edits as patrolled.
If there are a bunch of similar-looking edits by a single editor (e.g., creating thirty Khmer nouns in an hour, with the automated edit-summaries that show you the initial page contents), then I can just look at a representative sample of edits to confirm that there's no funny business going on, then mark a bunch of edits as patrolled in short order.
- I also have code in my own common.js that applies the patrolling enhancements to user-contributions pages, which makes this a bit easier for me. I haven't added it to the Gadget, though, because I'm not sure if it's ready for prime-time.

In addition, you were right that there's another feature that someone (maybe Connel MacKenzie?) intended for it to be used in conjunction with:

If I have Lupin's Popups turned on, then I don't have to actually click on the diff to see what changed. (That's the Gadget whose description reads, "Navigation popups, page previews and editing functions popup when hovering over links".)

but I find that feature very annoying, so I almost always have it turned off. (Still, you might as well try it out and see what it does. Even if you find it as annoying as I do, you still might find uses for it.)

—Ruakh_TALK 18:53, 14 October 2011 (UTC)Reply

Good points. Thank you for the comprehensive reply. Oh, and I had tried Lupin pop-ups before, but they only showed a bunch of links to the page's talk page, whatlinkshere, deletion logs, etc, without any page content. I tried them again now, though, and found that if I hover over the link long enough (and wiggle the mouse a bit, but that's probably just voodoo), the changed content also shows up. (I'll probably come to be as annoyed by it as you are, but for now it seems useful, when patrolling.) - -sche (discuss) 04:37, 18 October 2011 (UTC)Reply

The red and blue stuff is too large and garish for me. It leads to more scrolling and visual annoyance. Small icons instead of the large words, or just a lesser font size, would be good. Equinox ◑ 11:51, 15 October 2011 (UTC)Reply

How about now? —Ruakh_TALK 13:14, 15 October 2011 (UTC)Reply

That's definitely better for me. Equinox ◑ 13:15, 15 October 2011 (UTC)Reply

Is it possible to allow default deletion summaries? Perhaps by specificing something ine one's javascript? Mglovesfun (talk) 16:54, 15 October 2011 (UTC)Reply

Done. Actually, doubly done. You can set either a default value named GPE.initialDeleteReason that gets put into the input-box initially, but which you can override by clearing out that box, or a default value named GPE.deleteReasonIfBlank that gets used when you click the delete-button if the input-box is blank. Or you can even set both, in which case the latter is used if you explicitly clear out the former. To set them, you would put something in your common.js (or vector.js or whatnot) that looks like
GPE.initialDeleteReason = "I forgot I could specify a deletion reason!";
GPE.deleteReasonIfBlank = "I couldn't think of a deletion reason to enter!";
—Ruakh_TALK 21:42, 15 October 2011 (UTC)Reply

Could we set the site-wide JS so that if any admin leaves the field blank, it uses as the deletion summary a link to --explanation of deletion--? - -sche (discuss) 01:28, 18 October 2011 (UTC)Reply

Yes, that could easily be done, by changing the line
GPE.deleteReasonIfBlank = '';
to
GPE.deleteReasonIfBlank = '[[Wiktionary:Sysop deleted|--explanation of deletion--]]';
(Individual admins could still override it by setting their own defaults.) I'm not sure how I feel about that, though . . . I don't know. What do other people think?
—Ruakh_TALK 02:31, 18 October 2011 (UTC)Reply

Can we "force" a more substantive choice, ie, not allow a default, but require a positive choice by the admin? I'm not in love with that default. DCDuring TALK 02:40, 18 October 2011 (UTC)Reply

You mean, such that the delete-button won't even work unless the admin has entered some sort of deletion summary? Yes; we could replace
if(reason == '')
reason = GPE.deleteReasonIfBlank;
with something like
if(reason == '')
{
alert('Error: a deletion reason must be specified.');
return;
}
But I'm a bit loath to do that, since my goal is to lower the barrier to patrolling . . .
—Ruakh_TALK 03:40, 18 October 2011 (UTC)Reply

Mostly @DCDuring: The software allows admins to give no reason when deleting terms the traditional way... which isn't a convincing argument for or against forcing a choice here. Many bother to choose sysop-deleted, broad as it is, as their reason... which struck me as odd when I realised it (I had thought it was the default if the traditional deletion-reason box was left empty, but no, that default is the page content). It's a fairly comprehensive page, so it does seem a decent default, even if a specific reason would be better. After all, an admin isn't likely to find it convenient or possible use this tool to delete something that's failed RFD or otherwise needs a specific deletion summary; things deleted whle patrolling are most likely to be vandalism, misplaced Wikipedia pages, etc (like sysop-deleted covers).

@Ruakh: this does make patrolling easier for me. Thank you! - -sche (discuss) 04:37, 18 October 2011 (UTC)Reply

Edit summaries communicate in a timely fashion mostly to readers of pages such as their watchlists or Recent changes. That would not include new and casual users. (Of course, the edit summaries are important for reviewing history, but that it is not timely nor likely to be attended to by inexperienced users.

If we could have another channel for contributor-targeted communication, then edit summaries could be explicitly targeted toward experienced users' watchlists and entry history.

Just spitballing, but would it be possible to have a deletion of a recently created page trigger a message, presumably canned, but also including some entry-related specifics. on the creating user's (yes, even an anon's) talk page? Could it be limited to users/IPs with no previous contributions? Such a message might be a good way to try to convert would-be contributors to actually useful contributors. What would be the risks of doing so? Are there other ways of classifying users or contributions to generate appropriate messages? DCDuring TALK 14:57, 18 October 2011 (UTC)Reply

Re: "would it be possible to have a deletion of a recently created page trigger a message […] ?": In the general case, that would require a bot that watches the deletion log, which we're unlikely to have in the near future. (It's quite possible to create such a thing, and several editors here have the technical know-how, but our available programmer-hours of technical expertise are quite limited.) But in the specific case of entries deleted via the patrolling-enhancements Gadget, it would not be very difficult to add that functionality into the Gadget. (It also would not be very difficult to have the functionality controlled by a checkbox, with the message only being sent if the admin so chooses. We could then decide whether or not the checkbox should be checked by default.) —Ruakh_TALK 15:25, 18 October 2011 (UTC)Reply

I was thinking of it in the context of patrolling, which for me means Recent changes and my watchlist with the Gadget. Is it hard? Is it worth the effort? If some those who patrol regularly (I am only occasional) would be willing to use it, that would be evidence of value. DCDuring TALK 18:43, 18 October 2011 (UTC)Reply

I don't know. The thing is, even when an admin is patrolling via Special:NewPages?hidepatrolled=1, it seems unlikely to me that all of his/her deletions will be via the Gadget: in most cases, (s)he won't know that the entry should be deleted until (s)he has actually clicked through to it, and at that point I think (s)he's more likely to use the regular deletion interface — which it would be more difficult/awkward to add this feature to, IMHO — than the Gadget's delete-button. —Ruakh_TALK 00:23, 19 October 2011 (UTC)Reply

My own habit would be to look at an entry, than use back-button functionality to return to the watchlist or Recent changes page, from which I delete or mark as patrolled, so the Delete functionality would be be useful. DCDuring TALK 02:01, 19 October 2011 (UTC)Reply

Straw Poll: each section of our CFI

Latest comment: 13 years ago205 comments18 people in discussion

I apologise if I have chosen a poor format, but (following comments on WT:RFV#Finnair) I propose a straw poll to gauge the community's opinion of each section of CFI. (This is broader than just the necessary changes to BRAND CFI that Dan has a section for, above.)

If you think the section is good pretty much as-is, vote "keep as-is" (or "support").
If you think we should change a section, but still have a section (for example, change our criteria for including brand names, but still have criteria for brand names that are different from our general criteria), vote "change". If you can explain what you would change briefly (not three paragraphs), please do so.
If you think we should remove a section (for example, remove our specific criteria for including brand names, so that only general criteria apply to them), vote "remove" (or "oppose").
If you want to add a section (for example, to handle taxonomic names), add it under its own header in the "Sections to add" section. If your proposed section's text is very long, consider posting it in your userspace and simply putting a link under the header. Sign your section so we know who added it.

This way, we develop a clear idea, all in one place, of which sections are liked as-is, which (if any) a majority of editors would put on the chopping block, and which (if any) a majority would change. I used fake==== for some subsections so this page's TOC wouldn't explode, but I left some real headers so everyone could edit one section at a time and perhaps avoid edit conflicts. There's a section for general discussion at the bottom, if you'd rather comment there that you would "remove sections A and B, and change C, but keep the rest". - -sche (discuss) 19:08, 14 October 2011 (UTC)Reply

PS: Where CFI has a section, followed by text, followed by subsections, I have commented here on the subsections in their own (well) subsections, and my comments on the general section only apply to the text that is not part of any of the subsections. For example, my comments on the section "Attestation" are about the bit "“Attested” means verified [...] include the ISBN." My comments on the subsection "Conveying meaning" are in a subsection for that subsection. - -sche (discuss) 19:08, 14 October 2011 (UTC)Reply

Sentences 1, 2, 3

"As an international dictionary, Wiktionary is intended to include “all words in all languages”. A term should be included if it's likely that someone would run across it and want to know what it means. This in turn leads to the somewhat more formal guideline of including a term if it is attested and idiomatic."

My vote: keep as-is, a good statement of purpose, clarified by subsequent sections. I'm not opposed to changing it, though, if that's what a majority wants. - -sche (discuss) 19:08, 14 October 2011 (UTC)Reply
- I'd like to change it in some way, not sure how. Mglovesfun (talk) 20:17, 14 October 2011 (UTC)Reply
Keep as is, except to suggest that it should guide the drafting of other sections of specific application, not be a substitute for them. DCDuring TALK 20:43, 14 October 2011 (UTC)Reply
Keep as is. I think that you should be able to take any text (in any language), wikify it, and get no red links (I'm still not sure about red links due to capitalization at the beginning of sentences). SemperBlotto 07:15, 15 October 2011 (UTC)Reply
- What happens if one of the words is the name of a small, local shop? Mglovesfun (talk) 09:18, 15 October 2011 (UTC)Reply
Change to add "Broadly speaking," to the beginning of the second sentence. Also, I'm not sure what purpose the first four words serve, or even what they mean: perhaps remove them. Finally, since the next section explains term, much as the following two explain attested and idiomatic, term should be boldfaced here just as the other two are.—msh210℠ (talk) 01:14, 17 October 2011 (UTC)Reply
Sentence 1: Keep as is (per Semperblotto). Sentences 2 and 3: change to something such as A term of a language should be included if it's used in the language and can be considered as belonging to the vocabulary of the language (e.g. it can be useful to a learner of the language to learn it). Use in the language is normally shown through attestations.. A dictionary is not used only when you run across a word and want to know what it means. There are many other uses. Lmaltier 19:47, 18 October 2011 (UTC)Reply
Change in part per msh210 and Lmaltier. Specifically, remove the superfluous "As an international dictionary", which is captured by "all words in all languages", prepend "Broadly speaking" to the second sentence, and by indicating that a word would be part of the vocabulary of a language. However, also keep the notion that we should provide definitions for things that people might run across and wish to learn the meaning. bd2412 T 00:08, 22 October 2011 (UTC)Reply

"Terms" to be broadly interpreted

Keep as-is. I would support changing it if someone showed that to be needed. Comment: "A term need not be limited to a single word in the usual sense. Any of these are also acceptable: [...] multiple-word terms". Hurra, self-referential definition! - -sche (discuss) 19:08, 14 October 2011 (UTC)Reply
Oh, add a dot after proverbs, as after the other sections. - -sche (discuss) 16:11, 17 October 2011 (UTC)Reply
Keep as-is. Wording improvements OK. DCDuring TALK 20:40, 14 October 2011 (UTC)Reply
Keep as-is. Not perfectly written, but not a problem IMHO. —Ruakh_TALK 02:10, 15 October 2011 (UTC)Reply
Minor change to emphasise that a term of multiple words needs attestation whereas a single word just needs to exist in the real world. SemperBlotto 07:17, 15 October 2011 (UTC)Reply
Change to have header "Terms". See my rationale in the preceding section.—msh210℠ (talk) 01:14, 17 October 2011 (UTC)Reply
Change to have section heading "Termhood"; if that is disliked, the section heading "Terms" is okay with me. A minor rephrasing of the text of the section is in order, too. --Dan Polansky 08:59, 17 October 2011 (UTC)Reply
Keep as-is. More or less. It could stand to have infixes along with prefixes and suffixes. bd2412 T 00:25, 22 October 2011 (UTC)Reply

Attestation

Keep as-is for now, continue to change as necessary (there have been several successful and unsuccessful votes to change this section). Perhaps refine the paragraph which follows the list, and which could be argued to be more explanation and discussion than criteria. - -sche (discuss) 19:08, 14 October 2011 (UTC). I support Ungoliant and CodeCat's proposal to add a definition/clarification of the term "extinct". - -sche (discuss) 04:04, 17 October 2011 (UTC)Reply
- Clarity is good. Mglovesfun (talk) 09:02, 17 October 2011 (UTC)Reply
Keep as-is basically. DCDuring TALK 20:45, 14 October 2011 (UTC)Reply
Change 4th. It would be nice an exact definition of "extinct" in this sense. Also, does a transliterated form count as a contemporary source? To be honest I'd like this criterion removed, but since there was a vote for it there is nothing I can do :-( Ungoliant MMDCCLXIV 22:02, 14 October 2011 (UTC)Reply
Keep concept. I'm not sure about all of the details, though. And we're really misusing the term "attested", a fault which we compound by structuring the section as a definition of the term! —Ruakh_TALK 02:24, 15 October 2011 (UTC)Reply
- Also — due in part to the way the MediaWiki presents header levels, it's not immediately obvious that there are three subsections clarifying various aspects of list-item #3. So a bit of re-formatting might be in order. —Ruakh_TALK 15:46, 17 October 2011 (UTC)Reply
Keep as-is - I'm reasonable happy with this (and related) section(s). SemperBlotto 09:54, 15 October 2011 (UTC)Reply
Change to explain what extinct means and whether transliterations for the purposes of study count as attestations. —CodeCa t 10:12, 15 October 2011 (UTC)Reply
Change. It currently emphasizes Usenet over books, which is terrible. It says we don't quote WMF sites, which it false: we don't count on them for attestation, but we have no problem quoting them. It says we allow recorded audio and video without mentioning spelling issues. It mentions ISBN but not ISSN or DOI, and should mention all or none.—msh210℠ (talk) 01:14, 17 October 2011 (UTC)Reply
At least we commented out "". I spotted that in the raw text — we should actually remove it. - -sche (discuss) 16:19, 17 October 2011 (UTC)Reply
Change, very much so. Remove "1. Clearly widespread use"; remove "2. Usage in a well-known work"; rewrite the paragraph after the bullet lits. --Dan Polansky 08:57, 17 October 2011 (UTC)Reply
Change, Remove "1. Clearly widespread use": rare words are welcome, provided that we can make sure that they clearly exist. Add the case of words formed in a systematic way, such as -able or -like adjectives (considered by my Pocket Oxford Dictionary as always existing): in this case, one attestation should be sufficient, to prove that tehir existence is not only virtual. * Remove spaning at least one year: this would be the main added value for an Internet dictionary to define words as soon as they appear, when readers are most likely to want to look for their sense! This kind of restriction is understandable only for paper dictionaries. Lmaltier 20:02, 18 October 2011 (UTC) Lmaltier 19:54, 18 October 2011 (UTC)Reply
Just so we're on the same page: right now, words only need to meet one of the criteria, not all of them. Rare words are welcome, and aren't excluded by the fact that we accept words which are "clearly in widespread use": rare words just have to meet one of the other criteria, since they don't meet that criterion of clearly widespread use. awkwardnessful, for example, has been used in ≥3 books and Usenet posts, so we keep it. - -sche (discuss) 20:40, 18 October 2011 (UTC)Reply
Yes, but this seems to be useless and misleading, as we sometimes read this should be deleted, because it's not in widespead use. Note that the -ful suffix is not one with systematic applicability, unlike -like. Lmaltier 21:04, 18 October 2011 (UTC)Reply

I see no need to give special importance to nonce words systematically formed, especially as having a definition for them at all is less important (as can be seen by traditional dictionaries that just have a list of words formed with un-, as no definition was necessary.) Some time limit seems useful for keeping words without serious use out of Wiktionary; it also makes making up a word and adding it (via three Usenet accounts, for instance) much harder. If you're using a word that young seriously, you should provide a definition somewhere near.--Prosfilaes 20:43, 18 October 2011 (UTC)Reply
I limit my proposal to a very small number of cases, the -like adjectives being typical. But we should not require 3 attestations when the POD does not require any attestation to consider that the word exists in English...

It may happen that a word is very widely used worldwide just after being introduced. It should not be excluded. However, when only a few uses can be found, and there is a doubt about the authenticity of these uses, it should be excluded. Lmaltier 21:04, 18 October 2011 (UTC)Reply
Also change the permanently recorded media bit: note that all Internet pages can be recorded permanently (archived) by the software when we want to. But it's difficult to accept purely oral (and not recorded) attestations... This should be clarified. Lmaltier 21:09, 18 October 2011 (UTC)Reply

Conveying meaning

Keep as-is, or refine (like Attestation). - -sche (discuss) 19:08, 14 October 2011 (UTC)Reply
Keep as-is basically. DCDuring TALK 20:45, 14 October 2011 (UTC)Reply
Keep as-is, I think. I'm not sure that the use-mention distinction is exactly the relevant criterion here, because there are some cases that seem to be to be technically "uses", but that we exclude as though they were mentions. For example, the section says that we exclude "made-up examples of how a word might be used", but don't those made-up examples actually use the word, rather than merely mentioning it? But I think we all have a general sense of what this is supposed to mean, and it doesn't seem worth getting hung up about what we call it. (I doubt we could get consensus for any specific clarification of it, anyway.) —Ruakh_TALK 02:24, 15 October 2011 (UTC)Reply
Weak "keep as is", per Ruakh.—msh210℠ (talk) 01:14, 17 October 2011 (UTC)Reply
Change, a bit. Remove the link to Wikipedia. Rewrite the body text. --Dan Polansky 08:57, 17 October 2011 (UTC)Reply
Keep as-is Lmaltier 19:58, 18 October 2011 (UTC)Reply
I'd like to see three durably archived citations rules state that the citations must be on the entry or the entry's citation page. Mglovesfun (talk) 11:22, 21 October 2011 (UTC)Reply

Independence

Change. As has been said in RFV, we only partially, unclearly define "independent". - -sche (discuss) 19:08, 14 October 2011 (UTC)Reply
Completely rewrite. I started a beer-parlour discussion about it back in February — see Wiktionary:Beer parlour archive/2011/February#Independence. — and feedback was mostly positive (in that people mostly agreed with me about what it should say), but I let it drop without ever proposing a specific wording. —Ruakh_TALK 02:03, 15 October 2011 (UTC)Reply
Change, but I have no idea how. See my Gabai comment in the old discussion Ruakh links to.—msh210℠ (talk) 01:14, 17 October 2011 (UTC)Reply
Change, along the lines of Ruakh's proposal. --Dan Polansky 08:57, 17 October 2011 (UTC)Reply
Change for clarity. Lmaltier 20:02, 18 October 2011 (UTC)Reply
We must define independent. Mglovesfun (talk) 11:22, 21 October 2011 (UTC)Reply
Change to also require authorship by separate authors, to avoid a single author's effort to spam their neologism into the vocabulary. bd2412 T 00:32, 22 October 2011 (UTC)Reply

Spanning at least a year

Keep as-is, or refine by removing the last sentence, which is a comment, not a criterion. - -sche (discuss) 19:08, 14 October 2011 (UTC)Reply
Keep as-is basically. DCDuring TALK 20:45, 14 October 2011 (UTC)Reply
Keep as-is. —Ruakh_TALK 02:24, 15 October 2011 (UTC)Reply
Keep as is. Arbitrary, but I haven't seen any problem with it AFAIR. And the CFI note that it's arbitrary, which is nice.—msh210℠ (talk) 01:14, 17 October 2011 (UTC)Reply
Remove. This does not define or clarify anything; it merely comments on a phrase without providing any detail to it. The phrase "spanning at least a year" is clear enough on its own, and whatever ambiguity remains in it, that ambiguity is not removed by the current phrasing of the section. --Dan Polansky 08:57, 17 October 2011 (UTC)Reply
Remove. This would be the main added value for an Internet dictionary to define words as soon as they appear, when readers are most likely to want to look for their sense! This kind of restriction is understandable only for paper dictionaries. Lmaltier 19:58, 18 October 2011 (UTC)Reply
Not too bothered; I tend to agree with Lmaltier that we don't want to exclude recent coinages when they are abundantly attested, but all in less than 12 months. Mglovesfun (talk) 11:22, 21 October 2011 (UTC)Reply
Change to accommodate newer coinages with widespread use among a substantial number of independent users. bd2412 T 01:24, 23 October 2011 (UTC)Reply

Idiomaticity

Change. This section is messy. Its passing mention of the Phrasebook should be a separate section, establishing the Phrasebook with clear purpose and different CFI (especially with regard to idiomaticity). - -sche (discuss) 19:08, 14 October 2011 (UTC)Reply
Change and also add some information about how to add single-word terms that are not idiomatic. This may seem strange for English, but in Finnish there are suffixes like (deprecated template usage) -kin that can be added to almost any word, and this would be considered idiomatic in Finnish. Similar cases would also apply for unusually long compounds in German like the name of that law, or the name of that very long protein. —CodeCa t 22:28, 14 October 2011 (UTC)Reply
Change. Mglovesfun (talk) 08:56, 15 October 2011 (UTC)Reply
- Basically start from scratch and build up. Mglovesfun (talk) 09:19, 17 October 2011 (UTC)Reply
Change. Current practice is to allow megastar in English no matter what it means, and the section should reflect that if that's what the community thinks appropriate; but the community has agreed certain Finnish and Hebrew single-word terms are not idiomatic, and the sections should reflect that, too. Or it should be broader. Also, the whole section is too wordy without being precise enough. And of course we need phrasebook criteria, but that's an issue under debate.—msh210℠ (talk) 01:14, 17 October 2011 (UTC)Reply
Change. Remove the "megastar" paragraph. Probably remove the paragraph that starts with "This rule must be applied carefully and is somewhat subjective, ". Further rewrite seems to be in order. --Dan Polansky 09:11, 17 October 2011 (UTC)Reply
Change. —Ruakh_TALK 14:57, 17 October 2011 (UTC)Reply
Change. In particular I'd like to see a considerably more inclusive first sentence: ‘A term is considered idiomatic when it is particularly characteristic of a given language, especially when it shows unusual grammar or when its meaning is not obvious from its component parts.’ Ƿidsiþ 15:19, 17 October 2011 (UTC)Reply
Change. (and remove under this title). Idiomacity should not be a requirement. But belonging to the vocabulary of the language is a requirement. It's often the same, but not always (e.g. Atlantic salmon is not idiomatic as defined here, but belongs to the vocaulary of English nonetheless). Lmaltier 20:07, 18 October 2011 (UTC)Reply
Change per Lmaltier. Idiomacity should not be a requirement. If terms are consistently used as a phrase, such as garbage bag, then the ability to deduce the meaning from the component parts should be irrelevant. bd2412 T 00:42, 22 October 2011 (UTC)Reply

Spellings

Keep as-is or clean up. - -sche (discuss) 19:08, 14 October 2011 (UTC)Reply
Keep as is. Accurately depicts the difficulty of figuring out what the heck is a misspelling.—msh210℠ (talk) 01:14, 17 October 2011 (UTC)Reply
Change. A rewrite seems to be in order. To begin with, remove "It is important to remember that " from the third paragraph. --Dan Polansky 09:11, 17 October 2011 (UTC)Reply
Change. As msh210 says, it expresses well the difficulty of identifying misspellings; but it doesn't express well the relevance of doing so! If I were a newbie reading this section, wanting to know whether Wiktionary includes misspellings, I would not find a straight answer. (Overall, I think I would infer that a misspelling should be included if it's "important", and that importance correlates with, but is distinct from, frequency.) —Ruakh_TALK 14:57, 17 October 2011 (UTC)Reply
- Hm, good point.—msh210℠ (talk) 16:53, 17 October 2011 (UTC)Reply
Change. The existence of an academy should not change anything (NPOV), except that the position of this academy may be included for information. Lmaltier 20:11, 18 October 2011 (UTC)Reply

Formatting

Change: "page the" to "page, the", and add a link to ELE(?); possibly remove the subsection header or remove the subsection entirely. I do not feel strongly about this; I would not mind keeping it as-is. - -sche (discuss) 19:08, 14 October 2011 (UTC)Reply
If kept, then, yes, it needs a comma, per -sche. Also, it should probably be merged into the next section, which see my comments on.—msh210℠ (talk) 01:14, 17 October 2011 (UTC)Reply
Remove; does not belong to criteria for inclusion. If kept, at least remove the sentence "Once it is decided that a misspelling is of sufficient importance to merit its own page the formatting of such a page should not be particularly problematical." --Dan Polansky 09:11, 17 October 2011 (UTC)Reply
Change per -sche. I don't think we should just remove the section outright, because even when we include a misspelled word, we don't really include it as a word: we include it as a spelling, and the entry is geared toward pointing readers at the correct spelling. —Ruakh_TALK 14:57, 17 October 2011 (UTC)Reply

Inflections

Keep as-is. - -sche (discuss) 19:08, 14 October 2011 (UTC)Reply
Should possibly be moved to ELE. Otherwise, explain what about multiword idioms: i.e., say "see the section 'Idiomatic phrase', below" or the like.—msh210℠ (talk) 01:14, 17 October 2011 (UTC)Reply
Keep as-is, I guess, but I am open to proposals for rephrasing and improvements. --Dan Polansky 09:11, 17 October 2011 (UTC)Reply
Keep as-is. —Ruakh_TALK 14:57, 17 October 2011 (UTC)Reply

Idiomatic phrases: Pronouns, Articles, Verbs

Keep as-is. - -sche (discuss) 19:08, 14 October 2011 (UTC)Reply
Keep as-is. SemperBlotto 10:12, 15 October 2011 (UTC)Reply
Change. We generally use one's in the page title of reflexive verb phrases and someone's in transitive ones. This section should reflect BCP.—msh210℠ (talk) 01:14, 17 October 2011 (UTC)Reply

Proverbs

Keep as-is. - -sche (discuss) 19:08, 14 October 2011 (UTC)Reply
Keep as-is. SemperBlotto 10:12, 15 October 2011 (UTC)Reply
Keep as is. If there's consensus, add CFI: what proverbs are inclusible?—msh210℠ (talk) 01:14, 17 October 2011 (UTC)Reply

Languages to include: Natural languages

Change. I would keep the first sentence; the second sentence is more explanation than criteria, and is unclear: "a proposed language is considered a living language, or a dialect of or alternate name for another language" — I would at least remove "living" (surely there are debates over whether dead tongues were languages or dialects). - -sche (discuss) 19:08, 14 October 2011 (UTC)Reply
Change. Should give some parameter as to what is a language and what is a dialect (ISO codes?), and make it clear that dialectal forms/pronunciations are also allowed (because some people might think "dialect" means "non-standard"). Ungoliant MMDCCLXIV 22:02, 14 October 2011 (UTC)Reply
Change to reflect (or link to) agreed-upon rules of what dialects to count as languages and what not.—msh210℠ (talk) 01:14, 17 October 2011 (UTC)Reply
Change. All languages with an ISO code, or a Wikimedia code, or meeting some criteria (to be discussed) should be accepted automatically. Other languages should be accepted only after discussion. This is what we have decided for fr.wikt, and this is a good thing (no need for too many polemics). Lmaltier 20:16, 18 October 2011 (UTC)Reply
- Clarity is always good. Mglovesfun (talk) 11:15, 21 October 2011 (UTC)Reply

Sign languages

Keep as-is. - -sche (discuss) 19:08, 14 October 2011 (UTC)Reply
Keep as is.—msh210℠ (talk) 01:14, 17 October 2011 (UTC)Reply

Constructed languages

Keep as-is, or clean up. - -sche (discuss) 19:08, 14 October 2011 (UTC)Reply
Change, from the 2nd criterion (unapproved languages) add Brithenig and Láadan to the 4th (restricted to some literary works) criterion, and approve the rest (because they are languages intended for general use. The other criteria should deal with whether or not they are actually used). Ungoliant MMDCCLXIV 22:02, 14 October 2011 (UTC)Reply
- That would need a vote. I am all for approving LFN, but I have serious doubts about the other languages. -- Liliana • 16:48, 15 October 2011 (UTC)Reply
change delete the last sentence, because language names, like all other words, can be entered if they meet the CFI, making the page contradict itself. -- Liliana • 10:34, 15 October 2011 (UTC)Reply
The "Some" paragraph really belongs in a more general discussion of what's inclusible rather than here. And the "Even" paragraph should probably be omitted as irrelevant here. Otherwise, keep as is, except perhaps for some specific languages' CFI, as mentioned by Ungoliant, which I don't know about.—msh210℠ (talk) 01:14, 17 October 2011 (UTC)Reply

Reconstructed languages

Keep as-is. - -sche (discuss) 19:08, 14 October 2011 (UTC)Reply
Change. Just a little rewording. If I was reading that for the first time I'd interpret it as hostility against reconstructed words. Ungoliant MMDCCLXIV 22:02, 14 October 2011 (UTC)Reply
Keep as is.—msh210℠ (talk) 01:14, 17 October 2011 (UTC)Reply

Exclusions: Vandalism

Remove. This isn't a criteria for inclusion. Vandalism is vandalism because it does not meet the other criteria for inclusion: the new entry is not an attestable word, or the addition of "rxjgfrr" as the Russian word for "hair" is clearly wrong, etc. - -sche (discuss) 19:08, 14 October 2011 (UTC)Reply
- Sure, remove. Mglovesfun (talk) 20:16, 14 October 2011 (UTC)Reply
Keep as is. It doesn't hurt to have another reminder against vandalism. Ungoliant MMDCCLXIV 22:02, 14 October 2011 (UTC)Reply
I'm happy for this to stay or go - whatever the consensus is. SemperBlotto 10:18, 15 October 2011 (UTC)Reply
- Perhaps if we keep this, we should also say that we remove information which is wrong. I mean... Mglovesfun (talk) 10:22, 15 October 2011 (UTC)Reply
Remove. This does not regulate the inclusion of a term or a sense of a term. Vandalism, which includes replacing the content of a page with "eerwerjhewkrkew" and other sorts of edits, gets removed or reverted without reference to CFI. --Dan Polansky 11:00, 15 October 2011 (UTC)Reply
Remove per -sche.—msh210℠ (talk) 01:14, 17 October 2011 (UTC)Reply
Remove per -sche. But keep somewhere. Lmaltier 20:26, 18 October 2011 (UTC)Reply

Protologisms

Remove, as this isn't a criteria for inclusion, or move the link to WT:LOP to the ===See also=== section with a short note like "for words that do not meet CFI". - -sche (discuss) 19:08, 14 October 2011 (UTC)Reply
- Sure, remove. Mglovesfun (talk) 20:16, 14 October 2011 (UTC)Reply
As previous. SemperBlotto 10:18, 15 October 2011 (UTC)Reply
Remove. Protologisms get excluded as unattested, so the attestation section already handles this. The other way around, if something is attested, then it is not a protologism. --Dan Polansky 11:00, 15 October 2011 (UTC)Reply
This should be mentioned somewhere, but is not an exclusion, so should not be under "Exclusions". Perhaps a mention under "Attestation".—msh210℠ (talk) 01:14, 17 October 2011 (UTC)Reply
- Dan's hit the nail on the head. Mglovesfun (talk) 09:20, 17 October 2011 (UTC)Reply
Remove Lmaltier 20:26, 18 October 2011 (UTC)Reply

Fictional universes

Keep as-is. I would support changing it if someone showed that to be needed. - -sche (discuss) 19:08, 14 October 2011 (UTC)Reply
Keep as is.—msh210℠ (talk) 01:14, 17 October 2011 (UTC)Reply
Remove (or this would lead to the removal of e.g. Minotaur or nixie). There s no reason to exclude these words, if normal criteria are met. Lmaltier 20:26, 18 October 2011 (UTC)Reply
No, this sections doesn't lead to the exclusion of "nixie" and "Minotaur", because these creatures are used in context of religion and/or mythology, which is different from fiction.

--Daniel 21:14, 18 October 2011 (UTC)Reply
I think that the Odyssey, etc. were considered by their authors as fiction. And it would be absurd to reject words such as anglerfishlike because they are used only in a novel and by its fans. I really feel that this rule is specially designed against Harry Potter words. But these words are part of the language, even if some people don't like them. Lmaltier 20:04, 20 October 2011 (UTC)Reply
Improve if possible, because the policy does not reflect the opinions of people. It is often (very often) ignored in favor of deletionist gut feelings. --Daniel 21:14, 18 October 2011 (UTC)Reply

Wiktionary is not an encyclopedia

I have no strong opinion on this section; lean keep as-is. (Why is "the successor of Saul" allowed a sense-line at David under this section, as it stands?) - -sche (discuss) 19:08, 14 October 2011 (UTC)Reply
Change - I would be happy with short encyclopaedic content if it helps to explain the meaning of a term. SemperBlotto 10:18, 15 October 2011 (UTC)Reply
delete as is. This is better handled by specific criteria for specific types of terms, and long contradicts common practice at Wiktionary (note how Houdini is listed as an example of what not to include, yet this very sense passed RFD!) -- Liliana • 10:41, 15 October 2011 (UTC)Reply
Remove; if not that, remove the Houdini paragraph. --Dan Polansky 11:00, 15 October 2011 (UTC)Reply
Merge whatever of this is not already in the "Names of specific entities" section thereinto and remove this.—msh210℠ (talk) 01:14, 17 October 2011 (UTC)Reply
Keep, somewhere, but probably not here. Note that this explains that Houdini may be included, but not as a page about the escapologist, only as a page about his name, about the word. I add that even a very long mathematical definition such as the mathematical definition of a vectorial space is encyclopedic, bt should be kept nontheless, as this is the definition. I suggest to add that the definition may be considered as the intersection between an encyclopedia page and a language dictionary page with the same title. Lmaltier 20:51, 18 October 2011 (UTC)Reply

Language-specific issues

Keep as-is. - -sche (discuss) 19:08, 14 October 2011 (UTC)Reply
Keep as-is. SemperBlotto 10:18, 15 October 2011 (UTC)Reply
Keep as is, I guess, though I'm uneasy about the CFI's including by reference unvoted-on pages that regulars really don't patrol.—msh210℠ (talk) 01:14, 17 October 2011 (UTC)Reply
- Perhaps it should mention that they don't necessarily have community support.—msh210℠ (talk) 01:22, 17 October 2011 (UTC)Reply

Names

I have no strong opinion on this section; lean keep as-is. - -sche (discuss) 19:08, 14 October 2011 (UTC)Reply
Change (this and related sections) to emphasize that single-word entries are acceptable, but multi-word ones (e.g. "Greater Manchester") need attestation. SemperBlotto 10:22, 15 October 2011 (UTC)Reply

Company names

I have no strong opinion on this section; lean keep as-is. Mglovesfun and Ruakh have pointed out that it basically says "company names shall not be included", which I tend to support. - -sche (discuss) 19:08, 14 October 2011 (UTC)Reply
- But it contradicts all words in all languages when the company name is a word. Mglovesfun (talk) 08:55, 15 October 2011 (UTC)Reply
  - Well, to be clear, Ruakh's full comment was "it's saying that if a company name is also a family name, then that's included; and if a company name is also a common word, then that's included" (but the company name is not included as such). - -sche (discuss) 18:43, 15 October 2011 (UTC)Reply
Remove. Keep most attestable single-word company names, at least for their pronunciation and etymology. Let company names be regulated by the section on the names of specific entities. --Dan Polansky 11:00, 15 October 2011 (UTC)Reply
See my comment on the next section.—msh210℠ (talk) 01:14, 17 October 2011 (UTC)Reply
Change They should be kept when they are words. But there could be more stringent criteria (at least x attestations not originating from the company) to prevent abuse (these words can be created at will, with a legal status). Lmaltier 20:46, 18 October 2011 (UTC)Reply

Brand names

In-depth discussion of this section: Wiktionary:Beer_parlour#Brand_names_and_physical_products.

Change. Per many RFV discussions, "a physical product" should be changed to either "a product" (if it is meant to include all products), or something like "a tangible/three-dimensional product" (if it is meant to exclude non-tangible products). - -sche (discuss) 19:08, 14 October 2011 (UTC)Reply
Change to include all commercial names, advertising, and political slogans. DCDuring TALK 20:52, 14 October 2011 (UTC)Reply
Change to include commercial names and advertising, e.g. Internet service providers and banks as well as tangible items, and commercial creations like toy brands and cartoon characters. I'm not so sure about political slogans; they are not brands per se and I imagine most of them would fail CFI for other reasons. Equinox ◑ 20:59, 14 October 2011 (UTC)Reply
Remove; keep all single-word attested brand names of pharmaceuticals, at least. --Dan Polansky 11:00, 15 October 2011 (UTC)Reply
I personally think it should be changed (and tweaked) to include company and organization names, names of (non-brand) computer programs, titles of books of the Bible, and other things, and to clearly include brand names even not of "physical" products. But I'm not sure that has consensus. In any event, it should be changed to reflect consensus, if possible.—msh210℠ (talk) 01:14, 17 October 2011 (UTC)Reply
Change They should be kept when they are words. But there could be more stringent criteria (at least x independent attestations not originating from the company) to prevent abuse (these words can be created at will, with a legal status). Lmaltier 20:46, 18 October 2011 (UTC)Reply

Given and family names

Keep as-is, or change by settling the status of patronymics (and, I presume, matronymics). - -sche (discuss) 19:08, 14 October 2011 (UTC)Reply
What's the issue with patronymics? Has it truly not been settled? Aside from that, keep as is.—msh210℠ (talk) 01:14, 17 October 2011 (UTC)Reply
Patronymics are words, right? And they're not spammy or copyrighted, so just remove the bit about them not being agreed on. Allow 'em. Mglovesfun (talk) 09:02, 17 October 2011 (UTC)Reply

Genealogic content

Keep as-is. - -sche (discuss) 19:08, 14 October 2011 (UTC)Reply
Seems to duplicate the "not an encyclopedia" section.—msh210℠ (talk) 01:14, 17 October 2011 (UTC)Reply

Names of specific entities

Change. The section admits that it is incomplete. ("Among those that do meet that requirement, many should be excluded while some should be included, but there is no agreement on precise, all-encompassing rules for deciding which are which.") We should complete it. - -sche (discuss) 19:08, 14 October 2011 (UTC)Reply
Change with largely exclusionary intent, possibly allowing for phased inclusion of types, (eg, in "populated places": countries, then provinces/states, then cities with greater than 100K population). DCDuring TALK 20:52, 14 October 2011 (UTC)Reply
Not my personal cup of tea, but seems, in its ambiguity, to reflect whatever consensus exists. Keep as is for now (viz, until consensus develops further one way or another).—msh210℠ (talk) 01:14, 17 October 2011 (UTC)Reply
Change Should be accepted when they are words making possible the creation of a page with useful linguistic contents. e.g. Confucius is OK, not Winston Churchill. More generally, belonging to the vocabulary of the language and allowing a linguistic description should be the main criteria. Lmaltier 20:46, 18 October 2011 (UTC)Reply

Issues to consider: Attestation vs. the slippery slope

In-depth discussion of this section: Wiktionary:Beer_parlour#Attestation_vs._the_slippery_slope.

Remove. This section is more discussion than criteria. - -sche (discuss) 19:08, 14 October 2011 (UTC)Reply
Remove but it's useful on a separate page that discusses the rules more in-depth, and in a way that it can be edited without a vote. Other points could be discussed there too. —CodeCa t 22:23, 14 October 2011 (UTC)Reply
Remove; off-topic material. Mglovesfun (talk) 08:54, 15 October 2011 (UTC)Reply
Remove; needless and misleading by its reference to "common use" and "general use". --Dan Polansky 11:00, 15 October 2011 (UTC)Reply
Change "there's no need for an entry" to "an entry would not be appropriate" or the like.—msh210℠ (talk) 01:14, 17 October 2011 (UTC)Reply

Sections to add

Translingual entries

Note: there is Wiktionary:About Translingual, but it is not formal policy.

I propose that we develop criteria for including translingual or might-be-translingual entries such as taxonomic names and Latin phrases such as caveat emptor: specifically, we should have criteria for determining which language(s) to consider them: Latin? Translingual? English? German? - -sche (discuss) 19:08, 14 October 2011 (UTC)Reply
- That is kinda covered by WT:AMUL, isn't it? -- Liliana • 19:51, 14 October 2011 (UTC)Reply
  - Kinda. That page clarifies that taxonomic names are translingual, but it isn't a formal policy. Also, it isn't easy to find. - -sche (discuss) 20:24, 14 October 2011 (UTC)Reply
Change to clarify that a phrase from a language does not become translingual unless it assumes a meaning inconsistent with its meaning in that source language, eg, two-part species names are Latin. DCDuring TALK 21:02, 14 October 2011 (UTC)Reply
- I do remember a discussion about considering pizza a Translingual entry. This suggests we need a definition of "Translingual" to exclude that. -- Liliana • 16:46, 15 October 2011 (UTC)Reply
There was a discussion here too: "What language is smithii?" (not involving pizza). I don't have a very strong opinion on which headings should be used when, but I would like a decision to be made official. It's worth noting that there is almost always a specific translingual scientific meaning which is different to the original Latin. E.g. carex is Latin for reed, but in translingual scientific vocabulary it refers only to a particular genus of reeds. Should that meaning be under translingual or Latin? I think it should probably be under "Translingual". (For Carex it's currently under "English"). Regardless, it would be great to have some decision for official policy. Pengo 11:59, 20 October 2011 (UTC)Reply
- carex is Latin, Carex is translingual (but using other language headers as well should be allowed, to provide the English, French,etc. pronunciations, the gender in each language, and examples of use in the language). About smithii (alone), it does not mean anything in international conventions (they only require that it should follow the rules of Latin grammar), and I think that it can be considered as Latin. Lmaltier 19:52, 20 October 2011 (UTC)Reply
  - I agree with your first sentence (carex/Carex). But not sure what you mean by smithii not meaning anything. In all instances where it's used in taxonomy it means "Smith's" (named for or by). Pengo 23:18, 21 October 2011 (UTC)Reply
    - I mean that, in international conventions, binominal species names have a very precise meaning, but conventions don't define any meaning for smithii used alone. Lmaltier 20:07, 24 October 2011 (UTC)Reply
  - carex is English too, hence the Anglicised plural carexes. A genus name is often adopted into English in lower case to refer to specific instances of the genus. "Nice wellingtonias in your garden." Equinox ◑ 23:25, 21 October 2011 (UTC)Reply

- - Classifying Chinese characters as translingual (they are, well, Chinese, even if shared by four languages and Chinese dialects/topolects) is wrong but I don't see an easy fix here. The issue also exists in assigning all individual characters to a part of speech. The original meaning from Classical Chinese may be lost, it may be a pure phonetic with no separate meaning or never used separately. We also have a large amount of radicals, not words. The meaning, usage and readings will differ across CJKV languages. --Anatoli 00:29, 31 October 2011 (UTC)Reply

Phrasebook

Note: there is Wiktionary:Phrasebook, but it is not formal policy.

Per my comments above about the section "Idiomaticity", I think we should have formal Phrasebook criteria. We might have them on a separate page and only link to that page from a section on the main CFI page. - -sche (discuss) 19:08, 14 October 2011 (UTC)Reply
Delete There needs to be a project with active participants and serious intent. There is no evidence of such interest. DCDuring TALK 21:02, 14 October 2011 (UTC)Reply
Delete In its current state it's a shame for Wiktionary, and I doubt it has any chance of improving fast enough. Maybe reopen in half a decade or so. Ungoliant MMDCCLXIV 22:02, 14 October 2011 (UTC)Reply
Keep as long as we can find proper criteria. —CodeCa t 22:35, 14 October 2011 (UTC)Reply
Keep in some form, deleting this section entirely will simply mean that the phrasebook will have no rules. Mglovesfun (talk) 08:52, 15 October 2011 (UTC)Reply
delete Similar to a company which is running losses monthly, we need to concentrate on our core topic, which is building a dictionary. The phrasebook can come back later once there is interest. -- Liliana • 10:47, 15 October 2011 (UTC)Reply
You assume that Wiktionary is a single company that can focus on only one project at a time. But there are many Wiktionary users who can do many different things at a time. If they want to help, let them help in whatever way they feel is best, as long as it is an improvement. I would agree with you if I felt that a proper phrasebook would not improve Wiktionary, but I think it would. I don't think it's really our job to tell users what to focus on by banning everything else. —CodeCa t 11:03, 15 October 2011 (UTC)Reply
We should probably consider not just the phrasebook, but also Category:English non-idiomatic translation targets, since those entries are also not idiomatic but kept for the sake of translations. Perhaps a sensible rule would be to just disallow definitions altogether for English non-idiomatic terms, because definitions like 'Indicates that the spaker is hungry' are silly. The whole idea of non-idiomaticity is that no definition should be needed. —CodeCa t 12:13, 17 October 2011 (UTC)Reply
Change This should be an actual phrasebook (same principles as the thesaurus), with pages such as At the restaurant (French). This is what would be useful to readers. Lmaltier 20:37, 18 October 2011 (UTC)Reply

Change Phrasebook should have its own CFI. Phrasebook can make Wiktionary better but it must be a good phrasebook. Also support considering Category:English non-idiomatic translation targets for collocations that may or may not meet CFI for words. --Anatoli 11:29, 21 October 2011 (UTC)Reply

Placenames

I will also propose that we consider (restoring) some CFI of placenames. - -sche (discuss) 19:08, 14 October 2011 (UTC)Reply
- Ha ha. Good luck getting any sort of consensus on that! -- Liliana • 19:55, 14 October 2011 (UTC)Reply
Which ones? I disagree anyway. Placenames are words; they have pronunciations, they have translations (some quite unexpected: Aachen/Aquisgrão, Florence/Firence), and they have etymologies. The etymology of placenames is very important because they often come from a rare language substrate. (Etruscan placenames in Italy, Gothic placenames in Portugal, etc). Ungoliant MMDCCLXIV 22:02, 14 October 2011 (UTC)Reply
I would like to allow place names as they are words in the usual sense. The etymologies of place names are often hard to find and this would definitely be useful information. But because it's hard to find proper criteria, I propose allowing them only in a separate namespace. —CodeCa t 22:38, 14 October 2011 (UTC)Reply
Allow placenames when they are considered as words (e.g. New York or Red Sea, for above reasons, and to meet the basic principles (all words). But I think that Excelsior Hotel or rue Victor-Hugo are placenames never considered as words... Lmaltier 20:32, 18 October 2011 (UTC)Reply
Allow names of cities and towns, regions, and geographic forms. I would draw the line at municipal subdivisions and city streets unless they have some additional significance (i.e. Pennsylvania Avenue but not nearby Delaware Avenue). bd2412 T 15:01, 22 October 2011 (UTC)Reply
Place names are already allowed to some extent. I won't look for the policy right now but we had a vote. --Anatoli 09:29, 24 October 2011 (UTC)Reply

Discussion

I have a question. Can anyone vote? Ungoliant MMDCCLXIV 20:09, 14 October 2011 (UTC)Reply

This isn't a formal vote, so go ahead. -- Liliana • 20:10, 14 October 2011 (UTC)Reply

Right! Everyone should give input. :) - -sche (discuss) 20:12, 14 October 2011 (UTC)Reply

Usually any registered user can vote. DCDuring TALK 21:04, 14 October 2011 (UTC)Reply

Thanks to User:-sche for this. DCDuring TALK 21:04, 14 October 2011 (UTC)Reply

I'd like a more holistic approach, though I know that's not easy at all. The document shouldn't contradict itself and should be clear. It should define any potentially ambiguous terms, for example, what is a 'word', what is a 'language'? Example of contradictions are "all word in all languages" can contradict the rules on fictional universes, the rules on brand names and the rules on company names. I find such contradictions are a natural product of wikis, where one editor edits one part of the page, another editor edits another part independently. Mglovesfun (talk) 16:35, 15 October 2011 (UTC)Reply

I like what User:DCDuring wrote on his user page about that. -- Liliana • 16:44, 15 October 2011 (UTC)Reply

I'm flattered. It is just cautionary, though. DCDuring TALK 18:05, 15 October 2011 (UTC)Reply

In legal drafting, there are usually clauses beginning with (deprecated template usage) notwithstanding that indicate that a given clause is to be read as superseding the ones mentioned in the "notwithstanding" clause. There are also standard rules of construction for interpreting apparent contradictions in the absence of their explicit resolution. Obviously, it is best to be as explicit as possible about conflicts that are noted at the time of drafting and to attempt to identify as many of them as possible at that time. For example, in our case, attestation seems to override other considerations in that an absence of attestation (at least for lemmas) is deemed to be fatal to includability. DCDuring TALK 18:05, 15 October 2011 (UTC)Reply

I concur.—msh210℠ (talk) 01:30, 17 October 2011 (UTC)Reply

This poll remains open, but I'll comment on the results so far:

Some of us want to tweak Sentences 1, 2, 3 in an exclusive direction, some want to change them in an inclusive direction, but at least half of us can live with them as-is, which is what's likely to happen, in the absence of a majority for any particular change. I plan to roll msh210's format tweaks (bold terms, change the header) and some missing commas into one "cleanup" vote, which may also include clarifying "extinct" in the section on "Attestation". About half of us would keep "Attestation" as-is, only tweaking it to clarify "extinct", and it seems likely we will keep it as-is, in the absence (again) of a majority for particular change: Ungoliant would remove criterion 4, Dan would remove 1 and 2, Lmaltier 1 and part of 3, but I'd oppose most of those changes. Almost all of us are OK with "Conveying meaning".

All of us agree "Independence" should be rewritten. I plan to revive February's discussion so we can decide how to rewrite it, and make that a second vote. (Maybe we can get Ben to help us; he has experience cleaning up declarations of Independence.) A majority of us say keep "Spanning at least a year" as-is; Dan thinks the section is unnecessary; Lmaltier thinks the criterion itself should go. All of us agree "Idiomaticity" needs to be rewritten, but we need to discuss how. A majority want to change "Spellings", but in unrelated ways, so I'm marking it as something we should discuss further. We agree that "Formatting" should be reformatted. (I'll roll that into the "cleanup" vote.)

Most of us (who expressed an opinion) are OK with "Inflections" as-is. Msh210 suggests an update to the "Idiomatic phrases" section to bring it into line with actual practice; I plan to include that in the "cleanup" vote unless someone is opposed to it. All bzw. most of us want to change "Natural languages" and "Constructed languages", but we need to come to a consensus about specific changes. "Sign languages" and "Reconstructed languages" are good as-is or with a little tweaking. Most of us agree on removing "Vandalism" and "Protologisms", or moving them to a different page / to the "Attestation" section, respectively; I plan to make that a third vote (structured so people can vote to delete both, keep both, or delete one and keep the other). We're OK with "Language-specific issues".

I'll comment on "Fictional universes", "Wiktionary is not an encyclopedia", "Names", "Company names", "Genealogic content", "Names of specific entities" and "See also" later. We unanimously dislike BRAND, and though we're divided on how to change it, the dedicated section further up the page shows that a vote on the words "physical product" is in order. I suggest a vote with three options: change "physical product" to "product" (or similar, to make clear that it includes intangibles), change it to "tangible product" (or similar, to make clear that it includes only tangibles), or keep the status quo (unclarity); if neither of the first two options passes, the unclear status quo continues.

We agree: change "Given and family names" only to remove or answer the question of patronymics. Most of us agree: remove "Issues to consider: Attestation vs. the slippery slope"; I'll include that in the Vandalism-Protologisms vote. - -sche (discuss) 08:11, 21 October 2011 (UTC)Reply

The BRAND vote should include an option that goes beyond "tangible or intangible product" to something like "commercial offering". For example, British Telecom is not a product but it ought to fall under brand rules IMO. Equinox ◑ 14:58, 21 October 2011 (UTC)Reply

Even that is too narrow, because it would exclude noncommercial brand names like Debian, Anthrocon and so on. —CodeCa t 00:28, 22 October 2011 (UTC)Reply

Chinese radical changes

Latest comment: 13 years ago5 comments4 people in discussion

What the hell is going on? See http://en.wiktionary.org/wiki/Special:Contributions/213.79.124.126

Also, please archive this page so it doesn't take forever to load. It's just simple common sense.

71.66.97.228 07:38, 15 October 2011 (UTC)Reply

I oppose radical changes (lol). Mglovesfun (talk) 08:51, 15 October 2011 (UTC)Reply

Did you look at it? 71.66.97.228 23:20, 15 October 2011 (UTC)Reply

Reverted... changes like this need to be discussed first. Jamesjiao → ^{T ◊ C} 21:16, 16 October 2011 (UTC)Reply

I don't know any Chinese, but just looking at that page it's hard to discern the specific details of a character. I imagine that page will be used relatively often by people who know little Chinese (students or just curious people), and it would be hard for them to read the characters. So could they be made bigger on that page? Maybe twice the size? —CodeCa t 12:45, 19 October 2011 (UTC)Reply

Sanskrit dictionaries - parts of speech + language portals

Latest comment: 13 years ago3 comments3 people in discussion

Background

I am planning to write a bot to import definitions from publicly available and sanskrit-english, sanskrit-sanskrit digitized dictionaries to some wiktionary so that it may be collaboratively edited.
Ideally, for a given word or word-root like 'अङ्ग', I would want (English, Sanskrit, Hindi) definitions from various dictionaries to be collected in a single place.

Observations

en.wiktionary.org records English definitions of many Sankskrit word-roots. A part of sa.wiktionary.org interestingly seems to mark a beginning of duplication of some of these words.
Parts of speech used in these definitions are inadequate.
1. It is important to distinguish word roots from words in sanskrit. Eg: गम् is a verb-root. But it is never used in a sentence uninflected. With inflection, according to time, mood, number and case, forms like 'गच्छति', अगच्छम् are used. So, it will be importand from the perspective of dictionary users to ideally record all these forms (or atleast the roots) and distinguish the verb-root from the many inflected forms.
2. Further, word-roots are classified into different groups (including grammatical gender in case of noun-roots and इट्, गण in case of verb-roots), which determines how it may be inflected and used in a sentence. The same string can appear in multiple classes, and have different meanings. For the dictionary to be useful to the users of (and translators to and from) the Sanksrit language, these should be indicated while classifying word roots.
en.wiktionary.org lists definitions in different languages for a given string. Instead for the purpose of definitons and translations to or from Sanskrit, we would like definitions in different languages

Questions

Given the requirements mentioned (collating multiple language definitions in a single place, need for richer part-of-speech tags), should I plan to upload definitions to en.wiktionary.org or to sa.wiktionary.org? I am fairly certain that, in the latter case, no one will object, the only downside will be partial duplication in the two dictionaries.

Vishvas vasuki 15:15, 16 October 2011 (UTC)Reply

Welcome! Such a bot would be very welcome here, overall. A few comments:

We accept English definitions/translations of Sanskrit words, and Sanskrit translations of English words, but we do not accept Hindi definitions/translations of Sanskrit words.
Some duplication between projects is expected — even desirable. For example, en.wiktionary.org entries for English words include translations into French, and fr.wiktionary.org entries for English words also include translations into French. But the target audiences are different, so the presentation is different.
Just because something is publically available, that doesn't mean that you can copy it here. It must be either "in the public domain" (meaning that no one owns any copyright on it, for example because it's very old), or else publically available under an appropriate free license.
You're very, very new here. Before starting to run a bot, you need to become familiar with our norms and practices; that will take time.
We do use different part-of-speech headers for different languages, where necessary. If other Sanksrit-speaking editors agree with you about what part-of-speech headers are needed, you can start working on documenting the system, at Wiktionary:About Sanskrit.

—Ruakh_TALK 16:31, 16 October 2011 (UTC)Reply

It's very interesting. Sanskrit definitely needs some boost. Yes, before importing. You should try and create Sanskrit entries manually, look at the existing ones, like Category:Sanskrit_nouns, Category:Sanskrit_verbs, etc. Seek advice and see if the entries come out in an acceptable format. --Anatoli 07:56, 21 October 2011 (UTC)Reply

en-verb: person derivatives, e.g. "baker", "maker", "actor"

Latest comment: 13 years ago7 comments7 people in discussion

Verbs in English very frequently have a derivative based on a person performing the verb (anyone know the correct term for this construct?). For example, (deprecated template usage) bake has (deprecated template usage) baker, (deprecated template usage) make has (deprecated template usage) maker, (deprecated template usage) act has (deprecated template usage) actor. Would it be worthwhile to add to template:en-verb such that we could show the (one or possibly more, eg: actor, actress) of these derivatives? This would allow an organized and easy way to find the appropriate term rather than looking through the "Derived terms". Facts707 03:03, 18 October 2011 (UTC)Reply

But these words are derived terms, they're not considered part of the inflection of a verb. To add one or possibly two derived terms to the headword line doesn't really make much sense to me. We could also add -ness to adjectives otherwise, or -lik to nouns. —CodeCa t 09:46, 18 October 2011 (UTC)Reply

I don't think we should include agent nouns in verbs' inflection lines, not because they wouldn't be uesful there — maybe they would be — but just because the inflection line is already pretty long. —Ruakh_TALK 11:51, 18 October 2011 (UTC)Reply

I oppose this idea. Mglovesfun (talk) 14:24, 18 October 2011 (UTC)Reply

Great opportunity to work on one's Javascript skills, designing a custom inflection line for English verb lemmas. Perhaps it could even be made a gadget. I am a little skeptical that there is much call for this, however. DCDuring TALK 15:21, 18 October 2011 (UTC)Reply

I'm opposed too. The inflection line is for inflectional morphology; this is derivational morphology. Also, what would we do with forms like cooker? —An gr 10:05, 19 October 2011 (UTC)Reply

There's also the problem that English (deprecated template usage) actor is derived from Latin (deprecated template usage) actor, and not from English (deprecated template usage) act. Some of these agent nouns developed in other languages and then were borrowed into English, rather than developing in English from the English verb. So, they're not always Derived terms, but sometimes are Related terms. --EncycloPetey 15:20, 6 November 2011 (UTC)Reply

Portuguese bots?

Latest comment: 13 years ago2 comments2 people in discussion

Hi, anyone have a Portuguese conjugation bot? Or do I have to run one myself? --Rockpilot 10:06, 19 October 2011 (UTC)Reply

I think BuchmeierBot (talk • contribs) now does Portuguese. Mglovesfun (talk) 12:53, 19 October 2011 (UTC)Reply

Using a serif-like font for headwords in Chinese characters

Latest comment: 13 years ago2 comments2 people in discussion

In Chinese characters, 'serif' fonts (I'm not sure if it's the proper term) have the advantage that they show the individual strokes of the character more clearly than 'sans-serif' fonts. This is important for a dictionary which may possibly be used by students of Chinese or Japanese. Could the default font for headwords in Chinese characters (and maybe Japanese Kana as well) be changed? —CodeCa t 12:49, 19 October 2011 (UTC)Reply

It looks a bit ugly having serif for Chinese when the entire rest of the dictionary is in sans-serif. Additionally, I don't know how it would look in the font size we use. -- Liliana • 12:28, 20 October 2011 (UTC)Reply

Pinyin and Romanization headers

Latest comment: 13 years ago4 comments2 people in discussion

Why do some Mandarin entries use the header Pinyin, some use Romanization, and some use both. What's the difference? WT:RFC#shí mentions this. Mglovesfun (talk) 12:55, 19 October 2011 (UTC)Reply

I have cleaned up some. --Anatoli 22:18, 19 October 2011 (UTC)Reply

But how? Since both headers are valid. Mglovesfun (talk) 11:02, 21 October 2011 (UTC)Reply

I haven't seen Pinyin headers but we have been using and agreed on Romanization. Pinyin is the romanisation method for Mandarin, anyway. The entry is OK in terms of formatting but a native speaker may need to sort some characters I wasn't able to check properly - some are rare and not very productive. I'm also not sure if we need to list radicals like 飠/饣, they are never used to make words, they are character components. My suspicion is the entry was created by running a tool like Wenlin, which can generate lists of characters for a pinyin reading, not very useful, IMHO. --Anatoli 11:10, 21 October 2011 (UTC)Reply

Chinese character "etymologies"

Latest comment: 13 years ago3 comments3 people in discussion

This is concerning edits at 串 and other characters (eg. 字), and the dual use of the word "etymology", referring to 1) word etymology; 2) graphical origin of a glyph. The latter sense is not "etymology" per se - which, according to Wikipedia, is "the study of the history of words, their origins, and how their form and meaning have changed over time", not the development of the written form of a word, which Wikipedia treats as "origin" or "history".

At present, Wiktionary treats the origin of glyphs as "etymology", which is fine for non-word glyphs, for example "b"; but not so clear with glyphs which also carry meanings on their own, for example "あ". This dual use of the word "etymology" is much more notable when dealing with Chinese characters, almost all of which represent morphemes. The current format of Chinese character entries involves a "translingual" section at the top, which is where graphical "etymology" of a character is supposed to be discussed. Because most Chinese characters are partially phonetic and convey some information regarding similar-sounding characters at the time of coinage, this practice produces an inconsistency in the format, i.e. information which is non-translingual (applicable to certain stage of Chinese only) is placed under a heading "Translingual". To use an example, this old revision of "字" says the graphical "etymology" of the character is phono-semantic - which is true iff Old Chinese is the language being discussed, where the similarity in pronunciation of 子 ("child") and 字 ("to nurture; word") formed the basis for the invention of the character 字. This is some language-specific information, not "translingual". Readers would be misled to think that the concepts "child" and "word" are pronounced alike in all languages below on that page, which is not true.

A more obvious inconsistency is in 老 ("to be old"), which is said to be "cognate" with 考 ("to examine"). Graphically, the common origin of these two glyphs is obvious, but to say these two glyphs are "cognate" is again a language-specific issue. In Chinese, these two characters are doublets and cognate, but this is not true translingually. Similarly, the "cognacy" between 參 ("to participate") and 三 ("three"), if it were mentioned in the "translingual etymology" of 參, is neither "translingual" nor true "etymology". 129.78.32.23 03:20, 21 October 2011 (UTC)Reply

I agree. 福 is the worst offender, it kinda mixes everything in the Translingual etymology section, it hardly makes sense. -- Liliana • 03:24, 21 October 2011 (UTC)Reply

Proposal: use "Etymology" for the origin/coinage of the phonetic version of the character (if a single-character entry) or the origin/coinage of the word (if a multi-character term), and use "Graphical significance" (which could be a subheader of "Etymology") for the way the character looks and has developed over time (for single-character entries). 71.66.97.228 07:42, 22 October 2011 (UTC)Reply

Trademarks

Latest comment: 13 years ago17 comments10 people in discussion

Hi folks. At the Foundation, we've come across an interesting problem, and we need some guidance from you. We've had two companies contact us with concerns over terms in the English Wiktionary. Both are terms that are trademarked, but are defined on Wiktionary as generic terms. One of them (pycnogenol) has a citation list showing generic usage going back to the early 80's and a mention of the trademarked term as a brand name in the Usage Notes, the other (threatscape) has only one citation showing generic usage and no mention of a trademark or brand name usage.

What we need some help with is understanding what kind of policy Wiktionary has for trademarked terms and how to handle them when they arise. I've read WT:BRAND, and while it mentions genericized trademarks it does not mention strong criteria for determining when a trademark has been genericized, or what to do when a term is assumed to be generic but is also trademarked.

Has there been a policy discussion over this that I haven't found you can point me to? Or is this something that hasn't had a consensus decision yet? I would venture the opinion that this is something that needs a consensus decision, as these instances are going to continue (we've had these two come in quick succession, and I know it's a matter of time before we get more). If the community can come to a policy decision, it will give the community direction in creating definitions, and can give the Foundation something to point to when future instances arise.

There also needs to be a determination on what to do with these two entries; the makers of Pycnogenol ask for all generic definitions to be removed and only describe it as their proprietary product. The holders of the Threatscape trademark would at least like to add something to the entry mentioning their trademarked property.

Thanks for your help! Philippe (WMF) 04:29, 21 October 2011 (UTC)Reply

I've added three more citations for the generic use of (deprecated template usage) threatscape and added a link to the security company. SemperBlotto 07:40, 21 October 2011 (UTC)Reply
An example of how we treat trademark erosion can be seen at our entry for (deprecated template usage) hoover. However, in the two examples here the commercial terms have been created from existing ones. The Pycnogenol people seem to want us to go back in time and change reality. I am very afraid of tampering with the spacetime continuum - it would certainly end in tears. SemperBlotto 08:35, 21 October 2011 (UTC)Reply
- A bit of clarity in WT:BRAND wouldn't go amiss. Mglovesfun (talk) 10:41, 21 October 2011 (UTC)Reply
  - Since Wiktionary is an international dictionary, should trademarks be mentioned if they apply only in one country? Would we mention trademarks in Bhutan on English words? —CodeCa t 11:14, 21 October 2011 (UTC)Reply

Interestingly, @Philippe, we're currently revising the sections of WT:CFI that pertain to brand and company names (BP#1, BP#2), though for a different reason: a few users have proposed that we include many more trademarks (by which I mean: strings of letters which exist only because they were coined to be trademarks, and which have not become genericised, as distinct from pre-existing words which have been trademarked and as distinct from genericised trademarks) than we currently do — but a greater number of us seem poised to enact changes that will reduce (possibly eliminate) the number of such trademarks we include. We will continue to include genericised trademarks like hoover, and we will especially continue to include strings of letters which have been words for longer than they have been trademarks, because it is our mission and most fundamental policy to include attested words. AFAICT, ~~"Pycnogenol" was trademarked [in the US] in 1993, so~~ as SemperBlotto notes, both of these terms were words before being trademarked. They have also continued to be generic words after being trademarked. (Edit: Actually, I'm finding conflicting reports of when "Pycnogenol" was trademarked.) - -sche (discuss) 19:58, 21 October 2011 (UTC)Reply

These companies don't object to the inclusion of their trademarks here, only to the mention that their mark might be used in a generic way. The same happened on fr.wikt about fr:qualimétrie (see the lengthy discussion page). In some countries, companies have to protect their trademarks, and this might be the only reason of these contacts: getting a proof that they protect their trademarks. In any case, mentioning in the page that it's a trademark is useful, and necessary if the word was created by the company. Lmaltier 20:42, 21 October 2011 (UTC)Reply

(Intellectual property attorney hat on) We have had some discussions on this point previously, including a vote that I can not readily find to not put ^TM symbols in entries, and I have pointed out in every instance that Wiktionary has no legal obligation whatsoever to indicate the trademark status of a word. We face no legal liability of any kind for including and defining a word that happens to be a trademark, because we are not making trademark use of the word (i.e., we are not using the word to "sell" Wiktionary). I think our policy should state as much. If we assume the burden of noting the trademark status of words, we will be stuck with the fact that millions of common nouns in the English language (and others) are used as trademarks with respect to some products (dove, ace, coach, apple, cricket, fiesta, eagle, west, Mars, planters, and so on ad infinitum). Of course, noting the identity of a company that coined a word is important for etymological, not legal purposes. (Intellectual property atorney hat off). bd2412 T 20:51, 21 October 2011 (UTC)Reply

Putting a trademark gloss or symbol on words that are only trademarks is very different from adding notes at common nouns that they are trademarks for specific things (e.g. Dove soap). Most cases of the latter would fail WT:BRAND. Equinox ◑ 21:05, 21 October 2011 (UTC)Reply

Trademarks are transitory. They are abandoned from time to time (as when a car company discontinues a certain model); they can lose their federal registration status if the owner fails to file five year, ten year, and twenty year renewals. Furthermore, trademark registrations are country-specific, meaning that a mark can have different owners under different conditions in different countries. It's a morass we need not get into. Indicating the origin of a word or a sense as a trademark fulfills our mission. However, I think that giving an indication of the current legal status of a mark goes beyond our reasonable scope. bd2412 T 23:25, 21 October 2011 (UTC)Reply

What bad things could happen if we included "Claimed to be a trademark in some jurisdictions by IPCo. See their website/Contact them for details." as a usage note in cases where a company provides the information? DCDuring TALK 23:57, 21 October 2011 (UTC)Reply

First, by doing so we make ourselves advocates for claims of trademark ownership. My primary practice has been as a trademark attorney, either seeking to obtain trademark protection for a client, or advocating that one or the other of competing parties was the legitimate owner of the mark. Do we want to be in the position of stating in our definition that two different parties each (inconsistently) claim the exclusive rights to the mark? Do we want to be in the position of having parties seek to influence courts based on their ability to convince us that their claim is legitimate? Second, where do we draw the line if trademark owners in fact ask us to include such language with respect to common nouns, to "tide" and "whirlpool" and "crest" and "scope"? I am concerned that if we hold ourselves out as willing to include language indicating trademark status for some entities, then we do open ourselves up to liability for those companies for whom we refuse to provide the same service. Second, once such a notation is added to a word, it will have to be checked from time to time to be sure that it is still valid. Companies will have an interest in having trademark claims added while they use a mark, but have no interest in having it removed once they abandon that model. bd2412 T 01:23, 22 October 2011 (UTC)Reply

I tend to agree with / be persuaded by bd2412: including information on a word's trademark status is outside our scope, especially to the extent that we do not include trademarks as such. - -sche (discuss) 01:46, 22 October 2011 (UTC)Reply

I absolutely agree with BD. There is no need for it and no advantage in it. Ƿidsiþ 20:49, 22 October 2011 (UTC)Reply

Thanks, everyone. This is really very helpful. I'd like to request that if anyone else wants to provide some input on this, I'll continue to watch this page, or you can send it to me directly by email at philippe

wikimedia.org. Thanks again!! — This unsigned comment was added by Philippe (WMF) (talk • contribs) at 22:50, 22 October 2011 (UTC).Reply

Just as a follow-up, I just talked to our legal team, and we're very comfortable with both this discussion and the way that things are headed. We'll be happy to support this and it gives us a good understanding of the community's views on it. Thanks!Philippe (WMF) 22:31, 24 October 2011 (UTC)Reply

I would propose a brief statement to be inserted into the appropriate policy page indicating that we will note the origin of a word as a trademark where this is an etymologically significant fact, but that because trademark status varies by time and place, we do not include such information in our entries. Our entries are not intended to provide a legal opinion as to the trademark status of a word, or the legitimacy of any claim to rights in a word. bd2412 T 23:56, 24 October 2011 (UTC)Reply

A proposition: All words in all dictionaries.

Latest comment: 13 years ago23 comments9 people in discussion

I would like to propose that we should pursue a goal of incorporating in our lexicon all words in all dictionaries. If a reasonably reliable published dictionary of real-world terms (as opposed to fictional-universe terms or manufactured-language terms) contains a real-world definition for a word or a phrase, even a phrase that is encyclopedic or a sum-of-parts phrase, we should include that phrase somewhere in Wiktionary, even if it is only in a glossary. For example, I have a recent edition of Black's Law Dictionary. It has entries (to pick a random page) for Pennoyer rule, Pension Benefit Guaranty Corporation, pension plan, perfection of security interest, and perils of the lakes. I believe that we should not be in the position of lacking words that can be found in other dictionaries, and even if we do not include these terms in our corpus, we should have a glossary to which these terms redirect providing a quick sense of their meaning and, perhaps, referring the reader to the appropriate Wikipedia article on the subject. This is not a proposal to amend the CFI itself, but to maintain broad openness to inclusion among our entries or in a glossary of terms appearing in serious dictionaries of legal, medical, scientific, professional, and cultural terms. bd2412 T 03:08, 22 October 2011 (UTC)Reply

Oppose very strongly, we should not try to be other dictionaries. We cannot be better at being Websters than Webster, ot better at being Oxford than Oxford. We should also not reproduce the errors in other dictionaries. We should use own own criteria to avoid duplicating other people's work. Mglovesfun (talk) 09:44, 22 October 2011 (UTC)Reply

I specifically stated in my proposal that This is not a proposal to amend the CFI. We would still require independent attestation, and would still write our own definitions. Encyclopedic or sum-of-parts terms appearing in other dictionaries (and my proposal is aimed primarily at technical and professional dictionaries) would be relegated to a glossary or an appendix. Websters and Oxford are merely trying to be "the best dictionary", so why should we not want to be better than Websters and Oxford, and better than all the technical and professional dictionaries out there? bd2412 T 14:27, 22 October 2011 (UTC)Reply

Other dictionaries are not perfect and we should not ape them. We especially don't want to copy ghost words. We should be doing our own research on attestation: uses, not mentions. Equinox ◑ 09:48, 22 October 2011 (UTC)Reply

Equinox, that is not what I have proposed to do at all. bd2412 T 14:36, 22 October 2011 (UTC)Reply

It would certainly reduce the number of RfD debates if inclusion in even a single approved reference work qualified an entry for inclusion. Instead of dealing with so many entries at retail we could deal wholesale with the adequacy of a given reference work as a source of entries. I am not sure what criteria would work for assessing the adequacy of a reference work for this purpose, though. In English the "unabridged" dictionaries, obviously, would qualify. If we have a phrasebook, then there need be no question about learner's dictionaries. Wordnet includes some SoP terms (IMO), but ones that would be highly attractive as translation targets. I don't think we want Urban Dictionary without major qualification. I could imagine that it would not be hard to come to agreement on legal dictionaries, though some of the entries have seemed encyclopedic to me. OTOH, I am not at all convinced that business, finance, management, and investment glossaries have lexicographic authority.

And I would expect that we would want to apply our current practices to allow inclusion of items not in any improved reference. DCDuring TALK 13:34, 22 October 2011 (UTC)Reply

It's one case where I'd prefer 'usefulness' and 'accuracy' to 'simplicity'. Mglovesfun (talk) 13:39, 22 October 2011 (UTC)Reply

@Mglovesfun, Equinox: I think you're both arguing, at least in part, against a straw man. Even if we include a given word just because other dictionaries do, that doesn't mean we have to trust in the accuracy of their definitions. We could include (deprecated template usage) zzxjoanw, for example, while making clear that it's a joke or a hoax. —Ruakh_TALK 14:04, 22 October 2011 (UTC)Reply

If zzxjoanw has ever been used and not merely mentioned, fine. Equinox ◑ 14:10, 22 October 2011 (UTC)Reply

It has not. That's why we can't trust other dictionaries' definitions. ;-) (Though BD2412 is obviously talking about accusations of SOP-ness and {{rfd-redundant}}cy, rather than unattestability, so my example was a poor one. Also, he explicitly said he's O.K. with using appendices for these, and we already do include (deprecated template usage) zzxjoanw in an appendix.) —Ruakh_TALK 14:32, 22 October 2011 (UTC)Reply

That is correct, my proposal is primarily directed towards making sure that 1) we are not missing words that are attestible and meet the CFI, and are included in other dictionaries (including technical and professional dictionaries), and 2) allowing attestible SOP and encyclopedic terms to be included in a glossary or appendix here if they are listed in a technical or professional dictionary elsewhere). I am not proposing at all to include terms that are not found in the real world. As an example, I initially voted to delete angstrom unit in the VfD for that term. After researching and discovering that the phrase appears in several technical dictionaries, I changed my vote to keep. bd2412 T 14:42, 22 October 2011 (UTC)Reply

I have no objection to the use of dictionaries or similar reference works as a source of words (or terms) - as long as you are prepared to write your own definition, and be prepared to verify the word from the real world. SemperBlotto 14:09, 22 October 2011 (UTC)Reply

@Ruakh and BD2412 I just oppose the idea as a whole of trying to be more like other dictionaries. The details beyond that don't matter too much. Mglovesfun (talk) 14:50, 22 October 2011 (UTC)Reply

I don't want Wiktionary to be in the position of someone needing to look up a technical term and having to say to themselves, "Wiktionary won't have this, I'll have to look elsewhere". I want that someone to think of Wiktionary as the most likely place to have whatever definition they seek. bd2412 T 14:58, 22 October 2011 (UTC)Reply

My rebuttal to that specific point is a lot of dictionaries for technical terms include terms which would fail WT:CFI#Idiomaticity, because that's our rule, not theirs. A bit like in baseball, games played, which refers the number of [[games]] [[played]]. Mglovesfun (talk) 15:38, 22 October 2011 (UTC)Reply

(An interesting example, since in the case of "games played" there is a different definition for a defensive player, an offensive player and a team, none of which would probably match any pair of definitions on either game or played. - [The]DaveRoss 00:35, 25 October 2011 (UTC))Reply

I would propose in that case that games played, failing idiomacity, should not be within our main body of definitions, but should be in an appendix or glossary of baseball terms (probably along with such things as left-handed pitcher and batting order). bd2412 T 20:26, 22 October 2011 (UTC)Reply

If this is not a policy proposal but a general ideal that all words found in specialist dictionaries which meet WT:CFI should be included, then it seems a bit pointless to me, as all words that meet WT:CFI should be included, both ones which are in other dictionaries, and ones which aren't. Mglovesfun (talk) 10:33, 24 October 2011 (UTC)Reply

My thinking on this is primarily driven by my review of technical and professional dictionaries. As it happens, I currently work for a court that deals with technical definitions with unusual frequency, so the court's library has mind-boggling shelves upon shelves of technical dictionaries in every conceivable field. For example, there is the Dictionary of Mining, Mineral, and Related Terms, which I was able to get a CD version of, and which TheDaveRoss is presently crunching into a format uploadable to our project space. Of course, many definitions included in these dictionaries are SOP or possibly encyclopedic, but the fact that they are defined there indicates a market for people looking for those definitions in a dictionary format. Hence my concern that Wiktionary aim to become the place where people go to look up the definitions of words and phrases, no matter how arcane or specialized, and no matter whether they are in a gray area of being possibly SOP or encyclopedic (even if terms facing these questions are only included in appendices). bd2412 T 20:29, 24 October 2011 (UTC)Reply

Finding a phrase in another dictionary or technical glossary is a very strong clue that it's worth a definition (specialized technical writers are much more likely to know what should be included than us, because they know better their subject). I assume that this is what BD2412 means. Of course, I exclude encyclopedic dictionary entries such as Charles Darwin, I think only to set phrases. And when it's worth a definition, it's also worth a normal page. But this is not a reason to copy errors from other dictionaries, of course, I agree with SemperBlotto. Lmaltier 20:00, 24 October 2011 (UTC)Reply

@Lmaltier yes that's one possibiliyu, but also as you discuss and as BD2412 and I discuss above, 'specialized' dictionaries are also likely to include things that wouldn't meet our CFI like games played in baseball. They often behave more like Appendix:Glossary than our main namespace. Mglovesfun (talk) 12:40, 25 October 2011 (UTC)Reply

Oppose strongly, partly for some of the reasons above, but also because: (1) The American Heritage Dictionary has entries for Cleveland, (Stephen) Grover, Cosby, William Henry, Jr., Guevara, Ernesto, etc. and (2) B.D. Jackson's Glossary of Botanic Terms likewise has ridiculously useless definitions for terms such as necklace-shaped and rope-shaped, as well as a host of terms that appear to be peculiar to individual authors, such as flag-apparatus, paronychietum, and meridisk. These are all real-world terms, often cited from published authors, but of no particular value in a dictionary. We have developed our own criteria for inclusion for a reason, and if you look at a host of other dictionaries, that reason becomes quickly apparent. --EncycloPetey 15:10, 6 November 2011 (UTC)Reply

of no particular value in a dictionary: why? Anyway, we don't select words on their value, we include all words, even rare ones. Lmaltier 06:09, 8 November 2011 (UTC)Reply

Etymologies of place names

Latest comment: 13 years ago19 comments7 people in discussion

In the discussion about CFI above I mentioned that it would be useful to include the etymologies of place names even if they are small and don't meet CFI otherwise. For example there is little to be defined about a small place like w:Eersel, but its etymology would nonetheless be useful and not encyclopedic at all. So for that reason I'd like to propose that we allow the etymologies of all place names, but only their etymologies, to be entered in an appendix of some sort if they don't qualify for CFI. If there is enough support we can try a vote. —CodeCa t 15:14, 22 October 2011 (UTC)Reply

I support Ungoliant MMDCCLXIV 17:27, 22 October 2011 (UTC)Reply

From what I see, current CFI does not exclude "Eersel". CFI contains no restriction on the size of a geographic entity whose name is considered for inclusion. The criteria for geographic names that were in CFI and were removed from it recently only referred to the ability of the geographic name to carry lexicographical information such as etymology and pronunciation. From what I can see, no vote is needed. For a rather small village currently in Wiktionary, see Rückingen, which has 5,800 inhabitants. For clarification: in current CFI, geographic names are governed by the section WT:CFI#Names of specific entities, which says this: "A name of a specific entity must not be included if it does not meet the attestation requirement. Among those that do meet that requirement, many should be excluded while some should be included, but there is no agreement on precise, all-encompassing rules for deciding which are which." --Dan Polansky 18:41, 22 October 2011 (UTC)Reply
- In this case the question then becomes how small places meet those requirements. They ought to be in widespread use at least because anyone in the village uses the name, but usage in permanently recorded media would usually be limited to maps, surveys and legal documents (many early Dutch and German place names are found only in Latin texts). The village may be known by only a small amount of people outside it, and most place names are derived from the local dialect which is often otherwise undocumented. So in a way, place names are highly dialectal terms that are mentioned only in maps, and used only in the community for which the place is significant but for which no written language may exist. And since it was coined in the local dialect, you can also wonder what the language of a place name really is? Sometimes the official name of the place is actually an exonym and is not the name used in the place itself. As an example, Girona was until recently known only by its Spanish name Gerona, despite being predominantly Catalan-speaking, and similarly Hiiumaa in Estonia was known officially by its Russian exonym Хийумаа (Khiyumaa). All of this makes it hard to treat them the way we treat regular words, it would make more sense to group them by geographic area than by language. —CodeCat 19:07, 22 October 2011 (UTC)Reply
  - Yes, it's sometimes difficult, but this is not a reason to deal with them in an appendix (other words may be difficult too). Village names are words, they should be addressed the same way as other words. For small places, it may happen that no other dictionary adresses them, their presence here is a real added value. And not only for their etymology: their pronunciation is very useful too, and is different in different languages. And demonyms, etc. Please keep it simple. Lmaltier 19:48, 24 October 2011 (UTC)Reply
    - I am not saying we shouldn't allow place names in the main namespace. Many place names have useful information that could be added. But what if we include not just all words in all languages, but all place names in all countries as well? Even a small country like Luxembourg could contain thousands and thousands of place names, do we want to allow all of them? And if not, then what kind of criteria can we apply to place names? Eersel may be readily attested because it is fairly large still. But what about tiny places with only a few houses? Essentially, maps and surveys are the only things that document every place name, but they are really like dictionaries, and we have a policy against including material from dictionaries blindly. Can maps be used to attest place names, even though they are mentions and not uses? —CodeCat 20:11, 24 October 2011 (UTC)Reply
      - Yes, there is a huge number of placenames; but it's no more difficult that including all words: we all know that it's impossible, but it's our objective. And I think that placenames on maps should be considered as uses in the language used by the map. When you write Y is a small town..., you use Y, and it's the same for maps. When you write Y is the word used in French to refer to a small town..., this is a mention. Lmaltier 21:01, 24 October 2011 (UTC)Reply
        What should the definitions of these places be? Should every place in Luxembourg be defined as A place in Luxembourg? —CodeCa t 21:44, 24 October 2011 (UTC)Reply
        In my opinion, there should be an indication about where the place is + ideally a map showing its position (such a map makes the definition much clearer and more precise). But, of course, no demographic or economic data. Lmaltier 19:19, 25 October 2011 (UTC)Reply
I would like some more opinions from others on this if that's possible. I would like to start adding place names but I don't want to get in people's way if I do that. —CodeCa t 19:50, 25 October 2011 (UTC)Reply

	Input needed
	This discussion needs further input in order to be successfully closed. Please take a look!

AFAICT place names are all acceptable under the CFI. However, IMO places aren't. Just their names. In other words, not every place named Eersel gets its own sense line. Rather, if there is more than one Eersel and all of them are municipalities (towns or cities or the like), then the only definition should be "{{non-gloss definition|A municipality name.}}" or "{{non-gloss definition|A name of several municipalities in the Ducth-speaking countries.}}" or the like. Or if some Eersels are neighborhoods, some cities, and some counties, then "{{non-gloss definition|A place name.}}" or "{{non-gloss definition|A name of several counties, municipalities, and neighborhoods in the Netherlands.}}" or the like.—msh210℠ (talk) 15:45, 2 November 2011 (UTC)Reply

But it would be possible for each Eersel to have a distinct etymology, in which case it would be unavoidable to have separate senses for each of them. For example, America, the continent, has a different etymology from America, a small village in the Dutch province of Limburg. And similarly for Californië in Gelderland, Nederland in Texas, Colorado and even the Dutch province of Overijssel (so there is a Nederland in Nederland). —CodeCa t 16:02, 2 November 2011 (UTC)Reply

Re: "If there is more than one Eersel and all of them are municipalities (towns or cities or the like), then the only definition should be [...]": No such agreement has been reached, AFAIK. Thus, more definition lines of municipalities are accepted in the entry for "Paris", while, at the same time, not every place needs to have a dedicated definition line. The last thing on which we have agreed at least for a limited period of time until it was removed from CFI again was this: 'If the name is shared by several places, some of the places bearing the name can have a dedicated sense line, while other ones can be covered under a summary sense line such as "Any of a number of cities in Anglophone countries"', per this revision. --Dan Polansky 16:03, 2 November 2011 (UTC)Reply

We already have multiple etymologies for multiple places with the same name. (deprecated template usage) Christchurch is a good one. SemperBlotto 16:08, 2 November 2011 (UTC)Reply
- The way the etymologies are written in Christchurch is not how it's normally done on Wiktionary. Is it how we want to do it? —CodeCat 16:30, 2 November 2011 (UTC)Reply
  - I think the etymology of Christchurch should above all mention the words Christ and church - it's not necessarily transparent to a student in China. Why each place received this name might be called encyclopedic. In any case explaining after which Matti every one of the 50+ Finnish places named Mattila were named would be out of question. I've defined common placenames as "Any of a number of places in Finland", with a separate definition if some place is particularly important. I don't think entries for even small villages are likely to be deleted if they contain a good etymology and pronunciation, but I'd be wary of creating entries for compound words (X River, South X) and for names of minor places without anything else than a definition.--Makaokalani 17:09, 3 November 2011 (UTC)Reply
    Why a name was given to a thing or to a place is not encyclopedic, it's etymologic. would be out of question: why? Inhabitants would be happy to find the etymology of the name of the place where they live. And don't forget associated demonyms (e.g. in French, there are many places named fr:Beaulieu sharing the same etymology, but not the same demonym). Lmaltier 21:04, 8 November 2011 (UTC)Reply
    
    to Msh210: each sense of a word deserves its own definition. Some words have an etymology but no actual sense, such as surnames or 1st names. But placenames have senses. Nobody is obliged to add all these senses, but they should not be removed when present: they are useful, and sometimes required for etymologies (sometimes), pronunciation (sometimes), translations, demonyms... Lmaltier 21:13, 8 November 2011 (UTC).Reply

Yes, we should have entries for the words which make up place names, especially to include their etymologies.

But no, specific signified places are not “senses” to be defined. The dictionary entry Paris oughtn't be gazetteer of three dozen specific cities, towns, counties, and neighbourhoods (which is rightfully at w:Paris (disambiguation)#Geography), any more than the surname Smith should be a phone directory of a few million specific people (the prominent ones go in w:List of people with surname Smith).

The origins and etymologies of these words and names—toponymy and onomastics—belongs in the dictionary. But the identities and locations of each of these places, who named them, and when, and why—geography and history—not so much. —Michael Z. 2011-11-09 02:49 z

Don't you feel the difference with surnames? The sense of a surname might be anybody with this surname, but there is no real sense. Placenames have senses (most often one or two). And everything related to the names (including their etymologies and their senses) belong here. Lmaltier 06:07, 9 November 2011 (UTC)Reply

Definitions versus Descriptions

Latest comment: 13 years ago6 comments6 people in discussion

Dictionaries are commonly supposed to contain definitions; but most dictionaries seem to contain merely descriptions. For example, the Wiktionary definition of chaconne is currently "A slow, stately Baroque dance"; but all slow, stately Baroque dances are not chaconnes.

So my question is: should such descriptions be expanded to be as specific as possible, or would that be considered undue clutter? Paul Magnussen 20:59, 22 October 2011 (UTC)Reply

Unhelpfully: ‘it depends’. That definition could probably use a little more detail, but in general if you need to write more than one sentence, then it's probably too much. Where Wiktionary editors will get jittery is where a definition becomes ‘encyclopaedic’, but that is of course a subjective line. Ƿidsiþ 21:05, 22 October 2011 (UTC)Reply
- At some point it becomes too hard to define a term, because the specific defining characteristics aren't easy to write out in a single sentence. At that point referring to Wikipedia becomes a good alternative. —CodeCa t 21:12, 22 October 2011 (UTC)Reply
(See also [[user:msh210/specificity]] and its talkpage.—msh210℠ (talk) 06:53, 23 October 2011 (UTC))Reply
In a way, I'd say don't worry about it. In a sense, a definition is a description which has only the information necessary and sufficient to describe something completely, but we have dictionary entries, not true definitions. I think a dictionary is where the grammatical features belong, and the real-world description should be an abbreviation of a complete encyclopedic one, with the knowledge that the encyclopedia has the responsibility for that. Real-world descriptions in a dictionary and an encyclopedia tend to overlap, but grammatically, words have more discrete states, such as person (first person, second person), tense (past, present), mood, aspect, being countable or uncountable, and whatever else the grammar provides for. This information, grammatical information, especially about verbs, is hard to find in an encyclopedia. For example, if you look up "marries" in WP, you get redirected to the noun "Marriage". The "Third-person singular simple present indicative form of marry" definition is the type of information that should have a complete treatment here, and a full understanding of marriage is not our problem. Haplology 08:12, 23 October 2011 (UTC)Reply

I concur. Generally speaking, we cannot achieve perfect definitions. But we can (and should) in some cases, when it's easy, e.g. by providing the full mathematical definition of topological space, or the scientific name as a complement of the definition of Atlantic salmon. It makes the definition unambiguous, even if it does not help everybody... Lmaltier 20:52, 24 October 2011 (UTC)Reply

“Chaconne — a slow, stately Baroque dance.” Is it

Any slow stately dance of Baroque Europe?
A particular style of Baroque dance that is slow and stately?
A specific Baroque dance which happens to be slow and stately?

If we can't answer this question, then we can't even know whether the existing definition is defining or descriptive. This is as much a problem of syntax and grammar as it is of the facts provided. —Michael Z. 2011-11-09 00:57 z

Template:figurative

Latest comment: 13 years ago7 comments3 people in discussion

On my talk page, a user suggested that {{figuratively}} should redirect to figurative instead of the other way around. What do we think? {{literal}} does the same thing, in that it redirects to {{literally}}. Mglovesfun (talk) 10:39, 23 October 2011 (UTC)Reply

Neither. Each displays its pagename in the context label, so they have different uses. They should categorize identically and display differently. If, however, the community disagrees with me and decides, as Martin says someone suggests, to swap the redirect, then we'd better not do so before checking uses of the templates: we don't want (e.g.) "(literally or figuratively)" to become "(literally or figurative)".—msh210℠ (talk) 16:57, 23 October 2011 (UTC)Reply

Keep them the way they are. The user seems to have very strong feelings about it but didn't provide any reasoning. I can't imagine what it would be, and I for one don't share their feelings about it. Both labels make sense to me and both figurative and figuratively sound fine in my opinion. One could imagine an entry which is figurative in its main sense and also has another figurative sense, being a second level of figurative-ness if you will. In that case figuratively is better--the main sense is figurative and another is meant figuratively. Haplology 17:36, 23 October 2011 (UTC)Reply

But the way they are doesn't allow that: both display as "figuratively".—msh210℠ (talk) 18:07, 23 October 2011 (UTC)Reply

I'd be happy to allow {{figurative}} as a separate context label to figuratively. Mglovesfun (talk) 10:38, 24 October 2011 (UTC)Reply

On reflection, I'm not sure there is value in having them separate. A word could be figurative and hence used figuratively, so really they're the same thing. Since neither categorizes, it barely matters. Mglovesfun (talk) 12:37, 25 October 2011 (UTC)Reply

Wiktionary:Halloween Competition 2011

Latest comment: 13 years ago3 comments3 people in discussion

Hello all, this is an announcement about the latest installment of our user competition, to tie in with Halloween. It is all about writing a short Halloween story, and hopefully we can give some of the most common words here some much-needed improvement. Let me know if you think some things should be altered or added. Sign up at Wiktionary:Halloween Competition 2011. --Rockpilot 09:17, 24 October 2011 (UTC)Reply

Not sure I have the time or the inclination for another competition. Can't we wait until Xmas? Mglovesfun (talk) 10:41, 24 October 2011 (UTC)Reply

Not with that theme. Halloween stories are better than Christmas stories. --Daniel 23:19, 24 October 2011 (UTC)Reply

Wonderfool (Rockpilot)

Latest comment: 13 years ago15 comments8 people in discussion

Just for everyone's info, I've blocked him again. This was the immediate trigger [12] but I think people may agree that he was gradually making more and more trouble, and not really contributing much of use. Equinox ◑ 21:51, 24 October 2011 (UTC)Reply

(spectation) How is "Harvey" an example phrase of spectation/regard? It looks like a mistake to me. If it is a mistake, then Rockpilot was right to remove it. If it turns out to be correct, I think it needs an explanation, because it is meaningless to me. —Stephen ^(Talk) 22:50, 24 October 2011 (UTC)Reply

The entries I have been adding are from Webster 1913 (though I have been checking to ensure that the words are reasonably attestable). With that dictionary, the habit was to name a major author who had used the word, and not necessarily to include the quotation (due to limitations of space and printing techniques, I suppose). This obviously isn't perfect for us, but I think they are worth including as an easy way to find a citation for a word (search Google Books for the word and the given author). I am adding the Webster entries in a semi-automated way (involving a significant initial effort on my part in writing Webster-to-Wiktionary conversion code) and felt these were worth keeping. Ideally people would be improving these entries by finding the named citations and including them in full. Rockpilot is not interested in doing this kind of work and would rather create deliberately divisive competitions and rude comments. I am certain that he only removed the citation names from my entries in order to cause trouble. If he wanted to do useful work he could have found the citations. Equinox ◑ 22:57, 24 October 2011 (UTC)Reply

Just to make the point, I have found the Harvey citation in question and added it (see spectation) though I can't identify the original publication. Can you? Would you like to add it? Or perhaps everyone would rather add ridiculous phrasebook entries for do you love transsexuals?. Dorks. Equinox ◑ 23:02, 24 October 2011 (UTC)Reply

I have the same reaction as Stephen. "Harvey" is intelligible to those who know it is a placeholder for a quotation by Harvey, but to others, it is indistinguishable from vandalism — it looks like someone named Harvey inserted his name into an entry. If an IP had removed it, I would assume the IP was acting in good faith, removing apparent vandalism! Of course, RP/WF knew it wasn't vandalism, and probably knew it was a useful placeholder. - -sche (discuss) 23:11, 24 October 2011 (UTC)Reply

Then what do you suggest? Perhaps we need a new template. Please make it. Equinox ◑ 23:14, 24 October 2011 (UTC)Reply

I see no reason to think that Rockpilot meant to make any sort of trouble by removing it. I think he was acting in a perfectly reasonable manner. It was incomprehensible to me, and if I had seen it, I would have removed it myself.

I don’t know anything about what is involved in the semi-automated addition of entries, but maybe these citation names could be hidden from view and a template added asking for help finding the citation.

At the very least, I would unblock Rockpilot, because I don’t think he had anything but the best of intentions in this particular matter. —Stephen ^(Talk) 23:20, 24 October 2011 (UTC)Reply

I agree with Stephen. I have unblocked him. He has helped me a lot with German conjugations! -- Liliana • 23:25, 24 October 2011 (UTC)Reply

Rockpilot cannot claim ignorance of what the underlying phenomenon (material taken from Websters), as I had explained it and their practice for noting the author of a usage citation or a dictionary that had included the term. I think it was in the context of a previous deletion.

The existence of bare surnames and surnames associated with usage examples is quite widespread among the many English entries that have not been much revised since being copied from Webster 1913. DCDuring TALK 00:24, 25 October 2011 (UTC)Reply

I think a template is a good idea; we already have a category. I've created {{rfquotek}} ("k" stands for "known", because we "know" a bit of information, either the person or the work we want a quotation from). It takes the name of the person or work as its first and only parameter, and is used like this. Feel free to improve and/or rename. - -sche (discuss) 23:48, 24 October 2011 (UTC)Reply

The template is handy, but even handier would be a reliable means of identifying the entries and sections therein likely to need such a template. As the process of bringing such citations to our standard doesn't seem to be something many contributors find worth undertaking, it would seem we need a cleanup list for these things, on which the few contributors who are willing to do this can focus their efforts. DCDuring TALK 00:24, 25 October 2011 (UTC)Reply

WT:Abbreviated Authorities in Webster lists all the author abbreviations used (and their fuller forms). The list could be used to search for and templatize all the occurrences within entries. --Bequw → τ 13:12, 25 October 2011 (UTC)Reply

I would omit the surnames, and add {{R:Webster 1913}} to every imported entry instead, thereby making the surnames available one click away. Otherwise, the request templates are likely to sit in Webster 1913 entries for ages, providing close to no useful information to readers of Wiktionary. But even if the surnames stay, kudos to Equinox for doing the import! --Dan Polansky 07:35, 25 October 2011 (UTC)Reply

No hard feelings, Eq. And thanks for the unblock, Liliana. They were all good-faith edits, and I'm glad something positive is appearing as a result of my controversial edits. It's not the first time, and I'm sure it won't be the last time I do something controversial here either! --Rockpilot 07:53, 25 October 2011 (UTC)Reply

Outcome: I am using the new template on surnames (thanks, -sche) and I already put the Webster template on all these entries (unless the word seems remarkably common and I can't believe we don't have it). Equinox ◑ 22:56, 26 October 2011 (UTC)Reply

Template:ja-forms - what is this for?

Latest comment: 13 years ago5 comments4 people in discussion

I just ran across 啞 and was confused to see that the Japanese entry includes kanji boxes for "simplified" and "traditional" forms of the kanji -- both of which are only relevant in a Chinese context. I was surprised to find that this box is put in by the {{ja-forms}} template. Since simplified and traditional are in fact not Japanese forms, I'm strongly tempted to remove this template from those Japanese entries that use it and placing it in the ==Translingual== section of those pages, where it appears to belong. By way of reference, 观 and 览 (neither of them character forms used in Japanese) place this template in the ==Translingual== section. I'm also tempated to edit the template slightly to clarify that "shinjitai" is for Japanese use, and that "simplified" and "traditional" are for Chinese use.

Does anyone else have an opinion on the matter? And perhaps the template should be moved to a more appropriate name, since this cross-referencing of Chinese character forms is applicable to all Chinese character entries that have alternate forms, not just Japanese? -- Eiríkr Útlendi │ Tala við mig 15:26, 26 October 2011 (UTC)Reply

I support moving it to the Translingual section and renaming the template. - -sche (discuss) 19:24, 26 October 2011 (UTC)Reply

Me too. Haplology 16:09, 27 October 2011 (UTC)Reply

Certainly don't keep it as it is. However, do Translingual CJKV characters have traditional and simplified forms? I think not. Mandarin does, for other Chinese languages I don't know. Unless I'm very very wrong, Japanese, Korean and Vietnamese do not have traditional and simplified forms. Mglovesfun (talk) 16:11, 27 October 2011 (UTC)Reply

No, Mglovesfun, you're right, Chinese has (deprecated template usage) simplified and (deprecated template usage) traditional, Japanese has (deprecated template usage) shinjitai and (deprecated template usage) kyūjitai, with kyūjitai in Japanese writing basically the same as Traditional Chinese. I'm not aware of any simplified characters specific to Korean; South Korea at least seems instead to just be phasing out Hanja characters in their entirety, which simplifies education considerably but does regrettably reduce shared written vocabulary with the rest of the Chinese script world. Vietnamese is wholly outside of my realm of expertise.

Extrapolating a bit, my suspicion is that the Japanese-speaking (-reading?) editors who created {{ja-forms}} might have been responding to the use of {{Hani-forms}}, which is generally added to the ==Translingual== section, as over at 見. This makes sense since these forms are used (at least historically) across the breadth of the Chinese writing for all dialects. Perhaps {{ja-forms}} was intended to expand upon this to include all (or at least more) attested forms of a Chinese character? The Simplified Chinese page 观 has no Japanese entry (nor should it), and it uses {{ja-forms}} to point to both the Traditional Chinese form of 觀 and the Japanese shinjitai form of 観; this begins to make sense to me from the perspective of Chinese character forms being akin to alternate spellings, and thus a Chinese character entry (in any language) should list these alternates, just as we have both (deprecated template usage) colour and (deprecated template usage) color.

That said, using the ja- prefix on the template name seems misguided at best. I'd be happy to just make sure that {{Hani-forms}} can handle (deprecated template usage) shinjitai and (deprecated template usage) kyūjitai, and replace all calls to {{ja-forms}} and then delete it. And possibly tweak {{Hani-forms}} and its documentation to specify use only for single-character entries. What say you all? -- Eiríkr Útlendi │ Tala við mig 21:13, 27 October 2011 (UTC)Reply

Terms of Use update

Latest comment: 13 years ago5 comments4 people in discussion

I apologize that you are receiving this message in English. Please help translate it.

Hello,

The Wikimedia Foundation is discussing changes to its Terms of Use. The discussion can be found at Talk:Terms of use. Everyone is invited to join in. Because the new version of Terms of use is not in final form, we are not able to present official translations of it. Volunteers are welcome to translate it, as German volunteers have done at m:Terms of use/de, but we ask that you note at the top that the translation is unofficial and may become outdated as the English version is changed. The translation request can be found at m:Translation requests/WMF/Terms of Use 2 -- Maggie Dennis, Community Liaison 00:42, 27 October 2011 (UTC)Reply

I don't think you need to apologize for English on this project! — lexicógrafa | háblame — 02:31, 27 October 2011 (UTC)Reply

That's alright. We forgive you. --Daniel 08:24, 27 October 2011 (UTC)Reply

I'm afraid it's completely incomprehensible. Do we have any English-to-Wiktionarian translators around? --Yair rand 16:13, 27 October 2011 (UTC)Reply

Luckily for me, I have an automatic translator. I know they are unreliable and whatnot, but the English-to-English option is perfect. It's flawless, I swear. I understood everything. --Daniel 17:44, 27 October 2011 (UTC)Reply

Categories of names 3

Latest comment: 13 years ago1 comment1 person in discussion

Wiktionary:Votes/2011-10/Categories of names 3 started. --Daniel 08:25, 27 October 2011 (UTC)Reply

Moving large discussions to subpages

Latest comment: 13 years ago3 comments3 people in discussion

A problem I often have with discussions in the Beer Parlour and related rooms is that the pages often get very long. Once more new discussions are added it becomes hard to keep track of all of them because of all the scrolling involved. We do archive discussions but that doesn't always help because there is just too much in between. So I wonder if it would be a good idea to move larger discussions to subpages, and link to them from the main page? That way, the BP is kept clean and another advantage is that you can watch that discussion page, which is much more effective in following a discussion than watching all of the BP at once. Perhaps a system similar to how WT:VOTE works? —CodeCa t 15:53, 27 October 2011 (UTC)Reply

For information, fr.wikt uses a different subpage for each month. Lmaltier 20:03, 27 October 2011 (UTC)Reply

I have no strong objection to either of those methods. I do worry that sorting pages by month will lead to some confusion as to where a current discussion is ongoing, so I think making subpages for discussions that become very long is the better solution. bd2412 T 20:27, 27 October 2011 (UTC)Reply

November 2011

Proper label for Japanese "quasi-adjectives"

Latest comment: 13 years ago9 comments4 people in discussion

(NB: If this should be moved to Wiktionary talk:About Japanese, please let me know and I'm happy to move it there.)

What to call 形容動詞 (keiyōdōshi) in English has been a bit of a bugaboo for many a linguist. As noted above at Wiktionary:Beer_parlour#Preferred_forms_for_Japanese_lemmata, the current label of ===Adjective=== does not fit. The alternates of ===Nominal===, ===Adnominal===, ===Copular adjective===, ===Quasi-adjective===, ===Noun===, ===Descriptive noun===, and ===Nominal adjective===, among other possibilities, all seem to entail different kinds of confusion.

Considering that the keiyōdōshi part of speech has no real analog in English, it seems appropriate to eschew the plain English POS labels in favor of something more fitting. Does anyone have any strong argument against na- adjective? This is the most common label I've seen used in Japanese learning materials for English speakers. The only arguments against this that I've run across (so far not here on WT) arise from objections to how this term does not seem sufficiently technical or linguistic. Given WT's apparent target audience, I think being clear is more important than being technical jargon. What say you all? -- Eiríkr Útlendi │ Tala við mig 16:46, 31 October 2011 (UTC)Reply

It makes sense. To be honest it looks a little strange though, but I could live with it, especially since as you say it is common in English-Japanese dictionaries. Basically, something like 素直? I added that header. Haplology 18:14, 31 October 2011 (UTC)Reply

Na-adjective, copular adjective, quasi-adjective, and nominal adjective are not very good because they are supposed to include だ. I prefer descriptive noun in the list above by Eirikr if possible. I know na-adjective is a well-known term among Japanese learners as Haplology said, but it is too specific to Japanese. There is a very similar word class in Korean, namely 형용명사 (hyeongyongmyeongsa, 形容名詞, "adjectival noun"), like 과학적 (gwahakjeok, 科學的, "scientific"), and it is a type of noun in traditional Korean grammar. We should treat the Japanese 科学的 (kagakuteki) and the Korean 과학적 in the same way. — TAKASUGI Shinji (talk) 04:06, 2 November 2011 (UTC)Reply

@Takasugi-san, just to be sure, are you proposing that we use Adjectival noun? (not descriptive noun) I agree that na adjective is good for bi-lingual dictionaries but multilingual dictionaries should have parts of speech as broad as possible. Haplology 04:18, 2 November 2011 (UTC)Reply

Although I have used adjectival noun, we should avoid it if we want to have a term as broad as possible. In the traditional Latin grammar, adjectives were called nomen adiectivum, which translates literally to adjectival noun. That might be confusing for Latin learners. Descriptive noun will be better. — TAKASUGI Shinji (talk) 04:51, 2 November 2011 (UTC)Reply

I strongly prefer Adjectival noun. Wikipedia uses it on their English page linked to 形容動詞 (w:Adjectival noun (Japanese). Wikipedia's page doesn't use the word "descriptive" at all. Adjectival noun gets 3 times as many hits on Google. On Google Books, "Adjectival noun" Japanese gets 1,960 results and "Descriptive noun" Japanese gets 145, and the first hits at least are not about Japanese. Descriptive noun also has the common, non-technical meaning of being a noun that describes people, so it might confuse ordinary people. I don't think we need to worry about confusing Latin learners. 形容動詞 are confusing to everybody anyway no matter what we call them, but even if Descriptive noun is better, we should follow precedent. Haplology 08:16, 2 November 2011 (UTC)Reply

For interest, some searches that show what people outside of Wiktionary write: google books:"adjectival noun" Japanese, google books:"descriptive noun" Japanese, google books:"na- adjective" Japanese, google scholar:"adjectival noun" Japanese, google scholar:"descriptive noun" Japanese, google scholar:"na- adjective" Japanese. --Dan Polansky 09:02, 2 November 2011 (UTC)Reply

Okay, I understand. It will be fine to use adjectival noun for Japanese keiyōdōshi stems and Korean nouns with -적 (-jeok). — TAKASUGI Shinji (talk) 09:42, 2 November 2011 (UTC)Reply

Mulling it over some more, I'm happy with adjectival noun, especially since (as Haplogy and Takasugi noted above) this broader label works for more than just Japanese. I balked at nominal adjective, not least since this term is more commonly used in English grammar to refer to an adjective used as a noun (as described at #4 at w:Adjective#Form), but turning this term around backwards works much better, and Dan's list of links make it clear that adjectival noun is used widely enough in the literature that WT users are likely to be at least passingly familiar with the term. A quick look at google books:"adjectival noun" Korean and google scholar:"adjectival noun" Korean shows some use of this term for Korean too, for what it's worth. -- Eiríkr Útlendi │ Tala við mig 16:45, 2 November 2011 (UTC)Reply

Anagrams

Latest comment: 13 years ago10 comments6 people in discussion

I urge the community to rethink the introduction of anagrams in wiktionary entries.

The task of finding anagrams can be easily automated
More information is not always better than less information. There are all sorts of useless statistics that we could add to each entry, but we shouldn't because the signal-to-noise ratio would be too small, and it would increase the overall noise in the page, making it more confusing and difficult to find what one is normally looking for: meaning, etymology, etc.
Anagrams simply do not belong to a respectable dictionary. Are there any precedents?

Thanks. 220.210.184.66 23:32, 1 November 2011 (UTC)Reply

I like the anagrams. They are useful in word games, and when you consider (from the feedback page) that half our users are semi-literate and can barely read a normal entry without getting lost, they certainly aren't going to be able to "automate" anything. We can be a respectable dictionary and still introduce new practices, or else nothing would ever have changed or progressed from the first dictionary ever made. Equinox ◑ 23:37, 1 November 2011 (UTC)Reply

Of course, not everything useful or that you and I like belongs at wikt. Also, if users can't navigate our entries, that's fodder for 220's argument against noise; and they probably can nonetheless find anagrams, as they need merely Google anagram ayes (or whatever) to find already-automated anagram lists.—msh210℠ (talk) 23:57, 1 November 2011 (UTC)Reply

If users can't navigate, that is a problem with our navigation, not with the content. We can use collapsible sections, etc. Equinox ◑ 00:01, 2 November 2011 (UTC)Reply

Personally, I disagree with "Anagrams simply do not belong to a respectable dictionary". The set of anagrams of a word is a property of the word and, like rhymes and synonyms and other things, belongs in a page about the word, which is what we have here. (And like Equinox says, that other dictionaries don't have doesn't mean we shouldn't.) But that's just my opinion: I think opinions are what this will have to come down to, unless someone does a usability study.—msh210℠ (talk) 23:57, 1 November 2011 (UTC)Reply

I also think that including anagrams is a good thing (for those interested, not the majority of readers). Their inclusion emphasizes the difference between us and Wikipedia, which is not obvious to everybody. Their absence from most dictionaries is an additional reason for inclusion, another "plus" we can bring. Also note that anagrams can be found on other sites, but 1. certainly not for all languages. 2. certainly not for all words, including surnames (and readers might be interested by anagrams of their name), etc. Lmaltier 06:31, 2 November 2011 (UTC)Reply

Innovation must be allowed. Agreed, but it's not a sufficient reason to keep a feature that is marginal or useless to most users in such a prominent place (level 3, on par with pronounciation and the actual lemmas). I think at the very least it should be put somewhere less noticeable. There are plenty other statistics (e.g. character count, another questionable "plus" that we can bring, so to increase the ~~entropy~~, ahem, information) that are even more useful in word games that could go together with anagrams, in a collapsed Statistics/Properties subsubsubsubsection.

I also think that finding anagrams can be automated (not by users!) and therefore should be automated. Agreed, it's not a trivial development, but manually adding anagrams to all words sounds like a massive waste of time to me. There are plenty of programs out there that already specialise in finding anagrams, so it is a computable function. If I really were interested in finding all anagrams of a word, why would I risk missing out some of them by relying on a manually-compiled (and therefore fallible) list?

Rhymes are similarly upsetting, but somewhat more acceptable in my view. That said, I find it incredible that some entries would have the "rhymes" but not the full IPA pronounciation - or did I miss it among all that noise? A usability study sounds good to me...

Thanks for listening. 124.147.76.165 12:33, 2 November 2011 (UTC)Reply

It's already automated: see Wiktionary:Anagrams. Lmaltier 19:01, 2 November 2011 (UTC)Reply

Thanks. Please let's switch to that talk page, where I added a comment. Also you may want to archive this page because it takes a long time to load. Thank you. 124.147.76.165 23:52, 2 November 2011 (UTC)Reply

As a former Scrabble player, I'm not a big fan of anagrams on Wiktionary. Anagrams ignore the meaning of a word, and don't provide useful information in the same way pronunciation, rhymes and homophones do (which also ignore the meaning of a word). Mglovesfun (talk) 07:42, 9 November 2011 (UTC)Reply

For the lack of a "Wiktionary:Requests for deletion review" page

Latest comment: 13 years ago13 comments7 people in discussion

See also Wiktionary:Requests for deletion/Others#Template:py-to-ipa, Template talk:str index.

Please note Template talk:str index#Deletion, Wiktionary:Requests for deletion/Others#Template:py-to-ipa and User talk:Ruakh#Template:str index1, where a review of the deletion of Template:str index and Template:str index/logic has been requested. Hbrug 02:51, 2 November 2011 (UTC)Reply

Bring template undeletion requests to [[WT:RFDO]] if anywhere.—msh210℠ (talk) 05:57, 2 November 2011 (UTC)Reply

Let's be honest, nobody cares about RFDO. So many pages are sitting there just waiting to be reviewed, but there is nobody who reads the requests! -- Liliana • 12:44, 2 November 2011 (UTC)Reply

Maybe that would change if we agreed that silence is confirmation. :) —CodeCa t 15:10, 2 November 2011 (UTC)Reply

Actually, I take it back. The issue here is expensive templates generally, not any single one. So this really is the right place.—msh210℠ (talk) 15:49, 2 November 2011 (UTC)Reply

What we're doing could be compared to a country shutting down all coal and oil power plants because they consume resources like coal and oil. Sure, you do save up on them, but that isn't gonna help when 1. all the other countries continue wasting them, and more importantly 2. all the inhabitants of the country now suffer from a lack of power thanks to your politics. Automatic transliteration of Brahmic scripts (Brahmic specifically because the Transliterator extension cannot do these) is an excellent reason to restore these, because it adds a useful functionality. -- Liliana • 15:58, 2 November 2011 (UTC)Reply

I agree, until a developer comes along and says "you are overulitizing resources" I think we should just use the tools we have in any manner we imagine. [The]DaveRoss 16:26, 2 November 2011 (UTC)Reply

Agreed. Hbrug 10:41, 5 November 2011 (UTC)Reply

Incidentally, I think we should also get rid of the xs= parameter in {{t}}, which saves the language name for every page. This is a huge problem when a language name is changed, as has happened recently with {{nds}} (from Low Saxon to Low German), and isn't really worth the (usually marginal) performance increase. -- Liliana • 16:47, 2 November 2011 (UTC)Reply

How is it that Brahmic scripts can be transliterated by a pile of templates but not by the transliterator extension? --Yair rand 18:52, 2 November 2011 (UTC)Reply

The transliterator is a simple table, while templates can make use of conditionals, like "if consonant is not followed by vowel or halant add inherent vowel else suppress it". -- Liliana • 19:22, 2 November 2011 (UTC)Reply

AFAICT, the only reason these templates were deleted is that they are "expensive". However, other Wiktionaries and Wikipedias use them (with no apparent adverse effect to how those sites load), and AFAIK no developer has asked us not to use them. Therefore, like Liliana and TDR, I think we should restore the templates — those which do things nothing else (such as the transliterator extension) can. I'm not opposed to making the templates work only when subst:ed whenever that's possible without loss of functionality. - -sche (discuss) 18:52, 5 November 2011 (UTC)Reply

I have restored Template:str index per this discussion, but only on the premise that it be used for things which cannot be handled by other, more efficient mechanisms. -- Liliana • 22:21, 6 November 2011 (UTC)Reply

SoPs in Webster 1913

Latest comment: 13 years ago3 comments3 people in discussion

Apropos of nothing in particular, I've spotted a few howling sums of parts in this Webster 1913. My favourites are globe-shaped, "shaped like a globe", and beech tree, "the beech". Equinox ◑ 18:16, 2 November 2011 (UTC)Reply

globe-shaped would pass per WT:COALMINE. But yeah, quite many of Webster's entries would be classified as SoP by us. -- Liliana • 18:19, 2 November 2011 (UTC)Reply

Also "beech tree" is likely to pass per WT:COALMINE, although the Google books search seems dominated by the proper name "Beechtree". --Dan Polansky 18:27, 2 November 2011 (UTC)Reply

Translingual translations

Latest comment: 13 years ago2 comments2 people in discussion

Just noticed that queen features "translations" into Translingual. Where was this practice approved? Translingual isn't a language like the others and isn't really intended to be listed as a "translation". -- Liliana • 20:16, 2 November 2011 (UTC)Reply

Also, this doesn't satisfy the purposes the translation table does, which is to know how to render stuff in another language. Anyway, remove. Put the chess-piece symbol's link under "See also". (True, "See also" is meant only for same-language links, but IMO mul links should be fine in a "See also" section for any language.)—msh210℠ (talk) 20:23, 2 November 2011 (UTC)Reply

`{{pedia}}` under sense lines.

Latest comment: 13 years ago8 comments4 people in discussion

I think it should be possible to do something like this (at [[cat]]):

# Any similar animal of the family ''[[Felidae]]'', which includes [[lion]]s, [[tiger]]s, etc.
#* {{pedia|Felidae}}

which would ordinarily produce this:

Any similar animal of the family Felidae, which includes lions, tigers, etc.
- Felidae on Wikipedia.Wikipedia

but currently produces this:

Any similar animal of the family Felidae, which includes lions, tigers, etc. [quotations ▼]

I could list a bunch of reasons, but . . . I think the reasons "for" and "against" are all pretty obvious. Does anyone have any strong feelings about it, one way or the other?
—Ruakh_TALK 21:49, 2 November 2011 (UTC)Reply

Good idea. I'm in favor if the subsequent quotations under that sense would still be hidden in the quotations down-arrow-thingy.—msh210℠ (talk) 20:00, 3 November 2011 (UTC)Reply

I would rather not do this. There are already two kinds of information under the senses, and it is kind of messy. The proposal places an icon and boldfaced text next to a definition, making the definition stand out even less than before. I prefer that all links to Wikipedia are placed under one L3-section "External links" at the end of the entry. --Dan Polansky 08:07, 4 November 2011 (UTC)Reply

I also dislike the idea of peppering external links into the senses, especially with a nasty icon and template. The example given makes no sense to me, as it links the WP article on Felidae not from our Felidae entry but from cat. We should be providing enough information at Felidae (via a simple text link in the definition), and the WP link should be in the target article at Felidae. I see no reason to circumvent our own definitions with additional visual and structural clutter. --EncycloPetey 14:53, 8 November 2011 (UTC)Reply

Re: "The example given makes no sense to me, as it links the WP article on Felidae not from our Felidae entry but from cat": Sorry, but that makes no sense. The link is to the Wikipedia article on cats. How is it relevant that Wikipedia calls that article "Felidae"? —Ruakh_TALK 15:46, 8 November 2011 (UTC)Reply

Does anyone else want to weigh in? Right now it's two in favor, one opposed: not exactly a shining example of consensus. (I could start a vote, if people think it's something that should be voted on before being done?) —Ruakh_TALK 14:00, 8 November 2011 (UTC)Reply

Against. Let's avoid graphical icons, more boldfaced words, and external links all over entries. (And the example is not very good: why link w:Felidae instead of w:Cat in the entry cat?)

I see one rationale in preferring links to WP articles under senses: articles about things are more likely to correspond to senses rather than to our entries about words. But the correspondence will never be a good match, because the encyclopedia and dictionary is each a catalogues of a fundamentally different kind of thing. I'd rather group all of the external links at the bottom. Also better to have one link to a disambiguation page, which will list uses of a word whether they belong in the dictionary or not (e.g., persons, corporations, creative works, etc), than to try keeping up a set of matching equivalents in these two references. —Michael Z. 2011-11-08 19:44 z

Re: "And the example is not very good: why link w:Felidae instead of w:Cat in the entry cat?": "Instead of"? Your second paragraph makes clear that you already know why your question makes no sense . . . also, re "graphical icons" and "boldfaced words", obviously those aren't inherent in the proposal. (You, EP, and Dan Polansky all have other reasons as well, but you also all mention the icon, so I feel the need to make that explicit!) —Ruakh_TALK 20:11, 8 November 2011 (UTC)Reply

Good points. I glanced at the example a little hastily. Perhaps it isn't a terrible idea to sort Wikipedia links by sense.

Still, if the definition links to the entry Felidae, then the link to the Wikipedia article is already in a prominent box on that page. No need for more page clutter. —Michael Z. 2011-11-09 02:30 z

In my example, the definition links to Felidae, but in many cases it doesn't. For example, a definition for a non-English word will almost never link to a same-language synonym, because the definition is supposed to be in English. —Ruakh_TALK 18:09, 9 November 2011 (UTC)Reply

Short quotes from copyrighted material. Is it allowed?

Latest comment: 13 years ago16 comments9 people in discussion

What's the deal with quoting short sentences from copyrighted material? In particular, I'm using "Harry Potter and the Philosopher's Stone" translated into Mandarin and Japanese for learning these languages (哈利·波特与魔法石; ハリー・ポッターと賢者の石). I have planned to have a go at these translations for a long time and now I'm doing it and progressing well.(Note: I'm not a great fan of Harry Potter but I have audio (Mandarin and Japanese), books and the English translation, which makes it a great learning material. Bilingual readers are very useful!).

So, if I add an entry in Chinese or Japanese, can I add a short sample sentence? Like here: 阴沉 (yīnchén) or 瘦削 (shòuxiāo)? (I had to make different sample sentences, not from the book) I know that some language forums allow about four sentences without fearing of some copyright violations when discussing some language points. If there are multiple questions, they are obviously not consecutive - a sentence here, a sentence there. --Anatoli 06:00, 3 November 2011 (UTC)Reply

Single sentences are barely copyrightable, if at all, and fair use would tend to weigh heavily towards us. As long as they were scattered sentences, I think we could reproduce a lot of the work before they had a copyright claim.--Prosfilaes 07:54, 3 November 2011 (UTC)Reply

(ec)I'd say quotes from copyrighted material are okay if the words are rare and it's not easy to find quotes from public-domain material, but public-domain material should be preferred wherever possible. For example, I added two Stephen King quotes at Citations:megrim because (deprecated template usage) megrim is a very rare word that it isn't easy to find quotes for. But I would never use those same quotes for (deprecated template usage) imagination, (deprecated template usage) fly (strip of material hiding the zipper at the front of a pair of trousers), or (deprecated template usage) vulture. —An gr 08:01, 3 November 2011 (UTC)Reply

If that's the case, it's very good. Making or searching for good sample sentences in a foreign language, especially hard ones like Chinese or Japanese can be difficult or time-consuming. I hope people won't hate me for quoting Harry Potter. :) --Anatoli 09:02, 3 November 2011 (UTC)Reply

I add sentences from copyrighted material all the time. Concerning Harry Potter, Chinese and Japanese quotations would be more natural if they were not translations, but then creating a good English translation on your own is difficult. I have a good collection of Finnish quotations about given names - I've collected them all my life - but I don't want to put them here since I'm no good at translating poetry or high quality prose. And the Finnish wiktionary is so uninspiring.--Makaokalani 17:17, 3 November 2011 (UTC)Reply

I think it's a very bad thing to prefer public domain material. I would almost go so far as to say that every word that's been used in the 21st century should have a 21st century quote. Given the number of citation-less words, it's pretty far down my list, but we should be showing words in modern use as being in modern use.--Prosfilaes 07:27, 4 November 2011 (UTC)Reply

I, like Makaokalani, add sentences from copyrighted material all the time. I asked BD2412 (talk • contribs) about it once — actually, I was asking about an extreme case, where a significant proportion of a news article would end up in various quotations here — and he wrote that "With respect to news articles, if they are divided into individual relatively short sentences and scattered about the dictionary with no quick and easy way to connect them, this would be a very clear fair use. After all, disjointed sentences reproduced individually can not harm the value of the work or substitute for it, and of course they use is for a non-commercial and educational purpose" (link). —Ruakh_TALK 18:41, 3 November 2011 (UTC)Reply

Thanks, all. Here's one of the quote: 低垂 (dīchuí), which I had to retranslate (perhaps a bit awkwardly) into English. I didn't shorten the sentence, otherwise I would have to rewrite it. The people familiar with the book may still guess where it's from. I didn't provide the source as it is a usage demo, not a proof of existence of the word. --Anatoli 19:53, 3 November 2011 (UTC)Reply

Re: "I didn't provide the source as it is a usage demo, not a proof of existence of the word": Sorry, but I think that's really wrong: not only a bad idea, but also immoral. —Ruakh_TALK 20:38, 3 November 2011 (UTC)Reply

Fixed immorality. --Anatoli 21:30, 3 November 2011 (UTC)Reply

I agree. A quick Google Search reveals that the example sentence has been taken from another source without attribution. It should be sourced. ---> Tooironic 21:11, 3 November 2011 (UTC)Reply

Like Makaokalani and Ruakh, I use copyrighted sources as citations without qualms. I don't know about the specific laws of various countries, but in general I doubt that short extracts for purposes of illustrating the use of a word are going to get up anyone's nose. I do try to keep them short. Equinox ◑ 21:13, 3 November 2011 (UTC)Reply

I was going to comment here, but only to say what Ruakh has quoted me as saying. Our use of a handful of individual sentences from copyrighted material scattered across our entries is de minimis, and would not even rise to the level of copyrightability for a fair use analysis to be made. If one were made, the actual use that we make of such sentences makes about the strongest case for fair use that I can think of. On the other hand, if we were to take the entire Japanese translation of Harry Potter (for which both the underlying work and the translation would be individually copyrighted) and use that for all possible examples, and point out that we'd done so, that might be bad and we shouldn't do it just in case. bd2412 T 21:39, 3 November 2011 (UTC)Reply

I don't have the energy or time to provide sample sentences for all words I learn or known words I see missing in Wiktionary - they'll go sampleless. I won't be adding sample sentences to existing Mandarin or Japanese entries - we already have quite a lot - basic vocabulary is already covered. In 瘦削 (shòuxiāo, shòuxuē) I used an example sentence from a chat. I know it's hard to say what would mean too many citations from one book but if I say 500 sentences (some short, some a bit long, like in 低垂 (dīchuí), inconsecutive, no repetitions (that sample sentence generated two more entries, however 丝毫 (sīháo) doesn't repeat the same citation and no quote in 迹象 (jìxiàng)). Would 500 citations from one source sound like too many? --Anatoli 22:36, 3 November 2011 (UTC)Reply

I just want to add that short quotations may be prohibited in some cases, such as definitions from copyrighted dictionaries (each definition may be considered as a copyrighted work). And that (when the quotes are allowed, which is the normal case) authors or publishers are likely to be happy, because these quotes are much more likely to improve sales than to reduce them. Lmaltier 21:18, 4 November 2011 (UTC)Reply

Well, fiction books don't provide definitions just reading. Agreed with the second part. That was my idea. Having some vocabulary and examples could attract language learners. --Anatoli 21:40, 4 November 2011 (UTC)Reply

The Maori flag

Latest comment: 13 years ago8 comments5 people in discussion

I think this flag would be better to use for Maori entries instead of the New Zealand flag. —CodeCa t 12:59, 4 November 2011 (UTC)Reply

I agree. Ƿidsiþ 14:23, 6 November 2011 (UTC)Reply
I disagree, it's not an official flag (from what I read in WP; if I'm wrong then I agree). Ungoliant MMDCCLXIV 14:41, 6 November 2011 (UTC)Reply

Flags? What flags? I've never seen a flag in an entry. --EncycloPetey 14:51, 6 November 2011 (UTC)Reply

My preferences -> misc. Ungoliant MMDCCLXIV 15:24, 6 November 2011 (UTC)Reply

Huh? The only two options I see are

( ) Do not show page content below diffs

( ) Omit diff after performing a rollback

Neither of these appears to pertain to flags. --EncycloPetey 15:37, 6 November 2011 (UTC)Reply

My preferences -> Gadgets. --Yair rand 15:39, 6 November 2011 (UTC)Reply

Oh yeah, my bad. Ungoliant MMDCCLXIV 15:50, 6 November 2011 (UTC)Reply

counting down

Latest comment: 13 years ago1 comment1 person in discussion

To let you know that there's only a few days left before the end of the Wiktionary:Halloween Competition 2011. Get your entries in before it's too late. --Rockpilot 17:42, 4 November 2011 (UTC)Reply

ACDCrocks

Latest comment: 13 years ago1 comment1 person in discussion

Hi, it is I, ACDCrocks. I have a problem, Dick Laurent keeps bothering me and I believe he should be blocked for "Causing our editors distress by directly insulting them or by being continually impolite towards them." he is also abusing his powers blocking me for a reason not at all stated in block which is "stirring up trouble", only minutes after I was unblocked for another silly blocking of me, I got blocked for wishing 7 users Happy Halloween, I was unblocked for that but then Dick just blocked me again, It would be nice if Block could be updated if labeling my own talk page is no longer allowed or that Dick be blocked for constant harassment, because he thinks I am just a dumb cunt Also I believe his ability to block me should be taken away, and if he feels I should be blocked, he should have to contact another admin and ask them to do it or be blocked himself.71.142.73.25 19:54, 4 November 2011 (UTC)Reply

Descendants format

Latest comment: 13 years ago10 comments4 people in discussion

Should Descendants sections of regular entries use nesting when one descendant is descended from another (example)? This seems to be the normal format for proto-language entries, but I don't think I've seen it in regular entries. (We don't seem to have a very clear policy for descendants.) --Yair rand 20:28, 7 November 2011 (UTC)Reply

I treat proto-entries and regular entries the same when it comes to layout, so I would use the same nesting for regular entries. —CodeCa t 21:41, 7 November 2011 (UTC)Reply

I format them the same way I would a Translations section, minus the collapsing template. That is, languages are alphabetical and bulleted as they would be for Translations. The proto-language entries do it differently because they are concerned principally with descent across many languages and many levels. Regular entry descendants sections are less sweeping in their scope. --EncycloPetey 04:27, 8 November 2011 (UTC)Reply

I think this is mostly because translations rarely list languages that are the ancestors of other languages. Old French, Old Norse and so on are not often listed in translation sections. fritho has an example of descendants. —CodeCa t 10:43, 8 November 2011 (UTC)Reply

I dislike that approach because it gets messy very quickly. See (deprecated template usage) oculus and (deprecated template usage) unus for Latin examples. Languages like Latin will have many descendant words, and grouping these by linguistic relationship would be a mess. Note that the Old French placement on unus is a compromise by WordDeed to keep Mglovesfun from deleting it out of the entry. Mg believes that if the modern French word is there, then the Old French shouldn't be listed at all. --EncycloPetey 14:47, 8 November 2011 (UTC)Reply

If the list becomes too long, couldn't it be made to collapse? —CodeCa t 14:51, 8 November 2011 (UTC)Reply

We could collapse the list if there were a template specifically made for Descendants. However, its formatting would be determined by the structure of the Descendants section, and we have no clear guiding policy on that structure. Frankly, I would want the section expanded always (like the side-bar "Show Translations" toggle), so I can see the Descendants myself, but I can understand that other people might not be interested in seeing them. Enclosing and collapsing an indented hierarchical list would be messier, which is another reason I prefer modelling after the Translations format that is far more linear. --EncycloPetey 14:58, 8 November 2011 (UTC)Reply

{{des-top}}. --Yair rand 15:00, 8 November 2011 (UTC)Reply

More of a 'neatness' issue than believing the entry should be there at all. I can see three main approaches: 1) alphabetical 2) as in translation sections 3) as in proto-entries. The difference between 2) and 3) is in translations Middle French and Old French are bulleted under French, in proto entries it's the reverse. Frankly I don't mind as long as we're consistent; Help:Descendants or WT:Descendants seems like a good idea to me. Mglovesfun (talk) 15:04, 8 November 2011 (UTC)Reply

I should have time in a week or so, if WT:Descendants seems like a good idea (and with a second person suggesting it, it could be). I would start by pullling the model I prepared for WT:ALA#Descendants, and generalize it to a broader set of languages. However, we would need to choose a sequencing format. As I said, Mg's (2) option is my preferred sequence.

Note also that (3) would place Afrikaans under Dutch, whereas neither (1) nor (2) would group those languages. --EncycloPetey 18:41, 13 November 2011 (UTC)Reply

Jocular or humorous

Latest comment: 13 years ago7 comments4 people in discussion

The usage label {{jocular}} redirects to {{humorous}}, which puts a word into [[Category:English jocular terms]]. We should use one term to represent the one concept in both labels and categories: jocular or humorous.

Either word is acceptable. Although humorous (“funny”) is a more familiar term, jocular (“joking, intended to evoke humour”) seems more precise to me. —Michael Z. 2011-11-08 17:21 z

I think jocular is the correct term for words intended to evoke humor. Calling a word humorous could be taken to imply it's an inherently funny word. Bridezilla is jocular, but not funny; kumquat is funny, but not jocular. Calling either one humorous would be ambiguous. —An gr 14:52, 9 November 2011 (UTC)Reply

I agree w/Angr. By the way — does anyone have any thoughts on how we can confirm that a given term or sense merits this label? I worry that many entries in that category might not really belong there. —Ruakh_TALK 19:20, 9 November 2011 (UTC)Reply

I don't really know, but we do need to exclude as valid attestation instances where the jocularity is sarcasm or irony or is marked by putting the term in quotes. I think almost any term could, in some context, be rendered "jocular".

I think almost any term that is jocular also has a non-jocular sense and usage. I would guess that almost all jocularity depends on some kind of double meaning.

Why is it that this register merits a category whereas most of the vocabulary that has some other kind of context restriction is buried in topical categories that include terms that have no context restriction? DCDuring TALK 22:56, 9 November 2011 (UTC)Reply

This does seem to be a troublesome area. We are likely to continue to get numerous adolescent efforts to add senses that purport to be "jocular". One approach would be to require a vote on each membership that was not supported by a reference to a dictionary that claimed the term had a sense that was jocular. It would be possible to monitor new additions to the category. DCDuring TALK 23:04, 9 November 2011 (UTC)Reply

Not only would be: it is possible.—msh210℠ (talk) 23:12, 9 November 2011 (UTC)Reply

Me, too. (Ruakh's first sentence.) Or, more precisely, not that I agree with Angr (I have no knowledge of the issue, and agree sounds to me like I do), but I think that what he said makes sense and support the consequence.—msh210℠ (talk) 23:12, 9 November 2011 (UTC)Reply

Er, yes, that's actually what I meant, too . . . and now that it occurs to me to check, the OED Online has 125 entries using the label "jocular", and 940 using the label "humorous". I don't know what distinction it's drawing, however, and whatever it is, it doesn't seem to be a very firm one: there's a "humorous" category, containing 1,952 senses (in 1,655 distinct entries), but no "jocular" category, and many of the senses labeled "jocular" are in the "humorous" category. (As are some senses labeled "playfully", "facetious", etc.) Even so, I'm on board with Angr. —Ruakh_TALK 00:34, 10 November 2011 (UTC)Reply

Process Action nouns

Latest comment: 13 years ago4 comments2 people in discussion

Hello all, I started working with the word civilization, and found it to be an "action noun" like "transition." Surprisingly the only material I could find was here -> Princeton::Wordnet::Lexicographer Files. I put some writing on Civilization::Talk saying that my interest is in the parallel evolutions of thought and lexical meaning, and this approach seems especially helpful for giving etymological meaning.--John Bessa 19:32, 8 November 2011 (UTC)Reply

Action nouns often come to refer to the results of the action as well. For example, opening refers to the process of being opened, but an opening is something that has been opened. —CodeCa t 19:39, 8 November 2011 (UTC)Reply

Like an artist's wine and cheese party. Let me dig a little deeper into where I am going with this (stuff). I am finishing a masters in psych, and I hope to move into the field of "community counselling" (if it exists). Key to current personality understanding are the development models such as the five factor model (FFM) that are built directly from the dictionary using factor analysis; in other words, the outward "face" of the mind is best described lexically. (The problem with psych is that it describes both mental function and mental illness often at the same time; a professor of mine actually successfully correlated lexical personality models with personality disorders (which tend to be serious), when, in fact, they have nothing to do with each other (which CG Jung stressed with the first personality model that is now Myers-Briggs).

I have been attempting to add the evolutionary model to this. The time scales for brain evolution and language development should differ by factors, but the development of thought itself (which includes the "symbolization" of emotion such as in poetry and "modern" analytic disciples) parallels language development, and hence psychology (from the FFM). So I actually wonder if the mind, specifically the brain, has been evolving significantly over the past 100K to 10K years. To add to this, I think a collaborative wiki is the only way to pull these ideas together because even the kindest of university minds seem to be set in concrete (which, in of itself, is a personality disorder). This is why the above Princeton lexical link seems useful to me--to actually pin down the initial meanings of words in the context of their original use to link them to related psych development. Going a little further with the FFM, the development, or evolution, should be universal across language groups, and, hence, peoples. This is important to Science in that it is not biased and correctly describes phenomena with a single answer.

To reduce this to the topic of civilization, it seems to boil down to the difference between civic and social, where social describes artistic achievements, and I see civic nearly purely in terms of law and rules (and perhaps paranoia). I observe these differences often, especially in conflict. So civilization and society are parallel, though civilization appears bigger, but may not be--it may only be a virtual structure that we buy into, or are forced to at risk of marginalization--or worse. The proof is provided by it being an action noun. Hope this helps (HTH). --John Bessa 13:57, 9 November 2011 (UTC)Reply

I stand corrected: this is a (related) process noun according to the Wordnet categorization, which seems like a useful idea. I suspect that this Wordnet system might be the input to the personality NEO-IPIP which is a new FFM. (Both programs are written in Perl.) From doing "evoutional" analysis (Carl Rogers way of saying evolutionary), the best and widest root meaning seemed to be city, as in civic center, which is invariably a sports arena. Cities are invariably financial capitals, which brings in the idea of civilization as the process of developing capital, which means centralizing remote resources where the first step is driving off the natives. The "antecedent" (as it were) is barbarianism, where the process is civilizing presumably to make things better. This brings to mind my reading about Jung, who was liberal yet highly racist: he referred to aboriginal bushmen as having "monkey love" for their children which seems to me to parallel the processing of humanity to fit some prescribed social structure. Obviously this attitude of Jung's would be unacceptable today, and he would probably be a different, more enlightened person today, which tells me that the entire lexicon needs some processing especially if the material is going to be plugged into "whole systems models" such as the NEO-IPIP--I will have to ask NEO-IPIP's Dr. Johnson.--John Bessa 13:45, 10 November 2011 (UTC)Reply

Wikimedia Foundation "Answers"

Latest comment: 13 years ago1 comment1 person in discussion

Hi. :) Please forgive me if this is not the best place. I just wanted to let you all know that the Wikimedia Foundation is testing a potential new communication system intended to provide a central address to which community members who need assistance from the Wikimedia Foundation or who have questions about the Foundation or its activities can reach out and find answers. This system is being unrolled on a trial basis to test its efficiency and usefulness to communities.

What happens to your question will depend on what type of question it is. Many questions are general interest, and answers to these are being posted to wmf:Answers. Generally, at least to begin with, I will be writing these answers myself, although staff members have assisted with some questions already and I don't doubt will assist with more. Some issues will not be general interest, but may require attention from specific staff members or contractors. These will be forwarded to the appropriate parties. Questions that should be answered by community may be forwarded to the volunteer response team, unless we can point you to a more appropriate point of contact.

I imagine most of you are familiar with how the Wikimedia Foundation works, but it's probably a good idea for me to note for those who are not familiar that the Wikimedia Foundation does not control content on any of its projects. They can't help with content disputes or unblock requests, and they are not the place to report general bugs or to request features (that would be Wikimedia's Bugzilla). The letters I've answered already have included primarily questions about finances and the Foundation's work. I've been asked to get feedback from staff on diverse subjects ranging from the amount of latitude permitted to a project in drafting their "Exemption Doctrine Policy" to whether or not groups seeking grants need tax exempt status first.

If you have questions for or about the Wikimedia Foundation, you can address them to answerswikimedia.org. Please review wmf:Answers/Process for specific terms and more information. --Mdennis (WMF) 19:32, 9 November 2011 (UTC)Reply

Geonotice?

Latest comment: 13 years ago4 comments2 people in discussion

If you edit the English Wikipedia and live in an area with an active Wikimedia community, you've probably seen the notices at the top of your watchlist advertising for local meetups and other events. This is one of the best tools we have for getting word out about events to editors who don't watch all the project pages, but we should be reaching all editors, not just those on Wikipedia. At one point, I created MediaWiki:Geonotice.js and ran a message here for a meetup, but the page was never maintained. However, I noticed recently that Wikisource is importing geonotices directly from the Wikipedia geonotice (see s:MediaWiki:Common.js/watchlist.js), and this seems to be relatively successful since that's where events are all advertised. Can we do this here? Dominic·t 01:18, 10 November 2011 (UTC)Reply

If we do so, most (it seems) of the notices that appear will be WP-specific (viz, for WP meetups). Perhaps we can convince WP to adjust its JS so WP-specific notices get a certain variable set that other sites can use to decide whether to display the notice?—msh210℠ (talk) 03:00, 10 November 2011 (UTC)Reply

It's not mostly Wikipedia stuff, as meetups are for the whole community. I don't think we should suggest others would be unwelcome; these are mostly social gatherings . In fact, meetups, and most Wikipedians, would benefit from the perspective of more Wiktionarians. Last week's meetup here in DC attracted a Wikisourceror for her first meetup due to the geonotice, and that's a Good Thing. Dominic·t 04:19, 10 November 2011 (UTC)Reply

Perdon me. I saw the title "Wikipedia meetup" (on most of them: some are "Wikimedia"), and assumed they were, well, Wikipedia meetups. (Why are they called that? And why is their info on enWP, not meta? Anyway...) If you're correct (as I trust you are) that the seemingly WP-specific announcements aren't in fact, then IMO, yes, import the notices.—msh210℠ (talk) 04:26, 10 November 2011 (UTC)Reply

diminutive, hypocorism and nicks or shortened forms

Latest comment: 13 years ago1 comment1 person in discussion

Using either diminutive or hypocorism, at least in Greek language, in place of a caressive or everyday form is not correct.

Δημητράκης is a hypocorism, which means "a small Δημήτριος" and as such can also be used as a caressive, evdeavoring name but you cannot use the opposite. You can't use Μήτσος (other familiar or everyday form of "Δημήτριος") to denote "a small Δημήτριος".

I think this is the same in many languages that have, at least, -k- (phoneme: κορίτσι-κοριτσάκι, dziewczyna-dziewczynka...) as a way to distinguish diminutive forms (Turkish and all Slavic languages have also this).

So, having i.e.: "Category:Greek noun diminutive forms" and including in that category alternative, or everyday forms misleads the reader. In my opinion creation of a template (and derivative categories) is indispensable.--Xoristzatziki 08:19, 10 November 2011 (UTC)Reply

Desysopping Dick Laurent aka Opiaterein

Latest comment: 13 years ago33 comments17 people in discussion

If you think Dick Laurent (talk • contribs) should be desysopped, as I do, please say so. His long-term verbal behavior on-wiki is unworthy an admin, IMHO. Furthermore, he repeatedly blocked people in a bullyish manner, for offenses unworthy of blocking. If more people want to see him desysopped, I would collect a list of his actions unworthy of adminhood. Most recently, Dick Laurent blocked Pilcrow for addressing him with "[...] it would be a good idea to not write as a knuckle‐dragging, dysfunctional child like you always write", which, while not the most polite way of talking to people, cannot be blocked by a person who constantly swears and insults other editors. --Dan Polansky 09:26, 11 November 2011 (UTC)Reply

He’s a bit of a handful and a little eccentric, and he doesn’t always play well with others, but he is an invaluable editor. We all know his talents here and I’m pretty certain that he has broad support, so desysopping is not likely to succeed, but will only cause bad feelings. It would be nice if he would learn to tell those he doesn’t like to just buggah off rather than blocking them, and I think an occasional reminder from the rest of us not to use his blocking power as an arguing tool, that would be a much better idea than getting up a petition to desysop.

It is probably a good idea at this time for someone to write a policy page that deals with blocking, and that should take care of it. Blocking ought to be reserved for vandalism or aggravated harassment...other blocking actions should be brought before the community for a vote. —Stephen ^(Talk) 09:47, 11 November 2011 (UTC)Reply

Thank you for your input. We already have a blocking policy, one that has been voted on: WT:BLOCK. It is very short and says this:

"The block tool should only be used to prevent edits that will, directly or indirectly, hinder or harm the progress of the English Wiktionary.
"It should not be used unless less drastic means of stopping these edits are, by the assessment of the blocking administrator, highly unlikely to succeed."

By my estimate, Dick Laurent "does not give a fuck about this policy, lol", or the like.

I do acknowledge that Dick Laurent is a prolific and valuable contributor. I just doubt that he is a valuable admin. --Dan Polansky 10:10, 11 November 2011 (UTC)Reply

I do tend to agree a little. He has been using his administrative powers for personal disputes on occasion, and his language especially on his talk page is less than... civil. —CodeCa t 11:12, 11 November 2011 (UTC)Reply

I think a good place to start is with a solid definition of what we think a sysop should be. Historically this community has defined a sysop as "any good editor who has been around for a while", and Opi/Dick certainly meets those criteria. There is a prevailing opinion held by most Wikimedians who are not Wiktionarians that this project is very hostile to newcomers and non-regulars, and I think that may begin with our staff of sysops. We have lost a lot of our polite admins and now have almost exclusively admins who are great at getting the work done but less tactful when dealing with new users, problem users and outsiders. Maybe there should be a new role, or maybe we should leverage the 'crat role to be more fitting of its title, I don't know, but as it is people look at the sysops as the face of this project and the face is not a particularly gracious one. - [The]DaveRoss 11:19, 11 November 2011 (UTC)Reply

I'd prefer not to lose the good work he does reverting/deleting vandalism, but I'd definitely like him to be a bit more civil... perhaps 'professional' is the right word. --Mglovesfun (talk) 11:21, 11 November 2011 (UTC)Reply

'Zis really amusing, given the person suggesting this. Sometimes, you really aren't much better. -- Liliana • 12:31, 11 November 2011 (UTC)Reply

I find it quite offending, actually, to liken my behavior to that of Dick Laurent. I have no blocking tools, and I do not use profanity. I would like to see which edits of mine "aren't much better". And, of course, I am not an admin, so even if my behavior were as poor as that of Dick Laurent, it would be perfectly consistent for me to propose his desysopping. --Dan Polansky 12:43, 11 November 2011 (UTC)Reply

Now I remember. Weren't you the guy who proposed me for desysopping before? That shows a lot about you, I think. -- Liliana • 16:45, 11 November 2011 (UTC)Reply

Liliana, evidence please, in the form of a diff or a link of a wiki page: I don't recall having proposed your desysopping. Furthermore, I do not recall any behavior of yours that would call for desysopping. So again, please refer me to evidence, to help my memory, or yours. --Dan Polansky 19:08, 11 November 2011 (UTC)Reply

Ah, it was Ivan. Sorry for the confusion. -- Liliana • 19:23, 11 November 2011 (UTC)Reply

So again, which diffs show that I act no better than Dick Laurent? (Again, I have no block tools.) Or are you confusing me with other editors, overall? --Dan Polansky 19:36, 11 November 2011 (UTC)Reply

So, this exchange doesn't seem productive in the least, as the subject of this discussion is neither Dan nor Liliana's behavior. - [The]DaveRoss 20:31, 11 November 2011 (UTC)Reply

No, I don't think he should be desysopped. But he should be advised against blocking those with whom he has an argument, as all admins should. Even if the block is legit, if it's personal, ask another admin to make the call objectively. DAVilla 04:12, 13 November 2011 (UTC)Reply

The block on Pilcrow was really bad, because that argument was purely personal. A bit more "professionalism" (I know, I know, visions of boring suited executives) from Dick wouldn't hurt. On balance I don't think deopping him is a very good idea. Equinox ◑ 12:51, 11 November 2011 (UTC)Reply

Can we remonstrate with him? (And by "we" I don't mean "me": I don't think he cares two figs what I think about anything. But surely there must be someone whose opinion he respects?) If not, then — yes, I think we should seriously consider replacing his "administrator" flag with "autopatroller", "patroller", and "rollbacker", just so he can't block people. The biggest downside I see is that people might be overly tempted to block him; we'll have to restrain ourselves. (And by "we" I mean "me".) —Ruakh_TALK 14:25, 11 November 2011 (UTC)Reply

I, of all people, might actually be the best user to step up and remonstrate with him. We get on very well and as a former administrator myself I can see things from a different side (being an admin defintely makes a user less productive than being a non-admin, for a start). Anyway, I think everyone here is in agreement, including himself of course, that he can "be a total dick". We'll see what he thinks, at least. And if he is desysopped, at least let him delete the main page first. --Rockpilot 17:12, 11 November 2011 (UTC)Reply

You can't desysop Opio, he's the only gay admin and is therefore protected by affirmative action. Likewise, you can't desysop me for making racist jokes: I'm the only admin from this region (Caucasus, former USSR, Middle East). That's right, bitches, I just played the race card. --Vahag 20:52, 11 November 2011 (UTC)Reply

I realize you're joking, but — as it happens, he is not the only gay admin. —Ruakh_TALK 21:02, 11 November 2011 (UTC)Reply

*Cough*. Not the only gay admin. ---> Tooironic 11:00, 12 November 2011 (UTC)Reply

Yeah, it says something interesting about Vahag that he just automatically assumes that a group of dozens of people must be 100% straight unless he's been notified otherwise. ;-) @Vahag: Is there a form I need to submit somewhere, or is this comment-thread sufficient notification for you? —Ruakh_TALK 13:58, 12 November 2011 (UTC)Reply

I give people the benefit of the doubt. Unless proven otherwise I assume every Wiktionarian is a rich white straight male. But seriously, if you two are gay, we're exceeding the gay admin quota by one. I like Opio, I'm afraid of Ruakh, which means we should desysop Tooironic. But will we meet the Asian quota then? --Vahag 19:53, 12 November 2011 (UTC)Reply

Um, how is that giving the benefit of the doubt? Are you joking, or do you really believe it's better to be white? DAVilla 03:50, 13 November 2011 (UTC)Reply

I'm joking. Because I'm black, I can make racist jokes with impunity. --Vahag 12:12, 13 November 2011 (UTC)Reply

You're Caucasian, so are we. -- Liliana • 14:17, 12 November 2011 (UTC)Reply

Ooh, are we building a sexual census? I'm straight. That elevates the number of expressly known heterosexuals in Wiktionary to 1, I guess. --Daniel 14:26, 12 November 2011 (UTC)Reply

Guys, relax. We've got short-fused and long-fused types, straights and gays, whites and blacks, young and old. Dick's harmless, keep him sweet. And don't nobody be setting up no vote to get them Opi de-sysopp'd. If you do, minus 6000 Bastard Points. --Rockpilot 05:01, 13 November 2011 (UTC)Reply

Blocking Pilcrow and admitting it was hypocritical doesn't seem very harmless. I've nothing against Dick but I have in favour of Pilcrow (I like those medieval looking entries). Ungoliant MMDCCLXIV 12:49, 13 November 2011 (UTC)Reply

How about we just make Dan Polansky an admin? Then, he can just unblock himself next time and keep working. --EncycloPetey 18:35, 13 November 2011 (UTC)Reply

I don't want to see Ric (Dick Laurent) desysopped. I haven't always followed who he has blocked lately but knowing he is a great editor with good language skills makes me think I can trust his judgement. An admin is an editor with experience. If this was a personal clash, then we should invite him. If there is an issue about a particular block we can discuss here case by case but I see that not many people stood up to a few trolls he blocked. I don't support his rudeness, though. Yes, some revision of the blocking procedure can be in order. --Anatoli 21:27, 13 November 2011 (UTC)Reply

An admin is an editor with experience. That is only part of the story. We use that criteria here, and amongst the regular community we understand that everyone who has been around for a while will end up becoming an admin. Outside of en.wikt that is not always, or even often true. Outside folks expect the admins to be the go-to people for help, they expect them to be the ambassadors for the project, they expect them to be somewhat professional. This is where our "anyone who sticks around for a bit" criteria falls down, while it is nice to have lots of people with the ability to block and delete, it also hurts us when someone who seems to be acknowledged by the community as a representative goes about harassing people and generally behaving in a manner unbecoming to someone in an ambassador role. - [The]DaveRoss 21:47, 13 November 2011 (UTC)Reply

Noone's perfect. I'm against harassing people. The actual blocks may be justified. Let's see what Ric has to say. I have invited him to join the discussion. --Anatoli 21:53, 13 November 2011 (UTC)Reply

This user constantly harasses me although he has tone it down from calling me a "dumb cunt" and "wild bitch" directly, to calling me a "sloppy ho" in edit summaries. I have repeatedly told him I don't appreciate his aggressive series of insults, but he usually just blocks me. Wiktionary:BLOCK clearly states "Causing our editors distress by directly insulting them or by being continually impolite towards them." and based on that and arbitrary and inconsistent use of his admin powers he should simply be a regular editor, and should be blocked as well, because he regularly insults everyone around him it seems, sometimes in jest with friends, but very often at people in a willfully offensive manor, he may be a good editor when you look only at his work, but that is worthless if he is a rude hate monger here and a new and better editor will eventually come along.Lucifer 22:45, 13 November 2011 (UTC)Reply

Sandhi phenomena

Latest comment: 13 years ago4 comments4 people in discussion

I'm wondering how we can cover sandhi on Wiktionary. I've been working with Gothic and I realised that there are quite a few cases where word-final -h assimilates to the first letter of the next word. This means that a word such as (deprecated template usage) jah theoretically has many different forms depending on the word following it. In Gothic, word-final -a is also sometimes dropped if the following word begins with a vowel. Some particles can also appear between a prefix and the main stem, like in ga-u-laibeis (from the verb (deprecated template usage) galaubjan and the particle (deprecated template usage) -u). In modern publications of Gothic, these are indicated by separating the words with a hyphen, to indicate that they are to be read as several morphemes but they are pronounced as a single unit. But in the original manuscripts, word boundaries were not marked as far as I know, so it's not easy to decide whether a form like jaþ-þan is actually one word or two, and whether ga-u-laubeis is one word or two, or even three. Should jaþ have its own entry, and all other forms such as jan, jas and so on as well? Or should there be an entry for jaþ-þan instead (and then why not jaþþan too)? And to make it all even more difficult, Gothic has a suffix (deprecated template usage) -uh which functions similar to Latin (deprecated template usage) -que and can be attached to any word at all. And of course, this suffix can assimilate as well, creating forms such as munaidedunuþ-þan (from the verb form munaidedun + -uh + þan). I'm not sure if this assimilation occurs with every combination of words, but there is certainly a potential for many combinations. What would be the best way to represent this on Wiktionary? —CodeCa t 22:05, 12 November 2011 (UTC)Reply

Good question. In many languages, sandhi phenomena appear only in spoken language rather than in orthography, but when they do appear in written language and are a systematic process (English (deprecated template usage) a/(deprecated template usage) an variation is also a sandhi phenomenon but that's not much of a problem for us), that's obviously a problem for the "all words in all languages" account of Wiktionary (as are polysynthetic languages where one single word can represent a whole sentence). Longtrend 11:48, 13 November 2011 (UTC)Reply

For the sake of cross-language consistency, just ignore sandhi variants and assume that the readers are already familiar with such elementary phonetic transformations and corresponding orthographic conventions. OTOH Gothic has a limited attested corpus and listing all of the variants shouldn't entail too much effort. This is primarily a dictionary, not a lemmatization/sandhi removal engine. OTOH we already have millions of contentless soft-redirects for inflected forms, bad spellings and other stuff that should be resolved at the level of database search and not static pages, so adding sandhi variants would be a mere drop in the ocean of our current "technological limitations mitigation" practices. OTOH Gothic is sufficiently obscure language that its two and a half Wiktionary users shouldn't have much problems utilizing whatever approach. --Ivan Štambuk 13:30, 13 November 2011 (UTC)Reply

I agree, surprisingly, with all of Ivan's hands. (I wonder if Vahag's affirmative action policies have a quota for four-handed folk?) —Ruakh_TALK 16:14, 13 November 2011 (UTC)Reply

fitneß, kindneß, etc.

Latest comment: 13 years ago13 comments7 people in discussion

Should we in fact have these? We agreed not to include the "long-s" (ſ) forms of words, because they are purely typographical variants and not distinct words, right? Equinox ◑ 17:52, 13 November 2011 (UTC)Reply

it doesn't even look like an ß to me, more like a ſs. -- Liliana • 17:57, 13 November 2011 (UTC)Reply

It's an artifact of the italics. --EncycloPetey 19:20, 13 November 2011 (UTC)Reply

We have entries for German forms with ß and ss (Swiss orthography). Why not also for English? I think it does look like a ß. Ungoliant MMDCCLXIV 18:53, 13 November 2011 (UTC)Reply

It may look like <ß>, but it definitely is <ſs>. The answer to "why not also for English?" is simply that forms with <ß> do not seem to occur in English. —Ruakh_TALK 19:19, 13 November 2011 (UTC)Reply

Unicode mentions that ß is "in origin a ligature of 017F ſ and 0073 s" (http://www.unicode.org/charts/PDF/U0080.pdf) Ungoliant MMDCCLXIV 19:41, 13 November 2011 (UTC)Reply

I'm well aware of that — but <j> is in origin a variant of <i>, but that doesn't mean the spellings <fjtness> and <kjndness> occur in English. A ligature of <ſs> became a single character in German, but it failed to do so in English. —Ruakh_TALK 20:08, 13 November 2011 (UTC)Reply

But if <fjtness> was attested, wouldn't it be accepted? And just because it stopped being used it doesn't necessarily mean it wasn't ever a letter. It also stopped being used in Swiss German, but it's found in older Swiss texts. Ungoliant MMDCCLXIV 20:58, 13 November 2011 (UTC)Reply

I'm sorry, I feel like we're talking in circles. I stated that <ß> had not been used in English, and you replied that <ß> originated as <ſs>. I took that to mean that you felt that forms with <ß> should be included even if the only attested forms had <ſs>, so I pointed out that the same logic would mean including forms with <j> even when the only attested forms have <i>. You then replied that we would include forms with <j> if they were attested . . . which brings us back where we started: these forms are not attested. I assume that I must have misunderstood one or more of your comments? —Ruakh_TALK 22:06, 13 November 2011 (UTC)Reply

German-speakers consider ß to be a separate letter (originating as a ligature of tz), and it has a name and conventions of use in that language. This has never been the case in English. --EncycloPetey 19:20, 13 November 2011 (UTC)Reply

What about the capital form? Ungoliant MMDCCLXIV 20:58, 13 November 2011 (UTC)Reply

This is definitely just a ligature in italics because you can see "credibleneſs" and "kindneß" both in here. Therefore, it should be treated the same way that we treat other ligatures, which is to hard-redirect them or else not include them, IIRC. —Internoob 01:16, 14 November 2011 (UTC)Reply

Or, no, what am I saying, it's not even a ligature, it just looks like one. 24.207.85.160 01:18, 14 November 2011 (UTC)Reply

water

Latest comment: 13 years ago10 comments7 people in discussion

Hereby I announce that the page water is now over 100,000 bytes, or more than 100 KB large, containing over 1,500 translations. (Of course, that means we aren't at 100 KiB yet, but we'll get there eventually.) That makes it by far the largest page we have.

Of course, with that comes a great deal of problems, as most of us have probably noticed the insane slowness of the page, showing that the system we use isn't quite fit for larger pages (and we already use hacks for this particular page, namely {{t-simple}}). Therefore, I guess it serves as a perfect testcase for any eventual optimizations. -- Liliana • 19:54, 13 November 2011 (UTC)Reply

How much of that page is taken up by the English section? —CodeCa t 23:29, 13 November 2011 (UTC)Reply

93%. --Yair rand 23:33, 13 November 2011 (UTC)Reply

How about we just move the translations to an appendix? ---> Tooironic 23:38, 13 November 2011 (UTC)Reply

We have so many translations into languages whose names are red linked. It is such a common word, perhaps we could leave about 100 languages and move the rest into an appendix? --Anatoli 23:52, 13 November 2011 (UTC)Reply

Maybe it would be better to re-think the whole approach to translations. Wiktionary is in a sense three dictionaries in one. It is an English dictionary, an English-to-anything dictionary, and an anything-to-English dictionary. The English sections fulfill the first purpose, the translation sections the second, and the non-English sections the third. Consider the possible usage cases of someone who wants to look up the foreign translation of an English word. Would they really be interested in all the other languages, or even the English definitions? Most of the time, if I am looking for a word in a foreign language, I'd like to see a simple list of words and their translations, without too much information (which makes it harder to find and slows the page down). Today I came across a English-to-Swedish index. This seems to fulfill the purpose of translation perfectly, far more than the translation sections in entries do. Would it be a good idea to treat translations in this way instead? —CodeCa t 00:01, 14 November 2011 (UTC)Reply

That index is quite nice. Personally I hate the amount of redundancy in all these various wikis doing various languages, with no synchronisation. It will be wonderful if/when we can get a standard representation for translations of senses. Equinox ◑ 00:03, 14 November 2011 (UTC)Reply

Perhaps

{{trans-top|clear liquid H₂O}}[[Appendix:Overflow/water/translations/clear liquid H₂O|Due to the large number of translations of this sense, they have been moved to a separate page. Please see there.]]

{{trans-bottom}}.—msh210℠ (talk) 00:34, 14 November 2011 (UTC)Reply

I don't think we're really at the point where it's so large we need to split content off to a separate page (plus that would break targeted translations). The page is what, 50% larger than the average ENWP FA? --Yair rand 00:50, 14 November 2011 (UTC)Reply

Yes, but Wikipedia articles are mostly plain text, whereas this one... well. 1,500 language templates, plus a bunch of etymology and formatting templates, and... you can guess the rest for yourself. -- Liliana • 01:01, 14 November 2011 (UTC)Reply

Wiktionary:Beer parlour

July 2011

How to treat participles on Wiktionary

How to treat participles on Wiktionary — AEL

How to treat participles on Wiktionary — AEL 2

How to treat participles on Wiktionary — AEL 3

August 2011

earliest-attestation categories

Special:NewMessages

Updating anagram format

Romanizations of languages in ancient scripts

AWB

Glosses in old languages

Regional distribution of colloquial terms

Compound tenses in conjugation templates

d, di, de

Vote: Attestation of extinct languages 2

The problems of Mandarin entries

hypocorism vs diminutive

Common nouns and proper nouns

Nei Mongol - Why is it locked?

Why is it locked?

Library of Congress vocabularies

Admin-only definition editing options trial

What counts as a "derived term"?

unified Serbo-Croatian... by bot

Current votes

native-languages.org

including context tags in inflected forms (of sh entries)

Pinball category?

{{suffix|verb|t}}

hand and 手

Preferred forms for Japanese lemmata

"Category:en:Planets" with proper nouns only, etc.

Fancy button in rhyme pages

Removing words from Wiktionary:Wanted entries

WOTD

Languages written in more than one script, attestation

Template:ante and Template:post

Position of Template:was wotd

Klategory?

September 2011

Idiomatic translations

Question

community's opinion on bot format

I've created a list of the 1000 most common species epithets

Lemma entries for Japanese na type adjectives (形容動詞)

Classical/Literary Chinese entries

adding-translation script

Correlative conjunctions

Making an arse of it ... ?

highly visible templates that may need protection

Template:given name - sorting missing

Question about cats

Pokemon get their own ja-noun template?

Adjectives in the translation sections of nouns

Japanese POS templates and how entries are indexed

do not and does not

A new list of Latin Epithets (same suffixes together)

American vs European music terms

Japanese kanji entries and classical vs. modern readings

Hindi and Urdu vs Hindi-Urdu or Hindustani

Filipino and Tagalog

Deprecating zh, zh-cn and zh-tw in category names

Language merges

Non-idiomatic translations

Serbo/Croatian

re-e... or ree...

Target audience

Completing the projects of User:Robert Ullmann

Adjective+noun entries.

A small idea for formatting discussions

JA translations suddenly all borked

{nonstandard, rare} form of

Categories and single entries with multiple indices

Translation FROM non-English language

TheDaveBot wants to tidy up a bit.

Template:etyl and Template:proto

October 2011

Romanizations of words in languages including Gothic

AEL

AEL 2

AEL 3