Wiktionary:Beer parlour/2008/September
This is an archive page that has been kept for historical purposes. The conversations on this page are no longer live. |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
September 2008
Pronunciation series - Appendix
For language learners, comparing pronunciation of similar words is always important. For Hungarian, I started Appendix:Hungarian pronunciation pairs. There is a somewhat different draft attempt for English on User:Panda10/English pronunciation series. Would it deserve its own Appendix page? How about the format? I am still searching for a better audio icon that will not overpower the text. --Panda10 17:26, 1 September 2008 (UTC)
- I noticed these and think they're a great idea. --EncycloPetey 02:05, 4 September 2008 (UTC)
Hackers needed for Wiktionary application
I've begun work on a Wiktionary application and I'm at the point where I'd like to ask some other Wiktionary hackers to help.
It's not ready for a beta release yet but anybody who has hacked Wiktionary JavaScript or tried to parse Wiktionary articles might be interested and able to participate.
It's a Firefox extension which means it's programmed in JavaScript.
It has offline support, being able to read local dump files.
It has its own wikt: protocol which provides wikitext and emulates some features of api.php so it is easy to get it working with existing code. Anything that can do an AJAX request can query wikt:
I've also made an interface for one proprietary CD dictionary as a proof of concept that the same interfaces can be made to work with other dictionary datasources. This means when you come across unusual words in your reading you can query Wiktionary and other dictionaries to see if they cover the word from one central place.
I don't have much web page design experience and need some help developing a good interface.
Some of the current code is in proof of concept form in perl as external tools. These need to be converted to JavaScript and added to the Firefox extension.
It will be able to discover, download, and index new dump files for your language Wiktionaries.
It will be able to manage word lists in various languages, add article requests, etc.
Anyway it's early days but leave me a message if you'd like a copy to play with.
Andrew Dunbar (hippietrail)
(This message has also been sent to mailto:wiktionary-l@lists.wikimedia.org)
- Hello, Andrew. As a user, I am very interested in such FireFox extension. I think that WikiLook FF extension (parsing of English, French and Russian Wiktionaries) developed by w:User:TestPilot will be interesting for you.
- I am also working with parsing, but in Java. The result will be a public available database of parsed Wiktionary. In first order I will parse Russian Wiktionary, then English Wiktionary.
- My previous result is a parsing of Wikipedia (Russian, and Simple English), also in Java programming language. You can read about it in the paper Index wiki database: design and experiments. I am eager to see results of your work in future. Good luck! -- AKA MBG 10:33, 3 September 2008 (UTC)
- Hi AKA MBG. Yes I am familiar with WikiLook and TestPilot though I have not chatted to him for a while. WikiLook currently depends on online access to Wiktionary rendered data. One of my goals is a standard interface which can parse either online or offline raw wikitext data into a standard JSON format that applications such as WikiLook can then consume and display in various appropriate formats to end users.
- I'll have a look at your work now thanks. — hippietrail 16:06, 3 September 2008 (UTC)
Using other dictionaries as sources?
Background: I was interested in the etymology of a word, and discovered that neither Wikipedia nor Wiktionary had anything on it. I then referred to the OED and Merriam-Webster, and found the answer I was looking for. I wanted to contribute my new-found knowledge back to the wiki-community, but I couldn't find anything on this website about the best practices on how to use/cite other dictionaries. Can I contribute etymologies derived from other dictionaries to Wiktionary? It's not really something I can derive independently. If I can use them, how do I cite my sources? (or does Wiktionary not do that?) I found the "copyright" pages, but that's doesn't clearly and unambiguously answer the question of how much I can use other dictionaries in formulating (an improvement to) a Wiktionary entry. So to avoid stepping on anyone's legal/moral toes, I left the Wiktionary entry as-is, instead of improving it. - I'll be honest and say that I'm not going to check back on this page for answers. If you want to answer the questions I raise (for others - at this point I'm mildly disinterested about the whole thing), I suggest putting them in some sort of official centralized location, easily accessible from the main page. -- 21:54, 2 September 2008 (UTC)
- Essentially, if you find information in other dictionaries (when I expand words with WOTD in mind, I always do a quick sanity check in OED online myself), you should try not to copy-paste. Complete reformulation with new examples are ideal. Alternatively, collating information from several dictionaries (general and usage ones, such as MWDEU) to judge the "currentness" or "perceived standardness" of a word or expression is perfectly legitimate. Circeus 20:07, 3 September 2008 (UTC)
- But the question still remains, since we're not *supposed* to be having any un-cited original research here, how should a user cite a source on the etymology of the word for instance. Definitions/Usage Notes are crafted from quotations which should be cited but for some of the more historical stuff we often go to secondary texts. I don't know of any guidelines off the top of my head or for that matter even many entries that use cites in their etymologies. --Bequw → ¢ • τ 04:52, 6 September 2008 (UTC)
Other verbs with this conjugation
Would anyone object if I added links like the one I added to Template:es-conj-ar (e-ie) to show other Spanish verbs with the same conjugations? (they would go on the templates here). Nadando 00:03, 4 September 2008 (UTC)
- Looks like a good idea to me. -Atelaes λάλει ἐμοί 00:12, 4 September 2008 (UTC)
- I think that's a great idea, but WhatLinksHere is not really a reader-friendly page, since it sorts entries randomly (technically not random — I think it's by order of page creation — but might as well be), doesn't allow for any sort of editable text content, and contains various links and notes that are meaningful to us (because we understand the page's true purpose) but probably not to a reader who clicks on "Other verbs with this conjugation". If you have the time and are willing to create category pages for these conjugations, and edit the templates to add entries to these categories, I think that would be much better. (Maybe we can have a Category:Spanish verbs by conjugation to hold them all?) —RuakhTALK 02:01, 4 September 2008 (UTC)
- (yes, whatlinkshere is ordered by DB ID, order of page creation) Creating categories would seem to me to be just about the same amount of work? Is just adding the cat right there in the template. And much better. "Spanish verbs with xxx conjugation" or such, and just put that cat in Category:Spanish verbs? Robert Ullmann 13:21, 4 September 2008 (UTC)
- Ok, I was just trying to avoid having too many categories, but I can see that they would be better than what links here pages. Thanks. Nadando 20:34, 4 September 2008 (UTC)
Pronunciation questions
I've been wondering about a few things relating to pronunciation:
- How do we know the pronunciation of words that don't seem to be in current use?
- How many challenges to our pronunciations are registered from ordinary users (non-admins)? (I don't know of any, but I hadn't been paying much attention.)
- What is the appropriate means for challenging pronunciations? (I have tried sticking in rfps with comments. Would rfc be better?)
- Given the apparent relative insulation from the wiki process of pronunciation (compared to definitions, existence of inflected forms, and translations), is there some way to facilitate feedback from the broad population of users?
- Is there any software (open-source) that would take IPA (or any other phonetic alphabet) spelling and synthesis an audio pronunciation? (This would both give us more audio and facilitate verification of the phonetic spellings be a wider population, partially addressing the concern about feedback from users.) DCDuring TALK 02:33, 4 September 2008 (UTC)
- The answer varies. Sometimes the pronunciation can be guessed from the components and usual rules of English pronunciation. Sometimes we find it in poetry, which can tell us the stress and number of syllables, and something of the pronunciation from the ryhme scheme. Some language enthusiasts of the past discussed the pronunciation of words. For example, Erasmus wrote a lengthy discussion on the proper pronunciation of Latin, and compared the state of Latin education in England and on the continent. So, we know quite a bit about Medieval Latin pronunciation. Several Classical authors also wrote about Latin pronunciation, so we know something of Classical Latin pronunciation as well.
- I've seen only one or two every few months.
- RfC will probably get a faster response, and make the problem clearer. RfP might be ignored or removed without being fixed if someone sees that a tagged page already has IPA.
- No ideas on this one.
- Not that I know of, at least not widely commercially available. Part of the problem is that there are fussy little optional marks in addition to the major ones. So, the software needs to know what to read and what to ignore. However, Macs used to come with a software package that could read aloud English written text, and the step to simple IPA from that is a small one. I would recommend against machine-generated audio. After all, what's the difference between Canadian machine-audio and UK machine-audio? It's sad that we don't have more audio contributors, but resorting to machine-generation seems a bad idea for this one. Using it to check IPA seems reasonable, but we'd have to find such software first. --EncycloPetey 02:50, 4 September 2008 (UTC)
Red links by language
Would it be possible to generate a list of all red links (including the ones in trans tables) by language? The FL editor would have an easier access to them. Thanks. --Panda10 13:03, 4 September 2008 (UTC)
- Sure, see Wiktionary:Project - Spanish/missing for an example. But it requires reasonably current XML dumps, and they are now 3 months old. And no sign of anything getting fixed. With the dumps, it is a matter of adding a little information for the language (what its own language heading looks like in its own wikt ;-) and then running a bit of s/w on both the en.wikt dump and the FL.wikt dump; it lists all the words that are redlinked here (including those that exist, but don't have the language section), and words in the FL wikt we don't have. Robert Ullmann 13:15, 4 September 2008 (UTC)
- Would it be too time-consuming to create a list for Hungarian even if the XML dump is not the latest? E.g. Appendix:Hungarian red links to make it available to the Hungarian editors. I am not familiar with the background works of Wiktionary, but there is a comment on User:Robert Ullmann/Mismatched wikisyntax: "from XML dump as of 13 June 2008, checked against live wiki 2 September 2008" - does this mean you can query the current databases? And it is just more complicated than working from an XML dump? --Panda10 16:53, 4 September 2008 (UTC)
- I can easily recheck the live wikt for a reasonable number of entries. The math works like this: between 2 and 5 minutes to read and do non-trivial things with the XML dump(s), plus 2 seconds for each live reference. So mismatch takes about 15 minutes total, rechecking 300 entries. Trying to read all the data for the missing entries analysis would take many days. (think about 20K entries/day in practice) Robert Ullmann 17:23, 4 September 2008 (UTC)
- Thanks for the information. --Panda10 19:02, 4 September 2008 (UTC)
Traditional shop names
We have previously discussed the includability of possessives, and they got the thumbs down. However, traditionally in the UK (and probably in the Commonwealth) many types of shop are known by such possessives e.g. baker's, butcher's, fishmonger's and so on. I would like to include these, for no other reason that to provide meaningful targets for foreign translations e.g. macelleria => butcher's. They would be restricted to shops of the traditional, old-fashioned trades. Any objections? SemperBlotto 16:33, 4 September 2008 (UTC)
- Please clarify the difference, in the UK, between "I'm running down to the butcher's for some chops" and "I'm running down to my mother's for some chops" or "I'm running down to Bob's for some chops" or "I'm running down to the deacon's for some chops". Is is that the former is thought of by its speakers as less of a possessive (and more of a plain old noun) than the latter two are? Or merely that the former provides a good translation of various foreign words? Or something else? If the former, then I, for one (and FWIW), would have no objection.—msh210℠ 18:08, 4 September 2008 (UTC)
- We certainly use the words as nouns (invariant) (we can put "a" or "the" etc in front of them) - "There are now no tradtional butcher's in the High Street." "I had to queue for ten minutes at the fishmonger's." SemperBlotto 19:05, 4 September 2008 (UTC)
D(h)ivehi
Hi,
until now, there was no unified spelling for the language on Wiktionary, and the two spellings (with and without h) were co-existing. Today, User:EncycloPetey changed all occurences of "Dhivehi" to the spelling with no h, deprecated the existing category system at Category:Dhivehi language and created a new category Category:Divehi language. I think this is something that needs discussion, as it is a major step and decides the future spelling for all D(h)ivehi words which might be added to Wiktionary in the future. -- Prince Kassad 21:17, 6 September 2008 (UTC)
- We had only one such word. The existing request categories all used the D- spelling, while the language category used the DH- spelling. Although both spelllings are valid in English, both appear on Wikipedia, and both appear in Ethnologue, we need consistency for the sake of our bots. The D- spelling was selected because (1)
{{dv}}
uses D-, not DH- (and the 3-letter code is div), (2) The previous DH- language category was created by me, so I felt justified in deleting it in favor of the other spelling. Again, this affected only one entry on the entire project. --EncycloPetey 21:30, 6 September 2008 (UTC)
- On my computer, I can see only funny little boxes that all look the same. So, whatever you were saying in the second sentence is lost on me. --EncycloPetey 22:50, 6 September 2008 (UTC)
- He's saying that the (native?) name of the language begins with an angled-rho-looking letter that's usually transcribed as "dh", rather than an epsilon-like letter that's usually transcribed as "d". But, I don't know anything about the language, so can't say how relevant that is. With most languages we use the established English name — French, Spanish, Hebrew, etc. — rather than a transcription of the native name, but if there's basically a tie between multiple English names, it might make sense to give preference to the more native-like one. —RuakhTALK 23:57, 6 September 2008 (UTC)
- The "English" name is "Maldivian" ... Robert Ullmann 13:03, 9 September 2008 (UTC)
- Another thing is that for the past 30 years, there has been a strong tradition of writing the language in Roman letters, originally for the purpose of telex communication among the country’s 1200 islands, so that the Maldives have a semi-official Roman transcription system, and in that system, the language is spelt Dhivehi. The spelling without the h is an older spelling developed by a German linguist. —Stephen 00:29, 7 September 2008 (UTC)
- Meanwhile, the
{{div}}
template says "Dhivehi". There are (13 June) 17 translations to "Dhivehi" and 6 to "Divehi". Seems to me that if we are to use one or the other it should be Dhivehi as "Divehi" is not the correct initial consonant. (What, you don't have a Thaana font? ;-) Robert Ullmann 13:22, 9 September 2008 (UTC)
- Meanwhile, the
Proposing new category
Here's my idea: [[Category:English isograms]]. An isogram is a word with no letters repeated. It'd be interesting to find out a few. As a method of regulation (to prevent the category from overflowing), words with less than seven letters would be excluded. Teh Rote 00:26, 7 September 2008 (UTC)
- This might be better tested as an Appendix, rather than tagging the articles, in case the idea isn't supported. --EncycloPetey 00:32, 7 September 2008 (UTC)
- I'll get started on that, then. Teh Rote 00:33, 7 September 2008 (UTC)
- I think you might be underestimating how many words that is. Of the 226,246 words in SOWPODS that are at least seven letters long, there are 26,424 with no repeated letters. Even among its 76,920 words of at least eleven letters, there are 815 with no repeated letters. —RuakhTALK 18:46, 7 September 2008 (UTC)
- GAH! Are you serious? Uh-oh. I suppose the appendix should be cut down to 11+ letters then (that seems feasable). I mainly just wanted to include 7-letter words because isogram itself is one of them :D. I'll shorten the appendix, then. Teh Rote 20:49, 7 September 2008 (UTC)
- And just so everyone has the link, it's at Appendix:Isograms. Feel from to add isograms from other languages (of eleven letters or more) or some more from English, I already imported all the ones from w:Isogram. Teh Rote 14:24, 9 September 2008 (UTC)
It seems from what I am told, a while back someone deleted the Category:Video games and moved everything to Category:Electronic games. So I want to move it back to Category:Video games. It's no real work for me to do it, but I was wondering if there is anything I need to know/do so I don't step on any toes? CyberSkull 09:33, 7 September 2008 (UTC)
- I forgot, part of the reason I bring this up is for the category to match what was decided on Wikipedia. CyberSkull 09:37, 7 September 2008 (UTC)
If no one objects, I will proceed with the changeover later today. CyberSkull 13:50, 8 September 2008 (UTC)
new meanings and translations
Why, when someone adds a new meaning, moves all the translations to the "to be checked" place? If a translations fits with three definitions and you add a fourth one, perhaps it doesn't correspond to that new one but please, put it on the three previous ones! I'm fed up to check translations, come back and see someone has just moved it down again just because he or she added a new sense! Let's be practical --217.125.233.255 18:51, 7 September 2008 (UTC)
- If I'm understanding, it sounds like you might not have been doing it right. We have separate translations tables for each sense; so when you check a translation, you need to put it in the table for each sense that it applies to. If those tables don't exist, you need to create them. I thought that
{{checktrans}}
made that pretty clear. :-/ —RuakhTALK 21:14, 7 September 2008 (UTC)
- He/she is objecting to dumping stuff (back) into checktrans when a new sense is added. The objection is valid, when a new sense is added, a new, empty, translation table should be added for the sense. The existing translations do not need to be checked again, they are already in glossed tables for specific senses. Robert Ullmann 17:11, 8 September 2008 (UTC)
- I see, the translation was "checked" for all three senses, and put in one un-glossed table? No, doesn't work; whether another sense is added or not, it has to be dumped back into checktrans when the proper tables are created. Robert Ullmann 17:28, 8 September 2008 (UTC)
Shortcut for Wiktionary
Has there been any effort to get for Wiktionary a shorter shortcut than the current "Wikt:"? Other Wikimedia projects have one-letter shortcuts. Wiktionary could have "T:" for "WikTionary" or "D:" for the hidden "dictionary" or for the implied "define". One letter would be convenient, as I could type "D:word" in a Wikipedia searbox in Firefox and get to Wiktionary; referring to Wiktionary in other projects using Wiki markup would be simplified too. --Dan Polansky 19:21, 7 September 2008 (UTC)
- Agreed, that would be quite useful Scott Ritchie 20:36, 10 September 2008 (UTC)
- Agreed, I've brought this up on meta where such a global change needs to be discussed - please feel free to join in the conversation at meta:Metapub#The_d:_prefix_should_go_to_our_dictionary. Conrad.Irwin 17:53, 11 September 2008 (UTC)
- They've moved the conversation to the more relevant meta:Talk:Interwiki_map#Wiktionary. Conrad.Irwin 15:28, 13 September 2008 (UTC)
replacing "old" ety templates with Template:etyl
(hey, I can edit this page! WT:GP is not fully displayable for me, and not editable ...)
Atelaes is very eager to replace the "old" ety templates, like {{F.}}
and {{AGr.}}
with {{etyl}}
for languages with ISO codes that we use as L2 headers.
See User talk:AutoFormat#Etymon templates, User:Robert Ullmann/t18, User:AutoFormat/Ety temps.
An example of what it would do is this edit, (done manually). Do people think this is a good idea? Is it wanted? (To me, it makes the wikitext that much less readable, but that is me.) The older templates that correspond to ISO codes for languages we use at L2 would be deprecated and eventually deleted. Robert Ullmann 17:22, 8 September 2008 (UTC)
- I don't care whether we centralize the templates into one big
{{etyl}}
or not, but the existing names are crazy. "Starts with a capital, ends with a dot" doesn't suggest "is an etymology template" to me, and with the exception of{{Brythonic.}}
,{{Mayan.}}
,{{Malay.}}
, etc., the abbreviations aren't very good mnemonics. (Granted, the regular language codes aren't very good mnemonics, either, but people are used to them, and they're easy to look up.) —RuakhTALK 17:54, 8 September 2008 (UTC)
- The existing templates are easy to look up, too: we have a list. Do we even have a list of all ISO codes that we use? (Bear in mind that some languages have 2- and also 3-letter ISO codes, and we only use one or the other; same for old and new ISO codes.) And it doesn't much matter if you're used to the ISO code for a certain language (say, because you add translations to it, ot entries in it), as you won't then typically be adding etymologies from it. (For example, I add Hebrew words, but not Hebrew etymologies of English words; likewise, I add English words, but not English etymologies of Italian words.)—msh210℠ 22:33, 9 September 2008 (UTC)
- O.K., but at google:"language codes", nine of the first ten hits are for the language codes we use, one is for the three-letter language codes, and none are for our etymology templates. Similarly, French and w:French language both give the two- and three-letter language codes, while neither gives the etymology template. So, for "easy to look up", read "easier to look up than anything we could make up". :-) —RuakhTALK 23:10, 9 September 2008 (UTC)
- They're not language codes, they are abbreviated names of languages. And they are found in language texts, so they don't show up near the top in Google, although a quick search finds one.[1]
- However, I am happy using the ISO codes. They have the advantage of being universal throughout all of the multilingual Wiktionaries.
Appendix and "only in" for misconstructions etc.?
sparrow-grass is a false or folk etymology for asparagus. It is often mentioned and appears in w:False etymology with a link to wiktionary's non-existent entry. Should we have Appendices for each of the various classes of such "words" and use Conrad's "only-in" template to provide a link and an appropriately harmless and slot-filling home for these? DCDuring TALK 05:16, 10 September 2008 (UTC)
Would be nice to have a more specific "random word"
It'd be nice if I could search for a random entry with some sort of filter - a random English word, for instance, or a random English Adjective. And not just for when I'm playing mad libs. Technical means aside, how would the UI for a sort of thing be implemented? Would we need a new link in the navigation bar? Scott Ritchie 20:51, 10 September 2008 (UTC)
- Try WT:RND, we should really integerate that properly though. Conrad.Irwin 17:06, 11 September 2008 (UTC)
Unaddressed questions in talk pages
All the talk of whitelisted contributors has gotten me thinking of a more general problem, which whitelisting contributors would further exagerate. Right now, if someone poses a question at a talk page, the patrol is the only way anyone's ever gonna see the question, unless someone active has the word on watch.
I've talked briefly about this at Grease Pit before. There needs to be some sort of way to see a list of all un-answered comments in talk pages for words. A human could check whether it's a question. If it's a question, address it if possible, or mark it as needing answer from someone more experienced (maybe bump it to Tea Room). If it's not a question and doesn't warrant any reply, then remove it from the list. Language Lover 03:01, 11 September 2008 (UTC)
- If we have, as previously suggested, a bot add sections to and remove them from the Tea Room, including from talk pages, this will be taken care of automatically.—msh210℠ 16:56, 11 September 2008 (UTC)
FitBot
I've set up and tested FitBot to conjugate verbs and to inflect Latin adjectives and participles. The code is essentially the same as that used by SemperBlottoBot in that the page content is created in advance as a text file of pages/sections to be added.
I am now requesting bot status for the account. --EncycloPetey 23:44, 11 September 2008 (UTC)
Bots
Hello. I would like to request a bot permission for dealing with this page, using some translations from this page. Also, I got new messages while editing this, after I got it, the banner was still there.--Chris Wattson 09:18, 13 September 2008 (UTC)
- When you say "dealing with" what do you mean? You should never use automated translations on Wiktionary unless you do some checking that the translation is correct (User:Tbot may be able to give you some hints on that).
- I mean adding citations and adding more palindromes.--Chris Wattson 14:00, 13 September 2008 (UTC)
- It's probably easiest to construct a list somewhere else, and then put it into that page. How were you planning to find them (as per WT:BOT you'll need to publish the code if you want to run a bot), and why do you want to add Citations to the page? It seems to me that we should remove the definitions and just have a list of words. Conrad.Irwin 15:32, 13 September 2008 (UTC)
- I agree. I don't see why editing that page needs to be done by bot. Those kinds of edits can be prepared with local automation on your own computer, then the revised file uploaded manually all at once. --EncycloPetey 18:14, 13 September 2008 (UTC)
- Basically, what we are saying is - It's not going to happen. SemperBlotto 18:18, 13 September 2008 (UTC)
Comment-link in quotations
As an experiment, I added comment-link to the quotation at lak. Is it all right to use this template in quotations? --Panda10 12:36, 13 September 2008 (UTC)
- We've frowned on that in the past. The point of an example or citation is to illustrate the particular word of the entry. When additional words (or in this case all the words) are linked, it becomes visually distracting. What we prefer instead is an English translation underneath the quotation. --EncycloPetey 18:16, 13 September 2008 (UTC)
- Agreed, but to clarify, with example sentences the barring of wikilinks is actually a matter of policy (see Wiktionary:Entry layout explained#Example sentences), whereas with quotations it's strictly a matter of convention, and I for one have frequently wikilinked a word or name in a quotation when that has seemed appropriate. Following a quotation with an English translation is almost policy; it's specified at Wiktionary:Quotations, which isn't tagged as policy but IMHO basically is. —RuakhTALK 19:47, 13 September 2008 (UTC)
Images at talisman
I added iamges to talisman per a request a feedback, but I'd like a second opinion on how they lok and if they add to the entry. I copied all the image tags from WP and converted them to a gallery. RJFJR 13:36, 13 September 2008 (UTC)
- Hmm... I like images that illustrate a word, but talisman is such an abstract idea that I'm not sure the images necessarily help in this case. --EncycloPetey 18:18, 13 September 2008 (UTC)
wikisaurus creation template
We have preformat templates for nouns and verbs, etc. (Which I do not use.) Do we have one for creating a new wikisuarus entry? (If so how do I use it, if not could we?) RJFJR 14:17, 13 September 2008 (UTC)
- There is something of the sort, namely
{{ws shell}}
. --Dan Polansky 19:34, 13 September 2008 (UTC)
Is MWOnline pulling itself out of OneLook?
I wonder whether MWOnline has intentionally withdrawn the availability of its content through OneLook.com. Its links there have been dead for a couple of days. It might be a maintenance/technical issue, part of some haggling between them, or an indication of a new, cold relationship. It probably increases the relative value to OneLook of Wiktionary's content. DCDuring TALK 22:53, 13 September 2008 (UTC)
- All the more reason to make our format easy to parse! Conrad.Irwin 23:01, 13 September 2008 (UTC)
- MWOnline is back in OneLook. So it was either a maintenance/technical issue or, possibly, part of haggling. More convenient for us as contributors, but less good for Wiktionary competitively. DCDuring TALK 18:36, 15 September 2008 (UTC)
- MWOnline is only minimally back in OneLook. OneLook is unable to go directly to the individual entry in MWOnline, sending users instead to the MWOnline home page, where the user must retype the search term. This situation has maintained at since Monday, September 15, 2008. DCDuring TALK 23:10, 18 September 2008 (UTC)
{{law}}
As there is now a process in place to be able to use {{see}}
as a language code, I think the last 3-letter templates that is in the place of an existing language code is {{law}}
(law is the ISO 639-3 code for Luaje). The template is currently a redirect to the context template {{legal}}
. We can probably deprecate {{law}}
in favor of {{legal}}
, convert the existing occurrences, wait awhile, and finally make the transition. There is however, an extra wrinkle, in that {{legal}}
show's up in entries with the text (law). Should we change the label to read legal? If we leave it, editors might constantly be inserting the language code instead of the valid context tag. Or does someone have any better ideas? --Bequw → ¢ • τ 23:57, 13 September 2008 (UTC)
- If we do change the text, we have to go through every page into which it's transcluded and make sure it's not used in, e.g., {{context|when discussing the|_|law}} or {{context|law|and|government}}.—msh210℠ 16:31, 15 September 2008 (UTC)
- It should be possible for a bot to identify when it is used alone/with only context ({{law}} / {{context|law}}) and where it is used in combination with anything else. Only the latter case will need looking at by a human - although it could still be a big task! Thryduulf 20:04, 15 September 2008 (UTC)
- I just updated all [remaining] transclusions of
{{law}}
and replaced them with{{legal}}
. Not hard at all. EVula // talk // 15:38, 8 October 2008 (UTC)
- I just updated all [remaining] transclusions of
- It should be possible for a bot to identify when it is used alone/with only context ({{law}} / {{context|law}}) and where it is used in combination with anything else. Only the latter case will need looking at by a human - although it could still be a big task! Thryduulf 20:04, 15 September 2008 (UTC)
CheckUser votes.
We currently have two CheckUser votes (for Versageek and for Rodasmith) that are set to end tonight at midnight UTC (8PM EDT). Neither has had any "oppose" votes, nor even any "abstain"-s (though some editors have voted in one and not the other, which is probably an implicit abstention); however, I've heard (and someone who knows, please confirm or deny this) that WMF requires that there be 25 more "support"-s than "oppose"-s, which is not the case with the current votes.
Unless there's a WMF rule against this, I'd like to suggest that we simply leave the votes open for now, and only close a vote if either (1) it reaches 25 "support" votes or (2) it reaches 1 "oppose" vote. Is that reasonable?
—RuakhTALK 15:02, 15 September 2008 (UTC)
- What would the procedure be to more formally extend the vote, if there is one? DCDuring TALK 15:15, 15 September 2008 (UTC)
- I don't think we have one and I don't think we need one. Per BEANS, I won't speculate on what would cause us to need one. :-P —RuakhTALK 20:11, 15 September 2008 (UTC)
- Just to make it clear, one does not have to be an admin to vote, just a registered user, right? DCDuring TALK 15:15, 15 September 2008 (UTC)
- Sheesh, way to make me do the research I should have done to begin with. :-P So assuming m:CheckUser policy#Access to CheckUser is accurate, WMF doesn't require that there be 25 more "support"-s than "oppose"-s, only that the nominee have “gain[ed] consensus (at least 70%-80%) in his local community, and with at least 25-30 editors' approval”. (Not that Versageek and Rodasmith have that, either.) I'm not sure if "editor" is well-defined in this context, but I can't imagine it's a hyponym of "administrator". Taking a stab in the dark, I'd guess something like "registered account that is its owner's main account (e.g. not a bot) and that was not created for the purpose of voting", but I'm really just making that up. And if someone had been contributing anonymously, I wouldn't begrudge them the right to create an account and vote immediately, provided they demonstrated the fact. —RuakhTALK 20:11, 15 September 2008 (UTC)
- So this seems to mean that WMF CheckUser policy doesn't compel us to close the vote. What would compel us to do so? DCDuring TALK 01:01, 16 September 2008 (UTC)
- Never mind. Our own practice is to have a vote last a month. We set the time limit for this vote in accordance with that policy and probably have to live with it, thought it is not immediately clear to me what harm would come from extending the vote. DCDuring TALK 01:08, 16 September 2008 (UTC)
- CU elections are open to all users, regardless of whether they have additional editing privileges (sysop, 'crat) or not. EVula // talk // 20:36, 15 September 2008 (UTC)
- The point of asking the question (which was intended to be rhetorical) was not to get Ruakh to do the research, tho he is good at it, but to encourage those who are relatively new to participate. If you are interested enough to be reading this forum, you are eligible to vote and your input might be valuable. Even if in a specific case, you turn out to be mistaken about something, there is two-way learning that occurs. DCDuring TALK 23:14, 15 September 2008 (UTC)
- Which is why I answered it in a definitive "everyone's welcome!" manner. ;) EVula // talk // 00:16, 16 September 2008 (UTC)
- The point of asking the question (which was intended to be rhetorical) was not to get Ruakh to do the research, tho he is good at it, but to encourage those who are relatively new to participate. If you are interested enough to be reading this forum, you are eligible to vote and your input might be valuable. Even if in a specific case, you turn out to be mistaken about something, there is two-way learning that occurs. DCDuring TALK 23:14, 15 September 2008 (UTC)
CFP: eLexicography in the 21st century: New challenges, new applications
FIRST CALL FOR PAPERS
eLexicography in the 21st century: New challenges, new applications
Organized by the Centre for English Corpus Linguistics (CECL) under the aegis of the European Association for Lexicography (EURALEX).
Conference website: www.uclouvain.be/cecl-elexicography
Venue: University of Louvain, Louvain-la-Neuve, Belgium
Date: 22-24 October, 2009
Organizers: Prof. Sylviane Granger and Dr Magali Paquot
Conference theme: Innovative developments in the field of electronic lexicography
Papers on the following topics are particularly welcome:
- New technological environments (web-based dictionaries, mobile devices, etc.) - Exploitation of language resources: monolingual and multilingual corpora, learner corpora, lexical databases (e.g. WordNet). - Integration of NLP tools (grammatical annotation, speech synthesis, etc.) - Dictionary writing systems and other software available to the lexicographer - Changes to the dictionary macro- and microstructure afforded by the electronic medium (multiple access routes, efficient integration of phraseology, etc.) - Automated customisation of dictionaries in function of users’ needs (proficiency level, receptive vs. productive mode, register) - Integration of electronic dictionaries into language learning and teaching (CALL, translator training, etc.)
Keynote speakers
We are pleased to announce that the following speakers have accepted our invitation to give a keynote presentation at the conference:
Ulrich Heid (Universität Stuttgart, Germany) Marie-Claude L’Homme (Université de Montréal, Canada) Hilary Nesi (Coventry University, Great-Britain) Michael Rundell (Lexicography MasterClass Ltd, Great-Britain) Piek Vossen (Vrije Universiteit Amsterdam, The Netherlands)
Organising committee
De Cock Sylvie (Facultés Universitaires Saint-Louis & CECL, Université catholique de Louvain, Belgium) Granger Sylviane (CECL, Université catholique de Louvain, Belgium) Paquot Magali (CECL, Université catholique de Louvain, Belgium) Rayson Paul (UCREL, Lancaster University, Great-Britain) Tutin Agnes (LIDILEM, Université Stendhal-Grenoble 3, France)
Scientific committee
Bogaards Paul (Leiden University, The Netherlands) Bouillon Pierrette (ISSCO, University of Geneva, Switzerland) Campoy Cubillo, Maria Carmen (Universitat Jaume I, Spain) de Schryver Gilles-Maurice (Ghent University, Belgium) Drouin Patrick (Oservatoire de Linguistique Sens-Texte, Université de Montréal, Canada) Fairon Cédrick (CENTAL, Université catholique de Louvain, Belgium) Fellbaum Christiane (Princeton University, United States) Fontenelle Thierry (Microsoft Natural Language Group, United States) Grefenstette Gregory (EXALEAD, France) Hanks Patrick (Masaryk University, Czech Republic) Kilgarriff Adam (Lexical Computing Ltd, Great-Britain) Korhonen Anna (University of Cambridge, Great-Britain) Herbst Thomas (Universität Erlangen, Germany) Lemnitzer Lothar (Universität Tübingen, Germany) Moon Rosamund (University of Birmingham, Great-Britain) Ooi Vincent (National University of Singapore, Republic of Singapore) Pecman Mojca (Université Paris Diderot - Paris 7, France) Piao Scott (The University of Manchester, Great-Britain) Rayson Paul (UCREL, Lancaster University, Great-Britain) Ronald Jim (Hiroshima Shudo University, Japan) Sierra Martinez Gerardo (GIL, Universidad Autónoma de México, México) Smrz Pavel (Brno University of Technology, Czech Republic) Sobkowiak Wlodzimierz (Adam Mickiewicz University, Poland) Tarp Sven (Centre for Lexicography, Aarhus School of Business, Denmark) Tutin Agnes (LIDILEM, Université Stendhal-Grenoble 3, France) Verlinde Serge (Katholieke Universiteit Leuven, Belgium) Zock Michael (CNRS - Laboratoire d’Informatique Fondamentale, France)
The conference aims to be a showcase for the latest developments in the field and will feature both software demos and a book exhibition. A selection of papers will be invited for expansion into chapters for a book arising from the conference.
Language of the conference: English
Key dates
- Deadline for submission of abstracts: 15 December 2008
- Notification of acceptance / rejection: 27 February 2009
- Contact: elexicography@uclouvain.be ; for sponsoring options, please contact sylviane.granger@uclouvain.be
- Professor Sylviane Granger
- Université catholique de Louvain
- Centre for English Corpus Linguistics
- Place Blaise Pascal 1
- B-1348 Louvain-la-Neuve
- Belgium
- http://cecl.fltr.ucl.ac.be
--Brett 11:21, 16 September 2008 (UTC)
Format for heteronyms with the same etymology and part of speech?
Currently we give pronunciations for heteronyms either at the top of the page, distinguishing the different senses or different parts of speech, or within each etymology or part of speech.
However, there is a curious situation in Italian that defies this page format. The words bisbiglio and scompiglio (and possibly others) are each heteronyms with the same etymology and the same part of speech. Currently, I have listed the words as "Noun 1" and "Noun 2" and put the pronunciations at the next level down inside each of these noun headers. But it looks odd. The nearest to our usual formats would be to give the pronunciations at the top of the page inside the etymology header, where the senses would need to be distinguished by their translations. I suppose this would work. Any alternative suggestions? — Paul G 19:36, 16 September 2008 (UTC)
- OK, I've moved the pronunciations to the top of bisbiglio but left scompiglio unchanged for comparison. I think the format of "bisbiglio" looks better. Any objections to this? — Paul G 19:47, 16 September 2008 (UTC)
- Yes, "bisbiglio" is the better. But did you use "*" instead of "#" deliberately? SemperBlotto 21:42, 16 September 2008 (UTC)
- But with that format, someone will come along and merge the two Noun sections sooner or later. The Pronunciation section is still a mess for having to include such lengthy glosses to differentiate senses. --EncycloPetey 21:47, 16 September 2008 (UTC)
- Yes, "bisbiglio" is the better. But did you use "*" instead of "#" deliberately? SemperBlotto 21:42, 16 September 2008 (UTC)
This problem is not limited to Italian. It is extremely common in Latin, and Ruakh says it is common in Hebrew as well. With Latin, the additional problem is that the translations will be nearly the same as well, because the two pronunciations are different inflectional forms. This issue has been raised before, with no recolution. You can see how I've chosen to handle it at palma, a Latin lemma with two etymologies, each of which includes two different pronunciations. For a simpler case, see (deprecated template usage) mutaverimus. --EncycloPetey 19:51, 16 September 2008 (UTC)
- For a Hebrew example (with a different, uglier, but ELE-meeting, solution) see ישב.—msh210℠ 21:51, 16 September 2008 (UTC)
- Yikes! Unfortunately, that solution doesn't work when there is a single Etymology with multiple pronunciations. --EncycloPetey 21:56, 16 September 2008 (UTC)
- Well, yes, it does: Some of the listed etymologies really coincide and are artificially rewritten so as to differ; for example, the first two etymologies are "Root י-ש-ב, in paal construction, third person singular masculine past tense" and "Root י-ש-ב, in paal construction, third person singular masculine past tense, verse-final form". Pathetic, but it meets ELE. (One possible solution for Hebrew, which would not work for the other mentioned languages, would be to have vowelized PAGENAMEs. But it'd be better to find a solution that works in more generality than just for Hebrew.)—msh210℠ 22:10, 16 September 2008 (UTC)
- Yikes! Unfortunately, that solution doesn't work when there is a single Etymology with multiple pronunciations. --EncycloPetey 21:56, 16 September 2008 (UTC)
- And I'm going to throw in the recently WOTD-ed discus for another example for which I also consulted Petey. Circeus 02:22, 17 September 2008 (UTC)
- Different problem there. The pronunciation is the same throughout, so it's not a heteronym. The problem in that case was that different senses form the plural differently. --EncycloPetey 02:27, 17 September 2008 (UTC)
- True, but the issue is close, and finding a satisfactory solution for one might help with the other. Personally, I like the idea of a pronunciation split, which I think is fairly intuitive given the already allowed etymology split. It "feels" cleanest to me. on the side, nobody appears to have taken the time to deal with the issue in the most notorious English case: there are no heteronym entries for read. In other cases (e.g. rebel) purportedly different etymologies were invoked, but that still leaves invalid or record to deal with. Circeus 02:55, 17 September 2008 (UTC)
- Different problem there. The pronunciation is the same throughout, so it's not a heteronym. The problem in that case was that different senses form the plural differently. --EncycloPetey 02:27, 17 September 2008 (UTC)
There are also Ukrainian words which only differ in the place of stress, for example горілки. I formatted this one with duplicate “Noun” headers. —Michael Z. 2008-09-17 04:54 z
- I also agree that allowing multiple pronunciation sections, as we already allow multiple etymology sections is the right solution. I propose an amendment to WT:ELE which goes something along the lines of: When two words share the same spelling and the same etymology, but have different pronunciations, serial pronunciation sections are allowed. However, etymology is still the secondary division of words on Wiktionary (the first being language, of course), and when two words of the same spelling differ in pronunciation, but also differ in etymology, they should be put under numbered etymology sections, each of which carrying its own, unnumbered pronunciation section. Whadd'ya think? -Atelaes λάλει ἐμοί 17:19, 17 September 2008 (UTC)
- I generally agree with that, certainly for non-English languages, but think that we should not allow multiple pronunciation sections where pronunciation differs between regions but not within regions so we don't have one pronunciation section for the US English pronunciation and one for the UK pronunciation. I'm reluctant to allow them for English at all, given the variability of pronunciation - how do we handle a word with one etymology and multiple senses where the following is the correspondence between meaning and pronunciation?
Sense Australia Canada UK US Sense 1 Pronunciation 1 Pronunciation 2 Pronunciation 1 Pronunciation 2 Sense 2 Pronunciation 1 Pronunciation 1 Pronunciation 3 Pronunciation 1 Sense 3 Pronunciation 1 Pronunciation 1 Pronunciation 3 Pronunciation 3 Sense 4 Pronunciation 1 Pronunciation 1 Pronunciation 1 Pronunciation 1
- Then when you've figured that out, what happens when someone notes that in New Zealand senses 1-4 have pronunciations 1, 1, 2 and 4 respectively and that in Ireland they have pronunciations 5, 1, 1 and 2? Thryduulf 18:24, 17 September 2008 (UTC)
- I agree that splitting senses because they are pronounced differently in different regions is inappropriate. The assumption underlying this discussion is that we have at least two different senses which share an etymology, and which consistently differ in pronunciation in some significant way, such as the placement of stress, vowel length, etc., but not merely in articulation.
- In the hypothetical example you've given, there is not enough information for me to suggest what I would do, because there are no pronunciations given. Is the difference the result of stressing a different syllable? Is it the difference between /ɑ/ and /ɒ/? I can't address a purely abstract hypothetical situation. I can say that I wouldn't split two Pronunciation sections unless the difference in pronunciation changed the meaning of the word in conversation. In the case of (deprecated template usage) /ˈprɛdɪkeɪt/ and (deprecated template usage) /ˈprɛdɪkət/, I would split two pronunciations, because the difference in pronunciation is regular and the difference effects a difference in meaning. If you can provide a real example or two of where a situation resembles the hypothetical one above, then I could offer an opinion. --EncycloPetey 19:02, 17 September 2008 (UTC)
- The examples show that pronunciations are attributes of an inflected form. Some examples show that the map between inflected forms and pronunciations can differ by regional accent (one type of case: some regions distinguishing inflected forms by pronunciation; others not doing so). This would seem to suggest that Pronunciation is a particularly poor choice of attribute to use to structure the layout of entries. Etymology may not be perfect, but it does allow for a clustering of related meanings of different parts of speech. Pronunciation seem to cut across the more stable categories like parts of speech and etymology.
- If users were looking entries up by their pronunciation, the logic of structuring by Pronunciation would be impeccable, but we are still in a retrograde world of text with poor linkage between orthography and pronunciation, especially in English. As long as keyboards, non-phonetic alphabets, and text are the medium of Wiktionary, pronunciation (despite it importance to many of the users we most want to serve) can't be made to complicate the appearance of our entries for native speakers and those who cannot make use of our pronunciation information. If we had a dictionary that used IPA (or some more comprehensive phonetic alphabet) for its headowrds, the situation would be nearly reversed.
- In a world without IPA headwords, Atelaes' suggestion seems like the best we can do. We still need to make sure that we have considered all extant combination of known, missing, and unknown etymologies with different numbers of pronunciations falling into each and also across parts of speech. DCDuring TALK 18:53, 17 September 2008 (UTC)
- It's amazing how much of a definitive conclusion you were able to reach from a purely hypothetical situation. You've neglected to note that child developmental psychology shows that (as every parent knows), children learn their language first by hearing and speaking, then by writing. The Pronunciation section grouping gathers words that are pronounced the same, allowing users to sort only senses with a shared pronunciation. The alternative is to mix various rponunciations together, losing that valuable sorting tool. No one is suggesting that we split every possible change in pronunciation; we're investigating what kinds of differences might warrant such a split, i.e. heteronyms.
- Please note also that your statement that "pronunciations are attributes of an inflected [written] form" is backwards. Written inflection is the result of a change in pronunciation, not the other way round.
- All that side, I think we're in agreement about the best solution, even if we have arrived at that conclusion from different perspectives. --EncycloPetey 19:02, 17 September 2008 (UTC)
- EncycloPetey, I'm not convinced that you're reading DCDuring's comment correctly. In particular, your assumption that by "inflected" he means "written" strikes me as totally unfounded. Personally, I'd imagine that he meant exactly what he said: pronunciation is a property of a specific inflected form (a wordform), rather than of a lexeme as a whole. And your statement that "It's amazing how much of a definitive conclusion you were able to reach from a purely hypothetical situation" is both baseless and base: his comment contains such fragments as "some examples show that", and while it would have been cool if he'd actually listed those examples so we could follow his reasoning better, the fact that he didn't do so is no reason to assume that he's just pretending they exist. —RuakhTALK 01:37, 18 September 2008 (UTC)
- Ruakh, the difference between your interpretation and mine is that I have been foolowing DC's comments on this particular subject for many months now, and am reading the comments above in the light of previous comments. You seem to be reading the one passage in isolation. DC has consistently and regularly been advocating change to the pronunciation sections in the absence of evidence, and fully on the basis of unsupported speculation. I have never seen any evidence that DC talks in terms of lexemes as opposed to writen forms. I am not willing to grant that he has specific examples to support his position, because that has not been the case in the past. --EncycloPetey 19:10, 25 September 2008 (UTC)
- I'm still waiting for a concrete example of this pronunciation distinction changing between regions bit. Can anyone come up with a real example? If not, we should stop talking about this altogether. -Atelaes λάλει ἐμοί 19:18, 17 September 2008 (UTC)
- I don't know of any that are as variable as the hypothetical example above, but there are quite a few examples of words that are hetronyms in either UK/US speech but not in the other. There are also many examples of words that have multiple pronunciations on one side of the Atlantic but not on the other, and at least a few that have three pronunciations between the two (all of these discounting regular predictable changes such as /ɒ///ɑ invalid IPA characters (/) and /ə(ɹ)///ɚ/). There are (comparatively) extremely few pronunciations added for other regions so I don't know what they do. I don't have time to hunt the examples down at the moment, but I'll try and look them up later. Thryduulf 20:19, 17 September 2008 (UTC)
- In American English, but apparently not British English, the sports-related meanings of offense and defense are pronounced differently from their general meanings. Is that an example of what we're looking for? Angr 20:23, 17 September 2008 (UTC)
- I don't know of any that are as variable as the hypothetical example above, but there are quite a few examples of words that are hetronyms in either UK/US speech but not in the other. There are also many examples of words that have multiple pronunciations on one side of the Atlantic but not on the other, and at least a few that have three pronunciations between the two (all of these discounting regular predictable changes such as /ɒ///ɑ invalid IPA characters (/) and /ə(ɹ)///ɚ/). There are (comparatively) extremely few pronunciations added for other regions so I don't know what they do. I don't have time to hunt the examples down at the moment, but I'll try and look them up later. Thryduulf 20:19, 17 September 2008 (UTC)
- Ah, excellent example Angr. So, under the system I proposed, we would have a Pronunciation 1 section, with, perhaps əˈfɛns, and a pronunciation 2, with ˈɑfɛns (my IPA's crap, so I apologize if I've screwed this up). Now, if what Angr says is true, and the Brits don't pronounce these differently, then the UK pronunciations would be the same on both of them. I don't know about anyone else, but this seems ok to me. Perhaps we need a more complex example to really appreciate what kind of mess this could create, but I'm still standing by what I stated earlier. -Atelaes λάλει ἐμοί 21:47, 17 September 2008 (UTC)
Improving the 'cleanup' template
The 'request for cleanup' template {{rfc}}
is not very nice looking, and what is more, the picture does not suggest cleaning up at all. It would look nicer with an unbroken line and a relevant picture, such as the one at http://en.wikipedia.org/wiki/Template:Cleanup. Pistachio 11:12, 17 September 2008 (UTC)
- I must admit, a broom makes a bit more sense in that context than a painting. -Atelaes λάλει ἐμοί 17:10, 17 September 2008 (UTC)
- I have no objections if someone wants to improve the site maintenance templates; and as they are not terribly important to the dictionary I don't see a reason to deter changes to them (though if people begin to war over them that might change). The only thing that really annoys me about it at the moment is the use of the special "curly" quotes that were introduced recently; though that is just a personal pet-hate so I'll not mention it again (for about five minutes :p) - I think we can probably get away with no quote marks at all if we word it nicely. If drastic changes are to be made then
{{maintenance box}}
should be updated too, along with all the other templates that use it, so that we can maintain consistency (because no-matter how ugly one template looks, two templates that are not quite similar enough is even uglier). Conrad.Irwin 21:40, 17 September 2008 (UTC)
- I have switched the image to the broomstick one, as proposed here. If there are objections, we may revert it back. The broomstick image seems kind of obviously semantically more fitting. --Dan Polansky 09:50, 18 September 2008 (UTC)
- Thanks, no-one seems to mind and it does seem to fit in with the other templates. Pistachio 11:09, 22 September 2008 (UTC)
Wikisaurus alteration
In the spirit of cooperation and hoping to reach a concensus, what if we used the wikisaurus logo instead of the bullet at the beginning of each synonym etc. and use that logo to link to the word in wikisaurus. That way both types of structure could be maintained. Amina (sack36) 02:18, 18 September 2008 (UTC)
- From my aesthetic perspective, the current Wikisaurus logo should better stay out of Wiktionary and Wikisaurus entries. Also, regardless of the choice of logo, I find unfortunate the idea of having even a good image repeated twenty times at, say, Template:ws link.
- My reply rests on the assumption that you are actually proposing a modification of
{{ws}}
, a modification of the way in which it links to both Wiktionary and Wikisaurus. It is not clear to me whether you propose a change of the look of Wiktionary entries or Wikisaurus entries. I find the idea unfortunate in both cases. And I frankly don't understand expressions like "each synonym etc."--Dan Polansky 07:24, 18 September 2008 (UTC)
- From my aesthetic eye, a massive "wikisaurus" cluttering the lists each time there's a link to its own words is also "unfortunate". Having a colorful dot (for at the size I propose it would be nothing more) seems vastly preferable to that.
- I am, indeed, proposing a modification to
{{ws}}
. The way it links to wiktionary will remain the same; only the wikisaurus links will change. Making the change will allow people to locate wikisaurus words that haven't been thought of to link by us.
- I am, indeed, proposing a modification to
- Expressions like "each synonym etc." are shorthand for: each synonym, antonym, homonym, colloquialism, slang, Hyponym, see also, and other word types that have a connection to the head word.
- What is it that is getting you so angry, Dan? Amina (sack36) 16:36, 18 September 2008 (UTC)
- Amina, thank you. Now I know that you want to discuss the change of the appearance of Wikisaurus, and speficially of
{{ws}}
, not of the links to Wikisaurus that are in Wiktionary. And now I know that you had "synonym, antonym, homonym, ..." in mind what you wrote "each synonym etc.". I really did not know what you mean; I am quite poor at inducing a class from a list consisting of one item.
- Amina, thank you. Now I know that you want to discuss the change of the appearance of Wikisaurus, and speficially of
- Now to the topic itself. As I have written, I oppose the use of images in
{{ws}}
. That does not mean that I insist on the current "[Wikisaurus]" appearance. --Dan Polansky 10:49, 19 September 2008 (UTC)
- Now to the topic itself. As I have written, I oppose the use of images in
- How about a small green "GO" next to any word that has a wikisaurus entry. We could keep the font size to "small" so that it is somewhat obtrusive. The green color will let people know it is something different (ie a link) but not a standard link such as the one associated with the word itself.
- You did say that you opposed the use of images but I am still somewhat unclear as to why you are against it. Can you elucidate? Amina (sack36) 23:44, 22 September 2008 (UTC)
- I don't see that the text "GO" is the best solution, but it is basically okay with me. I have not seen it yet, but I am skeptical about of the idea of using colors. --Dan Polansky 08:43, 23 September 2008 (UTC)
English proverbs
Which is preferred for English proverbs: Category:English proverbs or Category:Proverbs? I would assume Category:English proverbs because we classify proverb as a part of speech here. --Jackofclubs 09:16, 20 September 2008 (UTC)
meanings found only in a certain idiom
If a meaning is only found in a certain idiom (notwithstanding obsolete uses from which the modern idiom stems), should it be listed at the word, or solely at the idiom?
E.g. Should there be an entry for (deprecated template usage) sack, that is only "used" in (deprecated template usage) get the sack, and etymologically appears to be derived from the "bag" meaning? What about the "sack" found in (deprecated template usage) hit the sack and (deprecated template usage) sack out ? Circeus 19:19, 20 September 2008 (UTC)
- From a user perspective, they might not know to or how to search for the lemma form of an idiom and may instead search for the word that is the focus of their uncertainty. They would be unlikely to search for "get" or "give" rather than "sack". They cannot be assumed to successfully find what they want in our sometimes interminable "derived terms" or "related terms" either. It is also possible and even likely that there will be low-frequency alternative forms that we will not have been entered. I think it is important for Wiktionary to be usable before it has achieved completion, which is still some months away ;-)). DCDuring TALK 20:04, 20 September 2008 (UTC)
"Sack out" is new to me but I've heard of "sack(ing) s.o." & "getting/being sacked", so in this particular instance where "sack" has become a verb synonymous with fire (unemploy), I'd have thought it'd already be on the page for "sack". Please remember the verb and noun "sack" for tackling a quarterback in Grid Iron (North American Football).
- The verb "to sack" is there. But the noun? I have yet to see an instance of "sack" with the specific meaning "firing". People use "sacking". "Sack" in get the "sack" does not even MEAN firing. It still means "sack/bag"! (like how you don't give a special meaning to "board" for (deprecated template usage) across the board). Circeus 02:16, 23 September 2008 (UTC)
Broken audio link
The pronunciation of Dutch leek was given like this: * {{IPA|//[[Media:nl-leek.ogg|leːk]]//}}
and I attempted to move the sound file to its own line, like this:
* {{IPA|/leːk/}}
** {{audio|Media:nl-leek.ogg|Audio}}
but now it doesn't link any more... could someone fix this for me, please? Thanks. — Paul G 13:23, 21 September 2008 (UTC)
- Someone has fixed this. The audio template supplies "Media:", so you were getting "Media:Media:nl-leek.ogg" Robert Ullmann 17:15, 22 September 2008 (UTC)
It was my understanding that {{italbrac}}
and {{i}}
did the same things; and, indeed, that the latter was an abbreviation of the former. However, I now notice that {{i}}
redirects to {{qualifier}}
, which still præscribes usages for {{italbrac}}
. What’s going on here? I’ve been using {{i}}
in place of {{italbrac}}
for months without a single instance of admonishment from a more clued-up editor… † ﴾(u):Raifʻhār (t):Doremítzwr﴿ 14:36, 21 September 2008 (UTC)
- The
{{i}}
template was a stop-gap fix. We prefer the use of more specific templates whenever possible. In the Etymology section{{term}}
is used. In the Pronunciation section{{a}}
is used to denote regional accents. In the Synonyms and other -onyms,{{sense}}
is used to mark the sense of a group of terms, and{{l}}
may be used to enclose the terms themselves, if the section is not English. Personally, I haven't been using{{i}}
or{{italbrac}}
for more than two years now. --EncycloPetey 18:58, 25 September 2008 (UTC)
- OK, thanks for the explanation. Is there any problem with my using
{{i}}
in place of{{italbrac}}
and{{qualifier}}
, as I have been doing? † ﴾(u):Raifʻhār (t):Doremítzwr﴿ 19:43, 25 September 2008 (UTC)
- OK, thanks for the explanation. Is there any problem with my using
- That depends on whom you ask. AFAIK, there isn't a clearly laid out policy on when to use each template. Someone (I do not recall who it was) was working to put together a guide to the use of these templates, but I haven't seen that completed anywhere. The
{{qualifier}}
template is used to clarify senses or amend items, and hence the name. There is also{{context}}
, which is to be used at the start of definition lines. If you know of situations where you've used{{i}}
, and for which one of the other templates isn't quite sufficient, that would be a useful thing to document. As I said, though, there doesn't seem to be a consensus on the status of{{i}}
. --EncycloPetey 20:13, 25 September 2008 (UTC)
- That depends on whom you ask. AFAIK, there isn't a clearly laid out policy on when to use each template. Someone (I do not recall who it was) was working to put together a guide to the use of these templates, but I haven't seen that completed anywhere. The
- No, all the other templates together cover every function, but I’d say that’s because
{{i}}
functions in identically to{{italbrac}}
. However, I don’t fancy typing out {{italbrac|…}} for the effects of (''…'') when {{i|…}} has, AFAICT, exactly the same function. However, I only use{{i}}
in place of{{qualifier}}
and{{italbrac}}
(the latter of which I hitherto thought was obsolete). † ﴾(u):Raifʻhār (t):Doremítzwr﴿ 20:31, 25 September 2008 (UTC)
- No, all the other templates together cover every function, but I’d say that’s because
- My understanding is that you should only use
{{i}}
/{{italbrac}}
/{{i-c}}
/{{italbrac-colon}}
if no other template is suitable. My use of{{qualifier}}
is much broader than noted in the documentation - basically I use it to qualify anything that needs qualifying where{{sense}}
,{{context}}
,{{term}}
or{{a}}
are not appropriate. I place the qualification either before or after the thing being qualified depending on which is most appropriate to the situation. I did not know{{l}}
existed until reading this discussion. Thryduulf 21:44, 25 September 2008 (UTC)
- My understanding is that you should only use
- Neither did I. The unfortunate thing, is that its use entails an additional five keystrokes per linked term. Also, does it allow different spellings to be displayed, e.g., for Latin persōnam (written as: {{term/t|la|persōnam}})? –The documentation did not mention a third parameter.
- Ideally, mass conversion of bracketed lists into
{{l}}
-engirdled ones would be something that AutoFormat should be able to handle. How about it? † ﴾(u):Raifʻhār (t):Doremítzwr﴿ 21:55, 25 September 2008 (UTC)
- Ideally, mass conversion of bracketed lists into
- Looking at the documentation for
{{l}}
it seems that it is less flexible than{{term}}
(e.g. for Latin macrons as noted above), the syntax order is backwards relative to{{term}}
and{{etyl}}
and there is apparently no parameter for non-Latin scripts. I suggest that these are discussed (and, from my POV, corrected) prior to any large scale adoption or conversion. As far as I can tell all that is required is a version of{{term}}
that does not italicise the displayed text, which for someone who understands class definitions and template syntax would presumably be a 5 minute job. So I propose that such a template is created at template:list, all existing uses of{{l}}
converted to use{{list}}
and then{{l}}
turned into a redirect to{{list}}
. Thryduulf 01:26, 26 September 2008 (UTC)
- Looking at the documentation for
- Seconded. And I propose that once your proposed changes are made, that AutoFormat be programmed to substitute all [[terms]] with {{l|terms}} and, as an aside, to ensure that there exists a space between the bullet and the first bracket-soon-to-be-brace (as * {{l|term}}), for readability’s sake. † ﴾(u):Raifʻhār (t):Doremítzwr﴿ 01:43, 26 September 2008 (UTC)
- Another template I didn't know about. It seems to do everything, but I think the syntax should be modified to match that of term (ie. {{onym|link|display|gloss|sc=script|tr=transliteration|lang=language}}, rather than the current {{onym|language|link|display|gloss=gloss|sc=script|tr=transliteration}}. Thryduulf 09:52, 26 September 2008 (UTC)
- I knew vaguely about
{{onym}}
. I think it sees some usage (definitely so in Ruakh’s Hebrew entries), so it wouldn’t be possible to alter the syntax without breaking all the instances in which the template is used currently. It’s also a four-, rather than one-, -letter template, making its use cumbersome. I still favour fixing{{l}}
and having AutoFormat use it. † ﴾(u):Raifʻhār (t):Doremítzwr﴿ 01:26, 27 September 2008 (UTC)
- I knew vaguely about
Represent FL syllabaries on char variation?
I was looking at a character variation page, Appendix:Variations of "t", and it struck me that we have no foreign language syllabaries represented. I understand the difference between abjads, abugidas, alphabets, and syllabaries and how words that don't follow (C)V(C)V(C)V... patterns do not have exact transliterations into syllabaries but I know that in practice such transliterations regularly occur and systems are made to compensate them. For instance "drill" in Japanese is "doriru" (ドリル), which follows a regular pattern wherein drill is split into "d-ri-ll", which is changed to its phonetic equivalent "d-ri-l", which has auxiliary vowels added to fit the (C)V(C)V(C)V... pattern traditional to syllabaries as "do-ri-lu", which is simplified to one form based on the Japanese l/r merger as "do-ri-ru" (ド-リ-ル), which makes its final form "doriru". This system is regular and straightforward but can make for some difficult back-extractions as "ドリル" could represent any combination of ("d"/"do")-("li"/"ri")-("l"/"lu"/"r"/"ru"), giving 2*2*4=16 possible back-extractions like for instance "doliru", which most English speakers would not associate with "drill". Nevertheless, I would like to add the appropriate kana symbols to those character variation pages. The following contains every consonant of English: m-mu-ム p-pu-プ b-bu-ブ f-fu-フ v-bu-ブ θ-su-ス ð-zu-ズ n-n-ン t-to-ト d-do-ド s-su-ス z-zu-ズ ɾ-do-ド ɹ-ru-ル r-ru-ル l-ru-ル tʃ-ti-チdʒ-zi-ジ ʃ-si-シ ʒ-zi-ジ ɻ-ru-ル ç-hu-フ j-i-イ ŋ-ngu-ング k-ku-ク g-gu-グ x-hu-フ ʍ-u-ウ w-u-ウ h-hu-フ& the glottal stop, ̚-small tu-ッ . I'd like to hear some other views first though. :)--Thecurran 16:51, 21 September 2008 (UTC)
new proposed vote: Whitelisted users autopatrol
There's a new proposed vote, Wiktionary:Votes/pl-2008-09/Whitelisted users autopatrol; comments and rewording are encouraged at its talk page.—msh210℠ 22:28, 9 September 2008 (UTC)
- And now it's live.—msh210℠ 16:48, 22 September 2008 (UTC)
I've begun Wiktionary:News for editors as a resource of necessary news, in brief, for editors who don't read discussion or vote pages. If this duplicates a page I've been unable to find, please inform (or, more simply, delete). Otherwise, please contribute. ;-)
Thanks!—msh210℠ 17:02, 22 September 2008 (UTC)
- Excellent idea. DCDuring TALK 18:28, 22 September 2008 (UTC)
- Many thanks, msh. I am not terribly diligent about keeping up with the discussion pages and this news summary is bound to be helpful. Just added it to my watch list. -- WikiPedant 18:42, 22 September 2008 (UTC)
- You're welcome. I do hope it's kept up. I mean, I'll add things as I see them (and if I remember to), but I also don't follow all discussions, not at all.—msh210℠ 18:54, 22 September 2008 (UTC)
- Many thanks, msh. I am not terribly diligent about keeping up with the discussion pages and this news summary is bound to be helpful. Just added it to my watch list. -- WikiPedant 18:42, 22 September 2008 (UTC)
- Not sure where to link to it from, though....—msh210℠ 18:57, 22 September 2008 (UTC)
- Community page? A second welcome-type message after someone has behaved non-vandalously [;-)) for a decent interval after registraton? Especially for white-list inhabitants and inactive long-time users. DCDuring TALK 19:09, 22 September 2008 (UTC)
request for wikisaurus entries?
We have a page for requesting dictionary entries. Should we add a page for requesting wikisaurus entries? RJFJR 19:24, 22 September 2008 (UTC)
- we actually have one already. It is somewhat hidden because of a rather strong influx of offensive words being requested. I do think it's time to bring it back out into the light. Amina (sack36) 23:29, 22 September 2008 (UTC)
- You will find the page you are looking for at Wiktionary:Wikisaurus/requested entries Amina (sack36) 23:49, 22 September 2008 (UTC)
Moby Project
I would like to propose we incorporate the moby project's thesaurus file into wikisaurus. The file is open source and on the opening pages it gives you the structure of the flat file so that you can incorporate it. There are over 30,000 words with millions of synonyms already established. This would give us a huge leg up on moving the wikisaurus project along. There should be little problem setting up an application to provide the influx and a bot to add standard formatting. All that would be left would be to go in and verify and tweak.
Anyone with any powerful objections? questions? ice cream? Amina (sack36) 01:04, 23 September 2008 (UTC)
- I have to disagree. Carrying out the proposal would mean to turn Wikisaurus (capitalized with W) into a collection of heaps of weakly semantically related words, without any internal structure in each heap, as Moby Thesaurus II sorts the synonyms alphabetically and does not even distinguish individual senses of words. As it now stands, Wikisaurus still has the chance of becoming a thesaurus organized by sets or clusters of synonyms.
- --Dan Polansky 08:24, 23 September 2008 (UTC)
- A potential differentiator of Wikisaurus is the linkage to our increasingly well developed senses of words and phrases. To say that we will "incorporate" Moby doesn't say very much. What exactly is going to be incorporated? Does it convey any benefits or costs to principal namespace Wiktionary? How will the incorporation be done technically? Is their license compatible with the WMF GFDL license? BTW, do you have any information on usage of Wikisaurus (visits, edits, most visited pages)? What about usage of Moby? DCDuring TALK 11:10, 23 September 2008 (UTC)
Yay!! Icecream!
Moby is totally open. You can read about it at Wikipedia: [[2]] or from the site itself: [[3]]
The thesaurus comes in a flat file that can be read by any computer. The headword is the first word in the file. Each synonym is comma delimited, and headwords are delimited by a hard carriage return. The synonyms are in alphabetical order. A fairly simple program could load the head words each to it's own page, then stack the synonyms in that page. A more complex program (tho not by much) could put them into our normal format, though we would have to go in and establish "Noun","Verb", etc headers. I absolutely agree with Dan that we would have to put our own sense of order to the resultant load, but we would have had to do that anyway. We'll be saving a huge amount of time and basically cutting our load in half.
The benefit to Wiktionary is that since the linkage to wiktionary would be already in place, it would handily point out any words still missing from the dictionary. It would get wikisaurus up and running sooner than expected and it would save on an incredible amount of work in wikisaurus making it possible for people to get on with the wiktionary side of things. As for cost to wiktionary, I don't anticipate any. As I said before, Moby is free for all. It is fairly large and may take up more space than you currently have. We would need to get someone who knows more about wiki's language than I do to do the actual transfer.
It's impossible to determine the number of hits Mobi gets on it's thesaurus. Each download of the flat file could represent thousands of hits. The file is a resource for people to use to set up a thesaurus more than a thesaurus itself. There are at least two thesauri using the moby file that I know of.
As for WS stats, I'll work on that and let you know. Amina (sack36) 19:51, 23 September 2008 (UTC)
- Thanks for the good explanation. It does seem pretty good except for the resources required to implement. But the result could be worth it. How do you plan to integrate with existing Wikisaurus entries and not override good Wiktionary synonym lists not yet brought into Wikisaurus? Did you have in mind some kind of halfway house for draft entries, like transwiki? I assume that it will take a good bit of handwork to make it come up to standards comparable to Wiktionary (however those standards apply to Wikisaurus). DCDuring TALK 20:43, 23 September 2008 (UTC)
Moby project example
- ----
- Let me exemplify what I have written. To get an idea of how varied the lists of words are in Moby Thesaurus II, which is in public domain, compare the current Wikisaurus:class and the entries in Moby: “class” in Moby Thesaurus II, Grady Ward, 1996.. There is a complete mixture of senses and parts of speech, including verbs and nouns. I do not know how to find there that list that is currently present at Wikisaurus:class. Specifically:
- Wikisaurus:class: grouping based on shared characteristics or attributes: class, type, kind, sort, genus, genre, category, family, rubric
- Moby:class: account, adherents, advantageousness, agreeableness, allot, alphabetize, analyze, animal kingdom, antonomasia, appraise, appreciate, arrange, ashram, assembly, assess, assign, assort, auspiciousness, bearing, beneficialness, benevolence, benignity, binomial nomenclature, biosystematics, biosystematy, biotype, birth, blood, body, bracket, branch, brand, break down, breed, breeding, brethren, brood, caliber, call, caste, catalog, categorize, category, church, churchgoers, clan, classification, classify, codify, cogency, colony, color, commonwealth, commune, communion, community, condition, confession, congregation, consider, deme, denomination, descent, description, desert, digest, discernment, disciples, distinction, divide, division, domain, echelon, economic class, elegance, endogamous group, estate, estimate, evaluate, excellence, expedience, extended family, extraction, factor, fairness, faith, family, favorableness, feather, figure, file, fineness, first-rateness, flock, fold, folk, followers, footing, form, form an estimate, gauge, genotype, genre, gens, genus, give an appreciation, glossology, goodliness, goodness, grace, grade, grain, group, grouping, guess, head, heading, healthiness, helpfulness, hierarchy, hold, house, identify, ilk, importance, index, ism, judge, kidney, kin, kind, kindness, kingdom, kinship group, label, laity, laymen, league, level, line, lineage, list, make an estimation, mark, matriclan, measure, merit, minyan, moiety, nation, nature, niceness, nomenclature, nonclerics, nonordained persons, nuclear family, onomastics, onomatology, order, orismology, parish, parishioners, part, patriclan, pedigree, people, persuasion, phratria, phratry, phyle, phylum, pigeonhole, place, place-names, place-naming, plant kingdom, pleasantness, polyonymy, position, power structure, precedence, predicament, presence, prestige, prize, profitableness, quality, race, range, rank, rate, rating, realm, reckon, refinement, regard, rewardingness, rubric, savoir faire, school, score, sect, section, seculars, separate, sept, series, set, settlement, sheep, sift, skillfulness, social class, society, sort, sort out, soundness, species, sphere, stage, stamp, standing, station, status, stem, stirps, stock, strain, stratum, stripe, style, subcaste, subclass, subdivide, subdivision, subfamily, subgenus, subgroup, subkingdom, suborder, subspecies, subtribe, superclass, superfamily, superiority, superorder, superspecies, systematics, tabulate, taste, taxonomy, terminology, thrash out, tier, title, toponymy, totem, track, tribe, trinomialism, type, usefulness, validity, valuate, value, variety, virtue, virtuousness, weigh, wholeness, winnow, worth, year.
- Roget 1911:75. class: N. class, division, category, categorema[obs3], head, order, section; department, subdepartment, province, domain.
kind, sort, genus, species, variety, family, order, kingdom, race, tribe, caste, sept, clan, breed, type, subtype, kit, sect, set, subset; assortment; feather, kidney; suit; range; gender, sex, kin.
manner, description, denomination, designation, rubric, character, stamp predicament; indication, particularization, selection, specification.
similarity &c. 17.
- The links, once more:
- Wikisaurus:class
- “class” in Moby Thesaurus II, Grady Ward, 1996.
- “class” in Roget's Thesaurus, T. Y. Crowell Co., 1911.
- --Dan Polansky 06:42, 24 September 2008 (UTC)
- Thanks for providing an example for discussion. Broadly, the example suggests that Wiktionary is more focused by specific part of speech and sense within part of speech. The Moby list seems like an undifferentiated list of synonyms for every PoS and sense of class. There are some words that seem like mistakes (though I may have forgotten a sense of "class"): "skillfulness", "wholeness". It seems to include groupings that are not really like classes. Perhaps it would best be treated as a kind of list for us to check against for completeness. Is there a way that the list could be imported, split by PoS, and compared with our existing synonyms lists in Wiktionary and in Wikisaurus? It certainly would need a lot of work to be useful. DCDuring TALK 10:18, 24 September 2008 (UTC)
- As for the concept of integration, reguardless of what their classes are, we would be in the ballpark. We could load them all into something like a holding area that isn't connected into the main wikisaurus then run a bot through it checking each headword in the new stuff. If it finds a headword that is the same as that in wikisaurus, it deletes the new record. (or flags it so we can check the synonyms for new, valid ones. Once that is done we can run a bot that adds all our bells and whistles to the synonyms. All that is left for us to do is to re-arrange the words in the proper order. A far cry from loading all of this in by hand!Amina (sack36) 22:11, 24 September 2008 (UTC)
- If you take a look at our wikisaurus entry for class we have totally left out at least half a dozen different senses:
- sense: having style
- sense: a grade room in school
- sense: position in society
- sense: magnitude
- sense: biological classification
- sense: word classification
- in the mobey project they may have gotten overboard, but at least they didn't leave so much out!
Amina (sack36) 22:23, 24 September 2008 (UTC)
- Is Roget 1911 available under suitable license? It seems to require less handwork to give useful results. Using it wouldn't preclude also using Moby. DCDuring TALK 23:24, 24 September 2008 (UTC)
- Roget is past copyright age so if it's on Gutenberg Project it's usable by us. What does this have to do with the topic? Are you thinking we could use it in some manner to sort out moby?
- By the way, to put this Moby project in perspective, in order to have a comprehensive wiktionary, we'll either have to use Moby and correct it's flaws, or recreate moby. I'm betting correcting flaws will be easier.Amina (sack36) 15:22, 25 September 2008 (UTC)
- My thought is that Roget has a more differentiated structure and there has more information content than Moby, based on the single case shown. It might require less work to integrate with existing WS and Wikt synonym lists. Further, it might require less work where such lists did not yet exist. Because Moby seems to require so much work to get past the grouping by word to more meaningful grouping by sense, I would have thought it would be better as a kind of completeness and modernization tool for a Roget-enhanced ws/wikt structure. DCDuring TALK 17:15, 25 September 2008 (UTC)
- If you check the next section down, you'll see the Roget will require as much work as mobey.Amina (sack36) 22:15, 25 September 2008 (UTC)
- My thought is that Roget has a more differentiated structure and there has more information content than Moby, based on the single case shown. It might require less work to integrate with existing WS and Wikt synonym lists. Further, it might require less work where such lists did not yet exist. Because Moby seems to require so much work to get past the grouping by word to more meaningful grouping by sense, I would have thought it would be better as a kind of completeness and modernization tool for a Roget-enhanced ws/wikt structure. DCDuring TALK 17:15, 25 September 2008 (UTC)
Roget
I looked up the Roget 1911 that Dan has been referencing and that is actually the joint French/US Government project I had spoken of earlier. That puppy is proprietary and we can only reference it, not grab it whole cloth. I then looked up the Gutenberg version and while the text is ascii and easily transferable, the text has no standardized nomenclature as is needed to load something programmaticly without problems. I looked for a way we could massage the data to make it load and format easily. While there is a possibility that this could be done, the time consumption would be about equal with Moby. In other words, Moby was designed to be loaded with a program and Roget was designed to be read. Amina (sack36) 15:40, 25 September 2008 (UTC)
- AFAICS Roget 1911 is in public domain. http://machaut.uchicago.edu/rogets says: "Roget's Thesaurus, 1911, version 1.02 (supplemented: July 1991) released to the public domain by MICRA, Inc. and Project Gutenberg." Boldface mine. For refence: Roget 1911 at Project Gutenberg. It comes in two files: the first with the word clusters as seen at chicago.edu, the second with an index linking all the words to their clusters. --Dan Polansky 16:55, 25 September 2008 (UTC)
- AFAICS says Roget 1911 is public domain, not that AFAICS's formatting is public domain. I went to gutenberg via your link and found way more than two files. I did finally catch on to which ones you were referring to and looked at them specifically. Let me show you the key for one of them.
- bivouac
- abode 189
- location 184
- quiescence 265
- warning 668
- bivouac
- AFAICS says Roget 1911 is public domain, not that AFAICS's formatting is public domain. I went to gutenberg via your link and found way more than two files. I did finally catch on to which ones you were referring to and looked at them specifically. Let me show you the key for one of them.
Does that look valid to you? I know everyone and their favorite flea is using this book, but it just seems weird to me that bivouac, which I'd always associated with military camps, can be defined as quiescence or warning. The other file still has people formatting. It didn't look usable without a great deal of programmer time, filtering out the anomalies or a great many man-hours doing it ourselves.
- Look, even I like the look of roget's better but roget only has one thing and a half going for it--the clumping you're looking for and I'm not; and a possible partial file that needs work. Mobey has at least three things going for it.
- It has a file that is completely formatted for the computer. None of the commas mean anything but "look! a new word"; all of the records have exactly the same structure--no extra ones with page names or headers. To load this into wiki would take a good programmer about half an hour.
- The license for Moby is actually more open that public domain. Moby says you can snatch it, make no improvements and sell it if you have the marketing skills. Public Domain says it has to be altered considerably to allow it to be sold. Moby is open source throughout the world and Public Domain only pertains to the US. Each country has their own rulings.
- Roget's is from 1911. A huge amount of words have changed meanings--especially nuance--since then. We'll have to cull the thing to remove words just as much as we'll have to cull Moby. Moby, on the other hand, is from 1996. The words that are given there will be more relevant to now. The more I think about it, the more I'm convinced Mobey will be less work.
Amina (sack36) 19:00, 25 September 2008 (UTC)Amina (sack36) 19:38, 25 September 2008 (UTC)
- Re: "The license for Moby is actually more open that public domain. […] Public Domain says it has to be altered considerably to allow it to be sold.": I'm 95–99% sure that you're mistaken. The whole point of the public domain is that it's owned by the public, and we can do whatever we want with it. —RuakhTALK 01:50, 26 September 2008 (UTC)
- You're right. I was mixing it up with Creative Commons. My apologies. Amina (sack36) 02:42, 26 September 2008 (UTC)
Dueling Banjos and wikisaurus
DC, you made a remark that has been haunting me. You mentioned that we could use both. Dan and I each have our favorites, how about we each work on the one we favor? I'll need somebody with more current programming skills than I have to load the file and we'll have to make a separate area for each load in order to keep what we have of wikisaurus clean-ish. So Dan, does that sound reasonable to you? Amina (sack36) 22:28, 25 September 2008 (UTC)
- Well, I'm practicing for Halloween. In any event, it would be useful to test the incorporation process before going whole hog. How complete is the process of bringing over Wiktionary's synonyms? DCDuring TALK 23:07, 25 September 2008 (UTC)
- I haven't been working on those, unless they were part of the list of words that needed cleaning up.Amina (sack36) 02:38, 26 September 2008 (UTC)
- Re: "So Dan, does that sound reasonable to you?" Well, I am only concerned that the Wikisaurus setup as we currently have it can be kept and stay useful. In my previous replies, I was correcting what to me seemed like inaccuracies about Roget 1911; Roget 1911 is in public domain and it is practicable to upload it if that should be the choice; the relevant file is of course the first one, of the two that I referred to and linked to. That said, I am not proposing any upload at this point.
- I surely have no objection if you manage to load Moby to, say, "Moby:" namespace.
- Moby Thesaurus II and Roget 1911 both have exactly the same license: public domain. --Dan Polansky 09:02, 26 September 2008 (UTC)
- A day late and a dollar short, Dan. Read my post in the section where we were talking about this (just one section straight up)Amina (sack36) 01:31, 27 September 2008 (UTC)
The Nuance Problem
Dan and I have been concerned about the lack of definition of nuance in the wikisaurus synonyms. I think I may have come up with a solution. When we make the words columnar, we gain readability but we lost the ability to konote a change in nuance. If we simply place 4 dashes between the nuances, it will show up as a short solid line dividing the two areas. Amina (sack36) 04:00, 26 September 2008 (UTC)
- Amina, I am not quite sure what you mean, and what problem you are trying to solve with your solution. I am quite happy with the current setup of Wikisaurus. --Dan Polansky 09:14, 26 September 2008 (UTC)
- Very well, Dan, I'll put it more bluntly. Dan and I have been arguing and working at cross purposes. He has recently taken to [deleting] my posts and instead of continuing in this manner, I am trying to come to some compromise. Amina (sack36) 00:52, 27 September 2008 (UTC)
- I have tried to merge your newly posted table into the table that I have planned to be expanded as new options come in, for easy overview, in
{{ws}}
. I am sorry that I have upset you with that. I will refrain from merging any new tables of yours. Your table looked as if it were a continuation of my table, so that is what confused me. --Dan Polansky 09:19, 30 September 2008 (UTC)
- I have tried to merge your newly posted table into the table that I have planned to be expanded as new options come in, for easy overview, in
Over-categorized Spanish verb forms
Hi, shouldn't the subcategories here be removed and all the Spanish verb forms (except for present and past participles and gerunds) be put here? Long ago, I was criticised for over-categorizing Finnish verb forms, i.e. for making categories for each person, each mood and each tense (thid-person singular, indicative, potential, present connegative bla blaa), so should the same policy be put into practice here? -- Frous 17:02, 23 September 2008 (UTC)
- (I'm not quite sure if I should've put this to Tea room since this place is for "general discussion", but whatever...) -- Frous 17:13, 23 September 2008 (UTC)
- That's how it was before McBot changed all of them- I'm not sure what the motivation was or where the discussion took place to get approval for it. Nadando 17:22, 23 September 2008 (UTC)
- Also note that it appears that it would be pretty easy to change back to just 'Spanish verb forms' using the template
{{es-verb form of}}
. Nadando 17:39, 23 September 2008 (UTC)
- There's a lot of work to be done in Spanish verb cleanup. There isn't even a unified
{{es-verb}}
in place yet, though I think I've devised a slick solution to that problem (more on that soon at Wiktionary:About Spanish). But yes, the Spanish verb forms are over-categorized. --EncycloPetey 19:42, 5 October 2008 (UTC)
The prefix template is broken
The prefix template has been changed so that is includes the word "From" at the beginning, breaking all current uses. Existing uses display like this:
Etymology
dougher 02:37, 24 September 2008 (UTC)
Suffix template is broken the same way.
dougher 02:42, 24 September 2008 (UTC)
- It's not at all clear to me why "From" belongs in front of such a decomposition of a word. In some cases it is not really a true etymology, but rather a link to two entries that may have useful information about the components. I'd be happy to hear discussion of why it should be one way or another. DCDuring TALK 03:18, 24 September 2008 (UTC)
- Agreed. User:Hamaryns did all this yesterday, without it appears, prior discussion. --Bequw → ¢ • τ 07:37, 24 September 2008 (UTC)
- I've undone both
{{suffix}}
and{{prefix}}
; that was not the only problem, several other things wrong. (So examples above don't show the problem now.) Any tech discussion should be in WT:GP or on the talk pages there. (I wouldn't use "From" anyway, it isn't quite right: the word isn't from a+b, the word is a+b.) Robert Ullmann 11:15, 24 September 2008 (UTC)
- I've undone both
- (oh, and see this edit where an important bit of information was removed to accommodate the forced "From".) Robert Ullmann 11:19, 24 September 2008 (UTC)
- Oh dear,
{{compound}}
too. Still need to fix the talk pages. Robert Ullmann 11:34, 24 September 2008 (UTC)
- Oh dear,
- Cleaned up those. --Bequw → ¢ • τ 06:37, 25 September 2008 (UTC)
- Sorry for all this, I admit I hastened there. My point is that right now, the Etymology section is very uninformative. I’d rather see at least some explanatory text about the word formation instead of simply a link to the components. ‘From’ was supposed to be a small step in that direction, but I’d rather see something like ‘From distance prefixed with out-.’, but I’d be glad hearing some better suggestions. I think the wording of this section is a bit harsh, btw, I did check a random sample of usages and they do look fine with the ‘From’, though I can imagine some don’t. H. (talk) 12:37, 30 October 2008 (UTC)
- That's fair enough; more informative etymologies are definitely needed, although the degree to which templates can be used for this purpose may be limited. I think the chief problem was that many entries already used "From
{{suffix|foo|bar}}
" and so forth. In the changed template this resulted in "From From foo + -bar," which is not desirable in any scenario. -- Visviva 13:10, 30 October 2008 (UTC)
- That's fair enough; more informative etymologies are definitely needed, although the degree to which templates can be used for this purpose may be limited. I think the chief problem was that many entries already used "From
German strong / weak declension
<-- copied here from User_talk:Mutante -->
Hi Mutante. I've seen that we still disagree on where to place the main entry for German nouns that have more than one nominative singular case forms (weak and strong declension, e.g. Beamte and Beamter, or Jüngste Gericht and Jüngstes Gericht). As we have previously discussed, some German dictionaries always use weak declension, others always strong declension, so both options would be possible. I would favour weak declension, because it seems that the German Wikipedia also uses this convention and that would make the interwikilinks more meaningful. Anyhow, we should decide to use one convention only for consistency. The entry for the other form should still exist, of course, but be maybe treated like an inflected form entry (see Beamter for how I do it now). One remaining problem is that for weak declension, the female form of the noun is usually (always?) often the same (der Beamte, die Beamte), which does not work well with the de-noun templates, if only one such template is allowed per noun section. Using different etymology sections, like recommended by some of the admins in similar cases, also sounds weird here, because die Beamte is only the female form of der Beamte, not a completely different word. Maybe we should put that question up on a help page to see what long-term admins propose? Cheers --Zeitlupe 22:03, 22 September 2008 (UTC)
Ah, yes, you probably refer to my creation of radioaktiver Abfall. I usually just avoided creating those words, because i was not sure about how to create them, but that one was already linked to from "radioactive waste" like this. And see, it is also in this form on de.wp. I agree we should probably put the question on a help page. But about Beamte specifically: I would have thought the female version is Beamtin. Mutante 06:06, 23 September 2008 (UTC)
- yes, bad example :-(. it's die Beamtin, of course. Here is a better one: der Blinde, die Blinde ("blind person"). Is there actually a portal or other Wiktionary page existing, where we could discuss policies related to the German language? Or could you set one up as an admin? It would be better to have the discussion there instead of using user talk pages. Thanks. --Zeitlupe 17:07, 23 September 2008 (UTC)
- The closest thing to what you’re looking for is probably Wiktionary:About German. FWIW, it sounds to me that having the entry at the strong-declension form is the best idea. † ﴾(u):Raifʻhār (t):Doremítzwr﴿ 11:09, 24 September 2008 (UTC)
- I favor having the main entry at the weak declension, but I think there should be at the very least a redirect at the strong form. Many languages have a similar problem, and I handle them with redirects. One example that I remember from quite a while ago is Dritte Reich, which redirects to Drittes Reich. —Stephen 15:07, 24 September 2008 (UTC)
- Re: "I think there should be at the very least a redirect at the strong form. Many languages have a similar problem, and I handle them with redirects.": Indeed, a significant proportion of languages have the “problem” of non-lemma forms. Our solution for all of them is the same: we indicate what form it is, and link prominently to the lemma. (Some editors also include a translation of the non-lemma form itself, which in this case would be identical to the lemma translation.) A redirect is appropriate in your multiword example, but not for single-word examples. The only complication here is the conflicting traditions for selecting a lemma form; but all we have to do is select a standard and use it. —RuakhTALK 19:43, 24 September 2008 (UTC)
- Not all languages are like the Indo-European ones, where it’s simply a matter of the form. In Arabic, for example, initial hamza may or may not be written; final yaa’ and t-marbuta may be written with or without dots; any of a wide selection of diacritics may or may not be applied; and there are frequently prefixes which are unlike Indo-European prefixes, but which affect the spelling with nothing to do with the form...e.g., ضلمه, وضلمه, فضلمه...where the prefix is unrelated to the word and only present because someone who doesn’t know Arabic does not know that the prefix should be removed. Or the Russian words with the letter ё, which is more often than not spelt with е. These are different from hablo, hablas, habla. Or spellings in Canadian Aboriginal syllabics, which permit different degrees of precision in spelling. These are not single words with different forms, they are different ways of writing the same form of a word. And there are not a handful of them, there are millions. Thai, Lao and Khmer do not employ word spaces, but often we put a zero-width nonjoiner between "words", which allows proper line wrapping in texts. There is no difference the the appearance, meaning or reading of a phrase with or without nonjoiners, but software programs see them as different spellings, and searching for one does not find the other. In the Khmer Wikipedia, some words include the nonjoiners, others do not. If you use the {{wikipedia|lang=km}} link, it won’t work unless our nonjoiners match theirs...but there is no visible difference in the two words, and I handle the problem with redirects. —Stephen 00:19, 25 September 2008 (UTC)
- Fair enough — indeed, I do the same with Hebrew diacriticked forms (when I bother with them at all) — but the German weak/strong noun thing is a bit different from that, in that they're different spellings, with different pronunciations, and used differently, yes? They seem to fit perfectly into our “form of” mold. —RuakhTALK 01:57, 25 September 2008 (UTC)
- Yes, they are spelt and pronounced differently, but they are one and the same form, but vary according to certain preceding words. For example, if preceded by the definite article, it is the article that takes the strong form, and the adjective is weak to avoid unpleasant redunduncy in the endings. See w:German adjectives#Weak and strong inflection. So the weak form is probably more common in texts, since the definite article is so common. But some other preceding determiners do not show strong forms, so the adjective must be strong in their place. —Stephen 03:15, 25 September 2008 (UTC)
- yes, it is of form of. It would appear in the inflection tables of the main entry. See for example the word Beamte at canoo.net. canoo.net also uses weak declension for the main entry. --Zeitlupe 03:07, 25 September 2008 (UTC)
@Doremítzwr: are there any arguments why you think the entry at the strong-declension form is the best idea? I am wondering if strong/weak declension also exists in other languages, and how they handle it. @Stephen: The German Wiktionary also uses redirects, but I thought that redirects are not encouraged in the English Wiktionary. Here is my latest entry of such a noun: Leidtragende / Leidtragender. I used the infl|g=f|g2=m template to indicate that this is the weak declension base form of both the male and female version. Maybe we need a new de-noun template for such nouns. The strong declension entry mentions that it is the nominative singular of the male form only. I used Category:German noun for the weak declension and Category:German noun form for the strong declension to avoid that the word is counted twice, should someone be interested in the number of German nouns. Is this a good solution? --Zeitlupe 18:34, 24 September 2008 (UTC)
- There is an irrational fear of redirects in the English Wiktionary because a few years ago one editor objected to them, complaining that clicking straight through a misspelling to the correct form was confusing for him, and he was afraid he wouldn’t notice that he had been redirected. This had the dreaded slippery-slope effect that resulting in our current fear of a great tool. I feel that we need to get over our fear of redirects and put them to the same good use that the other Wiktionaries and Wikipedias do. I know that in my own case, I always seem to notice when I’ve been redirected. —Stephen 00:28, 25 September 2008 (UTC)
- Because of the problem you noted with the weak declension: “One remaining problem is that for weak declension, the female form of the noun is usually (always?) often the same (der Beamte, die Beamte), which does not work well with the de-noun templates, if only one such template is allowed per noun section”. As for strong/weak declension in other languages, English verbs like sing and nouns like louse are said to belong to English’s “strong declension”, but I don’t know whether the use of such terminology is current in academic discussion of English morphology. † ﴾(u):Raifʻhār (t):Doremítzwr﴿ 23:43, 24 September 2008 (UTC)
- Nota bene: strong declension. † ﴾(u):Raifʻhār (t):Doremítzwr﴿
- Would one be 'alternate form of' the other? RJFJR 21:25, 24 September 2008 (UTC)
- no, it is not an alternative form, it is a different inflection case. Strong declension is used if the noun is used without an article, weak declension if it is used with a definitve article. --Zeitlupe 03:07, 25 September 2008 (UTC)
- This would be another reason why I would favour the lemma being housed at the strong-declension form; i.e., because it is the “stand-alone” form. † ﴾(u):Raifʻhār (t):Doremítzwr﴿ 12:21, 25 September 2008 (UTC)
I checked the current state at the German Wikipedia again, and it seems that they also mostly use the strong declension form for new entries now. They had an opinion poll in 2005 that resulted in using the weak declension form, and most articles since then used it, but they now longer seem to use it for new entries. So all arguments are now in favor of using the strong declension: it is the natural stand-alone form, and it will result in two entries for male and female form for certain nouns. Example ("blind person"): Blinder (m) and Blinde (f) will be two entries, and Blinde could also have a second etymolgy section telling that it is also an inflected form of Blinder (m). We will use redirects for multi-word examples, like Drittes Reich. Should I copy this to the Wiktionary:About German page with a link to this discussion? --Zeitlupe 14:58, 25 September 2008 (UTC)
- Make sure you use a permanent link if you include one at all, otherwise the link will be broken when the section is archived. † ﴾(u):Raifʻhār (t):Doremítzwr﴿ 15:06, 25 September 2008 (UTC)
Two brief questions which might help finding a solution for the issue:
- What is the policy for Swedish where, to my knowledge, all nouns possess a definite form (corresponding to der Blinde (m)) and an indefinite form (corresponding to ein Blinder (m)) ? Is there any policy?
- What is the policy of Duden?
Perhaps it would be appropriate to add a note (eventually) to each entry where this is an issue. If I'm not missing anything, the problem concerns mainly nominalised adjectives and participles - and standing expressions containing an adjective, usually proper nouns. --Gauss 15:41, 25 September 2008 (UTC)
- About Swedish: We add only the indefinite form of the nouns as headwords, considering the definite form to be an inflection just like the plural is considered to be an inflection. So ok, there are strictly speaking more forms than that: out of the 8 possible forms (2 cases x 2 numeruses (numeri?) x 2 "definitenesses") only the singular, indefinite nominative is considered a lemma form; the other seven are to be entered as inflectional forms. Though noone has bothered adding inflections systematically to Swedish words yet. \Mike 20:02, 25 September 2008 (UTC)
- If (the Swedish?) numerus comes from the Latin numerus, then the plural would indeed be numeri; however, in English, the grammatical term is number. † ﴾(u):Raifʻhār (t):Doremítzwr﴿ 20:36, 25 September 2008 (UTC)
- Ah, thanks. It was too easy (and I was too lazy) \Mike 10:18, 26 September 2008 (UTC)
- sigh... The Duden uses the weak declension :-( Example Duden: Blinde, der u. die --Zeitlupe 17:01, 25 September 2008 (UTC)
- Then this should probably become policy. Duden is generally considered authoritative for German, isn't it? Latin is using the policy that the lemma for verbs is, unlike pretty much all other languages, the form for first person singular present tense active and not the infinitive - and the only rationale for this I can imagine is that historically all influencial dictionaries for Latin are doing it that
cumbersomepeculiar way. -- Gauss 14:05, 26 September 2008 (UTC)
- Then this should probably become policy. Duden is generally considered authoritative for German, isn't it? Latin is using the policy that the lemma for verbs is, unlike pretty much all other languages, the form for first person singular present tense active and not the infinitive - and the only rationale for this I can imagine is that historically all influencial dictionaries for Latin are doing it that
How do people feel about this change to the wording of the template? --EncycloPetey 18:52, 25 September 2008 (UTC)
- It looks like a definite improvement to me. —RuakhTALK 01:57, 26 September 2008 (UTC)
- Hmm, I'm not sure we want it to be used outside of main-space; but there's no problem with the new wording. (Though it doesn't use the
{{pagetype}}
template I created for making nice messages like that more easily :) Conrad.Irwin 20:56, 26 September 2008 (UTC)
- Hmm, I'm not sure we want it to be used outside of main-space; but there's no problem with the new wording. (Though it doesn't use the
Nahuatl
Would anyone object to replacing Classical Nahuatl (and other specific Nahuatl varieties) with plain old Nahuatl? Classical Nahuatl is vaguely-defined, both in time and geographically, so I'm often not sure what should be included. Further, there seem to be present-day varieties they don't fit into SIL/ISO 639-3's classification; for example, Milpa Alta Nahuatl doesn't seem to fit in any of the varieties listed on Ethnologue[4], as far as I can tell (some scholars have considered it to be a living remnant of Classical Nahuatl, but Ethnologue lists it as extinct, so that can't be what they intended). On the other hand, though the term "Nahuatl" is kind of far-reaching and includes non-mutually-intelligible varieties, it's quite clear and obvious what's Nahuatl and what's not. More specific information can be given the same way we deal with dialects: (Milpa Alta) etc. --Ptcamn 04:19, 27 September 2008 (UTC)
- Speaking as someone who had never heard the word Nahuatl before I came onto Wiktionary, there is a certain extent where I'm going to defer to Ptcamn's expertise on this one. I've long been of the opinion that it is sometimes useful to lump multiple languages together, especially in the case of poorly cataloged languages. However, if we are going to do this, we would want to set up an infrastructure which accommodated it well. So, we would want a good set of contags for "dialect" notations, and perhaps a set of alternative spelling templates similar to
{{grc-alt}}
, which readily notes dialects in a standard format. Finally, from what I can tell, Ptcamn is the only one running this Nahuatl thing we've got going here, and so I'm inclined to defer to him for that reason as well. If someone voices a counter argument against his, I wonder if it might be good to weight it based on their expected contributions to the area. -Atelaes λάλει ἐμοί 04:31, 27 September 2008 (UTC)
- I'm also inclined to defer to Ptcamn on this. Re: "If someone voices a counter argument against his, I wonder if it might be good to weight it based on their expected contributions to the area": Are you talking about this specific issue, or language handling in general? If the latter, then I don't like that idea, because I think it is worthwhile to have some sort of standardization across languages; and a weighting system would mean that someone expected to make no Nahuatl contributions should have no say in how it is handled. While I also dislike the system where a single English-only contributor can veto any foreign-language POS header, no matter how necessary, surely there must be a middle ground? —RuakhTALK 13:17, 27 September 2008 (UTC)
- (off main topic) There is no system "where a single English-only contributor can veto ...". The problem is with editors who insist (mostly correctly) that a POS header is needed, but simply try to go ahead and continue to use it while flatly refusing to bring the addition of the POS header to discussion and a vote to make it legal. (In which vote that single contributor can be overruled by general consensus.) Robert Ullmann 14:17, 27 September 2008 (UTC)
- And, to be honest, I wasn't thinking of a strict mathematical weighting, but a rather softer one where a user who is expected to contribute zero gets weighted to less than someone who was expected to contribute, not zero. -Atelaes λάλει ἐμοί 17:50, 27 September 2008 (UTC)