Talk:AWK

Latest comment: 2 months ago by Trackerwannabe in topic Name of the article

old discussion

edit

Is "Version 3 UNIX" supposed to be "3rd edition UNIX"? That would be approximately the right time frame, though i would have thought maybe 4th or 5th edition. Adding a date would help too. — Preceding unsigned comment added by Stephen Gilbert (talkcontribs) 00:56, 21 August 2001‎ (UTC)Reply


As I understand it, versions corresponded to software releases, editions corresponded to documentation (manual releases). There wasn't always a one-to-one correspondence to each other. --drj — Preceding unsigned comment added by Drj (talkcontribs) 15:51, 25 February 2002 (UTC)Reply


The naming scheme for UNIX was relatively linear until above Version 7 Unix, which is where major source trees, both inside and outside Bell Labs began to split off.

This UNIX timeline has some excellent linked sources, and names and dates the various Unix versions. http://www.robotwisdom.com/linux/timeline.html

These sources date AWK to January 1979. http://minnie.tuhs.org/UnixTree/V7/

Lent 18:46, 30 March 2006 (UTC)Reply


This article could really do with some simple examples to show the expressive power of the language. IMHO, it's easier than PERL for many simple data manipulation tasks. Any objections? - Steve Donovan — Preceding unsigned comment added by Sdonovan (talkcontribs) 06:34, 24 January 2003‎ (UTC)Reply


Is awk really feature complete enough to be considered a general purpose programming languagE? I've always considered it more of a text manipulation language. Suppafly 01:47, 7 Oct 2004 (UTC)

That's what K & R thought at first. But when they saw people using it as a general purpose language, they revised awk (calling it nawk) adding more features and functions. It can be used as a general purpose language. —Pelladon 07:21, 4 August 2006 (UTC)Reply

Yes, it is. It particularly excels at text manipulation, but is also a fine tool for other things, and in any case 'text manipulation' covers a lot of territory (the majority of dynamic web content, for example). It's not unusual to find quite large awk programs. - jhd — Preceding unsigned comment added by 132.147.65.102 (talk) 02:01, 7 October 2004‎ (UTC)Reply

If you are in any doubt, you should look for the first editions of the excellent little books, "Programming Pearls" and "More Programming Pearls" by Jon Bentley which really show what AWK can do in the hands of an expert. -- Derek Ross | Talk 07:18, 17 November 2005 (UTC)Reply


A simple example (about 200 lines) of a fully working tool that also demonstrates AWK is a used for more than text manipulation can be seen in the TLDP web correlator http://www.nyx.net/~sgjoen/webcorr-css which takes in an Apache HTTP Server log and generates an HTML report. I read somewhere that AWK was initially made for database purposes. — Preceding unsigned comment added by 85.164.112.1 (talk) 17:17, 1 July 2005‎ (UTC)Reply

[13 years later...] . o O (How is "turning a log file into an HTML report" not exactly text manipulation? I have to wonder what this IP user imagined HTML documents are made of?) -- FeRDNYC (talk) 13:07, 29 September 2018 (UTC)Reply

Name of the article

edit

Why is this page called "AWK programming language"? I've never seen awk referred to as "AWK", as if it were an acronym (even though I suppose it is). It's always just "awk". Why not change it to "Awk programming language" and chalk up the capital letter to technical restrictions? Makaristos 05:36, 14 December 2005 (UTC)Reply

Both capitalized and lower case forms are commonly used. Brian Kerningham seems to prefer AWK. Amnonc 16:27, 14 December 2005 (UTC)Reply
Then still, why is it called "AWK programming language" instead of just "AWK"? Also, in the article, both AWK and Awk are used. IMHO, we should stick to one capitalization and only mention the alternative somewhere. Qwertyus 17:06, 14 December 2005 (UTC)Reply
No, you're right, Amnonc. Even awk's own manpage refers to it as the AWK Programming Language. awk, all lowercase letters, is the UNIX program that runs programs in the AWK programming langauge. I stand humbled and corrected, and shall therefore attempt to make this distinction clear, as well as standardize the differing appelations in the article. Makaristos 02:36, 15 December 2005 (UTC)Reply

Add a quick note to say that awk uses "extended regexps" by default while grep/ed/sed have "basic regexps" by default on most (all?) platforms? 70.82.141.92 13:13, 25 March 2006 (UTC)Reply

I added a line. If you see other things to be improved, be bold. Thanks, Tom Harrison Talk 14:52, 25 March 2006 (UTC)Reply

2024

edit
For those who might be interested in the reference to Brian Kernighan's preference, there are are archived versions here, here, and here (and elsewhere as well).
Trackerwannabe (talk) 18:58, 20 August 2024 (UTC)Reply

Warts

edit
I added this section as well as the shebang section. Is this too much text and should

the article be split? Lent 16:59, 30 March 2006 (UTC)Reply

Criticisms

edit

The whole article seems to be quite a mess, but the "criticisms" section seems to be specially bad. Most items there are either completely POV, factually incorrect, or affect only specific versions of AWK. If no one complains I think I will remove the whole section, I think there are a few bits there that are valid, but they are rather minnor, would need sourcing and can be more easily added afterwards. --Lost Goblin 14:15, 8 June 2006 (UTC)Reply

Agree, I was going to suggest the same. Qwertyus 10:53, 8 June 2006 (UTC)Reply
I wrote most of it and I am also not quite happy with how it fits in. There is one reason why it might still be justified to keep it: Too many misconceptions about what AWK really is and what it isnt are circulating. Look at the list of topics in this "Editing" section. Someone had doubts that AWK was a "real programming" language. Someone else seriously took Kernighan's AWK web page at Bell Labs for a simple "book advert". As long as such nonsense gets written down here, there must be some place where this nonsense is corrected. Jürgen Kahrs 22:00, 17 June 2006 (UTC)
I'm not sure how the current criticisms sections helps there, I think it adds more to the confusion. As no one has objected so far, I'm removing it, if someone likes they can try to come up with something more clear and consistent. --Lost Goblin 01:23, 10 July 2006 (UTC)Reply
edit

The first link takes me to a book advert. There must be good tutorials to link to. --82.15.46.131 16:07, 10 June 2006 (UTC)Reply

It is not "a book advert", it is the homepage of the AWK bible, equivalent to what K&R is for C. Actually that book probably deserves its own wikipedia article, seems to be the only book by BWK not to have its own wikipedia article and it is as significant as the others. --Lost Goblin 22:52, 10 June 2006 (UTC)Reply
edit

The "official" AWK logo can be found on the cover of the book "The AWK Programming Language". You cam find the cover on Kernighan's "book advert" page. Is this good enough ? Jürgen Kahrs 22:08, 17 June 2006

Requested move

edit

AWK programming languageAWK (programming language) – Conformance with WP naming conventions atanamir

The following discussion is an archived debate of the proposal. Please do not modify it. Subsequent comments should be made in a new section on the talk page. No further edits should be made to this section.

The result of the debate was move as outlined. -- tariqabjotu 02:43, 7 September 2006 (UTC)Reply

Note: This poll has been transcluded onto the talk pages of a number of individual programming languages, but is in fact a subpage of Wikipedia talk:WikiProject Programming languages. When you comment, please note that this survey is for multiple programming languages, not just the one you saw it on.

Some editors have proposed a general rename of articles named with the pattern "FOO programming language" to the pattern "FOO (programming language)". Please note that this poll only is applicable to those programming languages whose names alone would introduce ambiguity. For example, programming languages such as Java and C , whose names alone are ambiguous, would be at Java (programming language) and C (programming language), respectively. Unique names such as Fortran and COBOL, should remain at their respective simple names.

For instructions on how to add a poll participation request to additional applicable article talk pages, please see: Wikipedia talk:WikiProject Programming languages#Poll procedure

Please add "* Support" or "* Oppose" followed by an optional brief explanation, then sign your opinion with ~~~~

Voting

edit
  • Abstain Support - I initially abstained because I just wanted to get a procedure rolling. Looking at the first few comment, I support the rename. As with other editor, I only want this where ambiguity exists in the name: e.g. for "Python" but not for "Perl". Also, something like "Python programming language" would still redirect to "Python (programming language)" under the proposal, so existing links would not break. LotLE×talk 22:32, 1 September 2006 (UTC)Reply
  • Support - However, I would object to specifying "programming language" anywhere in the title, as parenthetic remark or not, if the name of the language itself does not have any ambiguity issues. For example C programming language should change to C (programming language) (since C is already taken), but Fortran should stay at Fortran. --Serge 23:24, 1 September 2006 (UTC)Reply
  • Support - originator of the request; it would also meet the common names policy and also meet the disambiguation guideline. atanamir 23:32, 1 September 2006 (UTC)Reply
  • Oppose. The convention has been "<name of language> programming language" for quite a while and I don't think it helps by changing it now. There are already redirects in place for "<name> (programming language)" and it would only add more work to move them all there. Also, it goes against conventions in other media. In books related to programming on the copyright page where it sometimes has sorting information for the book many books say "Computers & Internet - <name> programming language I. Title" or something similar. - DNewhall 23:32, 1 September 2006 (UTC)Reply
  • Oppose. To quote Wikipedia:Disambiguation, "When there is another word (such as Cheque instead of Check) or more complete name that is equally clear (such as Titan rocket), that should be used.". It is undeniable that the "C programming language" is a widely-understood name, not just a description. There's a reason K&R's book is called The C Programming Language rather than C, a Programming Language. Diverse examples from other areas include French language, Titan rocket, sticking plaster, bread roll, contract bridge. What makes programming languages different from these topics? Deco 23:44, 1 September 2006 (UTC)Reply
    • If those articles were named like the programming languages are currently, they would have been something like sticking plaster dressing, bread roll food, and contract bridge card game. Titan rocket, in fact, is a redirect to Titan (rocket family). The natural languages are a slightly odd exception to the normal convention, but i'm not a linguist, and not about to argue with them. (I do know, however, that many non-English Wikipedias use the normal (parenthesized) disambiguation convention for natural languages.) --Piet Delport 13:40, 2 September 2006 (UTC)Reply
      • Apologies for the bad example - Titan rocket was moved since it turned out to be a rocket family, but others such as Angara rocket were not. The controlling question here is whether "C programming language" is a "more complete name" for C. I argue that it is, and so standing guidelines strongly support the current name. Deco 10:12, 3 September 2006 (UTC)Reply
        • I would argue that isn't. You can say "I play contract bridge" and "I use C", but not "I use C programming language". You can expand the names into noun phrases, as in "I play the contract bridge card game" and "I use the C programming language", but in both cases "the * card game" and "the * programming language" are not part of the name itself, anymore. --Piet Delport 06:04, 4 September 2006 (UTC)Reply
          • The presence or absence of a leading article is not a reliable indicator of whether it's a name or not, as indicated by French language, unless you wish to expand this proposal to move X language -> X (language) as well. Deco 06:28, 4 September 2006 (UTC)Reply
            • Definitely not something i'm interested in pursuing; let the linguists and editors involved with natural languages worry about their own naming convention. --Piet Delport 12:09, 4 September 2006 (UTC)Reply
              • (I know I am commenting on a now old post, but...) My take on "French language" is that it's different from "C programming language" since French is the language of the French. However, "C" is not a language named after a culture, country, or people (or anything). "C" only refers to C; "French" refers to a whole lot more than a language. Also, "French" is descriptive, but "C" is not. There's no need to clarify "C" or let it modify a noun. But being that a one letter name for something is inherently ambiguous, as well as names such as "Java" or "Python" (as already mentioned), there needs to be the parenthetical, "(programming language)".
  • Support - due to its name being "Ruby". --Yath 01:31, 2 September 2006 (UTC)Reply
  • Support - this is the standard way that most Wikipedia articles are named. Use the common name and disambiguate appropriately using parentheses when necessary. --Polaron | Talk 01:43, 2 September 2006 (UTC)Reply
  • Oppose - For the same reasons as DNewhall. Chris Burrows 02:11, 2 September 2006 (UTC)Reply
  • Oppose — Per Deco, I don't see how adding parentheses to an article title which is already clear is an improvement. --Craig Stuntz 02:47, 2 September 2006 (UTC)Reply
  • Support -- Crypotography has had much the same problem for some time. It has adopted the "<topic> (cryptography)" approach which has worked well. Not elegant perhaps, but ... ww 05:20, 2 September 2006 (UTC)Reply
  • Oppose — Either way, there should be a second link so that both "C (programming language)" and "C programming langage" produce the C article. My main reason for opposing is that it isn't really consistent with the new "C programming language, criticism" page that was spun off the main C article; what would that name turn into? By the way, the official standard name is "programming language C", but to me that sounds too much like "PL/C" which would be wrong. Deco's remark is quite right. — DAGwyn 07:56, 2 September 2006 (UTC)Reply
  • Comment. This proposal is different from the original proposal, found here, which is now understood as having unanimous consensus in favour. Please do not interfere with the original proposition by misrepresenting it and opening a straw poll here, which can only serve to undermine the usefulness of the original proposal. It would have been much better to simply post a link. - Samsara (talkcontribs) 09:40, 2 September 2006 (UTC)Reply
The original proposal seems pretty wacko to me, and I don't see any evidence of a consensus. As I understand it, this current section is not a "straw poll", but a genuine attempt to determine whether or not to move the C article to a new name, independently of whether that wacko proposal is accepted. — DAGwyn 09:53, 2 September 2006 (UTC)Reply
In what way is "C programming language" misleading? I can't think of a more natural title for such an article. — DAGwyn 05:48, 4 September 2006 (UTC)Reply

Discussion

edit

Response to DNewhall's comment

edit

In order to reduce clutter in the voting section, i've deicded to respond to DNewhall's vote here. If you're afraid of the amount of work it would take to move the articles, I can move most of them and i'm sure there are other editors willing to take up the task. Also, most books about programming languages simply have the title or common name of the programming language as the title of the book -- the Wrox series uses "Professional PHP" or "professional Java", not "professional PHP programming language" or "professional Java programming langauge". Many of the books I have also have the sorting information as "Computers -- Programming languages -- X," where X is the programming language. atanamir 23:36, 1 September 2006 (UTC)Reply

The main issue is not that I'm afraid of the work but that it'll be a lot of work with next to no perceived benefit. Both "Euphoria programming language" and "Euphoria (programming language)" go to the same page and I (and others apparently) fail to see how that is an improvement over the current convention. The text is exactly the same, you're just adding parentheses. No one is going to get confused about the lack of parentheses (also remember that the names with parentheses already have redirects in place). Is "<name> (programming language)" a more correct title for the article? Arguably. Is it worth the effort of moving all the pages over from their perfectly understandable title to a title that already has a redirect in place for it? No. - DNewhall 16:10, 2 September 2006 (UTC)Reply
I think you misunderstand the point of stylistic consistency on Wikipedia. Any one article in isolation would be fine under either convention; in fact, if the project was only the one article on, e.g. "C programming language" there would be no contrast with all the other uses of parens for disambiguation. But if WP (or some subset) was prepared for print or other syndication, having relatively consistent stylistic choices helps a lot (article naming is, of course, just one small issue among many others, of course). The work involved in a rename would, obviously, be a tiny fraction of the work involved in discussing the question, so that is "vanishingly insignificant". LotLE×talk 16:42, 2 September 2006 (UTC)Reply
When it comes to C, we need to clear and distinct names for the articles on the programming language article and for the book. C (programming language) and The C Programming Language (book) are those two names. They are unambiguous and (or is that because?) they conform with the Wikipedia standard. Anything else should be a redirect to one or disambig page to both. 'C programming language' should redirect to the language and 'C Programming Language' to the book or a disambig page. The existence of a book called 'The C Programming Language' is actually an argument in Support. Aaron McDaid (talk - contribs) 12:49, 4 September 2006 (UTC)Reply
... Appending to own comment ... It's never referred to directly as 'C programming language'. It's always 'C' or 'the C programming language. Note the ' the '. The latter is of the form 'the X Y' where X is the name and Y is the type of object. 'the X Y' (or even 'X Y') is not a new name for the object, simply a way to refer to X where there may be some ambiguity. Aaron McDaid (talk - contribs) 13:07, 4 September 2006 (UTC)Reply

Repsonse to Deco's comment

edit

Imagine if you have a set of objects which all fall under the same category -- let's say they're all different types of Widgets. The types are Alboo, Kabloo, Hello, Wawoob, Baboon, Choogoo, Chimpanzee, etc. Because some will cause ambiguity -- Hello, Baboon, and Chimpanzee -- they need to be disambiguated. However, since the common name (in this case, the real name) is "Hello," "Baboon," and "Chimpanzee," wikipedia has an established precedent of using parentheses. Thus, the unique widgets, Alboo, Kabloo, Wawoob, Coogoo, can have articles simply at the name itself; but the ambiguous names should have articles at Hello (widget), Baboon (widget), and Chimpanzee (widget). Thus, the article titles will be uniform in that they are all "at" the name itself, but with a disambiguator on several of them. This is easier than making all of the articles at Alboo widget, Kabloo widget, Hello widget, etc. Also, it allows for the pipe trick, so links can easily be made with [[Hello (widget)|]] --> Hello. atanamir 23:54, 1 September 2006 (UTC)Reply

  • Titan rocket may now be a redirect, since it turned out to be a family of rockets rather than a single rocket, but there are still many rockets named that way (e.g. Angara rocket) and it's still cited on Wikipedia:Disambiguation specifically. The miniscule convenience of the pipe trick is not a reason for anything. My point is that this is a much wider concern than programming languages alone and represents a significant departure from the disambiguation guidelines. It would be radical to make such changes in a single area without raising them to the wider community, when your argument seems to apply to everything. The point of contract bridge and bread roll is that the more common names for these topics are "bridge" and "roll". Deco 07:48, 2 September 2006 (UTC)Reply

Simpler disambiguation

edit

Even if we add the parentheses, the guideline at Wikipedia:Disambiguation#Specific topic makes sense to me:

If there is a choice between disambiguating with a generic class or with a context, choose whichever is simpler. Use the same disambiguating phrase for other topics within the same context.

For example, "(mythology)" rather than "(mythological figure)".

In this case, we could have the simpler and more widely applicable "(computing)" instead of the long "(programming language)". --TuukkaH 10:04, 2 September 2006 (UTC)Reply

I agree with the sentiment, but i think "(computing)" is too wide, with way too much opportunity for clashes:
"(programming language)" might lean towards the long side, but i don't think any alternative class comes close to being as simultaneously large, well-defined and well-populated. --Piet Delport 15:14, 2 September 2006 (UTC)Reply
I agree that if we were to use parentheses, "(computing)" is not specific enough. Your examples are excellent, particularly "Icon", which clashes with an already-existing article! Deco 10:40, 3 September 2006 (UTC)Reply
Perhaps you're right in that it's not specific enough. On the other hand, the disambiguation can never be perfect as there are several programming languages that share a name: NPL has three programming languages, The Language List has four programming languages called G. What about "(language)" then? --TuukkaH 22:02, 3 September 2006 (UTC)Reply
"Language" connotes something rather different from "programming language". "Lisp (language)" for example. "Programming language" is the accepted category in the industry, abbreviated to "PL" quite often in discussions (whereas "L" is never used for this). — DAGwyn 05:59, 4 September 2006 (UTC)Reply
What about just "(programming)"? Or is that too ambiuguous as well? atanamir 02:39, 5 September 2006 (UTC)Reply
The above discussion is preserved as an archive of the debate. Please do not modify it. Subsequent comments should be made in a new section on this talk page. No further edits should be made to this section.

To meet the new standard, the pages should be moved to something like Criticism of C (programming language), right? examples are Georgia (U.S. State) and Politics of Georgia (U.S. state). atanamir 02:42, 5 September 2006 (UTC)Reply

Depends on the page in question, most likely; some would work like above, some (like C syntax) wouldn't require any changes, and some might want to use a different method to disambiguate. --Piet Delport 05:55, 5 September 2006 (UTC)Reply
Agreed with Piet; only the ones that would incite ambiguity -- simply "Criticism of C" would have ambiguity, but "C syntax" or "Syntax of C" are both rather unambiguous and would not need change. atanamir 06:01, 5 September 2006 (UTC)Reply
Surely, criticism of C is pretty unique and should be the article? Are there any other C's that would be criticized? Aaron McDaid (talk - contribs) 21:41, 5 September 2006 (UTC)Reply
I agree that the most likely "C" to be criticised is the programming language, but some may be looking for a criticism of the letter or magazine. Unlikely, but possible. This decision would be left up to the community, though. atanamir 01:57, 6 September 2006 (UTC)Reply
As of now, there is only one C that is criticized on Wikipedia, and I am not aware of anyone wanting to write an article criticizing any other Cs. Therefore, criticism of C is unique. The Wikipedia standard is to only disambiguate when necessary. That article should be moved to criticism of C at some point, but we should let this debate finish first. Aaron McDaid (talk - contribs) 09:16, 6 September 2006 (UTC)Reply
For the record, "Criticism of C" didn't even exist until I created the redirect yesterday. Was kind of surprised because it was at that wierd, longish name and is a pretty good article :). RN 10:19, 6 September 2006 (UTC)Reply
The C criticism article was split off from the main C article, where it had previously been embedded, in response to a requirement in order for the main C article to be designated a "Good Article". I picked the name with the idea that it was a sub-article of the main one. Once the discussion has settled, I don't object to some reasonable renaming, so long as the links between the two articles are fixed up so they still point to each other. — DAGwyn 21:51, 6 September 2006 (UTC)Reply
Aaargh! Whoever just renamed the main C article ignored this linking issue. I have edited the C criticism article so its link to the C article does not have to redirect. — DAGwyn 20:20, 7 September 2006 (UTC)Reply
The term "criticism" should not be used (I've stated reasons for this on Talk:C (programming language); the more accurate term of "analysis" or something similar should be used. Dysprosia 03:54, 7 September 2006 (UTC)Reply
You also received feedback to the effect that criticism doesn't have to be negative, that the article is fairly balanced, and that a list of limitations has to seem somewhat negative no matter how well-intentioned it may be. The C criticisms article is not at all a complete analysis of the language, just a description of the many characteristics of C that have drawn reasonable criticism. Since C is so popular and wide-spread, it is a target for a lot of sniping and second-guessing, and it is undeniable that that has happened, which is part of what the C criticism article specifically addresses. One of the useful functions of the C criticism page is to bring some balance to that criticism. — DAGwyn 20:20, 7 September 2006 (UTC)Reply
I also responded to that comment by saying (and I'll repeat the comment here for the benefit of readers of this page) that the term "criticism" still has primarily a negative connotation and that because of this it is an undesirable term. The article in question has the potential to contain discussion on design points on the language and opinions on those who comment on these design points. That is an analysis of the design of the language, and has the potential to encompass views from all points on the spectrum on the matter. Dysprosia 07:43, 8 September 2006 (UTC)Reply
I just want to chip in that i agree with DAGwyn that "criticism" does not carry negative any primarily negative connotations in this context. As the criticism article says:
"In literary and academic contexts, the term most frequently refers to literary criticism, art criticism, or other such fields, and to scholars' attempts to understand the aesthetic object in depth."
There are certain fields ("In politics, for instance [...]") where "criticism" connotes mainly negative criticism, but it should be reasonably clear that encyclopedias won't limit themselves to that. --Piet Delport 23:32, 10 September 2006 (UTC)Reply
Technically, it shouldn't carry any as you suggest but most seem to think it is a dumping ground for it. I would recommend "Analysis" as that's what I'm doing for criticism page I watch. RN 23:43, 10 September 2006 (UTC)Reply
"Analysis" usually implies something more formal, complete and reductionistic, though. Is that what the article is aiming for? --Piet Delport 00:00, 11 September 2006 (UTC)Reply
It doesn't need to imply that. The article in question however should aim to examine as many viewpoints on as many language points as possible. Dysprosia 02:33, 11 September 2006 (UTC)Reply
Unfortunately, the C (programming language) article itself does force the negative connotation on the reader by saying "Despite its popularity, C has been widely criticized. Such criticisms fall into two broad classes: desirable operations that are too hard to achieve using unadorned C, and undesirable operations that are too easy to accidentally achieve while using C. Putting this another way, the safe, effective use of C requires more programmer skill, experience, effort, and attention to detail than is required for some other programming languages." That whole paragraph implies that the article Criticism of the C programming language is negative (why else say "Despite its popularity" and then cite two negative classes?) Mickraus 17:14, 24 January 2007 (UTC)Reply
I'll just wait for someone else to paint the bikeshed — Preceding unsigned comment added by 121.211.204.77 (talk) 12:52, 6 July 2015 (UTC)Reply

domain specific lang or general purpose?

edit

The domain specific language article lists awk as a dsl. This article is also categorized as a DSL. However, the first line states it's a general purpose language. There should be some clarification on both pages. User:Mahanga 04:32, 9 May 2007 (UTC)Reply

bug in hello world example

edit

the hello world program is incorrect, it is missing an exit command at the end of the begin block, or else program will be waiting indefinitelly for an EOF coming from standard input.

No it doesn't, at least not in the variants I use. In the BEGIN-block AWK doesn't do any reading on the input and in the main-block there is no code so it exists immediately. --Marbl3s (talk) 10:23, 6 June 2008 (UTC)Reply

Note added 7/4/16: Depends on which version. In "old AWK", the comment is correct. Programs consisting of only a "BEGIN" block would still try to read input. This was fixed in "new AWK" and all subsequent versions. — Preceding unsigned comment added by 66.190.12.101 (talk) 20:11, 4 July 2016 (UTC)Reply

awka is virtually inexistent

edit

The link to sourceforge is valid, but there is no data/source/etc. to download on the page. —Preceding unsigned comment added by 217.88.202.92 (talk) 13:28, 25 December 2009 (UTC)Reply

who wrote this?

edit

awk programs are NOT pattern-action statements, and that's obvious to anyone that's actually knowledgable about awk

consider this implementation of uniq(1) in awk:

$ printf %s\\n a c d c b d | awk '!o[$0]++'
a
c
d
b

there's not PATTERN because the the conditional has to do with the value of an associative array, not the result of a regular expression, which is what the article refers to as "patterns"

there's no ACTION because print is the deffaut action —Preceding unsigned comment added by 190.36.145.91 (talk) 12:06, 26 February 2011 (UTC)Reply

'!o[$0]++' is the pattern. As you said, 'print $0' is the (default) action. The pattern need not necessarily be a regular expression: "Patterns are arbitrary Boolean combinations (with ! || &&) of regular expressions and relational expressions." [1] 70.225.163.47 (talk) 05:14, 1 May 2011 (UTC)Reply

Opening paragraph should clearly state that it is a programming language

edit

The current opening is:

"The AWK utility is a data extraction and reporting tool that uses a data-driven scripting language consisting of a set of actions to be taken against textual data (either in files or data streams) for the purpose of producing formatted reports. The language used by awk extensively uses the string datatype, associative arrays (that is, arrays indexed by key strings), and regular expressions."'

That opening paragraph fails to clearly state that AWK is an interpreted programming language. You can further categorise afterwards but I'd like to differentiate it from a photocopier fairly quickly :-)
--Paddy (talk) 21:57, 22 May 2011 (UTC)Reply

grepinawk code examples

edit

I think the code examples should use "$@" instead of $*, at least on my machine it allows whitespace in the input files while the other does not. Also, shouldn't the pattern variable be exported in order for it to work? At least that's the case for me, but maybe it works this way for others? — Preceding unsigned comment added by 92.76.123.142 (talk) 23:40, 30 May 2011 (UTC)Reply

Logos, etc

edit

Some books (O'Reilly for instance) use a drawing of a bird to hint at the contents, the authors of awk and the standards committees do not appear to associate any particular representation, nor is there a suitable trademark to refer to. So AWK's relationship to the bird has the status of a pun. TEDickey (talk) 16:28, 31 August 2012 (UTC)Reply

It's more subtle than that. AWK was created in an age when software projects didn't have logos like they do today. However, the AWK Programming Language book that was published by the authors of the language features the bird on the cover, as does Arnold Robbins' Effective AWK Programming. These are the two standard references. The auk bird has a status much like the Perl Camel, and no one will argue that that animal is a pun or "not official", or something. 128.226.130.73 (talk) 21:49, 3 September 2012 (UTC)Reply

Actually, the publisher is Addison Wesley], not "the authors of the language". The images used for those books have restrictions on their reuse because they're used as part of advertising (and this topic wouldn't meet the guidelines for incorporating that material because it is not dealing directly with the book). Introducing yet another image doesn't help the reader, since it doesn't have any relationship to the books. Not all books on awk use that image, e.g,. sed & awk, and this copy of Arnold's book GAWK: Effective AWK Programming. TEDickey (talk) 22:17, 3 September 2012 (UTC)Reply

Bossypants, have you read anything I said at all? And have you recently added to any kind of article? 128.226.130.73 (talk) 22:22, 5 September 2012 (UTC)Reply

I've read your edits, which are uncivil, and in other cases are unconstructive. TEDickey (talk) 00:07, 6 September 2012 (UTC)Reply

Piping and redirection

edit

These aren't awk functions; they're specific to the operating system. ____ Kernel.package — Preceding unsigned comment added by 71.211.235.93 (talk) 22:00, 28 June 2013 (UTC)Reply

They are awk features, as described in the standard (see http://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html#tag_20_06_13_10) . TEDickey (talk) 22:12, 28 June 2013 (UTC)Reply

Damaged sentence in 4th paragraph?

edit

The 4th paragraph contains the following sentence, which does not seem to make sense:

"The power, terseness, and limits of early AWK programs inspired Larry Wall to write Perl just as a new, more powerful POSIX AWK and gawk (GNU AWK) were being defined."

To me it looks as if part of he sentence (after "more powerful": a more powerful what?) got lost by accident. --Rüdiger Kupper (talk) 11:27, 30 July 2013 (UTC)Reply

Think it's meant to say that Posix Awk and Gawk are more powerful than regular garden-variety Nawk... AnonMoos (talk) 23:35, 31 July 2013 (UTC)Reply

"replaced by Perl"

edit

Awk has long been superseded for complex programs with many lines, but I think a significant number of people still find it convenient for very small programs (one-liners and such), frequently invoked directly from shell. Awk is also the only general-purpose language in the Posix standard intermediate in capability and power between "sh" and "C"...AnonMoos (talk) 20:41, 18 September 2013 (UTC)Reply

I see no WP:RS here or in the topic (Perl isn't going to replace Awk in any of my portable scripts simply because Awk is standard, while Perl is not -- and is unlikely to ever be -- and you're most likely to find support for the statement from people who don't focus on portability) TEDickey (talk) 22:09, 18 September 2013 (UTC)Reply
We can recognize that AWK's peak of popularity was probably in the late 1980s or beginning of the 1990s, and that few substantial programming projects are undertaken in AWK, while also recognizing that a significant number of people find AWK convenient for various supplemental purposes, and that it's deeply embedded within widely-adopted standards, and is not going anywhere anytime soon. So I'm not sure that simply saying that it's been "replaced by Perl" is a fair summary... AnonMoos (talk) 17:44, 21 September 2013 (UTC)Reply
I generally agree - but finding reliable sources by knowledgeable people is the hard part TEDickey (talk) 18:34, 21 September 2013 (UTC)Reply

Awk was never "superseded" because it was never used for "complex programs". It is used for programs of this scale, which are complex enough, depending on your opinion. Perl competes with Awk for "market share" but so do many others. They all have pros and cons and incredibly some people program in more than one language. Awk could not have peaked in the 80s, prior to the invention of Linux, when it was deployed on millions of installs during the 1990s and 2000s. It's only true for certain people born in certain years ie. I remember when Awk was the new thing, and ignores the bulk of users who experienced Awk for the first time in the 90s and 00s (when the O'Reilly books were published). A check of Stack Exchange and Unix.com shows Awk is as alive and well as ever. A check of Awk development (for GNU) shows it has seen more new features added in the past 4 years than in the prior 15. Comparisons of popularity with Perl are individual opinions, like saying Michael Jackson is hot or not. -- GreenC 20:04, 4 October 2014 (UTC)Reply

awk/nawk aliases

edit

It's common for related packages to have aliases for programs which are similar to those on other systems. That doesn't make them "also known as". Otherwise, we would have "bison, also known as yacc". Anywhere except Wikipedia, that sort of thing would be dismissed immediately. Here, we want a reliable source TEDickey (talk) 01:27, 27 February 2015 (UTC)Reply

Comparison of awk implementations

edit

Currently the article claims mawk is a very fast AWK implementation, but I have a counter example: match execution time grows exponentially on certain regular expressions, when you increase input length For some reason mawk is the default awk interpreter in most Ubuntu Linux variants, but the version of mawk, that is inherited from Debian Linux, is old; it does not contain fixes made by mawk's new maintainer Thomas E. Dickey since 2009. mawk's WWW-site — Preceding unsigned comment added by Selkänahka (talkcontribs) 21:58, 1 September 2015 (UTC)Reply

Maybe/maybe not: the drawback to bug-reports is that they are not a reliable source. A review by a knowledgeable reviewer of several implementations would be a reliable source. The problem with using bug-reports is two-fold: (a) the source is selective (chosen to illustrate a point), and (b) a large percentage of bug-reports simply are invalid, or consist largely of irrelevant information which the developer must study to get useful information. Thus it requires the knowledgeable review to make it suitable for use in sources. TEDickey (talk) 01:04, 2 September 2015 (UTC)Reply

Linux distribs do not always include gawk...

edit

The line:

[quote] Linux distributions are mostly GNU software, and so they include gawk. [/quote]

is actually not true, although it certainly should be.

Debian-based distros (which is many/most of them) tend to ship with mawk as "awk" and require an explicit "apt-get install" to get GAWK.

Personally, I think this is a shame - because GAWK is so much better - but that is the way that it is. — Preceding unsigned comment added by 66.190.12.101 (talk) 20:01, 4 July 2016 (UTC)Reply

edit

Hello fellow Wikipedians,

I have just modified 4 external links on AWK. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:

When you have finished reviewing my changes, please set the checked parameter below to true or failed to let others know (documentation at {{Sourcecheck}}).

This message was posted before February 2018. After February 2018, "External links modified" talk page sections are no longer generated or monitored by InternetArchiveBot. No special action is required regarding these talk page notices, other than regular verification using the archive tool instructions below. Editors have permission to delete these "External links modified" talk page sections if they want to de-clutter talk pages, but see the RfC before doing mass systematic removals. This message is updated dynamically through the template {{source check}} (last update: 5 June 2024).

  • If you have discovered URLs which were erroneously considered dead by the bot, you can report them with this tool.
  • If you found an error with any archives or the URLs themselves, you can fix them with this tool.

Cheers.—InternetArchiveBot (Report bug) 12:19, 1 October 2016 (UTC)Reply

Curious forward reference to nothing

edit

In the section on books on awk, it says: "Free download of this manual is possible through the following book references."

But no book references follow! GeneCallahan (talk) 06:19, 18 February 2017 (UTC)Reply

Yeah, I don't know what that was about. I thought the few sentences there (which were just tacked on after the {{Cite book}} call) might be a quote from the linked source, but they don't appear anywhere there. It appears they were just some notes written by whoever added the book reference. Unnecessary non-sequitur notes, so out they came. -- FeRDNYC (talk) 11:55, 29 September 2018 (UTC)Reply

Unicode

edit

I would like to see something written about unicode support or lack of it. It predates unicode of course, but I haven't researched into this to see if anyone has been able to do anything about it. It seems to me that using UTF8 would not be good at all, with very bad results in some cases, yet fine in others. Could anyone help? CecilWard (talk) 11:37, 17 February 2018 (UTC)Reply

Keeping mind the usual guidelines on reliable sources and original research. You may not find much that's useful which meets both of those TEDickey (talk) 13:36, 17 February 2018 (UTC)Reply
CecilWard -- I think UTF-8 will pretty much work in gawk internal processing, unless you use as an array index a sequence of characters which contains the defined SUBSEP character. AnonMoos (talk) 15:04, 3 October 2018 (UTC)Reply
P.S. Gawk apparently can pay attention to "locale" settings, but this seems to mainly affect the meaning of regexp specifications... AnonMoos (talk) 09:28, 4 October 2018 (UTC)Reply

Match pattern from command line

edit

Do we really need three examples of using Bash + AWK for implementing a single feature in an article devoted to AWK? This is encyclopedia, not StackOverflow or a Linux programming tutorial. A single example without Bash would be sufficient, I think. --Amakuha (talk) 13:33, 24 February 2018 (UTC)Reply

@Amakuha: That's a plague that has been raging across Wikipedia's computing articles for some time. This isn't even one of the more egregious examples, really. (For my money, this is, in the sense that it shouldn't be an article unto itself at all, but rather should occupy a tiny section of the XRI article.) My personal take is, this article should be shorter by about 2/3, and the "Commands", "Sample Applications", and "Self-contained AWK scripts" sections would be gone entirely, because none of that has anything to do with an encyclopedia.
Even though I say that XRDS is one of the more egregious examples (because it shouldn't exist), it does represent a good model for what the AWK article should be. It contains exactly one "Example XRDS document", which is just dumped right in there, all syntax-highlighted, in its entirety. It then proceeds to discuss exactly nothing of the syntax, structure, or purpose. Because, again, encyclopedia.
In a similar vein, I feel like there's a need here for exactly one "A simple AWK program" listing, just so the reader can get an idea of what they look like. Perhaps with example input and output to illustrate the purpose, and maybe a broad discussion of what makes up the program listing (pointing out the pattern-action structure). That would also remove the need for the confusing pseudocode in § Structure of AWK programs:

An AWK program is a series of pattern action pairs, written as:

condition { action }
condition { action }
...

where condition is typically an expression and action is a series of commands.

(Is it a pattern or a condition or an expression? Why would the article be directly contradictory on that point?)
However, I know that trimming 2/3 of the length out of an article like this will upset far too many people who feel invested in its existing content. Either because they take a blanket "more is better" view of Wikipedia as a whole, or because they feel that an article's length is somehow a reflection of the importance of its topic, and they don't want to see AWK "demoted" by the removal of unnecessary cruft. So, ¯\_(ツ)_/¯. -- FeRDNYC (talk) 12:46, 29 September 2018 (UTC)Reply


FeRDNYC -- XRDS is a "batch" query specification language, and so is a rather different beast from a programming language like AWK. In the case of an active programming language, there's naturally a tendency to show how it actively does things (thus the traditional "Hello, World!" program). AWK programs are characteristically often very short ("one-liners"), and AWK has a basic programming model that's rather different from what you see in "C" or Pascal or BASIC, so the use of example programs in the article does not seem excessive to me... AnonMoos (talk) 15:15, 3 October 2018 (UTC)Reply

Trimmed entry for gawk in "Versions and implementations"

edit

The entry for gawk in § Versions and implementations previously contained the following:

Linux distributions are mostly GNU software, and so they include gawk. FreeBSD before version 5.0 also included gawk version 3.0, but subsequent versions of FreeBSD use BWK awk to avoid the more restrictive GNU General Public License (GPL), as well as for its technical characteristics.[1][2]

References

  1. ^ FreeBSD's view of GPL Advantages and Disadvantages
  2. ^ FreeBSD 5.0 release notes with notice of BWK awk in the base distribution

It's rare to see such a textbook example of improper WP:SYNTH, but oh man this is some kind of poster child. Let me count the ways:

  1. The first sentence: Linux distributions are mostly GNU software, and so they include gawk. It's (a) uncited, (b) not accurate (see Talk:AWK#Linux_distribs_do_not_always_include_gawk... above), and (c) creating causation out of thin air. There's nothing presented to justify either half of the sentence even as simple factual statements, but even if there was it wouldn't support the implied "and so" relationship.
  2. The second sentence has two citations.
    1. At the second cite, the entirety of anything AWK-related is as follows: The system awk(1) now refers to BWK awk. That's it. Absolutely no reason for the change is given, so it certainly doesn't support the claims made in the article about what those reasons are.
    2. The first cite makes no mention of AWK or BWK awk anywhere, because it's a link to the entire document on BSD vs. GPL, from the FreeBSD team no less (an obviously biased source, on that topic).
  3. The use of those two citations here is thus an attempt to relitigate the licensing debate on the pages of the AWK article, because nothing presented indicates that licensing had anything to do with the switch from gawk to BWK awk. Again: no reason is ever given!

Is it possible that the reasons given for the FreeBSD switch are accurate? Absolutely. But there's nothing in any of the cited materials that even remotely supports the claims made, so it's pure WP:OR without some relevant citations to back it up.

I've therefore replaced the entire text above with:

Some Linux distributions include gawk as their default AWK implementation.[citation needed]

because I don't even have a source for that claim handy. -- FeRDNYC (talk) 13:52, 29 September 2018 (UTC)Reply

fwiw, the commit-comments in FreeBSD's subversion only hint that there was some problem porting gawk to spark64. There's no bug-report cited in any of that, however. There might be some mailing-list archive mentioning the issue. In any case, license didn't appear to be a factor TEDickey (talk) 14:29, 29 September 2018 (UTC)Reply

website property

edit

A link to source-code (no documentation) for a particular implementation is off-topic, since this topic deals with the programming language. For instance, the POSIX description of Awk goes into far more depth than the sketchy manual page on the Github site. TEDickey (talk) 19:42, 24 August 2022 (UTC)Reply

reference to Paul Rubin

edit

Is the Wikipedia reference to Paul Rubin correct? It is incredible that an economist contributed to AWK. Vveckaln (talk) 12:00, 22 September 2022 (UTC)Reply

The Paul Rubin who had a large early role in GAWK (not official AWK) was a person involved in the GNU project in the 1980s. You can read a short bio of him here... AnonMoos (talk) 14:49, 22 September 2022 (UTC)Reply
then is the hyperlink correct? Vveckaln (talk) 14:54, 22 September 2022 (UTC)Reply
You should have been able to easily figure that out on your own. Since the Paul Rubin economist article makes no reference to studying at Berkeley, and his education was over long before 1987, I would say that it is not valid. AnonMoos (talk) 14:57, 22 September 2022 (UTC)Reply
So... I removed the Paul Rubin link. MichielN (talk) 11:45, 2 October 2022 (UTC)Reply

persistent memory gawk

edit

gawk 5.2 (released September 2022) includes a persistent memory feature that I believe is worthy of mention. For the basics see "man gawk" in 5.2 or later, or for more detail the pm-gawk user manual, which is included in TeXinfo form in the gawk distribution and is also available in PDF format here:

http://web.eecs.umich.edu/~tpkelly/pma/pm-gawk_rev1.52_2022.08aug.16.pdf

A brief description of the feature along with the example in the "Quick Start" section of the user manual above, along with a link to the user manual, might be sufficient to enable interested readers to find additional details on their own.

The persistent memory allocator upon which pm-gawk is based is described here:

https://dl.acm.org/doi/pdf/10.1145/3534855

-- Terence Kelly 38.99.114.119 (talk) 23:19, 30 November 2022 (UTC)Reply

I added a one-sentence mention of persistent-memory gawk and included a URL to the User Manual at gnu.org.

-- Terence Kelly — Preceding unsigned comment added by 50.250.213.78 (talk) 06:49, 20 December 2022 (UTC)Reply

As presented there, that's abusing WP:EL as well as WP:UNDUE. Toning down the self-promotion would be an improvement TEDickey (talk) 09:00, 20 December 2022 (UTC)Reply