[IRC-DEV] Fw: [Coder-Com] Tokenising IRC

Victor Roman victor.roman at sionhq.com
Wed Jan 14 23:34:29 CET 2004


Forwarded by Victor Roman <victor.roman at sionhq.com>
----------------------- Original Message -----------------------
 From:    Perry Lorier <perry at coders.net>
 To:      coder-com at undernet.org
 Date:    Thu, 15 Jan 2004 10:34:55 +1300
 Subject: [Coder-Com] Tokenising IRC
----


This email goes out to client authors, and would be client authors.  You
know who you are. :)  Today I want to talk about tokenising IRC,
specifically how to parse what the IRC server sends you.

First, according to rfc 1459, a line from an IRC server may begin with a
token starting with a ":".  This token is the source of the message.  If
this token is not sent by the server, you can assume it is from the
server.  For instance:

:foo.undernet.org NOTICE nick :*** Welcome to foo
and
NOTICE nick :*** Welcome to foo

are equivilent (assuming the local server is "foo.undernet.org"),
fortunately most IRC clients support this.

However, things like:

351 nick u2.10.11.06. Foo.Undernet.org :B30AeEFfIKlMopSU

are also valid, however most clients barf on it (which is a pity, since
it would significantly save bandwidth from the server->client).

However, the real big issue is the ":" on the last token.

If you have the string

999 nick foo blah :blargh narf

it should be tokenised as

source=foo.undernet.org
command=999
destination=nick
arguments = ["foo","blah","blargh narf"]

NOT:
arguments = ["foo","blah","blargh","narf"]
or even:
arguments = ["foo",'blah"]
meat = "blargh narf"

or any other bizarre variations on this that client authors love to
think up.

In particular:

:foo!foo at foo.undernet.org PRIVMSG #narf Hi!

is valid, as you are only sending one word, the ":" is not necessary.
If you are sending multiple words, then the ":" is necessary.

With the 005 numeric, it's important to ignore the "last" parameter as
it is not a valid token.  If you don't keep the words after a ":"
together then you can't tell what the token is.

With other numerics/commands, ircu may decide (on a whim) to send a ":"
or not based on some arbitary criteria, please please please don't rely
on it!  Do you parsing correctly, then we can change ircu to be far more
sane with it's placement of ":"'s.  Currently every time we've changed
":"'s it's caused important clients to core when recieving those commands.

There can be up to 15 arguments after the destination in a command.

I've tried to attach some (ugly) C code that parses lines the (more or
less) the same way as ircu does and I highly recommend you check it out.
   It's not reliable enough to use in an actual program, but it should be
a good example of how to parse IRC lines.  However it doesn't make it
through the lists.

Use like so:
perry at storm:~$ ./irc_parser "330 Target foo :narf bar"
Source: foo.undernet.org
Command/Numeric: 330
Target: Target
Arg #0: foo
Arg #1: narf bar

perry at storm:~$ ./irc_parser ":bar.undernet.org SPIKE Target foo gack
naffle fish"
Source: bar.undernet.org
Command/Numeric: SPIKE
Target: Target
Arg #0: foo
Arg #1: gack
Arg #2: naffle
Arg #3: fish

---- irc_parser.c

#include <stdio.h>

char *server_name = "foo.undernet.org";

int do_command(char *source,char *command, char *target, int parc, char
**parv)
{
          int i;
          printf("Source: %s\n",source);
          printf("Command/Numeric: %s\n",command);
          printf("Target: %s\n",target);
          for(i=0;i<parc;i++) {
                  printf("Arg #%i: %s\n",i,parv[i]);
          }
          return 0;
}

int parse(char *line)
{
          char *source;
          char *command;
          char *target;
          char *arg[15];
          int args=0;
          /* Parse the source */
          if (*line==':') {
                  line++;
                  source=line;
                  while (*line!=' ' && *line)
                          line++;
                  if (!*line) {
                          printf("Error: Expected command\n");
                          return 1;
                  }
                  *line='\0';
                  line++;
          }
          else {
                  source = server_name;
          }
          /* Skip any spaces */
          while(*line==' ') line++;
          /* Parse the command */
          command=line;
          while(*line!=' ' && *line) line++;
          if (!*line) {
                  printf("Error: Expected Target\n");
                  return 1;
          }
          *line='\0';
          line++;
          /* Skip any spaces */
          while(*line==' ') line++;
          /* Parse the target */
          target=line;
          while(*line!=' ' && *line) line++;
          while(*line && args<15) {
                  *line='\0';
                  line++;
                  while(*line==' ') line++;
                  if (*line == ':') {
                          line++;
                          arg[args++]=line;
          while(*line && args<15) {
                  *line='\0';
                  line++;
                  while(*line==' ') line++;
                  if (*line == ':') {
                          line++;
                          arg[args++]=line;
                          break;
                  }
                  arg[args++]=line;
                  while(*line && *line!=' ') line++;
          }
          return do_command(source,command,target,args,arg);
}

int main(int argc,char **argv)
{
          if (argc<2) {
                  fprintf(stderr,"usage: %s ircline\n",argv[0]);
                  return 1;
          }
          return parse(argv[1]);
}









--------------------- Original Message Ends --------------------

----------------------------------
Victor Roman <victor.roman at sionhq.com>
Sion LTD - http://www.sionhq.com




More information about the IRC-Dev mailing list