Softpanorama
May the source be with you, but remember the KISS principle ;-)

Contents Bulletin Scripting in shell and Perl Network troubleshooting History Humor

Introduction to Perl 5.10 for Unix System Administrators

(Perl 5.10 without excessive complexity)

by Dr Nikolai Bezroukov

Contents : Foreword : Ch01 : Ch02 : Ch03 : Ch04 : Ch05 : Ch06 : Ch07 : Ch08 :


Prev | Up | Contents | Down  | Next

5.5. Split() Function and option g in matching

Split function is one the few Perl functions that have regular expression as an argument. Its purpose is to take a string and convert it to an array or list breaking at points where the first argument (delimiter) specified with the regular expression matches. 

The usual syntax for the split function is

list = split (pattern, string_value);

Here, string_value is the string to be split. pattern  is a regular expression to be searched for. Again, it is important to understand that a new element is started every time pattern is matched; pattern  itself is not included as part of any element serving as a separator between elements.).  The resulting list of elements is returned in list.

For example, the following statement breaks the character string stored in $line into elements delimited by ":", and store them into the array @tokens:

@tokens = split (/:/, $line);

You can specify the maximum number of elements of the list produced by split by specifying the maximum as the third argument. For example:

$line = "This:is:a:string";

@tokens = split (/:/, $line, 3);

As before, this breaks the string stored in $line into elements. After two first elements have been created, no more new elements are created. The rest of the string is assigned to the third element of arrays. A In this case, the list assigned to @list is ("This", "is", "a:string").

You can also assign to several scalar variables at once:

$line = "11 12 13 14 15";
($var1, $var2, $line) = split (/\s+/, $line, 3);

This splits $line into the list ("11", "12", "13 14 15"). $var1 is assigned 11, $var2 is assigned 12, and $line is assigned "13 14 15". This enables you to assign the "leftovers" to a single variable, which can then be split again at a later time

One or more target can be undef, if you do not want particular value

($var1, undef, $line) = split (/\s+/, $line, 3);
Split function has four major forms

Splits the string EXPR into a list of strings and returns that list. By default, empty leading fields are preserved, and empty trailing ones are deleted. (If all fields are empty, they are considered to be trailing.)

In scalar context, returns the number of fields found. In scalar and void context it splits into the @_  array. Use of split in scalar and void context is deprecated, however, because it clobbers your subroutine arguments.

If EXPR is omitted, splits the $_  string.

If PATTERN is also omitted, splits on whitespace (after skipping any leading whitespace). Anything matching PATTERN is taken to be a delimiter separating the fields.

If LIMIT is specified and positive, it represents the maximum number of fields the EXPR will be split into, though the actual number of fields returned depends on the number of times PATTERN matches within EXPR. If LIMIT is unspecified or zero, trailing null fields are stripped (which potential users of pop  would do well to remember).

If LIMIT is negative, it is treated as if an arbitrarily large LIMIT had been specified. Note that splitting an EXPR that evaluates to the empty string always returns the empty list, regardless of the LIMIT specified.

A pattern matching the null string (not to be confused with a null pattern //  , which is just one member of the set of patterns matching a null string) will split the value of EXPR into separate characters at each point it matches that way. For example:

print join(':', split(//, 'hi there')), "\n"; 

produces the output 'h:i:t:h:e:r:e'.

As a special case for split, using the empty pattern //  specifically matches only the null string, and is not be confused with the regular use of //  to mean "the last successful pattern match". So, for split, the following:

print join(':', split(//, 'hi there')), "\n";

produces the output 'h:i: :t:h:e:r:e'.

Empty leading fields are produced when there are positive-width matches at the beginning of the string; a zero-width match at the beginning of the string does not produce an empty field. For example:

print join(':', split(/(?=\w)/, 'hi there!')); 

produces the output 'h:i :t:h:e:r:e!'. Empty trailing fields, on the other hand, are produced when there is a match at the end of the string (and when LIMIT is given and is not 0), regardless of the length of the match. For example:

print join(':', split(//, 'hi there!', -1)), "\n"; 
print join(':', split(/\W/, 'hi there!', -1)), "\n"; 

produce the output 'h:i: :t:h:e:r:e:!:' and 'hi:there:', respectively, both with an empty trailing field.

The LIMIT parameter can be used to split a line partially

($login, $passwd, $remainder) = split(/:/, $_, 3); 

When assigning to a list, if LIMIT is omitted, or zero, Perl supplies a LIMIT one larger than the number of variables in the list, to avoid unnecessary work. For the list above LIMIT would have been 4 by default. In time critical applications it behooves you not to split into more fields than you really need.

If the PATTERN contains parentheses, additional list elements are created from each matching substring in the delimiter.

split(/([,-])/, "1-10,20", 3);

produces the list value

(1, '-', 10, ',', 20) 

If you had the entire header of a normal Unix email message in $header, you could split it up into fields and their values this way:

$header =~ s/\n(?=\s)//g; # fix continuation lines 
		%hdrs = (UNIX_FROM => split /^(\S*?):\s*/m, $header); 

The pattern /PATTERN/  may be replaced with an expression to specify patterns that vary at runtime. (To do runtime compilation only once, use /$variable/o  .)

As a special case, specifying a PATTERN of space (' '  ) will split on white space just as split  with no arguments does. Thus, split(' ')  can be used to emulate awk's default behavior, whereas split(/ /)  will give you as many null initial fields as there are leading spaces. A split  on /\s+/  is like a split(' ')  except that any leading whitespace produces a null first field. A split  with no arguments really does a split(' ', $_)  internally.

A PATTERN of /^/  is treated as if it were /^/m  , since it isn't much use otherwise.

Example:

open(PASSWD, '/etc/passwd');
  • while (<PASSWD>) {
  • chomp;
  • ($login, $passwd, $uid, $gid,
  • $gcos, $home, $shell) = split(/:/);
  • #...
  • }

    As with regular pattern matching, any capturing parentheses that are not matched in a split()  will be set to undef  when returned:

    @fields = split /(A)|B/, "1A2B3";
  • # @fields is (1, 'A', 2, undef, 3)

    Additional examples

    $_ = 'AB AB AC';
    print m/c$/i
  • If you split on an undefined value, the string will be split on every character:
      #!/usr/bin/perl
    
      my $data = 'abcdefgh';
      my @values = split(undef,$data);
    
      foreach my $val (@values) {
        print "$val\n";
      }

    Read more

    
    
    

    Prev | Up | Contents | Down | Next



    Etc

    FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available in our efforts to advance understanding of environmental, political, human rights, economic, democracy, scientific, and social justice issues, etc. We believe this constitutes a 'fair use' of any such copyrighted material as provided for in section 107 of the US Copyright Law. In accordance with Title 17 U.S.C. Section 107, the material on this site is distributed without profit exclusivly for research and educational purposes.   If you wish to use copyrighted material from this site for purposes of your own that go beyond 'fair use', you must obtain permission from the copyright owner. 

    ABUSE: IPs or network segments from which we detect a stream of probes might be blocked for no less then 90 days. Multiple types of probes increase this period.  

    Society

    Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers :   Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism  : The Iron Law of Oligarchy : Libertarian Philosophy

    Quotes

    War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda  : SE quotes : Language Design and Programming Quotes : Random IT-related quotesSomerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose BierceBernard Shaw : Mark Twain Quotes

    Bulletin:

    Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 :  Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method  : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law

    History:

    Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds  : Larry Wall  : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOSProgramming Languages History : PL/1 : Simula 67 : C : History of GCC developmentScripting Languages : Perl history   : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history

    Classic books:

    The Peter Principle : Parkinson Law : 1984 : The Mythical Man-MonthHow to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Haterís Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite

    Most popular humor pages:

    Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor

    The Last but not Least


    Copyright © 1996-2016 by Dr. Nikolai Bezroukov. www.softpanorama.org was created as a service to the UN Sustainable Development Networking Programme (SDNP) in the author free time. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License.

    The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.

    Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

    FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.

    This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

    You can use PayPal to make a contribution, supporting development of this site and speed up access. In case softpanorama.org is down you can use the at softpanorama.info

    Disclaimer:

    The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the author present and former employers, SDNP or any other organization the author may be associated with. We do not warrant the correctness of the information provided or its fitness for any purpose.

    Last modified: January 02, 2015