string slicing

Upload: prabhjot-singh

Post on 03-Apr-2018

248 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/28/2019 String Slicing

    1/22

    C# SubstringPrograms

    You want to extract several characters from your C# string as

    another string, which is called taking a substring. There are two

    overloaded Substring methods on string, which are ideal for getting

    parts of strings. This document contains several examples and a

    useful Substring benchmark, using the C# programming language.

    === Substring benchmark that tests creation time (C#) ===

    Based on .NET Framework 3.5 SP1.

    New char[] array: 2382 ms

    Substring: 2053 ms [faster]

    Getting first partInitially here you have a string and you want to extract

    the first several characters into a new string. We can use theSubstring instance method with two parameters here, the first being

    0 and the second being the desired length.

    === Program that uses Substring (C#) ===

    using System;

    class Program

    {

    static void Main()

    {

    string input = "OneTwoThree";

  • 7/28/2019 String Slicing

    2/22

    // Get first three characters

    string sub = input.Substring(0, 3);

    Console.WriteLine("Substring: {0}", sub);

    }

    }

    === Output of the program ===

    Substring: One

    Description. The Substring method is an instance method on the

    string class, which means you must have a non-null string to use it

    without triggering an exception. This program will extract the firstthree characters into a new string reference, which is separately

    allocated on the managed heap.

    Use one parameterHere we see the Substring overloaded method that takes one

    parameter, the start index int. The second parameter is considered

    the largest possible, meaning the substring ends at the last char.

    === Example program (C#) ===

    using System;

    class Program

    {

    static void Main()

    {

    string input = "OneTwoThree";

    // Indexes:

    // 0:'O'

    // 1:'n'

    // 2:'e'

    // 3:'T'

  • 7/28/2019 String Slicing

    3/22

    // 4:'w' ...

    string sub = input.Substring(3);

    Console.WriteLine("Substring: {0}", sub);

    }

    }

    === Output of the program ===

    Substring: TwoThree

    Description. The program text describes logic that takes all the

    characters in the input string excluding the first three. The end

    result is that you extract the last several characters. The Substring

    method internally causes the runtime to allocate a new string on the

    managed heap.

    Middle string sectionsHere we take several characters in the middle of a C# string and

    place them into a new string. To take a middle substring, pass

    two integer parameters to Substring. You will want each parameter

    to be a non-zero value to avoid taking all the edge characters.

    === Example program that uses Substring (C#) ===

    using System;

    class Program

    {

    static void Main()

    {

    string input = "OneTwoThree";

    string sub = input.Substring(3, 3);

    Console.WriteLine("Substring: {0}", sub);

  • 7/28/2019 String Slicing

    4/22

    }

    }

    === Output of the program ===

    Substring: Two

    Description of parameters. The two parameters in the example

    say, "I want the substring at index 3 with a length of three."

    Essentially, the third through sixth characters. The program then

    displays the resulting string that is pointed to by the string

    reference 'sub'.

    Slicing stringsHere we note that you can add an extension method to "slice"

    strings as is possible in languages such as JavaScript. The Substring

    method in C# doesn't use the same semantics as the Slice method

    from JavaScript and Python. However, you can develop an extension

    method that fills this need efficiently.

    See String Slice.

    Exclude several charactersHere you want to not copy the last several characters of your string.

    This example shows how you can take the last five characters in the

    input string and get a new string instance containing them.

    === Program that uses Substring for ending characters (C#) ===

    using System;

    class Program

    {

    static void Main()

    {

    string input = "OneTwoThree";

    http://dotnetperls.com/string-slicehttp://dotnetperls.com/string-slice
  • 7/28/2019 String Slicing

    5/22

    string sub = input.Substring(0, input.Length - 5);

    Console.WriteLine("Substring: {0}", sub);

    }

    }

    === Output of the program ===

    Substring: OneTwo

    MSDN researchHere we note some reference material on the MSDN website

    provided by Microsoft. The Substring articles I found on MSDN are

    really awful and not nearly as nice as this document. They do notsay anything that you cannot find from Visual Studio's IntelliSense.

    Visit msdn.microsoft.com.

    Exceptions raisedHere we look at exceptions that can be raised when the Substring

    instance method on the string type is called with incorrect

    parameters. Here we see an example where I trigger the

    ArgumentOutOfRangeException. When you try to go beyond the

    string length, or use a parameter < 0, you get the

    ArgumentOutOfRangeException from the internal method

    InternalSubStringWithChecks.

    === Program that shows Substring exceptions (C#) ===

    using System;

    class Program

    {

    static void Main()

    {

    string input = "OneTwoThree";

    http://msdn.microsoft.com/en-us/library/system.string.substring(VS.71).aspxhttp://msdn.microsoft.com/en-us/library/system.string.substring(VS.71).aspx
  • 7/28/2019 String Slicing

    6/22

    try

    {

    string sub = input.Substring(-1);

    }

    catch (Exception ex)

    {

    Console.WriteLine(ex);

    }

    try

    {

    string sub = input.Substring(0, 100);

    }

    catch (Exception ex)

    {

    Console.WriteLine(ex);

    }

    }

    }

    === Output of the program ===

    System.ArgumentOutOfRangeException

    System.String.InternalSubStringWithChecks

    System.ArgumentOutOfRangeException

    System.String.InternalSubStringWithChecks

    BenchmarkHere I wanted to see if taking characters and putting them into a

    char[] array could be faster than calling Substring. My result was

    that Substring is faster. However, if you want to extract only certain

  • 7/28/2019 String Slicing

    7/22

    characters, consider the char[] approach shown. This benchmark is

    based on .NET 3.5 SP1.

    === Data tested ===

    string s = "onetwothree"; // Input

    === Char array method version ===

    char[] c = new char[3];

    c[0] = s[3];

    c[1] = s[4];

    c[2] = s[5];

    string x = new string(c); // "two"

    if (x == null)

    {

    }

    === Substring version ===

    string x = s.Substring(3, 3); // "two"

    if (x == null)

    {

    }

    === Substring benchmark result ===

    Substring was faster.

    See figures at top.

    Benchmark notes. The above code is simply a benchmark you can

    run in Visual Studio to see the performance difference of Substring

    and char[] arrays. It is best to use Substring when it has equivalent

  • 7/28/2019 String Slicing

    8/22

    behavior. This site contains a useful benchmarking harness located

    in the "performance" section.

    SummaryHere we saw several examples concentrated on

    the Substring instance method with one or two parameters on the

    string type in the C# programming language. Additionally, we saw

    where to research Substring on MSDN, information about Slice,

    Substring exceptions, and a benchmark of Substring. Substring is

    very useful and can help simplify your programs, without significant

    performance problems. Combine it with IndexOf and Split for

    powerful string handling.

  • 7/28/2019 String Slicing

    9/22

    C# Split StringExamples

    You want to split strings on different characters with single

    character or string delimiters. For example, split a string that

    contains "\r\n" sequences, which are Windows newlines. Through

    these examples, we learn ways to use the Split method on the

    string type in the C# programming language.

    Use the Splitmethod to separate parts from a string.

    If your input string is A,B,C --

    Split on the comma to get an array of:

    "A" "B" "C"

    Using SplitTo begin, we look at the basic Split method overload. You already

    know the general way to do this, but it is good to see the basic

    syntax before we move on. This example splits on a single

    character.

    === Example program for splitting on spaces (C#) ===

    using System;

  • 7/28/2019 String Slicing

    10/22

    class Program

    {

    static void Main()

    {

    string s = "there is a cat";

    //

    // Split string on spaces.

    // ... This will separate all the words.

    //

    string[] words = s.Split(' ');

    foreach (string word in words)

    {

    Console.WriteLine(word);

    }

    }

    }

    === Output of the program ===

    there

    is

    a

    cat

    Description. The input string, which contains four words, is split on

    spaces and the foreach loop then displays each word. The result

    value from Split is a string[] array.

    Multiple charactersHere we use either the Regex method or the C# new array syntax.

    Note that a new char array is created in the following usages. There

  • 7/28/2019 String Slicing

    11/22

    is an overloaded method with that signature if you need

    StringSplitOptions, which is used to remove empty strings.

    === Program that splits on lines with Regex (C#) ===

    using System;

    using System.Text.RegularExpressions;

    class Program

    {

    static void Main()

    {

    string value = "cat\r\ndog\r\nanimal\r\nperson";

    //

    // Split the string on line breaks.

    // ... The return value from Split is a string[] array.

    //

    string[] lines = Regex.Split(value, "\r\n");

    foreach (string line in lines)

    {

    Console.WriteLine(line);

    }

    }

    }

    === Output of the program ===

    cat

    dog

    animal

    person

  • 7/28/2019 String Slicing

    12/22

    StringSplitOptionsWhile the Regex type methods can be used to Split strings

    effectively, the string type Split method is faster in many cases. The

    Regex Split method is static; the string Split method is instance-

    based. The next example shows how you can specify an array as thefirst parameter to string Split.

    === Program that splits on multiple characters (C#) ===

    using System;

    class Program

    {

    static void Main()

    {

    //

    // This string is also separated by Windows line breaks.

    //

    string value = "shirt\r\ndress\r\npants\r\njacket";

    //

    // Use a new char[] array of two characters (\r and \n) tobreak

    // lines from into separate strings. Use"RemoveEmptyEntries"

    // to make sure no empty strings get put in the string[]array.

    //

    char[] delimiters = new char[] { '\r', '\n' };

    string[] parts = value.Split(delimiters,StringSplitOptions.RemoveEmptyEntries);

    for (int i = 0; i < parts.Length; i++)

    {

    Console.WriteLine(parts[i]);

  • 7/28/2019 String Slicing

    13/22

    }

    //

    // Same as the previous example, but uses a new string of

    2 characters.

    //

    parts = value.Split(new string[] { "\r\n" },StringSplitOptions.None);

    for (int i = 0; i < parts.Length; i++)

    {

    Console.WriteLine(parts[i]);

    }

    }

    }

    === Output of the program ===

    (Repeated two times)

    shirt

    dress

    pants

    jacket

    Overview. One useful overload of Split receives char[] arrays. The

    string Split method can receive a character array as the first

    parameter. Each char in the array designates a new block.

    Using string arrays. Another overload of Split receives string[]

    arrays. This means string array can also be passed to the Split

    method. The new string[] array is created inline with the Split call.

    Explanation of StringSplitOptions. The RemoveEmptyEntries

    enum is specified. When two delimiters are adjacent, we end up

    with an empty result. We can use this as the second parameter to

  • 7/28/2019 String Slicing

    14/22

    avoid this. The following screenshot shows the Visual Studio

    debugger.

    See StringSplitOptions Enumeration.

    Separating wordsHere we see how you can separate words with Split. Usually, the

    best way to separate words is to use a Regex that specifies non-

    word chars. This example separates words in a string based on non-

    word characters. It eliminates punctuation and whitespace from the

    return array.

    === Program that separates on non-word pattern (C#) ===

    using System;

    using System.Text.RegularExpressions;

    class Program

    {

    static void Main()

    {

    string[] w = SplitWords("That is a cute cat, man");

    foreach (string s in w)

    {

    Console.WriteLine(s);

    }

    http://dotnetperls.com/stringsplitoptionshttp://dotnetperls.com/stringsplitoptions
  • 7/28/2019 String Slicing

    15/22

    Console.ReadLine();

    }

    ///

    /// Take all the words in the input string and separate them.

    ///

    static string[] SplitWords(string s)

    {

    //

    // Split on all non-word characters.

    // ... Returns an array of all the words.

    //

    return Regex.Split(s, @"\W+");

    // @ special verbatim string syntax

    // \W+ one or more non-word characters together

    }

    }

    === Output of the program ===

    That

    is

    a

    cute

    cat

    man

    Word splitting example. Here you can separate parts of your

    input string based on any character set or range with Regex.

    Overall, this provides more power than the string Split methods.

    See Regex.Split Method Examples.

    http://dotnetperls.com/regex-splithttp://dotnetperls.com/regex-split
  • 7/28/2019 String Slicing

    16/22

    Splitting text filesHere you have a text file containing comma-delimited lines of

    values. This is called a CSV file, and it is easily dealt with in the C#

    language. We use the File.ReadAllLines method here, but you may

    want StreamReader instead. This code reads in both of those lines,parses them, and displays the values of each line after the line

    number. The final comment shows how the file was parsed into the

    strings.

    === Contents of input file (TextFile1.txt) ===

    Dog,Cat,Mouse,Fish,Cow,Horse,Hyena

    Programmer,Wizard,CEO,Rancher,Clerk,Farmer

    === Program that splits lines in file (C#) ===

    using System;

    using System.IO;

    class Program

    {

    static void Main()

    {

    int i = 0;

    foreach (string line inFile.ReadAllLines("TextFile1.txt"))

    {

    string[] parts = line.Split(',');

    foreach (string part in parts)

    {

    Console.WriteLine("{0}:{1}",

    i,

    part);

  • 7/28/2019 String Slicing

    17/22

    }

    i++; // For demo only

    }

    }

    }

    === Output of the program ===

    0:Dog

    0:Cat

    0:Mouse

    0:Fish

    0:Cow

    0:Horse

    0:Hyena

    1:Programmer

    1:Wizard

    1:CEO

    1:Rancher1:Clerk

    1:Farmer

    Splitting directory pathsHere we see how you can Split the segments in a Windows local

    directory into separate strings. Note that directory paths are

    complex and this may not handle all cases correctly. It is also

    platform-specific, and you could use System.IO.Path.

    DirectorySeparatorChar for more flexibility.

    See Path Examples.

    === Program that splits Windows directories (C#) ===

    using System;

    http://dotnetperls.com/pathhttp://dotnetperls.com/path
  • 7/28/2019 String Slicing

    18/22

    class Program

    {

    static void Main()

    {

    // The directory from Windows

    const string dir = @"C:\Users\Sam\Documents\Perls\Main";

    // Split on directory separator

    string[] parts = dir.Split('\\');

    foreach (string part in parts)

    {

    Console.WriteLine(part);

    }

    }

    }

    === Output of the program ===

    C:Users

    Sam

    Documents

    Perls

    Main

    Internal logicThe logic internal to the .NET framework for Split is implemented in

    managed code. The methods call into the overload with three

    parameters. The parameters are next checked for validity. Finally, it

    uses unsafe code to create the separator list, and then a for loop

    combined with Substring to return the array.

    Benchmarks

  • 7/28/2019 String Slicing

    19/22

    I tested a long string and a short string, having 40 and 1200 chars.

    String splitting speed varies on the type of strings. The length of the

    blocks, number of delimiters, and total size of the string factor into

    performance. The Regex.Split option generally performed the worst.

    I felt that the second or third methods would be the best, afterobserving performance problems with regular expressions in other

    situations.

    === Strings used in test (C#) ===

    //

    // Build long string.

    //

    _test = string.Empty;

    for (int i = 0; i < 120; i++)

    {

    _test += "01234567\r\n";

    }

    //

    // Build short string.

    //

    _test = string.Empty;

    for (int i = 0; i < 10; i++)

    {

    _test += "ab\r\n";

    }

    === Example methods tested (100000 iterations) ===

    static void Test1()

    {

    string[] arr = Regex.Split(_test, "\r\n",RegexOptions.Compiled);

    }

  • 7/28/2019 String Slicing

    20/22

    static void Test2()

    {

    string[] arr = _test.Split(new char[] { '\r', '\n' },

    StringSplitOptions.RemoveEmptyEntries);

    }

    static void Test3()

    {

    string[] arr = _test.Split(new string[] { "\r\n" },StringSplitOptions.None);

    }

    Longer strings: 1200 chars. The benchmark for the methods on

    the long strings is more even. It may be that for very long strings,

    such as entire files, the Regex method is equivalent or even faster.

    For short strings, Regex is slowest, but for long strings it is very

    fast.

    === Benchmark of Split on long strings ===

    [1] Regex.Split: 3470 ms

    [2] char[] Split: 1255 ms [fastest]

    [3] string[] Split: 1449 ms

    === Benchmark of Split on short strings ===

    [1] Regex.Split: 434 ms

    [2] char[] Split: 63 ms [fastest]

    [3] string[] Split: 83 ms

    Short strings: 40 chars. This shows the three methods compared

    to each other on short strings. Method 1 is the Regex method, and it

    is by far the slowest on the short strings. This may be because of

    the compilation time. Smaller is better. This article was last updated

    for .NET 3.5 SP1.

  • 7/28/2019 String Slicing

    21/22

    Performance recommendation. For programs that use shorter

    strings, the methods that split based on arrays are faster and

    simpler, and they will avoid Regex compilation. For somewhat

    longer strings or files that contain more lines, Regex is appropriate.

    Also, I show some Split improvements that can improve yourprogram.

    See Split String Improvement.

    Escaped charactersHere we note that you can use Replace on your string input to

    substitute special characters in for any escaped characters. This can

    solve lots of problems on parsing computer-generated code or data.

    See Split Method and Escape Characters.

    Delimiter arraysIn this section, we focus on how you can specify delimiters to the

    Split method in the C# language. My further research into Split and

    its performance shows that it is worthwhile to declare your char[]

    array you are splitting on as a local instance to reduce memory

    pressure and improve runtime performance. There is another

    example of delimiter array allocation on this site.

    See Split Delimiter Use.

    === Slow version, before (C#) ===

    //

    // Split on multiple characters using new char[] inline.

    //

    string t = "string to split, ok";

    for (int i = 0; i < 10000000; i++)

    {

    string[] s = t.Split(new char[] { ' ', ',' });

    }

    http://dotnetperls.com/split-improvementhttp://dotnetperls.com/split-escapehttp://dotnetperls.com/split-delimiterhttp://dotnetperls.com/split-improvementhttp://dotnetperls.com/split-escapehttp://dotnetperls.com/split-delimiter
  • 7/28/2019 String Slicing

    22/22

    === Fast version, after (C#) ===

    //

    // Split on multiple characters using new char[] already created.

    //

    string t = "string to split, ok";

    char[] c = new char[]{ ' ', ',' }; //