how to trim leading and trailing whitespace in r_ - stack overflow

3
27/12/2014 How to trim leadin g and tra il ing w hitespace in R? - Stack Over flow http://stackover flow.com/questions/2261079/how-t o-t r im-l eading- and-trailing- whitespace-in- r 1/3 Take the 2-minute tour  × Jeromy Anglim 8,990  3 49 105 mropa 3,078  2 17 24 5 Answers I am having some troubles with leading and trailing whitespace in a data.frame. Eg I like to take a look at a specific row in a data.frame based on a certain condition: > myDummy [myDummy$country == c ("Austria" ),c(1,2,3:7,19)] [1] codeHelper country dummyLI dummyLMI dummyUMI [6] dummyHInonOECD dummyHIOECD dummyOECD <0 rows > (or 0length row.names) I was wondering why I didn't get the expected output since the country Austria obviously existed in my data.frame . After looking through my code history and trying to figure out what went wrong I tried: > myDummy [myDummy$country == c ("Austria "),c(1,2,3:7,19)]  codeHelper country dummyLI dummyLMI dummyUM I dummyHInonOECD dummyHIOECD 18 AUT Austria 0 0 0 0 1  dummyOECD 18 1  All I have chan ged in the command is a n addition al whitespa ce after Austria. Further annoying problems obviously arise. Eg when I like to merge two frames based on the country column. One data.frame uses "Austria " while the other frame has "Austria" . The matching doesn't work. 1. Is there a nice way to 'show' the whitespace on my scree n so that i am aware of the problem? 2. And can I remove the leading and trailing wh itespace in R? So far I used to write a simple Perl script which removes the whitespace but it would be nice if I can somehow do it inside R. r edited Nov 28 '13 at 1:49  asked Feb 14 '10 at 12:44  I just saw that sub() uses the Perl notation as well. Sorry about that. I am going to try to use the func tion. But for my first question i don't have a solution yet. – mropa Feb 14 '10 at 12:50  As hadley pointed it this regex "^\\ s+|\\s+ $" will identify leading and trailing whitespace . so x <- gsub("^\\s+|\\s+$", "", x) many of R's read functions as have this option: strip.white = FALSE – Jay  Feb 14 '10 at 15:11 Probably the best way is to handle the trailing whitespaces when you read your data file. If you use read.csv or read.table you can set the parameter  strip.white=TRUE  . If you want to clean strings afterwards you one of these functions: # returns string w/o leading whitespace trim.leading < function (x) sub ("^\\s+", "", x ) # returns string w/o trailing whitespace trim.trailing < function (x) sub ("\\s+$", "", x ) # returns string w/o leading or trailing whitespace trim < function (x) gsub ("^\\s+|\\s+$" , "", x ) To use one of these functions on myDummy$country : Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registrati on required. How to trim leading and trailing whitespace in R? sign up log in tour help stac k overflow careers  

Upload: jeiel-franca

Post on 01-Mar-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

7/26/2019 How to Trim Leading and Trailing Whitespace in R_ - Stack Overflow

http://slidepdf.com/reader/full/how-to-trim-leading-and-trailing-whitespace-in-r-stack-overflow 1/3

27/12/2014 How to tr im leadi ng and tr ail ing w hitespace in R? - Stack Over flow

http://stackover flow.com/questions/2261079/how- to- tr im- leading- and-trailing- whitespace-in- r 1/3

Take the 2-minute tour   ×

Jeromy Anglim

8,990   3 49 105

mropa

3,078   2 17 24

5 Answers

I am having some troubles with leading and trailing whitespace in a data.frame. Eg I like to take a look at

a specific row  in a data.frame  based on a certain condition:

>  myDummy[myDummy$country ==  c("Austria"),c(1,2,3:7,19)] 

[1]  codeHelper country dummyLI dummyLMI dummyUMI

[6]  dummyHInonOECD dummyHIOECD dummyOECD<0  rows> (or 0‐length row.names)

I was wondering why I didn't get the expected output since the country Austria obviously existed in my

data.frame . After looking through my code history and trying to figure out what went wrong I tried:

>  myDummy[myDummy$country ==  c("Austria "),c(1,2,3:7,19)]  codeHelper country dummyLI dummyLMI dummyUMI dummyHInonOECD dummyHIOECD18  AUT Austria 0  0  0  0  1  dummyOECD18  1

 All I have chan ged in the command is a n addition al whitespa ce after Austria.

Further annoying problems obviously arise. Eg when I like to merge two frames based on the countrycolumn. One data.frame  uses "Austria "  while the other frame has "Austria"  . The matching

doesn't work.

1. Is there a nice way to 'show' the whitespace on my screen so that i am aware of the problem?2. And can I remove the leading and trailing wh itespace in R?

So far I used to write a simple Perl  script which removes the whitespace but it would be nice if I cansomehow do it inside R.

edited Nov 28 '13 at 1:49   asked Feb 14 '10 at 12:44

  I just saw that sub()  uses the Perl  notation as well. Sorry about that. I am going to try to use the function.

But for my first question i don't have a solution yet. – mropa  Feb 14 '10 at 12:50 

 As hadley pointed it this regex "^\\ s+|\\s+$" will identify leading and trailing whitespace. so x <-gsub("^\\s+|\\s+$", "", x) many of R's read functions as have this option: strip.white = FALSE – Jay Feb 14 '10

at 15:11 

Probably the best way is to handle the trailing whitespaces when you read your data file. If you useread.csv  or read.table  you can set the parameter  strip.white=TRUE .

If you want to clean strings afterwards you one of these functions:

# returns string w/o leading whitespace

trim.leading <‐ function (x)  sub("^\\s+", "",  x)

# returns string w/o trailing whitespace

trim.trailing <‐ function (x)  sub("\\s+$", "",  x)

# returns string w/o leading or trailing whitespace

trim <‐ function (x)  gsub("^\\s+|\\s+$", "",  x)

To use one of these functions on myDummy$country :

Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, noregistration required.

How to trim leading and trailing whitespace in R?

sign up log in tour  help stac k overflow careers 

7/26/2019 How to Trim Leading and Trailing Whitespace in R_ - Stack Overflow

http://slidepdf.com/reader/full/how-to-trim-leading-and-trailing-whitespace-in-r-stack-overflow 2/3

27/12/2014 How to tr im leadi ng and tr ail ing w hitespace in R? - Stack Over flow

http://stackover flow.com/questions/2261079/how- to- tr im- leading- and-trailing- whitespace-in- r 2/3

f3lix

14.6k   7 37 64

user56785   1 10 23

Marek

19.5k   5 38 62

  myDummy$country <‐  trim(myDummy$country)

To 'show' the whitespace you could use:

  paste(myDummy$country)

which will show you the strings surrounded by quotation marks (") making whitespaces easier to spot.

edited Feb 14 '10 at 15:52   answered Feb 14 '10 at 13:13

  @f3lix oh those are some nice tips! thanks! – mropa  Feb 14 '10 at 13:43

3   As hadley pointed it this regex "^\\s+|\\s+$" will identify leading and trailing whitespace. so x <-

gsub("^\\s+|\\s+$", "", x) many of R's read functions as have this option: strip.white = FALSE – Jay Feb 14 '10

at 15:10 

@Jay: Thanks for the hint. I changed the regexps in my answer to use the shorter "\\s" instead of "[ \t]". –

f3lix Feb 14 '10 at 15:46

8   See also str_trim  in the stringr  package. – Richie Cotton Feb 16 '10 at 15:35

1   Plus one for "Trim function now stored for future use"- thanks! – Chris Beeley Jan 17 '12 at 9:56

show 2  more comments

To manipulate the white space, use str_trim() in the stringr package. The package has manual dated Feb

15,2013 and is in CRAN. The function can also handle string vectors.

install.packages("stringr",  dependencies=TRUE)require(stringr)example(str_trim)d4$clean2<‐str_trim(d4$V2)

(credit goes to commenter: R. Cotton)

answered Feb 21 '13 at 16:30

1   +1 For best practice, most easy, most convenient solution! – petermeissner  Oct 16 at 12:24 

ad1) To see white spaces you could directly call print.data.frame  with modified arguments:

print(head(iris),  quote=TRUE)# Sepal.Length Sepal.Width Petal.Length Petal.Width Species

# 1 "5.1" "3.5" "1.4" "0.2" "setosa"

# 2 "4.9" "3.0" "1.4" "0.2" "setosa"

# 3 "4.7" "3.2" "1.3" "0.2" "setosa"

# 4 "4.6" "3.1" "1.5" "0.2" "setosa"

# 5 "5.0" "3.6" "1.4" "0.2" "setosa"

# 6 "5.4" "3.9" "1.7" "0.4" "setosa"

See also ?print.data.frame  for other options.

answered Feb 15 '10 at 10:00

 A simple function to remove leading and trailing whitespace:

trim <‐ function(  x ) {  gsub("(^[[:space:]]+|[[:space:]]+$)", "",  x)

}

Usage:

>  text = " foo bar baz 3 ">  trim(text)[1] "foo bar baz 3"

7/26/2019 How to Trim Leading and Trailing Whitespace in R_ - Stack Overflow

http://slidepdf.com/reader/full/how-to-trim-leading-and-trailing-whitespace-in-r-stack-overflow 3/3

27/12/2014 How to tr im leadi ng and tr ail ing w hitespace in R? - Stack Over flow

http://stackover flow.com/questions/2261079/how- to- tr im- leading- and-trailing- whitespace-in- r 3/3

Bernhard Kausler 

1,541   1 0 23

Jyotirmoy Bhattacharya

2,939   1 13 25

answered Feb 19 at 13:37

Use grep or grepl to find observations with whitespaces and sub to get rid of them.

names<‐c("Ganga Din\t","Shyam Lal","Bulbul ")grep("[[:space:]]+$",names)[1] 1 3grepl("[[:space:]]+$",names)

[1]  TRUE FALSE  TRUEsub("[[:space:]]+$","",names)[1] "Ganga Din" "Shyam Lal" "Bulbul" 

answered Feb 14 '10 at 14:13

4   Or, a little more succinctly, "^\\s+|\\s+$"  – hadley Feb 14 '10 at 14:45 

1   Just wanted to point out, that one will have to use gsub  instead of sub  with hadley's regexp. With sub  it

will strip trailing whitespace only if there is no leading whitespace... – f3lix Feb 14 '10 at 15:50

  Didn't know you could use \s etc. with perl=FALSE. The docs say that POSIX syntax is used in that case, but

the syntax accepted is actually a superset defined by the TRE regex library

laurikari.net/tre/documentation/regex-syntax  – Jyotirmoy Bhattacharya Feb 14 '10 at 18:37

Not the answer you're looking for? Browse other questions tagged r  or ask your own

question.