how to trim leading and trailing whitespace in r_ - stack overflow
TRANSCRIPT
7/26/2019 How to Trim Leading and Trailing Whitespace in R_ - Stack Overflow
http://slidepdf.com/reader/full/how-to-trim-leading-and-trailing-whitespace-in-r-stack-overflow 1/3
27/12/2014 How to tr im leadi ng and tr ail ing w hitespace in R? - Stack Over flow
http://stackover flow.com/questions/2261079/how- to- tr im- leading- and-trailing- whitespace-in- r 1/3
Take the 2-minute tour ×
Jeromy Anglim
8,990 3 49 105
mropa
3,078 2 17 24
5 Answers
I am having some troubles with leading and trailing whitespace in a data.frame. Eg I like to take a look at
a specific row in a data.frame based on a certain condition:
> myDummy[myDummy$country == c("Austria"),c(1,2,3:7,19)]
[1] codeHelper country dummyLI dummyLMI dummyUMI
[6] dummyHInonOECD dummyHIOECD dummyOECD<0 rows> (or 0‐length row.names)
I was wondering why I didn't get the expected output since the country Austria obviously existed in my
data.frame . After looking through my code history and trying to figure out what went wrong I tried:
> myDummy[myDummy$country == c("Austria "),c(1,2,3:7,19)] codeHelper country dummyLI dummyLMI dummyUMI dummyHInonOECD dummyHIOECD18 AUT Austria 0 0 0 0 1 dummyOECD18 1
All I have chan ged in the command is a n addition al whitespa ce after Austria.
Further annoying problems obviously arise. Eg when I like to merge two frames based on the countrycolumn. One data.frame uses "Austria " while the other frame has "Austria" . The matching
doesn't work.
1. Is there a nice way to 'show' the whitespace on my screen so that i am aware of the problem?2. And can I remove the leading and trailing wh itespace in R?
So far I used to write a simple Perl script which removes the whitespace but it would be nice if I cansomehow do it inside R.
r
edited Nov 28 '13 at 1:49 asked Feb 14 '10 at 12:44
I just saw that sub() uses the Perl notation as well. Sorry about that. I am going to try to use the function.
But for my first question i don't have a solution yet. – mropa Feb 14 '10 at 12:50
As hadley pointed it this regex "^\\ s+|\\s+$" will identify leading and trailing whitespace. so x <-gsub("^\\s+|\\s+$", "", x) many of R's read functions as have this option: strip.white = FALSE – Jay Feb 14 '10
at 15:11
Probably the best way is to handle the trailing whitespaces when you read your data file. If you useread.csv or read.table you can set the parameter strip.white=TRUE .
If you want to clean strings afterwards you one of these functions:
# returns string w/o leading whitespace
trim.leading <‐ function (x) sub("^\\s+", "", x)
# returns string w/o trailing whitespace
trim.trailing <‐ function (x) sub("\\s+$", "", x)
# returns string w/o leading or trailing whitespace
trim <‐ function (x) gsub("^\\s+|\\s+$", "", x)
To use one of these functions on myDummy$country :
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, noregistration required.
How to trim leading and trailing whitespace in R?
sign up log in tour help stac k overflow careers
7/26/2019 How to Trim Leading and Trailing Whitespace in R_ - Stack Overflow
http://slidepdf.com/reader/full/how-to-trim-leading-and-trailing-whitespace-in-r-stack-overflow 2/3
27/12/2014 How to tr im leadi ng and tr ail ing w hitespace in R? - Stack Over flow
http://stackover flow.com/questions/2261079/how- to- tr im- leading- and-trailing- whitespace-in- r 2/3
f3lix
14.6k 7 37 64
user56785 1 10 23
Marek
19.5k 5 38 62
myDummy$country <‐ trim(myDummy$country)
To 'show' the whitespace you could use:
paste(myDummy$country)
which will show you the strings surrounded by quotation marks (") making whitespaces easier to spot.
edited Feb 14 '10 at 15:52 answered Feb 14 '10 at 13:13
@f3lix oh those are some nice tips! thanks! – mropa Feb 14 '10 at 13:43
3 As hadley pointed it this regex "^\\s+|\\s+$" will identify leading and trailing whitespace. so x <-
gsub("^\\s+|\\s+$", "", x) many of R's read functions as have this option: strip.white = FALSE – Jay Feb 14 '10
at 15:10
@Jay: Thanks for the hint. I changed the regexps in my answer to use the shorter "\\s" instead of "[ \t]". –
f3lix Feb 14 '10 at 15:46
8 See also str_trim in the stringr package. – Richie Cotton Feb 16 '10 at 15:35
1 Plus one for "Trim function now stored for future use"- thanks! – Chris Beeley Jan 17 '12 at 9:56
show 2 more comments
To manipulate the white space, use str_trim() in the stringr package. The package has manual dated Feb
15,2013 and is in CRAN. The function can also handle string vectors.
install.packages("stringr", dependencies=TRUE)require(stringr)example(str_trim)d4$clean2<‐str_trim(d4$V2)
(credit goes to commenter: R. Cotton)
answered Feb 21 '13 at 16:30
1 +1 For best practice, most easy, most convenient solution! – petermeissner Oct 16 at 12:24
ad1) To see white spaces you could directly call print.data.frame with modified arguments:
print(head(iris), quote=TRUE)# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1 "5.1" "3.5" "1.4" "0.2" "setosa"
# 2 "4.9" "3.0" "1.4" "0.2" "setosa"
# 3 "4.7" "3.2" "1.3" "0.2" "setosa"
# 4 "4.6" "3.1" "1.5" "0.2" "setosa"
# 5 "5.0" "3.6" "1.4" "0.2" "setosa"
# 6 "5.4" "3.9" "1.7" "0.4" "setosa"
See also ?print.data.frame for other options.
answered Feb 15 '10 at 10:00
A simple function to remove leading and trailing whitespace:
trim <‐ function( x ) { gsub("(^[[:space:]]+|[[:space:]]+$)", "", x)
}
Usage:
> text = " foo bar baz 3 "> trim(text)[1] "foo bar baz 3"
7/26/2019 How to Trim Leading and Trailing Whitespace in R_ - Stack Overflow
http://slidepdf.com/reader/full/how-to-trim-leading-and-trailing-whitespace-in-r-stack-overflow 3/3
27/12/2014 How to tr im leadi ng and tr ail ing w hitespace in R? - Stack Over flow
http://stackover flow.com/questions/2261079/how- to- tr im- leading- and-trailing- whitespace-in- r 3/3
Bernhard Kausler
1,541 1 0 23
Jyotirmoy Bhattacharya
2,939 1 13 25
answered Feb 19 at 13:37
Use grep or grepl to find observations with whitespaces and sub to get rid of them.
names<‐c("Ganga Din\t","Shyam Lal","Bulbul ")grep("[[:space:]]+$",names)[1] 1 3grepl("[[:space:]]+$",names)
[1] TRUE FALSE TRUEsub("[[:space:]]+$","",names)[1] "Ganga Din" "Shyam Lal" "Bulbul"
answered Feb 14 '10 at 14:13
4 Or, a little more succinctly, "^\\s+|\\s+$" – hadley Feb 14 '10 at 14:45
1 Just wanted to point out, that one will have to use gsub instead of sub with hadley's regexp. With sub it
will strip trailing whitespace only if there is no leading whitespace... – f3lix Feb 14 '10 at 15:50
Didn't know you could use \s etc. with perl=FALSE. The docs say that POSIX syntax is used in that case, but
the syntax accepted is actually a superset defined by the TRE regex library
laurikari.net/tre/documentation/regex-syntax – Jyotirmoy Bhattacharya Feb 14 '10 at 18:37
Not the answer you're looking for? Browse other questions tagged r or ask your own
question.