Summing multiple columns of a Data Frame and merging with the first unique row [R Tips & Tricks]

Taking the mtcars dataset as an example, I need to split the data frame by “cyl” and sum by multiple columns, “wt” and “drat”.

Next, I will need to merge the “wt” and “drat” sums to the first unique record by “cyl”.

Let’s start with the raw data, the mtcars data frame:

library("plyr")
head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Summing over multiple columns by using the ddply function:

df2 <- ddply(mtcars, c("cyl"), function(x) colSums(x[c("wt", "drat")]))
df2
##   cyl     wt  drat
## 1   4 25.143 44.78
## 2   6 21.820 25.10
## 3   8 55.989 45.21

Using duplicated is quick way to find the first unique instance by “cyl”:

df3 <- mtcars[!duplicated(mtcars$cyl),]
df3
##                    mpg cyl disp  hp drat   wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.62 16.46  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.32 18.61  1  1    4    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.44 17.02  0  0    3    2

Before merging the two data frames (df2 and df3), we remove “wt” and “draft” from df3, to be replaced later by the columns’ sum from df2:

drops <- c("wt","drat")
df3.dropped <- df3[,!(names(df3) %in% drops)]
df3.dropped
##                    mpg cyl disp  hp  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 16.46  0  1    4    4
## Datsun 710        22.8   4  108  93 18.61  1  1    4    1
## Hornet Sportabout 18.7   8  360 175 17.02  0  0    3    2

We will now merge the two data frames to produce the final data frame:

merged.left <- merge(x = df2, y = df3.dropped, by = "cyl", all.x=TRUE)
merged.left
##   cyl     wt  drat  mpg disp  hp  qsec vs am gear carb
## 1   4 25.143 44.78 22.8  108  93 18.61  1  1    4    1
## 2   6 21.820 25.10 21.0  160 110 16.46  0  1    4    4
## 3   8 55.989 45.21 18.7  360 175 17.02  0  0    3    2

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s