Taking the mtcars dataset as an example, I need to split the data frame by “cyl” and sum by multiple columns, “wt” and “drat”.
Next, I will need to merge the “wt” and “drat” sums to the first unique record by “cyl”.
Let’s start with the raw data, the mtcars data frame:
library("plyr")
head(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb ## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 ## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 ## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 ## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 ## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 ## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Summing over multiple columns by using the ddply function:
df2 <- ddply(mtcars, c("cyl"), function(x) colSums(x[c("wt", "drat")]))
df2
## cyl wt drat ## 1 4 25.143 44.78 ## 2 6 21.820 25.10 ## 3 8 55.989 45.21
Using duplicated is quick way to find the first unique instance by “cyl”:
df3 <- mtcars[!duplicated(mtcars$cyl),]
df3
## mpg cyl disp hp drat wt qsec vs am gear carb ## Mazda RX4 21.0 6 160 110 3.90 2.62 16.46 0 1 4 4 ## Datsun 710 22.8 4 108 93 3.85 2.32 18.61 1 1 4 1 ## Hornet Sportabout 18.7 8 360 175 3.15 3.44 17.02 0 0 3 2
Before merging the two data frames (df2 and df3), we remove “wt” and “draft” from df3, to be replaced later by the columns’ sum from df2:
drops <- c("wt","drat")
df3.dropped <- df3[,!(names(df3) %in% drops)]
df3.dropped
## mpg cyl disp hp qsec vs am gear carb ## Mazda RX4 21.0 6 160 110 16.46 0 1 4 4 ## Datsun 710 22.8 4 108 93 18.61 1 1 4 1 ## Hornet Sportabout 18.7 8 360 175 17.02 0 0 3 2
We will now merge the two data frames to produce the final data frame:
merged.left <- merge(x = df2, y = df3.dropped, by = "cyl", all.x=TRUE)
merged.left
## cyl wt drat mpg disp hp qsec vs am gear carb ## 1 4 25.143 44.78 22.8 108 93 18.61 1 1 4 1 ## 2 6 21.820 25.10 21.0 160 110 16.46 0 1 4 4 ## 3 8 55.989 45.21 18.7 360 175 17.02 0 0 3 2