How To Draw Error Bars On A Bar Graph
- Problem
- Solution
- Sample data
- Line graphs
- Bar graphs
- Fault bars for inside-subjects variables
- One within-subjects variable
- Understanding within-subjects error confined
- 2 within-subjects variables
- Note about normed means
- Helper functions
Problem
You lot want to plot means and error bars for a dataset.
Solution
To make graphs with ggplot2, the data must be in a data frame, and in "long" (as opposed to wide) format. If your data needs to be restructured, see this folio for more information.
Sample data
The examples below volition the ToothGrowth
dataset. Annotation that dose
is a numeric column here; in some situations it may exist useful to convert it to a factor.
tg <- ToothGrowth head ( tg ) #> len supp dose #> 1 iv.2 VC 0.5 #> two 11.five VC 0.5 #> three 7.3 VC 0.5 #> 4 5.8 VC 0.v #> 5 6.4 VC 0.five #> half dozen 10.0 VC 0.v library ( ggplot2 )
First, it is necessary to summarize the data. This can be done in a number of means, every bit described on this page. In this case, we'll apply the summarySE()
office defined on that folio, and as well at the bottom of this page. (The code for the summarySE
office must be entered before it is called here).
# summarySE provides the standard deviation, standard error of the mean, and a (default 95%) confidence interval tgc <- summarySE ( tg , measurevar = "len" , groupvars = c ( "supp" , "dose" )) tgc #> supp dose N len sd se ci #> 1 OJ 0.5 10 13.23 4.459709 one.4102837 3.190283 #> 2 OJ 1.0 10 22.70 3.910953 1.2367520 2.797727 #> iii OJ 2.0 10 26.06 2.655058 0.8396031 1.899314 #> 4 VC 0.5 10 7.98 two.746634 0.8685620 1.964824 #> 5 VC 1.0 10 16.77 2.515309 0.7954104 1.799343 #> 6 VC 2.0 10 26.fourteen four.797731 i.5171757 3.432090
Line graphs
Subsequently the data is summarized, we can brand the graph. These are bones line and point graph with mistake bars representing either the standard fault of the mean, or 95% confidence interval.
# Standard error of the mean ggplot ( tgc , aes ( x = dose , y = len , color = supp )) + geom_errorbar ( aes ( ymin = len - se , ymax = len + se ), width = .1 ) + geom_line () + geom_point () # The errorbars overlapped, then use position_dodge to move them horizontally pd <- position_dodge ( 0.one ) # move them .05 to the left and right ggplot ( tgc , aes ( x = dose , y = len , colour = supp )) + geom_errorbar ( aes ( ymin = len - se , ymax = len + se ), width = .1 , position = pd ) + geom_line ( position = pd ) + geom_point ( position = pd ) # Use 95% confidence interval instead of SEM ggplot ( tgc , aes ( ten = dose , y = len , colour = supp )) + geom_errorbar ( aes ( ymin = len - ci , ymax = len + ci ), width = .1 , position = pd ) + geom_line ( position = pd ) + geom_point ( position = pd ) # Black error bars - notice the mapping of 'group=supp' -- without it, the error # bars won't be dodged! ggplot ( tgc , aes ( 10 = dose , y = len , colour = supp , group = supp )) + geom_errorbar ( aes ( ymin = len - ci , ymax = len + ci ), color = "black" , width = .i , position = pd ) + geom_line ( position = pd ) + geom_point ( position = pd , size = iii )
A finished graph with error bars representing the standard error of the mean might look like this. The points are drawn last so that the white fill goes on top of the lines and fault bars.
ggplot ( tgc , aes ( 10 = dose , y = len , colour = supp , group = supp )) + geom_errorbar ( aes ( ymin = len - se , ymax = len + se ), colour = "black" , width = .1 , position = pd ) + geom_line ( position = pd ) + geom_point ( position = pd , size = 3 , shape = 21 , fill = "white" ) + # 21 is filled circle xlab ( "Dose (mg)" ) + ylab ( "Tooth length" ) + scale_colour_hue ( name = "Supplement type" , # Legend label, use darker colors breaks = c ( "OJ" , "VC" ), labels = c ( "Orange juice" , "Ascorbic acrid" ), fifty = forty ) + # Apply darker colors, lightness=forty ggtitle ( "The Effect of Vitamin C on\nTooth Growth in Republic of guinea Pigs" ) + expand_limits ( y = 0 ) + # Expand y range scale_y_continuous ( breaks = 0 : 20 * 4 ) + # Set tick every 4 theme_bw () + theme ( legend.justification = c ( i , 0 ), fable.position = c ( one , 0 )) # Position fable in bottom right
Bar graphs
The procedure is similar for bar graphs. Note that tgc$size
must be a factor. If it is a numeric vector, then it will not work.
# Utilise dose equally a factor rather than numeric tgc2 <- tgc tgc2 $ dose <- factor ( tgc2 $ dose ) # Mistake bars represent standard error of the hateful ggplot ( tgc2 , aes ( x = dose , y = len , fill up = supp )) + geom_bar ( position = position_dodge (), stat = "identity" ) + geom_errorbar ( aes ( ymin = len - se , ymax = len + se ), width = .two , # Width of the error bars position = position_dodge ( .ix )) # Use 95% confidence intervals instead of SEM ggplot ( tgc2 , aes ( x = dose , y = len , fill = supp )) + geom_bar ( position = position_dodge (), stat = "identity" ) + geom_errorbar ( aes ( ymin = len - ci , ymax = len + ci ), width = .ii , # Width of the error bars position = position_dodge ( .ix ))
A finished graph might look like this.
ggplot ( tgc2 , aes ( x = dose , y = len , make full = supp )) + geom_bar ( position = position_dodge (), stat = "identity" , colour = "black" , # Use black outlines, size = .three ) + # Thinner lines geom_errorbar ( aes ( ymin = len - se , ymax = len + se ), size = .3 , # Thinner lines width = .two , position = position_dodge ( .nine )) + xlab ( "Dose (mg)" ) + ylab ( "Tooth length" ) + scale_fill_hue ( proper noun = "Supplement type" , # Legend characterization, use darker colors breaks = c ( "OJ" , "VC" ), labels = c ( "Orange juice" , "Ascorbic acid" )) + ggtitle ( "The Event of Vitamin C on\nTooth Growth in Guinea Pigs" ) + scale_y_continuous ( breaks = 0 : xx * 4 ) + theme_bw ()
Error bars for within-subjects variables
When all variables are between-subjects, it is straightforward to plot standard error or conviction intervals. Yet, when there are within-subjects variables (repeated measures), plotting the standard mistake or regular conviction intervals may be misleading for making inferences about differences between atmospheric condition.
The method below is from Morey (2008), which is a correction to Cousineau (2005), which in turn is meant to be a simpler method of that in Loftus and Masson (1994). Meet these papers for a more than detailed handling of the issues involved in fault bars with within-subjects variables.
One within-subjects variable
Here is a data set (from Morey 2008) with ane within-subjects variable: pre/mail-test.
dfw <- read.tabular array ( header = TRUE , text = ' subject pretest posttest 1 59.4 64.v ii 46.4 52.iv iii 46.0 49.7 4 49.0 48.7 v 32.5 37.iv 6 45.two 49.v 7 sixty.3 59.9 eight 54.3 54.i ix 45.4 49.6 10 38.ix 48.5 ' ) # Treat subject ID as a gene dfw $ discipline <- factor ( dfw $ subject )
The starting time pace is to convert it to long format. See this page for more information about the conversion.
# Convert to long format library ( reshape2 ) dfw_long <- melt ( dfw , id.vars = "bailiwick" , measure.vars = c ( "pretest" , "posttest" ), variable.name = "status" ) dfw_long #> subject condition value #> 1 1 pretest 59.iv #> 2 2 pretest 46.iv #> 3 3 pretest 46.0 #> 4 4 pretest 49.0 #> 5 5 pretest 32.5 #> six vi pretest 45.2 #> 7 seven pretest threescore.3 #> viii 8 pretest 54.iii #> 9 9 pretest 45.4 #> 10 10 pretest 38.ix #> 11 1 posttest 64.five #> 12 2 posttest 52.iv #> 13 three posttest 49.7 #> xiv 4 posttest 48.7 #> 15 5 posttest 37.4 #> 16 six posttest 49.5 #> 17 7 posttest 59.nine #> eighteen 8 posttest 54.1 #> 19 ix posttest 49.half dozen #> xx 10 posttest 48.v
Plummet the data using summarySEwithin
(defined at the bottom of this page; both of the helper functions below must be entered before the function is called hither).
dfwc <- summarySEwithin ( dfw_long , measurevar = "value" , withinvars = "condition" , idvar = "discipline" , na.rm = Faux , conf.interval = .95 ) dfwc #> condition N value value_norm sd se ci #> 1 posttest 10 51.43 51.43 two.262361 0.7154214 i.618396 #> 2 pretest 10 47.74 47.74 2.262361 0.7154214 i.618396 library ( ggplot2 ) # Brand the graph with the 95% confidence interval ggplot ( dfwc , aes ( ten = condition , y = value , group = 1 )) + geom_line () + geom_errorbar ( width = .ane , aes ( ymin = value - ci , ymax = value + ci )) + geom_point ( shape = 21 , size = iii , fill = "white" ) + ylim ( 40 , 60 )
The value
and value_norm
columns represent the united nations-normed and normed means. Run across the section beneath on normed means for more information.
Understanding within-subjects error bars
This department explains how the within-subjects error bar values are calculated. The steps here are for explanation purposes simply; they are non necessary for making the mistake bars.
The graph of private information shows that there is a consistent trend for the within-subjects variable condition
, but this would not necessarily exist revealed past taking the regular standard errors (or confidence intervals) for each group. The method in Morey (2008) and Cousineau (2005) essentially normalizes the data to remove the betwixt-bailiwick variability and calculates the variance from this normalized data.
# Use a consistent y range ymax <- max ( dfw_long $ value ) ymin <- min ( dfw_long $ value ) # Plot the individuals ggplot ( dfw_long , aes ( x = condition , y = value , colour = subject , group = subject )) + geom_line () + geom_point ( shape = 21 , fill = "white" ) + ylim ( ymin , ymax ) # Create the normed version of the information dfwNorm.long <- normDataWithin ( information = dfw_long , idvar = "subject" , measurevar = "value" ) # Plot the normed individuals ggplot ( dfwNorm.long , aes ( x = condition , y = value_norm , colour = subject , group = subject area )) + geom_line () + geom_point ( shape = 21 , fill = "white" ) + ylim ( ymin , ymax )
The differences in the mistake bars for the regular (betwixt-field of study) method and the inside-subject method are shown here. The regular error bars are in red, and the within-subject mistake bars are in blackness.
# Instead of summarySEwithin, use summarySE, which treats condition as though it were a between-subjects variable dfwc_between <- summarySE ( information = dfw_long , measurevar = "value" , groupvars = "condition" , na.rm = Faux , conf.interval = .95 ) dfwc_between #> condition Northward value sd se ci #> 1 pretest 10 47.74 eight.598992 2.719240 six.151348 #> 2 posttest 10 51.43 vii.253972 two.293907 5.189179 # Show the between-S CI's in red, and the within-Due south CI's in black ggplot ( dfwc_between , aes ( ten = condition , y = value , grouping = 1 )) + geom_line () + geom_errorbar ( width = .one , aes ( ymin = value - ci , ymax = value + ci ), color = "red" ) + geom_errorbar ( width = .1 , aes ( ymin = value - ci , ymax = value + ci ), information = dfwc ) + geom_point ( shape = 21 , size = iii , make full = "white" ) + ylim ( ymin , ymax )
2 within-subjects variables
If in that location is more 1 inside-subjects variable, the same function, summarySEwithin
, tin be used. This data set is taken from Hays (1994), and used for making this type of within-subject mistake bar in Rouder and Morey (2005).
data <- read.tabular array ( header = TRUE , text = ' Subject RoundMono SquareMono RoundColor SquareColor one 41 40 41 37 2 57 56 56 53 3 52 53 53 50 4 49 47 47 47 5 47 48 48 47 6 37 34 35 36 7 47 50 47 46 8 41 xl 38 twoscore 9 48 47 49 45 10 37 35 36 35 11 32 31 31 33 12 47 42 42 42 ' )
The information must first be converted to long format. In this case, the column names indicate two variables, shape (circular/square) and color scheme (monochromatic/colored).
# Convert it to long format library ( reshape2 ) data_long <- melt ( data = information , id.var = "Subject" , measure out.vars = c ( "RoundMono" , "SquareMono" , "RoundColor" , "SquareColor" ), variable.name = "Condition" ) names ( data_long )[ names ( data_long ) == "value" ] <- "Time" # Split Condition column into Shape and ColorScheme data_long $ Shape <- NA data_long $ Shape [ grepl ( "^Circular" , data_long $ Status )] <- "Round" data_long $ Shape [ grepl ( "^Foursquare" , data_long $ Status )] <- "Square" data_long $ Shape <- factor ( data_long $ Shape ) data_long $ ColorScheme <- NA data_long $ ColorScheme [ grepl ( "Mono$" , data_long $ Condition )] <- "Monochromatic" data_long $ ColorScheme [ grepl ( "Colour$" , data_long $ Status )] <- "Colored" data_long $ ColorScheme <- factor ( data_long $ ColorScheme , levels = c ( "Monochromatic" , "Colored" )) # Remove the Condition column now data_long $ Condition <- NULL # Look at commencement few rows head ( data_long ) #> Discipline Time Shape ColorScheme #> 1 i 41 Round Monochromatic #> two two 57 Round Monochromatic #> 3 3 52 Round Monochromatic #> 4 four 49 Circular Monochromatic #> five v 47 Round Monochromatic #> 6 vi 37 Round Monochromatic
At present it tin be summarized and graphed.
datac <- summarySEwithin ( data_long , measurevar = "Time" , withinvars = c ( "Shape" , "ColorScheme" ), idvar = "Field of study" ) datac #> Shape ColorScheme N Time Time_norm sd se ci #> 1 Circular Colored 12 43.58333 43.58333 one.212311 0.3499639 0.7702654 #> 2 Round Monochromatic 12 44.58333 44.58333 1.331438 0.3843531 0.8459554 #> 3 Square Colored 12 42.58333 42.58333 i.461630 0.4219364 0.9286757 #> 4 Foursquare Monochromatic 12 43.58333 43.58333 one.261312 0.3641095 0.8013997 library ( ggplot2 ) ggplot ( datac , aes ( ten = Shape , y = Time , fill up = ColorScheme )) + geom_bar ( position = position_dodge ( .9 ), color = "blackness" , stat = "identity" ) + geom_errorbar ( position = position_dodge ( .9 ), width = .25 , aes ( ymin = Time - ci , ymax = Time + ci )) + coord_cartesian ( ylim = c ( 40 , 46 )) + scale_fill_manual ( values = c ( "#CCCCCC" , "#FFFFFF" )) + scale_y_continuous ( breaks = seq ( 1 : 100 )) + theme_bw () + geom_hline ( yintercept = 38 )
Annotation about normed means
The summarySEWithin
part returns both normed and un-normed ways. The united nations-normed means are only the mean of each group. The normed means are calculated so that means of each between-field of study group are the same. These values can diverge when there are betwixt-subject variables.
For example:
dat <- read.tabular array ( header = Truthful , text = ' id trial gender dv A 0 male person two A 1 male 4 B 0 male 6 B 1 male 8 C 0 female 22 C 1 female 24 D 0 female 26 D one female 28 ' ) # normed and un-normed ways are different summarySEwithin ( dat , measurevar = "dv" , withinvars = "trial" , betweenvars = "gender" , idvar = "id" ) #> Automatically converting the following non-factors to factors: trial #> gender trial N dv dv_norm sd se ci #> 1 female 0 ii 24 fourteen 0 0 0 #> 2 female 1 2 26 sixteen 0 0 0 #> 3 male 0 ii iv 14 0 0 0 #> four male person 1 2 6 16 0 0 0
Helper functions
The summarySE
function is also defined on this page. If you merely are working with betwixt-subjects variables, that is the only function yous volition need in your code. If you take within-subjects variables and want to adjust the error bars then that inter-subject variability is removed as in Loftus and Masson (1994), then the other two functions, normDataWithin
and summarySEwithin
must also be added to your lawmaking; summarySEwithin
volition then be the role that you call.
## Gives count, mean, standard deviation, standard error of the hateful, and confidence interval (default 95%). ## data: a data frame. ## measurevar: the proper name of a cavalcade that contains the variable to be summariezed ## groupvars: a vector containing names of columns that contain grouping variables ## na.rm: a boolean that indicates whether to ignore NA'south ## conf.interval: the percent range of the conviction interval (default is 95%) summarySE <- function ( information = Nix , measurevar , groupvars = NULL , na.rm = FALSE , conf.interval = .95 , .drop = TRUE ) { library ( plyr ) # New version of length which can handle NA's: if na.rm==T, don't count them length2 <- function ( x , na.rm = FALSE ) { if ( na.rm ) sum ( ! is.na ( 10 )) else length ( x ) } # This does the summary. For each group's information frame, return a vector with # N, hateful, and sd datac <- ddply ( data , groupvars , .drop = .drop , .fun = function ( twenty , col ) { c ( N = length2 ( 20 [[ col ]], na.rm = na.rm ), hateful = hateful ( twenty [[ col ]], na.rm = na.rm ), sd = sd ( twenty [[ col ]], na.rm = na.rm ) ) }, measurevar ) # Rename the "mean" column datac <- rename ( datac , c ( "hateful" = measurevar )) datac $ se <- datac $ sd / sqrt ( datac $ N ) # Calculate standard error of the hateful # Conviction interval multiplier for standard error # Calculate t-statistic for confidence interval: # east.g., if conf.interval is .95, use .975 (above/below), and use df=North-1 ciMult <- qt ( conf.interval / 2 + .five , datac $ Northward -1 ) datac $ ci <- datac $ se * ciMult render ( datac ) }
## Norms the data inside specified groups in a data frame; it normalizes each ## discipline (identified by idvar) so that they have the same mean, inside each grouping ## specified by betweenvars. ## data: a data frame. ## idvar: the proper noun of a column that identifies each field of study (or matched subjects) ## measurevar: the name of a column that contains the variable to be summariezed ## betweenvars: a vector containing names of columns that are between-subjects variables ## na.rm: a boolean that indicates whether to ignore NA's normDataWithin <- office ( data = NULL , idvar , measurevar , betweenvars = Null , na.rm = Fake , .drop = True ) { library ( plyr ) # Measure var on left, idvar + between vars on right of formula. data.subjMean <- ddply ( data , c ( idvar , betweenvars ), .drop = .drop , .fun = office ( xx , col , na.rm ) { c ( subjMean = mean ( twenty [, col ], na.rm = na.rm )) }, measurevar , na.rm ) # Put the subject ways with original information information <- merge ( data , data.subjMean ) # Become the normalized data in a new column measureNormedVar <- paste ( measurevar , "_norm" , sep = "" ) data [, measureNormedVar ] <- data [, measurevar ] - data [, "subjMean" ] + mean ( data [, measurevar ], na.rm = na.rm ) # Remove this subject field mean cavalcade data $ subjMean <- NULL return ( information ) }
## Summarizes information, treatment within-subjects variables by removing inter-subject variability. ## It will all the same piece of work if in that location are no within-S variables. ## Gives count, un-normed mean, normed mean (with aforementioned between-group hateful), ## standard departure, standard fault of the mean, and confidence interval. ## If there are within-bailiwick variables, calculate adjusted values using method from Morey (2008). ## data: a data frame. ## measurevar: the proper name of a column that contains the variable to be summariezed ## betweenvars: a vector containing names of columns that are betwixt-subjects variables ## withinvars: a vector containing names of columns that are within-subjects variables ## idvar: the proper noun of a column that identifies each field of study (or matched subjects) ## na.rm: a boolean that indicates whether to ignore NA's ## conf.interval: the pct range of the confidence interval (default is 95%) summarySEwithin <- function ( data = NULL , measurevar , betweenvars = Naught , withinvars = NULL , idvar = NULL , na.rm = FALSE , conf.interval = .95 , .drop = TRUE ) { # Ensure that the betweenvars and withinvars are factors factorvars <- vapply ( data [, c ( betweenvars , withinvars ), driblet = FALSE ], FUN = is.gene , FUN.VALUE = logical ( i )) if ( ! all ( factorvars )) { nonfactorvars <- names ( factorvars )[ ! factorvars ] message ( "Automatically converting the following non-factors to factors: " , paste ( nonfactorvars , collapse = ", " )) data [ nonfactorvars ] <- lapply ( data [ nonfactorvars ], factor ) } # Get the means from the un-normed information datac <- summarySE ( data , measurevar , groupvars = c ( betweenvars , withinvars ), na.rm = na.rm , conf.interval = conf.interval , .drop = .drop ) # Drop all the unused columns (these volition be calculated with normed data) datac $ sd <- NULL datac $ se <- Zippo datac $ ci <- Zip # Norm each subject'due south information ndata <- normDataWithin ( data , idvar , measurevar , betweenvars , na.rm , .drop = .drop ) # This is the proper name of the new column measurevar_n <- paste ( measurevar , "_norm" , sep = "" ) # Collapse the normed data - at present we can treat between and within vars the aforementioned ndatac <- summarySE ( ndata , measurevar_n , groupvars = c ( betweenvars , withinvars ), na.rm = na.rm , conf.interval = conf.interval , .drop = .drop ) # Apply correction from Morey (2008) to the standard error and confidence interval # Go the production of the number of conditions of within-S variables nWithinGroups <- prod ( vapply ( ndatac [, withinvars , drib = FALSE ], FUN = nlevels , FUN.VALUE = numeric ( 1 ))) correctionFactor <- sqrt ( nWithinGroups / ( nWithinGroups -1 ) ) # Apply the correction cistron ndatac $ sd <- ndatac $ sd * correctionFactor ndatac $ se <- ndatac $ se * correctionFactor ndatac $ ci <- ndatac $ ci * correctionFactor # Combine the un-normed means with the normed results merge ( datac , ndatac ) }
Source: http://www.cookbook-r.com/Graphs/Plotting_means_and_error_bars_(ggplot2)/
Posted by: maringois1977.blogspot.com
0 Response to "How To Draw Error Bars On A Bar Graph"
Post a Comment