## Statistical Transformations

- Before we discuss transformation in graphs, we will take a quick look at a
**bar chart**.

- The above bar chart displays the total number of diamonds in the diamonds dataset, grouped by
`cut`

.`geom_bar`

only requires one aesthetic, that is`x`

.- On the y-axis, it displays count. But count is not a variable in
`diamonds`

!

- Many plots (like scatterplot) the raw values of your dataset.
- Other graphs (like bar charts, histograms, etc.) calculate new values to plot:
- bar charts, histograms, and frequency polygons bin your data and then plot bin counts, the number of points that fall in each bin.
- smoothers fit a model to your data and then plot predictions from the model.
- boxplots compute a robust summary of the distribution and then display a specially formatted box.

- The algorithm used to calculate new values for a graph is called a
**stat**, short for statistical transformation.

`geom`

and `stat`

functions

- You can generally use geoms and stats interchangeably. For example, you can recreate the previous plot using
`stat_count()`

instead of`geom_bar()`

:

- This works because
**every geom has a default stat; and every stat has a default geom**.

### Use a specific stat for graphing

There are three reasons you might need to use a stat explicitly:

- You might want to override the default stat.

```
demo <- tribble(
~cut, ~freq,
"Fair", 1610,
"Good", 4906,
"Very Good", 12082,
"Premium", 13791,
"Ideal", 21551
)
ggplot(data = demo) +
geom_bar(mapping = aes(x = cut, y = freq), stat = "identity")
```

- You might want to override the default mapping from transformed variables to aesthetics.
- For example, you might want to display a bar chart of proportion, rather than count:

- You might want to draw greater attention to the statistical transformation in your code.
- For example, you might use
`stat_summary()`

, which summarises the y values for each unique x value, to draw attention to the summary that youâ€™re computing:

- For example, you might use

## References

*R for Data Science*, by Garrett Grolemund, Hadley Wickham.