How do I compute descriptive statistics by group in R?

The most efficient way to compute grouped descriptive statistics in R is using the dplyr package. Use group_by(data, group_variable) followed by summarise() to compute any statistic per group. For example: data %>% group_by(species) %>% summarise(mean_value = mean(variable), sd_value = sd(variable), n = n()). The psych package's describeBy(data, group = data$group_variable) function also generates full descriptive statistics tables split by group.

How do I handle missing values (NA) in descriptive statistics in R?

Most R statistical functions return NA if the input contains missing values. To handle this, add na.rm = TRUE to functions like mean(x, na.rm = TRUE), sd(x, na.rm = TRUE), and median(x, na.rm = TRUE). The summary() function automatically handles NA values and reports their count. For complete case analysis, use na.omit(data) to remove all rows with any NA. For more sophisticated imputation, the mice package provides multiple imputation methods.

How do I visualize descriptive statistics in R using ggplot2?

ggplot2 provides powerful tools for visualizing descriptive statistics in R. Use geom_histogram() for distribution shape, geom_boxplot() for five-number summary visualization including outliers, geom_density() for smooth density curves, and geom_bar() for frequency distributions of categorical variables. The stat_summary() function lets you overlay mean and standard deviation directly on grouped plots. The patchwork or gridExtra packages let you arrange multiple ggplot2 plots in a single figure for a complete descriptive summary panel.

How to Perform Descriptive Statistics in R

Introduction

How to Perform Descriptive Statistics in R: Where Every Analysis Begins

Descriptive statistics in R is the first thing you do with any dataset — before models, before hypothesis tests, before predictions. You describe what you have. What’s the center of the data? How spread out is it? Does it lean left or right? Are there outliers lurking in the tails? R handles all of this with remarkable elegance, and the functions involved are genuinely not hard to learn once you understand what each one measures.

R is the language of choice for statistical computing across universities, research institutions, and data-driven industries in the United States and United Kingdom. The Comprehensive R Archive Network (CRAN), maintained by the R Foundation for Statistical Computing in Vienna, hosts over 20,000 packages — and dozens of them are dedicated to descriptive analysis. You don’t need all of them. You need the right five. This guide will teach you exactly which ones, and how to use them. Statistics assignment help often starts with this exact skill set — building a clean, complete descriptive summary of your data.

Numbers returned by summary() for every numeric variable — the fastest overview in R

Statistics generated by describe() from the psych package in a single line

Core R packages every statistics student must know: base R, psych, dplyr, ggplot2

What Is Descriptive Statistics?

Descriptive statistics describes the basic features of a dataset using quantitative summaries. It does not make predictions or test hypotheses — that is inferential statistics. Descriptive statistics answers the question: what does this data look like? It condenses potentially thousands of observations into a handful of interpretable numbers and charts. The three pillars are central tendency (where is the middle of the data?), variability (how spread out is it?), and distribution shape (is it symmetric, skewed, or heavy-tailed?). Understanding the difference between descriptive and inferential statistics is essential before running any analysis in R.

In R, descriptive statistics is typically the first section of any analysis script. You run it immediately after loading and cleaning your data. It tells you if your data makes sense — if the ranges are plausible, if there are suspicious spikes, if important variables are heavily skewed in ways that might violate the assumptions of your planned statistical tests. For college students working on research papers or lab reports, understanding your descriptive statistics is not optional: it is a required section of nearly every empirical methods assignment. Finding the right dataset for your statistical project is the natural precursor to running these analyses.

Why R for Descriptive Statistics?

Excel computes means. SPSS generates tables. Python with pandas can do most of what R does. So why use R specifically? Because R was built by statisticians, for statisticians. Its syntax is tightly aligned with statistical thinking. The psych package, developed by William Revelle at Northwestern University, produces a descriptive table in a single line that would take fifteen Excel formulas to replicate. The ggplot2 package, created by Hadley Wickham (now Chief Scientist at Posit PBC, formerly RStudio), produces publication-quality visualization with a grammar of graphics that makes statistical communication genuinely beautiful. R is also the language most commonly used in peer-reviewed statistics and social science journals — learning it now pays dividends throughout your academic career.

Quick Orientation: If you’re brand new to R, the two most important resources are the official documentation at r-project.org and the free online textbook R for Data Science by Hadley Wickham and Garrett Grolemund. For descriptive statistics specifically, the psych package vignette on CRAN is the most thorough technical reference available.

Setting Up R

Installing R, RStudio, and the Packages You Need

Before performing descriptive statistics in R, your environment needs to be ready. This section covers exactly what to install and why. You need two downloads: R itself (the language engine) and RStudio (the interface that makes working with R human-friendly). Both are free and open source.

Download and Install R

Go to CRAN and download the latest version of R for your operating system (Windows, macOS, or Linux). As of 2026, R 4.4.x is the current stable release. Install with all defaults.

Download and Install RStudio Desktop

Go to Posit’s website and download RStudio Desktop (free version). RStudio gives you a script editor, a console, an environment pane showing your objects, and a plot viewer — all in one window.

Install Required Packages

Open RStudio and run the following in the Console. You only need to do this once per machine. The install.packages() function downloads packages from CRAN.

# Install all packages needed for this guide
install.packages(c(
  "psych",      # comprehensive describe() function
  "dplyr",      # grouped summaries with group_by() + summarise()
  "ggplot2",    # visualization
  "e1071",      # skewness() and kurtosis()
  "skimr",      # clean skim() summary output
  "Hmisc"       # additional describe() variant
))

Load Packages at the Start of Every Script

Once installed, you load packages with library() at the top of each new R script. Installing is permanent (once); loading is per-session (every time you open R).

# Load packages at the top of your analysis script
library(psych)
library(dplyr)
library(ggplot2)
library(e1071)
library(skimr)

Loading Your Dataset

R comes with built-in datasets that are perfect for practicing descriptive statistics. The mtcars dataset (Motor Trend Car Road Tests, 1974) and the iris dataset (Edgar Anderson’s iris measurements) are the most commonly used. For your own data, the most common import functions are shown below.

# Use a built-in dataset (no import needed)
data(mtcars)
head(mtcars)       # view first 6 rows
str(mtcars)        # structure: variable types, dimensions
dim(mtcars)        # rows x columns: [1] 32 11

# Import a CSV file from your computer
my_data <- read.csv("path/to/your/file.csv", header = TRUE)

# Import an Excel file (requires readxl package)
library(readxl)
my_data <- read_excel("path/to/your/file.xlsx", sheet = 1)

Tip: Always Inspect Your Data Before Running Descriptive Statistics

Run str(data) to check variable types. Run head(data) and tail(data) to spot obvious data entry errors. Run colSums(is.na(data)) to count missing values per column. Skipping this step and going straight to summary() is one of the most common mistakes beginners make — your statistics will be misleading if the wrong variables are coded as numeric or if unexpected NAs exist. Understanding qualitative vs quantitative data is essential here: descriptive statistics functions only make sense on numeric (quantitative) variables.

The summary() Function

The summary() Function: Your First Stop in Descriptive Statistics in R

The fastest way to perform descriptive statistics in R on an entire dataset is a single command: summary(). This base R function requires no packages, no setup beyond loading your data, and returns a structured overview of every variable in your data frame in under a second. For any statistics assignment, this is always the first thing you run.

# summary() on the entire mtcars dataset
summary(mtcars)

mpg cyl disp Min. :10.40 Min. :4.000 Min. : 71.1 1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 Median :19.20 Median :6.000 Median :196.3 Mean :20.09 Mean :6.188 Mean :230.7 3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 Max. :33.90 Max. :8.000 Max. :472.0

For each numeric variable, summary() returns six statistics: minimum, first quartile (Q1), median, mean, third quartile (Q3), and maximum. When the mean and median differ substantially, this signals skewness — a key thing to flag in your descriptive analysis write-up. Notice that when mean > median (as in the disp variable above), the distribution is positively skewed. This is the kind of immediate insight summary() gives you.

Running summary() on a Single Variable

You don’t always need the full dataset summary. For a single variable — say, miles per gallon in the mtcars dataset — extract it using the $ operator.

# Summary for a single variable
summary(mtcars$mpg)

# Output:
#    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#   10.40   15.43   19.20   20.09   22.80   33.90

This is clean and immediately interpretable. The median fuel efficiency is 19.2 mpg while the mean is 20.09 — a modest difference suggesting mild positive skewness. The interquartile range (IQR = 22.80 − 15.43 = 7.37) tells you the spread of the middle 50% of cars in the dataset. When writing up descriptive statistics for an assignment, these are the exact numbers you would report in a results table. For more on how variance and standard deviation complement this picture, see expected values and variance in statistics.

What summary() Does Not Tell You

There is no standard deviation, no skewness, no kurtosis, and no count of non-missing observations in base summary() output. For a college assignment or research paper that requires a complete descriptive statistics table, you need to supplement summary() with additional functions — or use the psych package described in the next section. The official R introduction manual on CRAN provides a detailed reference for the summary function and related base R tools.

Need Help With Your R Statistics Assignment?

Our R and statistics experts provide step-by-step guidance on descriptive analysis, code debugging, and report writing — available 24/7 for college and university students.

Get Assignment Help Now Log In

Central Tendency

Measures of Central Tendency in R: Mean, Median, and Mode

Central tendency describes the center of a distribution. It answers: where is the “typical” value in this dataset? In R, you compute the three measures of central tendency — mean, median, and mode — with simple functions. Each tells you something different, and knowing when to use which one is as important as knowing how to compute it.

Mean in R: mean()

The arithmetic mean is the sum of all values divided by the count of values. It is the most widely used measure of central tendency but is sensitive to outliers. In R, mean() computes it directly. The critical argument to remember is na.rm = TRUE, which tells R to ignore missing values (NA) rather than returning NA for the whole result.

# Mean of a single variable
mean(mtcars$mpg)
# [1] 20.09062

# Mean ignoring missing values
mean(mtcars$mpg, na.rm = TRUE)

# Mean for all numeric columns in a data frame
sapply(mtcars, mean)

# Trimmed mean: removes top/bottom 10% before computing
mean(mtcars$mpg, trim = 0.10)

The trimmed mean is worth knowing — it is a compromise between the regular mean and the median, and it is more robust to outliers without completely ignoring magnitude the way the median does. In research papers analyzing income distributions or test scores, a 10% or 20% trimmed mean often better represents the “typical” case than either the raw mean or the median alone.

Median in R: median()

The median is the middle value when data is ordered from smallest to largest. For an even number of observations, R averages the two middle values. The median is the preferred measure of central tendency when your data is skewed or contains outliers — house prices, income data, and reaction times are classic examples where the median is more representative than the mean.

# Median of a single variable
median(mtcars$mpg)
# [1] 19.2

# Median with missing value handling
median(mtcars$mpg, na.rm = TRUE)

# Median for all columns
sapply(mtcars, median)

Notice that the mean of mpg (20.09) is slightly higher than the median (19.2). This difference tells you the distribution is mildly right-skewed — a handful of high-mpg cars (notably the Toyota Corolla at 33.9 mpg and Fiat 128 at 32.4 mpg) are pulling the mean upward. This kind of interpretation is exactly what your professor is looking for in a descriptive statistics write-up.

Mode in R: No Built-in Function — Write Your Own

R’s built-in mode() function returns the storage type of an object (e.g., “numeric”), not the statistical mode. This is a notorious source of confusion for beginners. To find the most frequently occurring value — the statistical mode — you need either a custom function or the DescTools package.

# Custom function for statistical mode
get_mode <- function(x) {
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
}

# Apply to cylinder variable (categorical-style numeric)
get_mode(mtcars$cyl)
# [1] 8   (most cars in the dataset have 8 cylinders)

# Using DescTools package (handles ties more explicitly)
install.packages("DescTools")
library(DescTools)
Mode(mtcars$cyl)
# [1] 8
# attr(,"freq")
# [1] 14   (14 out of 32 cars have 8 cylinders)

Common Mistake: Never write mode(mtcars$mpg) expecting the statistical mode. R will return "numeric" — the data type, not the most frequent value. This trips up almost every R beginner. Use the custom get_mode() function above or install DescTools for a proper Mode() function. Avoiding common analytical errors like this is the difference between a solid result and a misleading one.

When to Use Mean vs. Median vs. Mode

Use Mean When…

Data is roughly normally distributed (symmetric)
No significant outliers are present
You need to use the result in further calculations (e.g., variance)
Examples: height, weight, test scores in large samples

Use Median When…

Data is skewed (income, house prices, reaction times)
Outliers are present and you don’t want them to dominate
Your variable is ordinal rather than truly continuous
Examples: salary distributions, property values, clinical measurements

The mode is most useful for categorical or discrete data — the most common response category in a survey, the most common number of children per household, or the most common diagnostic code in a clinical dataset. For continuous data, mode is rarely reported in academic descriptive statistics tables. Computing mean, median, and mode in Excel follows similar logic, so if you are transitioning to R from Excel, the conceptual framework is the same — the R syntax is just more powerful and reproducible.

Measures of Variability

Measures of Variability in R: Variance, Standard Deviation, Range, and IQR

Knowing where the center of your data is tells you only half the story. Variability tells you how spread out the values are around that center. Two datasets can have identical means and medians but wildly different spreads. In descriptive statistics in R, you report variability using variance, standard deviation, range, and interquartile range — each capturing a different aspect of data spread. Understanding the relationship between expected values and variance deepens your theoretical grasp of why these measures matter.

Variance: var()

Variance measures the average squared deviation from the mean. R’s var() function computes the sample variance (divides by n−1, applying Bessel’s correction), not the population variance (divides by n). Unless you are working with a complete population, always use var() — not a manual n-denominator formula.

# Sample variance
var(mtcars$mpg)
# [1] 36.3241

# Variance for all columns
sapply(mtcars, var)

# Variance with NA handling
var(mtcars$mpg, na.rm = TRUE)

Variance is expressed in squared units — for mpg, that’s squared miles-per-gallon, which is not directly interpretable. That is why standard deviation is usually reported instead. Variance is primarily used as a building block in downstream analyses: ANOVA, regression, and factor analysis all rely on variance decomposition.

Standard Deviation: sd()

The standard deviation is the square root of variance — it rescales variance back to the original units of the variable, making it interpretable. A standard deviation of 6.03 mpg (the SD of mpg in mtcars) means that, on average, cars in this dataset deviate from the mean fuel efficiency by about 6 miles per gallon. This is the single most commonly reported measure of variability in academic research papers.

# Standard deviation
sd(mtcars$mpg)
# [1] 6.026948

# Coefficient of Variation (CV): SD as % of mean
# Useful for comparing variability across variables on different scales
cv <- (sd(mtcars$mpg) / mean(mtcars$mpg)) * 100
round(cv, 2)
# [1] 29.99  (mpg varies by about 30% of its mean)

The Coefficient of Variation (CV) is worth including in your descriptive tables when you need to compare variability across variables measured on different scales. A CV of 30% for mpg tells you there is moderate relative variability in fuel efficiency — about 30 cents of variation for every dollar of mean, so to speak. For variables like income or reaction time where the scale varies enormously, CV is far more informative than raw standard deviation. The foundational treatment of variance in Lehmann’s statistical estimation theory underpins why sample variance uses n−1 — worth reading if you are in a formal statistics course.

Range: range() and diff(range())

The range is the simplest spread measure: the distance from minimum to maximum. R’s range() returns both the minimum and maximum as a vector; wrapping it in diff() gives you the single range value.

# range() returns c(min, max)
range(mtcars$mpg)
# [1] 10.4 33.9

# diff(range()) returns the single range value
diff(range(mtcars$mpg))
# [1] 23.5

# min() and max() separately
min(mtcars$mpg)   # [1] 10.4
max(mtcars$mpg)   # [1] 33.9

Interquartile Range: IQR()

The IQR (Interquartile Range) measures the spread of the middle 50% of data (Q3 − Q1). It is robust to outliers in a way that range and standard deviation are not. When your data contains extreme values — or when you’re reporting on skewed distributions — the IQR is the preferred measure of spread. It is also the basis for identifying outliers in box plots: any observation more than 1.5 × IQR above Q3 or below Q1 is flagged as a potential outlier.

# IQR
IQR(mtcars$mpg)
# [1] 7.375

# Quartiles using quantile()
quantile(mtcars$mpg)
#   0%  25%  50%  75% 100% 
# 10.40 15.43 19.20 22.80 33.90 

# Custom percentiles (e.g., 10th, 25th, 75th, 90th)
quantile(mtcars$mpg, probs = c(0.10, 0.25, 0.75, 0.90))

Rule of Thumb for Reporting Variability: In most academic assignments, you report standard deviation alongside the mean, and IQR alongside the median. These pairs make sense together: mean + SD assumes a roughly symmetric distribution; median + IQR is robust to skew. Mixing them (e.g., reporting mean + IQR) creates a table that is internally inconsistent and will be flagged by your professor. For more on the theoretical properties that underpin these choices, the classical statistical literature on robust estimation remains the authoritative reference.

The psych Package

The psych Package: describe() and describeBy() for Full Descriptive Tables

If you need a comprehensive descriptive statistics table in a single command — the kind of table that belongs in an academic paper’s methods section — the psych package’s describe() function is the most efficient tool available for descriptive statistics in R. Developed by William Revelle at Northwestern University, psych is the most-downloaded statistics-specific package on CRAN and is widely used across psychology, education, and social sciences research in the United States and United Kingdom.

library(psych)

# Full descriptive table for entire dataset
describe(mtcars)

vars n mean sd median trimmed mad min max range skew kurtosis se mpg 1 32 20.09 6.03 19.20 19.70 5.41 10.4 33.90 23.50 0.61 -0.37 1.07 cyl 2 32 6.19 1.79 6.00 6.23 2.97 4.0 8.00 4.00 -0.17 -1.76 0.32 disp 3 32 230.72 123.94 196.30 222.52 140.48 71.1 472.00 400.90 0.38 -1.21 21.91 hp 4 32 146.69 68.56 123.00 141.19 77.10 52.0 335.00 283.00 0.73 -0.14 12.12 …

In a single line, describe() returns: n (sample size), mean, sd (standard deviation), median, trimmed (trimmed mean), mad (median absolute deviation), min, max, range, skew, kurtosis, and se (standard error). This is a complete descriptive statistics table. Copy it, format it, and you have the core of a publishable results section. For students producing research papers or lab reports, mastering academic research paper writing includes knowing how to present exactly this kind of table clearly and correctly.

Selecting Specific Variables with describe()

# describe() on a subset of variables
describe(mtcars[, c("mpg", "hp", "wt")])

# Or using dplyr select() for cleaner code
library(dplyr)
mtcars %>%
  select(mpg, hp, wt) %>%
  describe()

Grouped Descriptive Statistics: describeBy()

When your research question involves comparing groups — e.g., how do cars with 4, 6, and 8 cylinders differ in fuel efficiency? — describeBy() generates the full descriptive table separately for each level of a grouping variable. This is one of the most requested features in a statistics assignment and the function that makes psych worth installing for any grouped analysis.

# Descriptive statistics split by number of cylinders
describeBy(mtcars$mpg, group = mtcars$cyl)

# Full dataset split by group
describeBy(mtcars[, c("mpg", "hp", "wt")],
           group = mtcars$cyl)

Descriptive statistics by group group: 4 vars n mean sd median trimmed mad min max range skew kurtosis se mpg 1 11 26.66 4.51 26.0 26.36 6.67 21.4 33.9 12.5 0.26 -1.65 1.36 group: 6 vars n mean sd median trimmed mad min max range skew kurtosis se mpg 1 7 19.74 1.45 19.7 19.74 1.93 17.8 21.4 3.6 -0.35 -1.46 0.55 group: 8 vars n mean sd median trimmed mad min max range skew kurtosis se mpg 1 14 15.10 2.56 15.2 15.10 2.52 10.4 19.2 8.8 0.03 -0.79 0.68

The output tells a clear story: as cylinders increase from 4 to 8, mean fuel efficiency drops from 26.7 to 15.1 mpg. The standard deviation is highest for 4-cylinder cars (4.51), suggesting more variability within that group. This kind of grouped descriptive analysis is the precursor to an independent samples t-test or one-way ANOVA — confirming you understand your data before applying inferential tests. See t-test definitions and applications for the natural next step after describing grouped data.

Struggling With Your R Code or Statistics Assignment?

Our expert tutors debug R code, explain statistical concepts, and write clear results sections — available any time of day, for any university or college level.

Start Your Order Login

Distribution Shape

Skewness and Kurtosis in R: Measuring Distribution Shape

A complete descriptive analysis of descriptive statistics in R always addresses distribution shape — not just where the data centers and how spread it is, but how that spread is structured. Two distributions can share the same mean and standard deviation but have radically different shapes. Skewness captures asymmetry; kurtosis captures tail heaviness. Both are essential for evaluating normality before applying parametric tests like t-tests, ANOVA, or Pearson correlation. Understanding normal distribution, kurtosis, and skewness applications is foundational to this step.

Skewness: e1071 and psych

Skewness quantifies the degree to which a distribution leans left (negative skew) or right (positive skew). A perfectly normal distribution has skewness = 0. As a rule of thumb in the academic literature: skewness between −0.5 and +0.5 is approximately symmetric; between −1 and −0.5 or +0.5 and +1 is moderately skewed; beyond ±1 is highly skewed and warrants attention.

library(e1071)

# Skewness of mpg variable
skewness(mtcars$mpg)
# [1] 0.6106550   (mild positive/right skew)

# skewness() from psych describe() output (already included)
# For all variables at once:
sapply(mtcars, skewness)

# Interpretation helper
interpret_skew <- function(sk) {
  if(abs(sk) < 0.5) "Approximately symmetric"
  else if(abs(sk) < 1) "Moderately skewed"
  else "Highly skewed — consider transformation"
}
interpret_skew(skewness(mtcars$mpg))
# [1] "Moderately skewed"

Kurtosis: Measuring Tail Heaviness

Kurtosis measures the heaviness of distribution tails relative to a normal distribution. The e1071 package returns excess kurtosis (also called Fisher’s kurtosis), where the normal distribution has excess kurtosis = 0. A value greater than 0 (leptokurtic) means heavier-than-normal tails — more extreme values than expected. A value less than 0 (platykurtic) means lighter tails. The mtcars mpg variable has excess kurtosis of −0.37, indicating slightly lighter tails than normal.

# Kurtosis (excess kurtosis, normal = 0)
kurtosis(mtcars$mpg)
# [1] -0.3718876   (slightly platykurtic)

# All variables
sapply(mtcars, kurtosis)

# Note: psych's describe() also returns skew and kurtosis
# psych uses the same excess kurtosis (Fisher) convention as e1071

Testing for Normality with shapiro.test()

Once you have skewness and kurtosis, the natural next step is a formal normality test. The Shapiro-Wilk test (shapiro.test()) is the most powerful test for normality for small to moderate samples (n < 5,000) and is built into base R — no packages required.

# Shapiro-Wilk normality test
shapiro.test(mtcars$mpg)

# Output:
#     Shapiro-Wilk normality test
# data:  mtcars$mpg
# W = 0.94778, p-value = 0.1229

# p > 0.05: fail to reject normality — mpg is approximately normal

# Test all numeric columns
sapply(mtcars, function(x) shapiro.test(x)$p.value)

A p-value above 0.05 means you cannot reject the null hypothesis of normality — your variable is approximately normally distributed, which supports the use of parametric tests. A p-value below 0.05 means the distribution departs significantly from normality. In that case, consider log-transformation (log(x)), square-root transformation (sqrt(x)), or switching to non-parametric tests.

Understanding p-values and significance levels is essential for interpreting the Shapiro-Wilk test correctly — do not simply report the test result without explaining what the p-value means in the context of your analysis decision.

Grouped Statistics with dplyr

Grouped Descriptive Statistics in R Using dplyr

One of the most common analytical tasks in college and research settings is comparing descriptive statistics across groups — by treatment vs. control, by gender, by school year, or by geographic region. The dplyr package from the tidyverse (developed by Posit PBC and the R community) provides the most elegant and readable approach to grouped descriptive statistics in R. Its pipe operator (%>%) makes code read almost like English, making it easier to understand, debug, and explain in a methods section.

library(dplyr)

# Grouped descriptive statistics: mpg by cylinder count
mtcars %>%
  group_by(cyl) %>%
  summarise(
    n          = n(),
    mean_mpg   = round(mean(mpg), 2),
    sd_mpg     = round(sd(mpg), 2),
    median_mpg = median(mpg),
    iqr_mpg    = IQR(mpg),
    min_mpg    = min(mpg),
    max_mpg    = max(mpg)
  )

# A tibble: 3 × 8 cyl n mean_mpg sd_mpg median_mpg iqr_mpg min_mpg max_mpg <dbl> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> 1 4 11 26.7 4.51 26 6.67 21.4 33.9 2 6 7 19.7 1.45 19.7 1.93 17.8 21.4 3 8 14 15.1 2.56 15.2 3.88 10.4 19.2

This output is already close to a publication-ready table. For a college assignment, you would copy these numbers into your results section and use them to support an argument about how engine size relates to fuel efficiency. Notice how the standard deviation drops sharply as cylinders increase — 6-cylinder cars are remarkably consistent in fuel efficiency (SD = 1.45), while 4-cylinder cars vary more widely (SD = 4.51), possibly because this group includes everything from small economy cars to sports models.

Adding Multiple Variables to a Group Summary

# Summarise multiple variables by group using across()
mtcars %>%
  group_by(cyl) %>%
  summarise(
    across(c(mpg, hp, wt),
           list(mean = mean, sd = sd),
           .names = "{.col}_{.fn}")
  )

The across() helper function is one of the most useful modern dplyr additions — it applies a list of functions (here, mean and sd) to a list of columns simultaneously, generating neatly named output columns. This is the most efficient way to produce a multi-variable grouped descriptive table for a research paper or lab report. Social statistics coursework frequently requires exactly this kind of grouped summary, and dplyr’s pipe syntax makes the code both readable and reproducible.

Filtering and Summarising Simultaneously

# Descriptive stats for automatic transmission cars only
mtcars %>%
  filter(am == 0) %>%   # am=0 is automatic
  group_by(cyl) %>%
  summarise(
    n        = n(),
    mean_mpg = mean(mpg),
    sd_mpg   = sd(mpg)
  )

Additional Packages

skimr and Hmisc: Alternative Approaches to Descriptive Statistics in R

Beyond summary() and psych’s describe(), two additional packages are worth knowing for descriptive statistics in R: skimr and Hmisc. Both produce rich summary output with a single command, and both handle missing data, factor variables, and character variables in ways that summary() alone does not.

skimr: Clean, Readable Summaries

The skimr package (developed by Elin Waring and colleagues at the rOpenSci community) produces a beautifully organized summary using skim(). It separates numeric and character variable summaries, includes a small histogram for each numeric variable, and clearly reports the count of missing values. For quick data exploration, skimr is arguably the most readable single-command overview available in R.

library(skimr)

# Full skim summary
skim(mtcars)

# skim() integrated with dplyr group_by
mtcars %>%
  group_by(cyl) %>%
  skim(mpg, hp)

Hmisc: The describe() Variant with Extended Detail

The Hmisc package (developed by Frank Harrell at Vanderbilt University) provides its own describe() function that returns — for each variable — the number of observations, missing values, unique values, five lowest and five highest values, and frequency counts for categorical variables. It is particularly strong for variables with many categories or for clinical data where the extreme observed values matter as much as the central tendency.

library(Hmisc)

# Note: Hmisc's describe() may mask psych's describe()
# If using both, call explicitly:
Hmisc::describe(mtcars$mpg)

# Output includes: n, missing, unique values, mean, quantiles,
# plus the 5 lowest and 5 highest observed values

Package Masking Warning: Both psych and Hmisc export a function named describe(). If you load both packages, whichever was loaded last will mask the other. To use a specific version, always call it explicitly: psych::describe(data) or Hmisc::describe(data). This is a common source of “unexpected output” bugs in R scripts that use multiple packages. Avoiding misuse of statistical tools includes being deliberate about which functions and packages you are calling.

Handling Missing Data

Handling Missing Values in Descriptive Statistics in R

Real-world datasets have missing values. It’s just how it is. When you perform descriptive statistics in R without handling NAs, most functions return NA for the entire result — a frustrating experience if you don’t know why it is happening and how to fix it. This section shows you the standard approaches.

# Create a vector with missing values
x <- c(12, 15, NA, 18, 22, NA, 9)

# Without na.rm — returns NA
mean(x)         # [1] NA

# With na.rm = TRUE — ignores NAs
mean(x, na.rm = TRUE)   # [1] 15.2
sd(x, na.rm = TRUE)     # [1] 4.764452
median(x, na.rm = TRUE) # [1] 15

# Count missing values per column in a data frame
colSums(is.na(mtcars))

# Remove all rows with any NA (complete case analysis)
clean_data <- na.omit(mtcars)

# Proportion of missing values per column
colMeans(is.na(mtcars)) * 100   # expressed as percentage

Multiple Imputation for Serious Missing Data Problems

For datasets where missing data is substantial (more than 5% of values in any variable), simply removing rows with na.omit() can introduce bias and reduce your effective sample size significantly. The mice package (Multivariate Imputation by Chained Equations) implements multiple imputation — the gold standard method for handling missing data in academic research. While a full treatment is beyond this guide’s scope, understanding that the option exists is important for any serious statistical project. The mice package documentation published in the Journal of Statistical Software by van Buuren and Groothuis-Oudshoorn is the authoritative reference.

# Quick preview of mice for multiple imputation
install.packages("mice")
library(mice)

# Generate 5 imputed datasets
imputed <- mice(my_data_with_NAs, m = 5, method = "pmm")

# Extract one complete dataset
complete_data <- complete(imputed, 1)

Visualization with ggplot2

Visualizing Descriptive Statistics in R with ggplot2

Numbers tell you what; charts tell you why it matters. Visualizing your descriptive statistics in R with ggplot2 is not optional in a professional or academic context — it is the standard. The ggplot2 package uses a grammar of graphics framework where you build plots layer by layer: data, aesthetics (what variables map to x and y), geometry (what type of plot), and optional themes and labels. Once you understand this logic, every type of plot follows the same pattern.

Histogram: Visualizing Distribution Shape

library(ggplot2)

# Histogram of mpg with custom bins, color, and theme
ggplot(mtcars, aes(x = mpg)) +
  geom_histogram(binwidth = 3,
                 fill = "#2563EB",
                 color = "white",
                 alpha = 0.85) +
  geom_vline(aes(xintercept = mean(mpg)),
             color = "#AA4646", linewidth = 1.2,
             linetype = "dashed") +
  labs(title = "Distribution of Fuel Efficiency (MPG)",
       subtitle = "Dashed line = mean (20.09 mpg) | mtcars dataset",
       x = "Miles Per Gallon", y = "Count") +
  theme_minimal(base_size = 13)

Adding a vertical line at the mean (geom_vline()) on a histogram is standard practice in academic reports — it allows readers to immediately see where the center falls relative to the distribution shape. If the mean line sits noticeably off-center, you have visual evidence of skewness to discuss. Creating professional charts and graphs for assignments is a skill in itself, and ggplot2 is the tool that makes that skill achievable for any R user.

Box Plot: Five-Number Summary Visualized

# Box plot: mpg by cylinder count (grouped comparison)
ggplot(mtcars, aes(x = factor(cyl), y = mpg,
                   fill = factor(cyl))) +
  geom_boxplot(alpha = 0.75, outlier.color = "#AA4646",
               outlier.size = 2.5) +
  scale_fill_manual(values = c("#93c5fd", "#2563EB", "#1a3480")) +
  labs(title = "Fuel Efficiency by Number of Cylinders",
       x = "Cylinders", y = "Miles Per Gallon",
       fill = "Cylinders") +
  theme_minimal(base_size = 13)

The box plot visualizes the five-number summary (min, Q1, median, Q3, max) as a box with whiskers, and flags outliers as individual points. This is the single most information-dense chart for displaying descriptive statistics visually — a professor reviewing a statistics assignment can immediately see the median, the IQR, the range, and any outliers for each group simultaneously.

Density Plot: Visualizing Distribution Shape Smoothly

# Overlapping density plots by group
ggplot(mtcars, aes(x = mpg, fill = factor(cyl))) +
  geom_density(alpha = 0.45) +
  scale_fill_manual(values = c("#93c5fd", "#2563EB", "#1a3480")) +
  labs(title = "Density of MPG by Cylinder Count",
       x = "Miles Per Gallon", y = "Density",
       fill = "Cylinders") +
  theme_minimal(base_size = 13)

A Complete Descriptive Statistics Visualization Panel

# Combine multiple plots using patchwork package
install.packages("patchwork")
library(patchwork)

p1 <- ggplot(mtcars, aes(x = mpg)) +
  geom_histogram(fill = "#2563EB", bins = 12) +
  theme_minimal() + labs(title = "Histogram")

p2 <- ggplot(mtcars, aes(x = "", y = mpg)) +
  geom_boxplot(fill = "#93c5fd") +
  theme_minimal() + labs(title = "Box Plot")

p3 <- ggplot(mtcars, aes(x = mpg)) +
  geom_density(fill = "#AA4646", alpha = 0.6) +
  theme_minimal() + labs(title = "Density")

(p1 | p2 | p3) +
  plot_annotation(title = "Descriptive Statistics Panel: MPG")

Reference Tables

Quick Reference: Essential Functions for Descriptive Statistics in R

The following two tables summarize all the key functions and packages covered in this guide for performing descriptive statistics in R. Use these as a cheat sheet when writing your analysis scripts or preparing your assignment submissions.

Statistic	R Function	Package	Key Argument	Notes
Mean	`mean(x)`	base R	`na.rm = TRUE`	Sensitive to outliers; use trimmed mean for robustness
Median	`median(x)`	base R	`na.rm = TRUE`	Preferred for skewed distributions
Mode	custom / `Mode(x)`	DescTools	—	base R `mode()` returns storage type, not stat mode
Variance	`var(x)`	base R	`na.rm = TRUE`	Sample variance (n−1); squared units
Standard Deviation	`sd(x)`	base R	`na.rm = TRUE`	Same units as variable; pair with mean in reports
Range	`diff(range(x))`	base R	`na.rm = TRUE`	Affected by outliers; use IQR for robustness
IQR	`IQR(x)`	base R	`na.rm = TRUE`	Robust to outliers; pair with median in reports
Quantiles	`quantile(x)`	base R	`probs = c(...)`	Default returns 0%, 25%, 50%, 75%, 100%
Skewness	`skewness(x)`	e1071 / psych	—	0 = symmetric; >0 right skew; <0 left skew
Kurtosis	`kurtosis(x)`	e1071 / psych	—	Excess kurtosis; 0 = normal; >0 heavy tails
Full summary	`summary(data)`	base R	—	Min, Q1, median, mean, Q3, max for all variables
Complete descriptive table	`describe(data)`	psych	—	13 statistics per variable including skew, kurtosis, SE
Grouped descriptive	`describeBy(data, group)`	psych	`group =`	Full psych table split by grouping variable
Grouped summary	`group_by() %>% summarise()`	dplyr	`across()`	Most flexible approach for custom multi-variable tables
Normality test	`shapiro.test(x)`	base R	—	p > 0.05 = approximately normal; best for n < 5,000

Package	Developed By	Key Functions	Best For	Install Command
base R	R Foundation / CRAN	`mean, sd, var, summary, quantile, shapiro.test`	Core descriptive stats; no install required	Pre-installed
psych	William Revelle, Northwestern University	`describe(), describeBy(), pairs.panels()`	Complete academic descriptive tables; grouped stats	`install.packages("psych")`
dplyr	Hadley Wickham, Posit PBC	`group_by(), summarise(), filter(), select(), across()`	Flexible grouped summaries; data manipulation	`install.packages("dplyr")`
ggplot2	Hadley Wickham, Posit PBC	`geom_histogram(), geom_boxplot(), geom_density()`	Publication-quality visualization of distributions	`install.packages("ggplot2")`
e1071	TU Wien / CRAN	`skewness(), kurtosis()`	Distribution shape measures for normality assessment	`install.packages("e1071")`
skimr	rOpenSci community	`skim()`	Quick, clean exploratory data overview with mini-histograms	`install.packages("skimr")`
Hmisc	Frank Harrell, Vanderbilt University	`describe()`	Clinical data; extreme value reporting; many categories	`install.packages("Hmisc")`
DescTools	Andri Signorell, CRAN	`Mode(), Desc(), MeanCI()`	Statistical mode; confidence intervals for descriptives	`install.packages("DescTools")`

Reporting Your Results

How to Report Descriptive Statistics in R for Academic Assignments

Computing the numbers is only half the task. The other half is presenting them clearly in a way your professor can read, understand, and evaluate. For descriptive statistics in R assignments, reporting conventions follow the norms of your discipline — APA style for psychology and social science, AMA for health sciences, and field-specific conventions for econometrics and education research. Here are the key rules that apply across most academic contexts.

Reporting Central Tendency and Variability

Always report central tendency and variability together. The conventional format in APA style is: M = [value], SD = [value] for approximately normal variables, and Mdn = [value], IQR = [value] for skewed variables. Never report just the mean without its standard deviation, or just the median without the IQR — a center without a spread measure is incomplete and will be marked down in any rigorous assignment. Reporting statistical results transparently is a professional and ethical obligation in academic work, not just a formatting convention.

Example write-up: “Fuel efficiency ranged from 10.4 to 33.9 mpg (M = 20.09, SD = 6.03, Mdn = 19.2, IQR = 7.38). The distribution was moderately positively skewed (skewness = 0.61), with a Shapiro-Wilk test indicating approximate normality, W(32) = 0.95, p = .12.”

Presenting a Descriptive Statistics Table

For multi-variable analyses, present a table rather than reporting each variable in prose. A typical descriptive statistics table for a college assignment should include, at minimum: variable name, n, mean, SD, and range. For skewed variables, replace mean + SD with median + IQR. For papers requiring APA format, use professional tables and figures formatting with no vertical lines, minimal horizontal lines, and notes below explaining abbreviations.

Mentioning the R Package and Version

In a formal academic paper, you must cite the software and packages you used. The standard citation format for R is: “All analyses were conducted in R (version 4.4.x; R Core Team, 2025). Descriptive statistics were computed using the psych package (Revelle, 2024) and visualizations were produced using ggplot2 (Wickham, 2016).” Use citation("psych") in R to get the exact citation format for any package. Skipping this step is a minor but genuine academic integrity issue in methods sections of research papers. Writing a thorough literature review includes citing all methodological tools and software correctly.

Key Tip: Choose the Right Statistic for Your Distribution Shape

Before reporting any descriptive statistics, run skewness() and shapiro.test(). If skewness is between −0.5 and +0.5 and p > 0.05 on Shapiro-Wilk, report mean + SD. If the variable is skewed or fails the normality test, report median + IQR. This decision tree keeps your descriptive statistics internally consistent and shows your professor that you understand what the numbers mean — not just how to compute them. Choosing the right statistical test for your data follows the same logic: distribution shape determines method, not the other way around.

Need Your R Statistics Assignment Done Right?

From descriptive tables to full data analysis reports — our expert R tutors deliver accurate, well-explained work for college and university students at any level.

Order Now Log In

Related Concepts & Next Steps

LSI Keywords, Related Topics, and What Comes After Descriptive Statistics in R

Mastering descriptive statistics in R opens the door to the full analytical pipeline. Understanding what comes next — and what related tools and concepts connect to this foundation — helps you plan your learning path, structure your assignments more coherently, and produce more sophisticated analysis. Here are the most important related areas that build directly on descriptive statistics in R.

Inferential Statistics: The Natural Next Step

Once you have described your data, you are ready to draw inferences. The most common first inferential steps after descriptive analysis include: the one-sample t-test (does the mean differ from a known value?), the independent samples t-test (do two groups have different means?), the chi-square test (are two categorical variables associated?), and simple linear regression (how does one variable predict another?). Each of these requires a solid descriptive foundation — you should never run a t-test without first describing both groups’ means, SDs, and sample sizes.

Confidence Intervals for Descriptive Statistics

The mean you compute from your sample is a point estimate of the true population mean. A confidence interval quantifies the uncertainty around that estimate. In R, t.test(x)$conf.int returns the 95% confidence interval for the mean in a single line. For professional and academic reporting, always accompany means with confidence intervals when making inferences about populations.

Correlation: Connecting Descriptive Statistics to Relationships

After describing individual variables, the natural next question is how they relate to each other. Understanding correlation and statistical relationships in R uses cor(data) for a correlation matrix, and the psych package’s pairs.panels(data) for a beautiful combined scatterplot, histogram, and correlation matrix in one function call.

Probability Distributions: The Theory Behind the Summaries

Descriptive statistics describes what you observed. Probability distributions describe what you would expect under specific theoretical models. The connection is direct: when you confirm a variable is approximately normally distributed via shapiro.test(), you are establishing that the theoretical normal distribution is a good model for your empirical data — which then justifies using parametric procedures. Understanding distributions including binomial, Poisson, and uniform deepens your theoretical grasp of when and why your descriptive summaries matter.

Regression: From Description to Prediction

After describing individual variables and correlations between them, regression analysis uses your descriptive understanding to build predictive models. Simple linear regression predicts one continuous outcome from one predictor; logistic regression predicts a binary outcome. The assumptions of regression — linearity, homoscedasticity, normality of residuals — all trace back to what you first observed in descriptive statistics. Checking regression model assumptions is the formal extension of the normality and distribution checks you first performed descriptively.

Frequently Asked Questions

Frequently Asked Questions: Descriptive Statistics in R

What is descriptive statistics in R? +

Descriptive statistics in R refers to using R programming functions and packages to summarize, organize, and describe the main features of a dataset. It includes computing measures of central tendency (mean, median, mode), measures of variability (variance, standard deviation, IQR), and measures of distribution shape (skewness, kurtosis). R provides built-in functions like mean(), median(), sd(), and summary() alongside specialized packages like psych and DescTools for comprehensive descriptive analysis. It is always the first step in any data analysis workflow, before hypothesis testing or modeling.

How do I calculate mean, median, and mode in R? +

In R: mean is computed with mean(x, na.rm = TRUE), median with median(x, na.rm = TRUE). There is no built-in statistical mode function — use a custom function: get_mode <- function(x) { ux <- unique(x); ux[which.max(tabulate(match(x, ux)))] }, or install the DescTools package and use Mode(x). Never use R’s built-in mode() function expecting the statistical mode — it returns the storage type (“numeric”), not the most frequent value. For all three simultaneously across a data frame, psych’s describe() function is the most efficient approach.

What does the summary() function return in R? +

The summary() function in R returns a six-number summary for each numeric variable: minimum, first quartile (Q1), median, mean, third quartile (Q3), and maximum. For factor variables, it returns frequency counts per level. For character variables, it returns length, class, and mode. It is the fastest way to get a broad overview of your dataset in one command. Its limitation is that it does not return standard deviation, skewness, kurtosis, or sample size — for those, use psych’s describe() or compute them individually with sd(), skewness(), etc.

What is the difference between var() and sd() in R? +

Both var() and sd() measure spread, but at different scales. var() returns the sample variance — the average of squared deviations from the mean — expressed in squared units of the original variable (e.g., squared mpg). sd() is the square root of variance, returning a value in the same units as your variable. Standard deviation is almost always reported in academic papers because it is directly interpretable: an SD of 6 mpg means values typically deviate from the mean by 6 mpg. Variance is used in formulas (ANOVA, regression), but rarely reported alone.

How do I compute grouped descriptive statistics in R? +

Two main approaches: (1) dplyr: data %>% group_by(group_var) %>% summarise(mean_x = mean(variable), sd_x = sd(variable), n = n()) — this is the most flexible and readable approach. (2) psych: describeBy(data, group = data$group_var) — this generates the full 13-statistic psych table split by group in one line. For quick multi-variable grouped summaries, dplyr’s across() helper lets you apply a list of functions to multiple columns simultaneously. Both methods handle missing values when na.rm = TRUE is included.

How do I handle NA (missing values) in R descriptive statistics? +

The key argument is na.rm = TRUE in most base R functions: mean(x, na.rm = TRUE), sd(x, na.rm = TRUE), median(x, na.rm = TRUE). Without it, functions return NA if any missing value exists in the input. Check for missing values first with colSums(is.na(data)). For complete case analysis, use na.omit(data) to remove all rows containing any NA. For substantial missing data (>5%), use multiple imputation via the mice package rather than simple deletion, which can introduce bias and reduce statistical power.

What packages are best for descriptive statistics in R? +

The core four are: psych (for describe() and describeBy() — the most comprehensive descriptive table available), dplyr (for grouped summaries with group_by() + summarise()), ggplot2 (for visualizations), and e1071 (for skewness() and kurtosis()). Additionally: skimr for a clean, readable exploratory summary; Hmisc for clinical/extreme value detail; and DescTools for statistical mode and confidence intervals on descriptive statistics. Base R’s summary() and shapiro.test() require no installation. For most college assignments, psych + dplyr + ggplot2 covers everything needed.

What is skewness and kurtosis in R and why do they matter? +

Skewness measures distribution asymmetry: 0 = symmetric; positive = right tail (mean > median); negative = left tail (mean < median). Kurtosis measures tail heaviness: excess kurtosis of 0 matches a normal distribution; above 0 (leptokurtic) means heavier tails with more extreme values; below 0 (platykurtic) means lighter tails. They matter because most parametric tests (t-test, ANOVA, Pearson correlation) assume approximate normality. High skewness or kurtosis flags potential violations that require transformation or non-parametric alternatives. Use skewness() and kurtosis() from e1071 or psych, then confirm with shapiro.test().

How do I visualize descriptive statistics in R with ggplot2? +

ggplot2 is the standard for descriptive visualization in R. Use geom_histogram() for distribution shape, geom_boxplot() for five-number summary with outliers, geom_density() for smooth distribution curves, and geom_bar() for categorical frequency distributions. Add geom_vline(aes(xintercept = mean(x))) to overlay mean lines on histograms. For grouped comparisons, use fill = factor(group_variable) inside aes() to create colored groups. The patchwork package arranges multiple ggplot2 plots in a grid for a complete descriptive panel — ideal for assignment figures.

What is the difference between descriptive and inferential statistics in R? +

Descriptive statistics summarizes and describes the observed dataset — it tells you the shape, center, and spread of data you have collected. It makes no claims beyond the sample. Inferential statistics uses the sample to make inferences about a larger population — it uses t.test(), chisq.test(), aov(), lm(), and other functions to test hypotheses and estimate population parameters. Descriptive statistics always comes first: you must fully understand your data’s distribution before applying any inferential procedure. Running a t-test on data you have not first described is a methodological error in academic work.

How do I perform descriptive statistics on a data frame with multiple variables in R? +

The three most efficient approaches: (1) summary(data) applies the six-number summary to every column simultaneously; (2) psych::describe(data) generates 13 statistics per column in a structured table; (3) sapply(data, function_name) applies any function (mean, sd, skewness) to all columns at once. For subset of columns: psych::describe(data[, c(“var1”, “var2”)]) or data %>% select(var1, var2) %>% describe(). For named output: sapply(data, function(x) round(c(mean=mean(x, na.rm=TRUE), sd=sd(x, na.rm=TRUE)), 2)) produces a clean named matrix.

Should I cite R and its packages in my academic assignment? +

Yes — in any formal academic paper or lab report, you must cite your statistical software and packages. For R itself: R Core Team (current year). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/. For any package, run citation(“package_name”) in R to get the exact formatted reference. Example: citation(“psych”) returns the Revelle (2024) reference for the psych package. Failing to cite your analytical tools is treated the same way as failing to cite any other methodological source — as an incomplete methods section.

Blog

How to Perform Descriptive Statistics in R

How to Perform Descriptive Statistics in R: Where Every Analysis Begins

What Is Descriptive Statistics?

Why R for Descriptive Statistics?

Installing R, RStudio, and the Packages You Need

Download and Install R

Download and Install RStudio Desktop

Install Required Packages

Load Packages at the Start of Every Script

Loading Your Dataset

Tip: Always Inspect Your Data Before Running Descriptive Statistics

The summary() Function: Your First Stop in Descriptive Statistics in R

Running summary() on a Single Variable

What summary() Does Not Tell You

Need Help With Your R Statistics Assignment?

Measures of Central Tendency in R: Mean, Median, and Mode

Mean in R: mean()

Median in R: median()

Mode in R: No Built-in Function — Write Your Own

When to Use Mean vs. Median vs. Mode

Use Mean When…

Use Median When…

Measures of Variability in R: Variance, Standard Deviation, Range, and IQR

Variance: var()

Standard Deviation: sd()

Range: range() and diff(range())

Interquartile Range: IQR()

The psych Package: describe() and describeBy() for Full Descriptive Tables

Selecting Specific Variables with describe()

Grouped Descriptive Statistics: describeBy()

Struggling With Your R Code or Statistics Assignment?

Skewness and Kurtosis in R: Measuring Distribution Shape

Skewness: e1071 and psych

Kurtosis: Measuring Tail Heaviness

Testing for Normality with shapiro.test()

Grouped Descriptive Statistics in R Using dplyr

Adding Multiple Variables to a Group Summary

Filtering and Summarising Simultaneously

skimr and Hmisc: Alternative Approaches to Descriptive Statistics in R

skimr: Clean, Readable Summaries

Hmisc: The describe() Variant with Extended Detail

Handling Missing Values in Descriptive Statistics in R

Multiple Imputation for Serious Missing Data Problems

Visualizing Descriptive Statistics in R with ggplot2

Histogram: Visualizing Distribution Shape

Box Plot: Five-Number Summary Visualized

Density Plot: Visualizing Distribution Shape Smoothly

A Complete Descriptive Statistics Visualization Panel

Quick Reference: Essential Functions for Descriptive Statistics in R

How to Report Descriptive Statistics in R for Academic Assignments

Reporting Central Tendency and Variability

Presenting a Descriptive Statistics Table

Mentioning the R Package and Version

Key Tip: Choose the Right Statistic for Your Distribution Shape

Need Your R Statistics Assignment Done Right?

LSI Keywords, Related Topics, and What Comes After Descriptive Statistics in R

Inferential Statistics: The Natural Next Step

Confidence Intervals for Descriptive Statistics

Correlation: Connecting Descriptive Statistics to Relationships

Probability Distributions: The Theory Behind the Summaries

Regression: From Description to Prediction

Frequently Asked Questions: Descriptive Statistics in R

About Byron Otieno

Leave a Reply Cancel reply