Date variables can pose a challenge in data management. This is true in any package and different packages handle date values differently. This page aims to provide an overview of dates in R–how to format them, how they are stored, and what functions are available for analyzing them.

## Defining variables as dates

For a date variable stored as a vector of strings, see R FAQ: How can I format a string containing a date in R “Date” object?.

For a date variable stored as a vector of numbers, there is a little bit of
detective work to be done. Look at the few of the numbers and see if there’s a
clear pattern. If the numeric values are actually month, day, and
year values concatenated without separation, like 20011010 for October 10, 2001,
then these values should be converted to character strings (using **
as.character**) and then formatted using the tips in the link above.

If the numeric values are counting the days that have passed since some starting
date, then the **as.Date** function can be used with an origin date
indicated. Excel dates, when converted to integers, are counting from January 1,
1900. However, Excel uses 1 to represent this origin, while R uses 0, so we should first subtract 1 from each value before converting. Furthermore, Excel erroneously treats 1900 as a leap year, and so all numbers representing dates after February 28, 1900 are incorrectly incremented by 1, so we will need to subtract one from dates that fall after February 28, 1900. After cleaning up the numbers, we can indicate Excel’s origin date of January 1, 1900 in **as.Date**.

#These correspond to the following dates in Excel:#1: January 1, 1900#59: February 28, 1900#61: March 1, 1900 (remember that Excel wrongly treats 1900 as a leap year)#1000: September 26, 1902edates <- c(1, 59, 61, 1000)edates <- edates - 1edates[1] 0 58 60 999edates[edates>=60] <- edates[edates>=60] - 1as.Date(edates, origin="1900-01-01")[1] "1900-01-01" "1900-02-28" "1900-03-01" "1902-09-26"

Other packages store dates using different origins. SAS, for example uses 1960 rather than 1900. When R looks at dates as integers, its origin is January 1, 1970.

as.numeric(as.Date(edates, origin = "1900-01-01"))[1] -25567 -25509 -25508 -24569startdate <- "1970-01-01" as.numeric(as.Date(startdate))[1] 0

## Date objects in R

Date objects are stored in R as integer values, allowing for dates to be
compared and manipulated as you would a numeric vector. Logical
comparisons are a simple. When referring to dates, earlier dates are “less than” later dates.
Returning to our example above, we can compare the three dates in **edates** to
January 1, 1970. For dates prior to this, the comparison should return TRUE. For
later dates, the comparison should return FALSE. Here all dates are earlier than January 1, 1970.

as.Date(edates, origin = "1900-01-01") < "1970-01-01"[1] TRUE TRUE TRUE TRUE

Adding a week to dates can be done by simply adding 7. The date format is maintained.

weeklater <- as.Date(edates, origin = "1900-01-01") + 7 weeklater[1] "1900-01-08" "1900-03-07" "1900-03-08" "1902-10-03"class(weeklater)[1] "Date"

## R functions for dates

There are several functions in R specific to Date objects or for creating
Date objects. The **Sys.Date()** function generates the value of the current
date. It is easy to extract the day of the week and the month.

weekdays(weeklater)[1] "Monday" "Wednesday" "Thursday" "Friday"months(weeklater)[1] "January" "March" "March" "October

If you are interested in the distribution of a date variable, there are plotting functions available. Below, we randomly sample 100 dates in a year and then plot a histogram with one bar per month.

rdates <- as.Date("2010/1/1") + floor(365*runif(100))hist(rdates, "months", format = "%d %b")

## Much, much more

This has been a very quick overview. R also has date-time objects and functions specific to date-time data and more far functions than were shown here. The documentation pages for Dates and DateTimeClasses, both in base R, provide more details.