---
title: "Furniture"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Furniture}
  \usepackage[utf8]{inputenc}
  %\VignetteEngine{knitr::rmarkdown}
date: "`r Sys.Date()`"
editor_options: 
  chunk_output_type: console
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

This vignette is current as of `furniture` `r packageVersion("furniture")`.

## Using `furniture`

```{r}
library(furniture)
```


We will first make a fictitious data set:
```{r data}
df <- data.frame(a = rnorm(1000, 1.5, 2), 
                 b = seq(1, 1000, 1), 
                 c = c(rep("control", 400), rep("Other", 70), rep("treatment", 500), rep("None", 30)),
                 d = c(sample(1:1000, 900, replace=TRUE), rep(-99, 100)))
```

There are four functions that we'll demonstrate here:

1. `washer`
2. `table1`
3. `tableC`
4. `tableF`

## Washer

`washer` is a great function for quick data cleaning. In situations where there are placeholders, extra levels in a factor, or several values need to be changed to another.
```{r washer, message=FALSE, warning=FALSE}
library(dplyr)

df <- df %>%
  mutate(d = washer(d, -99),  ## changes the placeholder -99 to NA
         c = washer(c, "Other", "None", value = "control")) ## changes "Other" and "None" to "Control"
```


## Table1

Now that the data is "washed" we can start exploring and reporting.
```{r table1}
table1(df, a, b, factor(c), d)
```

The variables must be numeric or factor. Since we use a special type of evaluation (i.e. Non-Standard Evaluation) we can change the variables in the function (e.g., `factor(c)`). This can be extended to making a whole new variable in the function as well.

```{r table1.2}
table1(df, a, b, d, ifelse(a > 1, 1, 0))
```

This is just the beginning though. Two powerful things the function can do are shown below:

```{r table1.3}
table1(df, a, b, d, ifelse(a > 1, 1, 0),
       splitby=~factor(c), 
       test=TRUE)
```

The `splitby = ~factor(c)` stratifies the means and counts by a factor variable (in this case either control or treatment). When we use this we can also automatically compute tests of significance using `test=TRUE`. 

We can also use it intuitively within the pipe (for more about this, see the "Table 1" vignette):
```{r table1.3.2, message=FALSE, warning=FALSE}
df %>%
  group_by(c) %>%
  table1(a, b, d, ifelse(a > 1, 1, 0), 
        test=TRUE)
```
In this case, we used the `group_by()` function from `dplyr` (within the `tidyverse`) and `table1()` knows to use that as the grouping variable in place of the `splitby` argument.

If the parametric tests (default) are not appropriate, you can set `param = FALSE`.
```{r table1.3.3, message=FALSE, warning=FALSE}
df %>%
  group_by(c) %>%
  table1(a, b, d, ifelse(a > 1, 1, 0), 
        test=TRUE,
        param=FALSE)
```


Finally, you can polish it quite a bit using a few other options. For example, you can do the following:
```{r table1.4}
table1(df, a, b, d, ifelse(a > 1, 1, 0),
       splitby=~factor(c), 
       test=TRUE,
       var_names = c("A", "B", "D", "New Var"),
       type = c("simple", "condensed"))
```

Note that `var_names` can be used for more complex naming (e.g., with spaces, brackets) that otherwise cannot be used with data frames. Alternatively, for more simple naming, we can name them directly.
```{r table1.4.2}
table1(df, A = a, B = b, D = d, A2 = ifelse(a > 1, 1, 0),
       splitby=~factor(c), 
       test=TRUE,
       type = c("simple", "condensed"))
```


You can also format the numbers (adding a comma for big numbers such as in 20,000 instead of 20000):
```{r table1.5}
table1(df, a, b, d, ifelse(a > 1, 1, 0),
       splitby=~factor(c), 
       test=TRUE,
       var_names = c("A", "B", "D", "New Var"),
       format_number = TRUE)
```

The table can be exported directly to a folder in the working directory called "Table1". Using `export`, we provide it with a string that will be the name of the CSV containing the formatted table.
```{r table1.6, eval=FALSE}
table1(df, a, b, d, ifelse(a > 1, 1, 0),
       splitby=~factor(c), 
       test=TRUE,
       var_names = c("A", "B", "D", "New Var"),
       format_number = TRUE,
       export = "example_table1")
```

This can also be outputted as a latex, markdown, or pandoc table (matching all the output types of `knitr::kable`). Below shows how to do a latex table (not using `kable` however, but a built-in function that provides the variable name at the top of the table):
```{r table1.7}
table1(df, a, b, d, "new var" = ifelse(a > 1, 1, 0),
       splitby = ~factor(c), 
       test = TRUE,
       output = "latex2")
```

Last item to show you regarding `table1()` is that it can be printed in a simplified and condensed form. This instead of reporting counts and percentages for categorical variables, it reports only percentages and the table has much less white space.
```{r simple_table1.1}
table1(df, a, b, d, "new var" = ifelse(a > 1, 1, 0),
       splitby = ~factor(c), 
       test = TRUE,
       type = c("simple", "condensed"))
```


## Table C

This function is to create simple, beautiful correlation tables. The syntax is just like `table1()` in most respects. Below we include all the numeric variables to see their correlations. Since there are missing values in `d` we will use the natural `na.rm=TRUE`.

```{r tableC.1}
tableC(df, 
       a, b, d,
       na.rm = TRUE)
```

All the adjustments that you can make in `table1()` can be done here as well. For example,

```{r tableC.2}
tableC(df, 
       "A" = a, "B" = b, "D" = d,
       na.rm = TRUE,
       output = "html")
```

## Table F

This function is to create simple frequency tables. The syntax is just like `table1()` and `tableC()` in most respects, except that it uses only one variable instead of many.

```{r tableF.1}
tableF(df, a)
```

Similarly to `table1()` we can use a `splitby` argument (or `group_by()`).

```{r tableF.2}
tableF(df, d, splitby = c)
```

```{r tableF.3}
df %>%
  group_by(c) %>%
  tableF(d)
```


## Table X

Lastly, `tableX()` is a pipe-able two-way version of `table()` with a similar syntax to that of the rest of the `furniture` functions.

```{r tableX.1}
df %>%
  tableX(c, ifelse(d > 500, 1, 0))
```

By default, it provides the total counts for the rows and columns with flexibility as to what is displayed and where.


## Conclusion

The four functions: `table1()`, `tableC()`, `tableF()`, and `washer()` add simplicity to cleaning up and understanding your data. Use these pieces of furniture to make your quantitative life a bit easier.