You can immediately install {soils}, create a new template project, and render example reports, as demonstrated in the last two tutorials. However, you will need to customize and edit the content to fit your project, shown in the next three tutorials: Customize & write, Render reports, and Troubleshoot.
An example dataset and data dictionary are included in every {soils}
project. Access the .csv files in the data
folder, or load the package and call the dataframes by name as shown
below.
| year | sample_id | farm_name | producer_id | field_id | county | crop | longitude | latitude | texture | sand_percent | silt_percent | clay_percent | bd_g_cm3 | pmn_lb_ac | nh4_n_mg_kg | no3_n_mg_kg | poxc_mg_kg | ph | ec_mmhos_cm | k_mg_kg | ca_mg_kg | mg_mg_kg | na_mg_kg | cec_meq_100g | b_mg_kg | cu_mg_kg | fe_mg_kg | mn_mg_kg | s_mg_kg | zn_mg_kg | total_c_percent | total_n_percent | ace_g_protein_kg_soil | min_c_96hr_mg_c_kg_day | p_olsen_mg_kg | wsa_percent | om_percent | toc_percent | whc_in_ft | inorganic_c_percent |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2023 | 23-WUY05-01 | Farm 150 | WUY05 | Field 01 | County 9 | Hay/Silage | -119 | 49 | Clay Loam | 44 | 23 | 33 | 1.30 | 67.13 | 1.6 | 9.2 | 496 | 6.7 | 0.42 | 498 | 1380 | 145.2 | 16.1 | 7.8 | 0.22 | 0.6 | 26 | 1.5 | 4.29 | 1.7 | 1.85 | 0.16 | 6.74 | 35.6 | 15 | 88.5 | 4.5 | 1.85 | 1.01 | NA |
| 2022 | 22-RHM05-02 | Farm 085 | RHM05 | Field 02 | County 18 | Green Manure | -123 | 47 | Sandy Loam | 69 | 21 | 10 | 0.88 | 129.97 | 21.6 | 6.1 | 571 | 5.9 | 0.05 | 198 | 780 | 96.8 | 20.7 | 10.5 | 0.09 | 0.4 | 28 | 2.7 | 9.41 | 0.8 | 2.88 | 0.18 | 21.50 | 30.0 | 37 | 92.6 | 5.8 | 2.88 | 1.08 | NA |
| 2022 | 22-ENR07-02 | Farm 058 | ENR07 | Field 02 | County 11 | Vegetable | -122 | 47 | Silt Loam | 11 | 79 | 10 | 1.21 | 122.17 | 8.1 | 25.3 | 419 | 6.3 | 0.60 | 294 | 1760 | 266.2 | 20.7 | 13.0 | 0.41 | 4.2 | 141 | 4.1 | 26.73 | 4.2 | 1.68 | 0.14 | 10.90 | 15.0 | 73 | 91.3 | 2.4 | 1.68 | 2.77 | NA |
| 2022 | 22-ZTD04-03 | Farm 061 | ZTD04 | Field 03 | County 13 | Herb | -120 | 46 | Silt Loam | 36 | 51 | 13 | 1.37 | 95.24 | 13.8 | 16.9 | 424 | 6.8 | 2.18 | 229 | 3380 | 738.1 | 80.5 | 14.4 | 0.72 | 1.1 | 37 | 11.5 | 51.70 | 2.4 | 1.40 | 0.12 | 5.53 | 67.5 | 30 | 94.3 | 2.9 | 1.40 | 1.93 | NA |
| 2023 | 23-WUY05-03 | Farm 150 | WUY05 | Field 03 | County 9 | Pasture, Seeded | -119 | 49 | Sandy Loam | 64 | 33 | 3 | 1.22 | 111.35 | 3.9 | 6.7 | 547 | 7.6 | 0.60 | 273 | 2820 | 193.6 | 13.8 | 10.1 | 0.25 | 0.7 | 15 | 1.7 | 3.29 | 0.8 | 1.65 | 0.16 | 4.20 | 50.6 | 8 | 84.6 | 6.7 | 1.53 | 1.28 | 0.12 |
| measurement_group | column_name | abbr | unit |
|---|---|---|---|
| Physical | texture | Texture | |
| Physical | sand_percent | Sand | % |
| Physical | silt_percent | Silt | % |
| Physical | clay_percent | Clay | % |
| Physical | bd_g_cm3 | Bulk Density | g/cm³ |
Use washi_data and data_dictionary as
templates when formatting your own data to use in {soils} functions and
reports.
Your data must contain the below required columns and each soil
measurement must be in its own column, as shown in
washi_data.
dplyr::glimpse(washi_data)
#> Rows: 100
#> Columns: 41
#> $ year <int> 2023, 2022, 2022, 2022, 2023, 2022, 2023, 2022,…
#> $ sample_id <chr> "23-WUY05-01", "22-RHM05-02", "22-ENR07-02", "2…
#> $ farm_name <chr> "Farm 150", "Farm 085", "Farm 058", "Farm 061",…
#> $ producer_id <chr> "WUY05", "RHM05", "ENR07", "ZTD04", "WUY05", "B…
#> $ field_id <chr> "Field 01", "Field 02", "Field 02", "Field 03",…
#> $ county <chr> "County 9", "County 18", "County 11", "County 1…
#> $ crop <chr> "Hay/Silage", "Green Manure", "Vegetable", "Her…
#> $ longitude <int> -119, -123, -122, -120, -119, -117, -118, -117,…
#> $ latitude <int> 49, 47, 47, 46, 49, 47, 49, 47, 48, 48, 46, 47,…
#> $ texture <chr> "Clay Loam", "Sandy Loam", "Silt Loam", "Silt L…
#> $ sand_percent <int> 44, 69, 11, 36, 64, 24, 80, 22, 62, 48, 80, 69,…
#> $ silt_percent <int> 23, 21, 79, 51, 33, 62, 16, 57, 26, 45, 14, 27,…
#> $ clay_percent <int> 33, 10, 10, 13, 3, 14, 4, 21, 12, 7, 6, 4, 10, …
#> $ bd_g_cm3 <dbl> 1.30, 0.88, 1.21, 1.37, 1.22, 1.14, 1.44, 1.24,…
#> $ pmn_lb_ac <dbl> 67.13, 129.97, 122.17, 95.24, 111.35, 61.92, -7…
#> $ nh4_n_mg_kg <dbl> 1.6, 21.6, 8.1, 13.8, 3.9, 12.4, 2.4, 12.4, 2.3…
#> $ no3_n_mg_kg <dbl> 9.2, 6.1, 25.3, 16.9, 6.7, 4.3, 21.5, 7.4, 2.3,…
#> $ poxc_mg_kg <int> 496, 571, 419, 424, 547, 235, 501, 480, 965, 10…
#> $ ph <dbl> 6.7, 5.9, 6.3, 6.8, 7.6, 5.5, 5.5, 5.9, 6.3, 6.…
#> $ ec_mmhos_cm <dbl> 0.42, 0.05, 0.60, 2.18, 0.60, 0.81, 0.55, 0.34,…
#> $ k_mg_kg <int> 498, 198, 294, 229, 273, 372, 289, 355, 253, 73…
#> $ ca_mg_kg <int> 1380, 780, 1760, 3380, 2820, 1480, 1140, 2080, …
#> $ mg_mg_kg <dbl> 145.2, 96.8, 266.2, 738.1, 193.6, 229.9, 133.1,…
#> $ na_mg_kg <dbl> 16.1, 20.7, 20.7, 80.5, 13.8, 16.1, 23.0, 16.1,…
#> $ cec_meq_100g <dbl> 7.8, 10.5, 13.0, 14.4, 10.1, 12.4, 12.9, 14.8, …
#> $ b_mg_kg <dbl> 0.22, 0.09, 0.41, 0.72, 0.25, 0.18, 0.12, 0.21,…
#> $ cu_mg_kg <dbl> 0.6, 0.4, 4.2, 1.1, 0.7, 1.0, 0.5, 1.4, 1.1, 0.…
#> $ fe_mg_kg <int> 26, 28, 141, 37, 15, 64, 44, 85, 129, 31, 86, 3…
#> $ mn_mg_kg <dbl> 1.5, 2.7, 4.1, 11.5, 1.7, 9.0, 4.4, 17.1, 9.9, …
#> $ s_mg_kg <dbl> 4.29, 9.41, 26.73, 51.70, 3.29, 4.51, 9.13, 8.2…
#> $ zn_mg_kg <dbl> 1.7, 0.8, 4.2, 2.4, 0.8, 0.5, 34.0, 0.9, 7.8, 0…
#> $ total_c_percent <dbl> 1.85, 2.88, 1.68, 1.40, 1.65, 1.55, 2.25, 2.37,…
#> $ total_n_percent <dbl> 0.16, 0.18, 0.14, 0.12, 0.16, 0.13, 0.15, 0.17,…
#> $ ace_g_protein_kg_soil <dbl> 6.74, 21.50, 10.90, 5.53, 4.20, 10.30, 7.73, 6.…
#> $ min_c_96hr_mg_c_kg_day <dbl> 35.60, 30.00, 15.00, 67.50, 50.60, 25.50, 30.60…
#> $ p_olsen_mg_kg <int> 15, 37, 73, 30, 8, 33, 27, 29, 40, 16, 19, 11, …
#> $ wsa_percent <dbl> 88.5, 92.6, 91.3, 94.3, 84.6, 86.6, 86.9, 82.5,…
#> $ om_percent <dbl> 4.5, 5.8, 2.4, 2.9, 6.7, 3.2, 27.0, 4.2, 7.9, 5…
#> $ toc_percent <dbl> 1.85, 2.88, 1.68, 1.40, 1.53, 1.55, 2.25, 2.37,…
#> $ whc_in_ft <dbl> 1.01, 1.08, 2.77, 1.93, 1.28, 2.25, 0.84, 2.14,…
#> $ inorganic_c_percent <dbl> NA, NA, NA, NA, 0.12, NA, NA, NA, NA, NA, NA, N…year through
longitude)Your data must have the below required columns. However, only the columns in bold are required to have values. Put another way, your data must have these column names, even if every row is blank.
year <int> is used to select
samples to be included in the report.
sample_id <chr> is a unique
identifier for each soil sample. Can be any alphanumeric value or a
combination of year, producer_id, and
field_id.
farm_name <chr> is included at the top of the
report. If blank, it is replaced with “Farm:
producer_id”.
producer_id <chr> is a unique
identifier for each producer. Reports are generated for each
producer_id within a given year. This can be an
alphanumeric value or the producer’s name.
field_id <chr> is to
distinguish fields when a producer has multiple samples in the same
year. Must be unique within a producer_id and
year combination. Can be any alphanumeric value or a
producer-assigned field name.
county <chr> is used to group and summarize
samples from the same county as the producer. Can be blank.
crop <chr> is used to group and summarize
samples from the same crop as the producer. Can be blank.
latitude <int> and
longitude <int>are used to map each sample point. If
blank, the map will not be included. Coordinates must be provided in
decimal degrees. latitude may range from -90 to 90.
longitude may range from -180 to 180.
Each soil measurement must have its own column in the dataset and a corresponding row in the data dictionary, as shown in Dictionary template.
texture through
na_mg_kg)All measurement columns except
texture must be numeric. Non-numeric values (e.g.,
ND, <1) will be coerced to NA
and may be omitted from summaries, tables, and plots. Clean or recode
censored values before uploading.
texture may be left blank. If at least two of
sand_percent, silt_percent, and
clay_percent are provided, texture will be
classified using USDA
NRCS rules.
sand_percent, silt_percent, and
clay_percent must:
Be between 0 and 100
Sum to 100 (± 1)
Include at least two values per sample (the third will be calculated as 100 minus the sum of the other two)
Delete any columns for measurements not analyzed in your project.
Add columns for any additional measurements you wish to include.
Important: Measurement column names in the
data must match exactly match the column_name
values in the data_dictionary. Update
data_dictionary after adding or removing
measurements.
The data dictionary is used to group and order soil measurements, and
to nicely format labels in tables and plots. The example
data_dictionary contains UTF-8 encoded
superscripts, subscripts, and special characters.
To properly encode your data dictionary as UTF-8, save it to the
data folder as
CSV UTF-8 (Comma delimited) (*.csv) in MS Excel.
dplyr::glimpse(data_dictionary)
#> Rows: 32
#> Columns: 4
#> $ measurement_group <chr> "Physical", "Physical", "Physical", "Physical", "Phy…
#> $ column_name <chr> "texture", "sand_percent", "silt_percent", "clay_per…
#> $ abbr <chr> "Texture", "Sand", "Silt", "Clay", "Bulk Density", "…
#> $ unit <chr> "", "%", "%", "%", "g/cm³", "%", "in/ft", "%", "mg/k…Your data dictionary must have the below required columns for every soil measurement included in your data.
measurement_group <chr>
defines how the soil measurements are grouped. Groups appear in the
report in the order listed in the data dictionary.
Only the following values are currently supported. Custom groups are not allowed and will fail validation.
English: Physical, Biological, Chemical, Plant Essential Macro Nutrients, Plant Essential Micro Nutrients
Spanish: Mediciones físicas, Mediciones biológicas, Mediciones químicas, Macronutrientes esenciales para plantas, Micronutrientes esenciales para plantas
column_name <chr> is used to
join the dictionary with your data. Must exactly match the column names
of the soil measurements in your data. Within each measurement group,
measurements appear in the order listed in the data dictionary.
abbr <chr>: Abbreviation used
in tables and plots. Shorter abbreviations improve readability.
unit <chr>: Unit of
measurement displayed in tables and plots.
abbr +
unit: Combination of abbreviation and unit
must be unique for each measurement.
Once your project data and dictionary files match the structure of
the examples and are saved in the data folder, follow along
with the changes in the code chunks in
01_producer-report.qmd. Code you will need to change are
marked with the text “EDIT:”. Find all edit markers in the
RStudio project with Ctrl + Shift +
F to open the Find in Files wizard.
Below changes are for demonstration purposes only, the actual changes should be based on your data and dictionary.
load-data chunkReplace washi-data.csv with the name of your data file
(my-data.csv).
load-dictionary chunkChange data-dictionary.csv to the name of your
dictionary file (my-dictionary.csv). If using subscripts,
superscripts, or special characters, make sure your csv is saved with
UTF-8 encoding (see Dictionary
template for how to do this).
# EDIT: Add your data dictionary to the data folder, using 'data-dictionary.csv'
# as a template.
# Load data dictionary
dictionary <- read.csv(
here::here("data/my-dictionary.csv"),
check.names = FALSE,
# Set encoding for using subscripts, superscripts, special characters
encoding = "UTF-8",
strip.white = TRUE
)tidy-long chunkAs of {soils} version 1.0.1, measurement columns are automatically
detected, converted to numeric, and pivoted to a tidy long format.
All measurement columns must be numeric. Non-numeric
values (e.g., ND, <1, NA,
blanks, or other characters) are coerced to NA and may be
omitted from summaries, tables, and plots. If you have additional
metadata columns that should not be converted or
pivoted, add them to the metadata_cols vector.
# OPTIONAL EDIT: Add any extra metadata columns in `data` that should not be
# treated as measurements.
metadata_cols <- c(
"year",
"sample_id",
"farm_name",
"producer_id",
"field_id",
"county",
"crop",
"longitude",
"latitude",
"texture"
)
# Identify measurement columns
measurement_cols <- data |>
dplyr::select(-dplyr::any_of(metadata_cols)) |>
colnames()
# Coerce measurement columns to numeric; warns if values are converted to NA
data <- soils::coerce_to_numeric(data, measurement_cols)
# Tidy data into long format and join with data dictionary
results_long <- data |>
tidyr::pivot_longer(
cols = dplyr::all_of(measurement_cols),
names_to = "measurement"
) |>
...See the troubleshooting article for more help on debugging errors.