Subsections

6. Miscellaneous

6.1 Combining flights

It is sometimes useful to have a data.frame that spans a whole project. Individual data.frames can be combined using the R function “rbind”, provided the individual data.frames have the same structure. The argument “F” to “getNetCDF()” can be used to add a variable named “RF” with the value specified by “F”, so that individual flights can be identified and easily separated in the combined data.frame.

Here are some examples that illustrate uses of the combined data set:


VarList <- c("ADIFR", "PITCH", "QCF", "PSF", "AKRD", "WIC",
  "TASF", "GGALT", "ROLL", "PSXC", "ATX", "DPXC", "QCXC",
  "EWX", "ACINS","GGLAT")
## add variables needed to recalculate wind
VarList <- c(VarList, "TASX", "ATTACK", "SSLIP",
  "GGVEW", "GGVNS", "VEW", "VNS", "THDG")
Data <- data.frame()
Project <- 'CSET'
Fl <- sort (list.files ( ## get list of available flights
  sprintf ("%s%s/", DataDirectory(), Project),
  sprintf ("%srf...nc$", Project)))
for (flt in Fl) {
    fname = sprintf("%s%s/%s", DataDirectory(), Project, flt)
    fno <- as.numeric(sub('.*f([0-9]*).nc', '\\1', flt))
    D <- getNetCDF (fname, VarList, F=fno)
    Data <- rbind(Data, D)
}
## impose restrictions where good vertical wind expected
Data <- dplyr::filter(Data, TASX > 90, abs(ROLL) < 2) %>%
        dplyr::select(Time, WIC, ATX, DPXC, EWX, GGALT, RF)
Data %>% ggplot() +
         geom_boxplot(aes(RF, WIC, group=RF),
                                 color='blue', na.rm=TRUE) +
         theme_WAC()

Figure 6.1: Distribution of values of the vertical wind for each research flight number.

Image multiFlt-1


Data %>% dplyr::select(ATX, GGALT, RF) %>% dplyr::filter(RF >= 3 & RF <= 6) %>%
  Rmutate(RF = as.character(RF)) %>%
  ggplot() + geom_point(aes(ATX, GGALT, color=RF)) +
  theme_WAC()

Figure 6.2: Measurements of temperature vs. altitude during research flights 3 to 6.

Image MF2-1


Data %>% dplyr::select(WIC, GGALT, RF) %>%
         dplyr::filter(RF == 4 | RF == 5) %>%
         Rmutate(RF = sprintf('research flight %d', RF)) %>%
         ggplot() + geom_point(aes(WIC, GGALT)) +
         facet_wrap(~ RF, nrow=1) + ## see also facet_grid()
         theme_WAC()

Figure 6.3: Example that uses faceted plots to show results from different research flights.

Image MF3-1


Data %>% dplyr::select(EWX, ATX, GGALT, RF) %>%
         dplyr::filter(RF == 4 | RF == 5) %>%
         Rmutate(RF = sprintf('research flight %d', RF)) %>%
         Rmutate(RH = 100 * EWX / MurphyKoop(ATX)) %>%  ## new variable
         ggplot() + geom_path(aes(RH, GGALT, color=RF)) +
         ylim(c(0, 7500)) +
         xlab('relative humidity [%]') +
         ylab('geometric altitude [m]') +
         theme_WAC()

Figure 6.4: Example that uses faceted plots to show results from different research flights.

Image MF4-1


Data %>% dplyr::group_by(RF) %>% summarise(mean = mean(WIC, na.rm=TRUE))
## # A tibble: 16 x 2
##       RF      mean
##    <dbl>     <dbl>
##  1     1  0.353   
##  2     2  0.142   
##  3     3  0.452   
##  4     4  0.328   
##  5     5  0.362   
##  6     6 -0.000691
##  7     7 -0.0391  
##  8     8 -0.0322  
##  9     9  0.0523  
## 10    10  0.0434  
## 11    11  0.0410  
## 12    12  0.00624 
## 13    13  0.0159  
## 14    14 -0.00931 
## 15    15  0.0351  
## 16    16 -0.228

6.2 Comments re “tibbles”

The data.frames used by convention in Ranadu are inconsistent with the “tidy” structure discussed in “R for Data Analysis” by H. Wickham because, for size-distribution variables such as those produced by the CDP or UHSAS, the column consists of a two-dimensional vector where the first dimension is the row and the second is the concentration or count of particles in each bin. Data.frames not containing such variables are “tidy” and can be converted to tibbles using the function as_tibble(). This will fail, however, for data.frames that contain size-distribution variables. The function Ranadu::df2tibble() will convert such data.frames to tibbles by converting the two-dimensional vectors into lists. However, then the tibbles won't work with functions like Ranadu::plotSD().6.1 Otherwise, the resulting tibbles are consistent with the Ranadu functions including plotting and algorithm calculations.

6.3 Reproducible research

With the tools now available, it is possible to document analysis projects to a degree that others can duplicate them using archived information. Steps toward that goal are the topic of this section. It is suggested that proper documentation of a project should include these components:

  1. The project report, documenting the analysis steps, data used, results and interpretation.
  2. Any code used.
  3. Enough information on the underlying programming language (version number, operating system, etc.) that someone else can use the same code interpreter if necessary.
  4. Locations of data files used, if in maintained archives, or copies of the data sufficient to reproduce the results.
  5. A discussion of the workflow required to reproduce the research. This may include discussion of aspects of the code that may not be evident to an inexperienced reader, documentation of investigations not included in the report, reasons for choices made, and other advice to a person seeking to reproduce the research that might not be appropriate in the project report. The workflow document can be less formal and more wordy or chatty that the project report if that material might be useful to another analyst.
Often, analysis steps are stutter-steps producing scattered material that is hard to assemble, with different steps used to generate plots, manipulate data, perform fits, construct derived data, combine multiple and supplementary data sets, etc. Reproducibility does not mean necessarily following that original path, but a logical path using the successful steps should be documented. Essential but not adequate steps toward reproducibility include making the code available in some repository and ensuring that the data as used is archived where it is accessible. The project report should indicate where these components of the analysis are saved. The additional component that will usually be needed by a reproducing analyst is a workflow document, which can be thought of as guidance to a person wishing to verify or extend the results.

R tools are available that are of great utility in performing reproducible research. The “knitr” package (see references) makes it possible to assemble the text and code in the same file and to use knitr functions to reference results from the code in the text or embed graphics in the document as generated in the code. The “Rnw” format or other alternative formats support this approach, and running that program can generate the project report while running the specified code. This avoids ad hoc assembly of figures, tables, and text from different sources, which often obscures efforts to reproduce the work. A suggested documentation package can then include the Rnw-format (or equivalent) file, the report in text form, links to archives where the data are available or alternately inclusion of the data in the archived project package, a workflow discussion, and documentation of the version of various programs and computer systems used. Some more information on using knitr is included in the “RSessions” shinyApp tutorial, in the “reproducibility” tab.

6.4 The Ranadu Shiny app

A shiny app that uses the Ranadu package to examine data files is documented here.