Menu toggle

Research software and services

In 2021, ACEMS members were active in:

  • Producing, publishing, and updating research/open-source software, particularly R Packages hosted on platforms including CRAN, Github and MapleCloud.
  • Supporting the Australian Research Data Commons (ARDC) in its strategic agenda for research software.
  • Helping to educate and enable end-users about research/open-source software, including through podcasts, lectures, guest presentations, briefings, and blog articles.
  • Recognising excellence in research software development with the introduction of the Di Cook award.

Further, significant outcomes and benefits have been realised from ACEMS software outputs, including for Partner Organisations like CSIRO via their RiskLab and AIMS, as well as other end-users in the research community and industry such as the Federal Reserve.

New and Updated Open-Source Software Packages

ACEMS members published and made significant updates to software packages published for open access on a range of platforms, including CRAN, GitHub and MapleCloud. Some further details including download metrics are given in Table 1 below. Note that this list is not exhaustive with respect to either R or other open-source packages created by ACEMS members. For example, R code and packages are often outputs of collaborative research projects, such as this free software program "Predicting seagrass decline due to cumulative stressors", and may be shared in other repositories such as GitHub.

Table 1: Software Packages Created/Updated in 2021
Package Maintainer Description 2021 Downloads; 2021 Monthly Average*
forecast Rob Hyndman Methods and tools for displaying and analysing univariate time series forecasts including exponential smoothing via state space models and automatic ARIMA modelling. 3368082; 280674
DescTools Andri Signorell A collection of miscellaneous basic statistic functions and convenience wrappers for efficiently describing data. The author's intention was to create a toolbox, which facilitates the (notoriously time consuming) first descriptive tasks in data analysis, consisting of calculating descriptive statistics, drawing graphical summaries and reporting the results. The package contains furthermore functions to produce documents using MS Word (or PowerPoint) and functions to import data from Excel. Many of the included functions can be found scattered in other packages and other sources written partly by Titans of R. The reason for collecting them here, was primarily to have them consolidated in ONE instead of dozens of packages (which themselves might depend on other packages which are not needed at all), and to provide a common and consistent interface as far as function and arguments naming, NA handling, recycling rules etc. are concerned. Google style guides were used as naming rules (in absence of convincing alternatives). The 'BigCamelCase' style was consequently applied to functions borrowed from contributed R packages as well. 694032; 57836
hdrcde Rob Hyndman Computation of highest density regions in one and two dimensions, kernel estimation of univariate density functions conditional on one covariate, and multimodal regression. 216906; 18076
hts Earo Wang Provides methods for analysing and forecasting hierarchical and grouped time series. The available forecast methods include bottom-up, top-down, optimal combination reconciliation (Hyndman et al. 2011) <doi:10.1016/j.csda.2011.03.006>, and trace minimization reconciliation Wickramasuriya et al. 2018) <doi:10.1080/01621459.2018.1448825>. 215430; 17952
tsibble Earo Wang Provides a 'tbl_ts' class (the 'tsibble') for temporal data in an data- and model-oriented format. The 'tsibble' provides tools to easily manipulate and analyse temporal data, such as filling in time gaps and aggregating over calendar periods. 204754; 17063
fabletools Mitchell O’Hara-Wild Provides tools, helpers and data structures for developing models and time series functions for 'fable' and extension packages. These tools support a consistent and tidy interface for time series modelling and analysis. 124361; 10363
feasts Mitchell O’Hara-Wild Provides a collection of features, decomposition methods, statistical summaries and graphics functions for the analysing tidy time series data. The package name 'feasts' is an acronym comprising of its key features: Feature Extraction And Statistics for Time Series. 111506; 9292
fable Mitchell O’Hara-Wild Provides a collection of commonly used univariate and multivariate time series forecasting models including automatically selected exponential smoothing (ETS) and autoregressive integrated moving average (ARIMA) models. These models work within the 'fable' framework provided by the 'fabletools' package, which provides the tools to evaluate, visualise, and combine models in a workflow consistent with the tidyverse. 107671; 8973
fpp3 Rob Hyndman All data sets required for the examples and exercises in the book "Forecasting: principles and practice" by Rob J Hyndman and George Athanasopoulos <https://OTexts.com/fpp3/>. All packages required to run the examples are also loaded. 60888; 5074
tsibbledata Mitchell O’Hara-Wild Provides diverse datasets in the 'tsibble' data structure. These datasets are useful for learning and demonstrating how tidy temporal data can tidied, visualised, and forecasted. 56156; 4680
vitae Mitchell O’Hara-Wild Provides templates and functions to simplify the production and maintenance of curriculum vitae. 20847; 1737
bfast Dainius Masiliunas Decomposition of time series into trend, seasonal, and remainder components with methods for detecting and characterizing abrupt changes within the trend and seasonal components. 'BFAST' can be used to analyze different types of satellite image time series and can be applied to other disciplines dealing with seasonal or non-seasonal time series, such as hydrology, climatology, and econometrics. The algorithm can be extended to label detected changes with information on the parameters of the fitted piecewise linear models. 'BFAST' monitoring functionality is described in Verbesselt et al. (2010) <doi:10.1016/j.rse.2009.08.014>. 'BFAST monitor' provides functionality to detect disturbance in near real-time based on 'BFAST'- type models, and is described in Verbesselt et al. (2012) <doi:10.1016/j.rse.2012.02.022>. 'BFAST Lite' approach is a flexible approach that handles missing data without interpolation, and will be described in an upcoming paper. Furthermore, different models can now be used to fit the time series data and detect structural changes (breaks). 19278; 1606
eechidna Jeremy Forbes Data from the seven Australian Federal Elections (House of Representatives) between 2001 and 2019, and from the four Australian Censuses over the same period. Includes tools for visualizing and analysing the data, as well as imputing Census data for years in which a Census does not occur. This package incorporates data that is copyright Commonwealth of Australia (Australian Electoral Commission and Australian Bureau of Statistics) 2019. 6901; 575
bayesforecast Asael Alonzo Matamoros Fit Bayesian time series models using 'Stan' for full Bayesian inference. A wide range of distributions and models are supported, allowing users to fit Seasonal ARIMA, ARIMAX, Dynamic Harmonic Regression, GARCH, t-student innovation GARCH models, asymmetric GARCH, Random Walks, stochastic volatility models for univariate time series. Prior specifications are flexible and explicitly encourage users to apply prior distributions that actually reflect their beliefs. Model fit can easily be assessed and compared with typical visualization methods, information criteria such as loglik, AIC, BIC WAIC, Bayes factor and leave-one-out cross-validation methods. References: Hyndman (2017) <doi:10.18637/jss.v027.i03>; Carpenter et al. (2017) <doi:10.18637/jss.v076.i01>. 6868; 572
seer Thiyanga Talagala A novel meta-learning framework for forecast model selection using time series features. Many applications require a large number of time series to be forecast. Providing better forecasts for these time series is important in decision and policy making. We propose a classification framework which selects forecast models based on features calculated from the time series. We call this framework FFORMS (Feature-based FORecast Model Selection). FFORMS builds a mapping that relates the features of time series to the best forecast model using a random forest. 'seer' package is the implementation of the FFORMS algorithm. For more details see our paper at <https://www.monash.edu/business/econometrics-and-business-statistics/research/publications/ebs/wp06-2018.pdf>. 6734; 561
nortsTest Asael Alonzo Matamoros Despite that several tests for normality in stationary processes have been proposed in the literature, consistent implementations of these tests in programming languages are limited. Four normality test are implemented. The Lobato and Velasco's, Epps, Psaradakis and Vavra, and the random projections tests for stationary process. Some other diagnostics such as, unit root test for stationarity, seasonal tests for seasonality, and arch effect test for volatility; are also performed. The package also offers residual diagnostic for linear time series models developed in several packages. 6357; 530
HRW Matt Wand The book "Semiparametric Regression with R" by J. Harezlak, D. Ruppert & M.P. Wand (2018, Springer; ISBN: 978-1-4939-8851-8) makes use of datasets and scripts to explain semiparametric regression concepts. Each of the book's scripts are contained in this package as well as datasets that are not within other R packages. Functions that aid semiparametric regression analysis are also included. 6339; 528
lookout Sevvandi Kandanaarachchi Outlier detection using leave-one-out kernel density estimates and extreme value theory. The bandwidth for kernel density estimates is computed using persistent homology, a technique in topological data analysis. Using peak-over-threshold method, a generalized Pareto distribution is fitted to the log of leave-one-out kde values to identify outliers. 3996; 333
spotoroo Weihao Li An algorithm to cluster satellite hot spot data spatially and temporally. 3527; 294
Rsfar Hossein Haghbin This is a collection of functions designed for simulating, estimating and forecasting seasonal functional autoregressive time series of order one. These methods are addressed in the manuscript: <https://www.monash.edu/business/ebs/research/publications/ebs/wp16-2019.pdf>. 3522; 294
ferrn Huize Sherry Zhang Diagnostic plots for optimisation, with a focus on projection pursuit. These show paths the optimiser takes in the high-dimensional space in multiple ways: by reducing the dimension using principal component analysis, and also using the tour to show the path on the high-dimensional space. Several botanical colour palettes are included, reflecting the name of the package. 3509; 292
nestr Emi Tanaka Facilitates building a nesting or hierarchical structure as a list or data frame by using a human friendly syntax. 3385; 282
flipbookr Evangeline Reynolds Flipbooks present code step-by-step and side-by-side with its output. 'flipbookr' helps creators build flipbooks efficiently because code pipelines are automatically parsed and prepped for presentation as flipbooks. 3339; 278
liminal Stuart Lee Compose interactive visualisations designed for exploratory high-dimensional data analysis. With 'liminal' you can create linked interactive graphics to diagnose the quality of a dimension reduction technique and explore the global structure of a dataset with a tour. A complete description of the method is discussed in ['Lee' & 'Laa' & 'Cook' (2020) <arXiv:2012.06077>]. 2863; 239
outlierensembles Sevvandi Kandanaarachchi Ensemble functions for outlier/anomaly detection. There is a new ensemble method proposed using Item Response Theory. Existing outlier ensemble methods from Schubert et al (2012) <doi:10.1137/1.9781611972825.90>, Chiang et al (2017) <doi:10.1016/j.jal.2016.12.002> and Aggarwal and Sathe (2015) <doi:10.1145/2830544.2830549> are also included. 2691; 224
cropgrowdays Peter Baker Calculate agrometeorological variables for crops including growing degree days (McMaster, GS & Wilhelm, WW (1997) <doi:10.1016/S0168-1923(97)00027-0>), cumulative rainfall, number of stress days and cumulative or mean radiation and evaporation. Convert dates to day of year and vice versa. Also, download curated and interpolated Australian weather data from the Queensland Government DES longpaddock website <https://www.longpaddock.qld.gov.au/>. This data is freely available under the Creative Commons 4.0 licence. 2656; 221
learningtower Kevin Wang The Programme for International Student Assessment (PISA) is a global study conducted by the Organization for Economic Cooperation and Development (OECD) in member and non-member countries to assess educational systems by assessing 15-year-old school students academic performance in mathematics, science, and reading. This datasets contains information on their scores and other socioeconomic characteristics, information about their school and its infrastructure, as well as the countries that are taking part in the program. 1801; 150
adbb Rob Hyndman An R package containing state-level demographic data for Australia. To be used in conjunction with the demography package. -
ccpdmp Matt Sutton PDMP-based samplers move a particle with a state and velocity using deterministic dynamics. At random event times the velocity is updated and the particle continues. The trajectories returned by the sampler define a Markov process sampling the distribution of interest. For more information see Joris Bierken’s Overview, the great Julia package ZigZagBoomerang and our Arxiv preprint: https://arxiv.org/abs/2112.12897. This package implements the concave-convex PDMP algorithm to facilitate sampling from distributions of interest (particularly Bayesian distributions). -
conmat Nicholas Tierney The goal of conmat is to provide methods for producing contact matrices. -
cricketdata Rob Hyndman Functions for downloading data on international and other major cricket matches from ESPNCricinfo and Cricsheet. This package provides some functions to download the data into tibbles ready for analysis. -
fasster Mitchell O'Hara-Wild An implementation of the FASSTER (Forecasting with Additive Switching of Seasonality, Trend and Exogenous Regressors) model in R. This model is designed to capture patterns of multiple seasonality in a state space framework by using state switching. The fasster package prioritizes flexibility, computational speed and accuracy to provide convenient tools for modelling, predicting and understanding high frequency time-series. -
Identification-two-phase-recovery David Warne This repository contains useful R functions and scripts to process coral cover data obtained from reef surveys to classify patterns in recovery trajectories following major disturbance events. -
Maple 2020 procedures and a dashboard for interactive testing of uncontrolled linear-time-invariant structures for structural global identifiability Jason Whyte This worksheet relates to the testing of a particular type of "model structure" (collection of related models) for the property of structural global identifiability (SGI). The absence of this property makes it very unlikely that we can obtain a unique estimate of the structure's parameters when calibrating the structure to data. This may mean that we cannot use our structure to make predictions, and the time, effort, and resources used in modelling and data collection may be unproductive. However, we may be able to anticipate this problem in advance of data collection. The content below relates to my book chapter "Branching out into Structural Identifiability Analysis with Maple: Interactive Exploration of Uncontrolled Linear Time-Invariant Structures", to appear in "Maple in Mathematics Education and Research" (the Proceedings of the Maple 2020 Conference, Editors Rob Corless, Jürgen Gerhard, and Ilias Kotsireas), a book in Springer's series "Communications in Computer and Information Science". (The pre-print is available here https://arxiv.org/abs/2103.11309) -
ozbabynames Rob Hyndman The ozbabynames package provides the dataset ozbabynames. This contains popular Australian baby names by sex, state and year. -
PETabc Clara Grazian This package performs simulation and Bayesian analysis of kinetic models, in particular of single-tissue and two-tissue compartment models. The Bayesian analysis is performed through rejection ABC, with summary statistics obtained through smooth splines. -
tidytreatment Joshua Bon Functions for extracting tidy data from Bayesian treatment effect models, in particular BART, but extensions are possible. Functionality includes extracting tidy posterior summaries as in 'tidybayes' <https://github.com/mjskay/tidybayes>, estimating (average) treatment effects, common support calculations, and plotting useful summaries of these. -
yahtsee Nicholas Tierney The goal of yahtsee is to provide tools around fitting hierarchical time series models for data commonly found in malaria. -

Current ACEMS Packages Maintained

ACEMS members maintain a wide variety of R Packages which remain popular amongst CRAN’s diversity of international users. Since 2016 ACEMS has published and maintained over 100 open-source packages on the CRAN platform. These open-source software packages on CRAN have been downloaded more than 73.6 million times since 2016 by students, researchers, and other end-users. A selection of some of the most downloaded packages are shown in Table 2 below.

Table 2: Popular ACEMS Software Packages Published to CRAN (>200,000 downloads)
Package Maintainer Current Version Downloads (since 2016)
rmarkdown Yihui Xie 2.11 32717202
forecast Rob Hyndman 8.15 11662976
fracdiff Martin Maechler 1.5-1 7846588
GGally Barret Schloerke 2.1.2 3545227
KernSmooth Brian Ripley 2.23-20 2158380
DescTools Andri Signorell 0.99.44 2136250
expsmooth Rob Hyndman 2.3 1523874
fma Rob Hyndman 2.4 1063133
imputeTS Steffen Moritz 3.2 971113
fpp Rob Hyndman 0.5 782840
naniar Nicholas Tierney 0.6.1 744289
visdat Nicholas Tierney 0.5.3 713627
rticles Christophe Dervieux 0.22 553193
hts Earo Wang 6.0.2 526154
tsibble Earo Wang 1.1.1 506591
fpp2 Rob Hyndman 2.4 488353
distributional Mitchell O'Hara-Wild 0.2.2 423910
hdrcde Rob Hyndman 3.4 383319
rainbow Han Lin Shang 3.6 340607
tsfeatures Rob Hyndman 1.0.2 328420
thief Rob Hyndman 0.3 313749
fds Han Lin Shang 1.8 287994
fabletools Mitchell O'Hara-Wild 0.3.2 282413
fable Mitchell O'Hara-Wild 0.3.1 239963
feasts Mitchell O'Hara-Wild 0.2.2 231037
xaringan Yihui Xie 0.22 208020

Supporting ARDC Strategy for Recognising Software as a First-Class Research Output

The Australian Research Data Commons (ARDC) is developing a national research software agenda to establish research software as a first class research output.

Developing and implementing a national research software agenda is a strategic activity for the ARDC’s Platforms and Software theme. Over 2020-2023, this initiative will enable the ARDC to establish the agenda for recognition of research software as a first-class research output, alongside journal papers and datasets, and for recognising the importance of software as a research enabler.”

ACEMS and its members engaged with the ARDC and collaborated in co-hosting several events to promote and advocate software as a first-class research output. This suite of events held in the latter part of 2021 included:

  • An ACEMS public lecture by ACEMS CI Matt Wand titled “Statistical Methodology Development and Software Dissemination” with multiple members from the ARDC present.
  • A Random Sample podcast episode titled “Software as a first-class research output”. In this episode ACEMS CI Rob Hyndman describes a recent policy change at the Monash Business School to recognise software outputs in that faculty for the purposes of career progression. The discussion also featured Tom Honeyman from the ARDC and was hosted by ACEMS AI Nick Tierney.
  • ACEMS hosted an expert panel webinar titled “Open-source research software panel discussion”. The panel was hosted by ACEMS PhD Student Katie Buchhorn and featured Tom Honeyman from the ARDC and ACEMS AIs Di Cook, Sevvandi Kandanaarachchi and Dan Simpson. The panel discussed how open-source software development benefits researchers and others, effect on career development, research impact via software, what can be done to increase uptake and usability, and getting recognition for publishing research software.

Educating and enabling end-users about open-source research software

ACEMS members were also featured in or organised multiple public lectures, podcasts, invited seminars and presentations, and workshops for open-source software end-users, related to research software packages. A selection of these events and presentations from 2021 are in Table 3 below.

Table 3: Selected presentations relating to open-source research software in 2021
Forum Presentation Speaker(s)
ACEMS/Australian Data Science Network - Focused Workshop: Synthetic Data GRATIS: GeneRAting TIme Series with diverse and controllable characteristics Rob Hyndman
ACEMS Podcast – The Random Sample Episode 66: Software as a First-Class Research Output Tom Honeyman (ARDC), Rob Hyndman, and Nick Tierney
ACEMS Public Lecture Statistical Methodology Development and Software Dissemination Matt Wand
ACEMS Public Lecture Uncertain futures: what can we forecast and when should we give up? Rob Hyndman
ACEMS Technical Talk Greta: Simple and Scalable Statistical Modelling in R Nick Tierney
E.A. Cornish Lecture Feasts and fables: modern tools for time series analysis Rob Hyndman
IASSL Workshop Data Analysis with R Programming for Beginners Priyanga Dilini Talagala and Thiyanga Talagala
ICSMTR 2021 Spatial Modelling and Visualization with R Susanna Cramb
Malaysia R 2021 The Role of R and Data Visualisation in Understanding Your World Di Cook
Monash MDFI Meetup The right to access, open data, open software, diagnostics and statistics as integral components of AI Di Cook
QUT Centre for Data Science Seminar Use of Open-Source Tools* for the Australian Environmental Health Atlas (AusEnHealth) [*including visualisation of aggregated spatial data using R, leaflet, and RShiny] Aiden Price
R Ladies Workshop Working smarter, faster and more securely in R Katie Buchhorn
RStudio Blog + RStudio YouTube Channel An inclusive solution for teaching and learning R during the COVID pandemic Patricia Menendez
Shiny Dev Podcast @theRcast Episode 28: The transformative effects of Shiny with the coronaSweeper app Kate Saunders
Statistical Society of Australia Workshop An introduction to Bayesian modelling using greta Nick Golding (Telethon Kids) and Nick Tierney
Statistical Society of Australia Workshop Data Visualisation with R Di Cook and Emi Tanaka
United Nations Training Program on Earth Observations for Agricultural Statistics Virtual Presentation to UN Global Working Group on Big Data, US - Training in the Use of Big Data and Data Science Amy Stringer
Why R? 2021 Conference Feasts & fables: modern tools for time series analysis Rob Hyndman

In addition to these events, many ACEMS members support the development, profile, and adoption of research software by publishing articles on online blogs, informal mentoring sessions and briefings with industry stakeholders. Some examples include:

  • ACEMS CI Rob Hyndman briefed Australia’s Federal Reserve Board’s UseR group on time series analysis using the tssible, feasts, and fable packages – some of the most frequently downloaded ACEMS R packages on CRAN.
  • ACEMS AI Patricia Menendez authored a blog post for RStudio, RStudio Cloud: An inclusive solution for learning R, and then presented in a related RStudio YouTube event.
  • ACEMS AI Nick Tierney (a member of the R Journal Editorial Board) co-hosts a social co-working and office hours event with the international group, rOpenSci. The event is held bi-monthly online with developers from around the world to exchange advice, ideas and improvements for existing R packages.

Recognising excellence in research software development

ACEMS AI Di Cook, who is most well-known for her research in visual inference and high-dimensional data visualisation and as an avid advocate of statistical software development, has funded the Di Cook Award for “Open-Source Statistical Software Award”.

This award was promoted in 2021 in collaboration with the Statistical Society of Australia’s Victorian Branch to acknowledge excellent open-source statistical software developed by students and recent graduates of Victorian and Tasmanian research institutes.

The award was first announced publicly on 22nd September 2021 at the SSA Vic event on that day. The award will be given out yearly. For more information about the award, please see: https://statsocaus.github.io/dicook-award/.

Outcomes and Impacts from End-User Use of Software

Downloads of open-source software developed and published by ACEMS members to date are counted in the tens of millions.

In addition to these significant outcomes, important benefits are being realised from the use of such software – including by end users in research, government, industry, and hackathons.

For example, the software package R2VR developed by ACEMS and CSIRO collaborators, has been used to provide immersive experiences in environments and to support research quantifying the aesthetics of key ecosystems including the Great Barrier Reef to inform reporting to UNESCO reporting purposes and, most recently, for Antarctica as part of the Virtual Antarctica project – a collaboration involving ACEMS PhD student Katie Buchhorn, and ACEMS AIs Julie Vercelloni and Aiden Price.

Slides from a presentation by ACEMS PhD student Katie Buchhorn on the Virtual Antarctica project.