Research software and services
In 2021, ACEMS members were active in:
Further, significant outcomes and benefits have been realised from ACEMS software outputs, including for Partner Organisations like CSIRO via their RiskLab and AIMS, as well as other end-users in the research community and industry such as the Federal Reserve.
ACEMS members published and made significant updates to software packages published for open access on a range of platforms, including CRAN, GitHub and MapleCloud. Some further details including download metrics are given in Table 1 below. Note that this list is not exhaustive with respect to either R or other open-source packages created by ACEMS members. For example, R code and packages are often outputs of collaborative research projects, such as this free software program "Predicting seagrass decline due to cumulative stressors", and may be shared in other repositories such as GitHub.
Table 1: Software Packages Created/Updated in 2021 | |||
---|---|---|---|
Package | Maintainer | Description | 2021 Downloads; 2021 Monthly Average* |
forecast | Rob Hyndman | Methods and tools for displaying and analysing univariate time series forecasts including exponential smoothing via state space models and automatic ARIMA modelling. | 3368082; 280674 |
DescTools | Andri Signorell | A collection of miscellaneous basic statistic functions and convenience wrappers for efficiently describing data. The author's intention was to create a toolbox, which facilitates the (notoriously time consuming) first descriptive tasks in data analysis, consisting of calculating descriptive statistics, drawing graphical summaries and reporting the results. The package contains furthermore functions to produce documents using MS Word (or PowerPoint) and functions to import data from Excel. Many of the included functions can be found scattered in other packages and other sources written partly by Titans of R. The reason for collecting them here, was primarily to have them consolidated in ONE instead of dozens of packages (which themselves might depend on other packages which are not needed at all), and to provide a common and consistent interface as far as function and arguments naming, NA handling, recycling rules etc. are concerned. Google style guides were used as naming rules (in absence of convincing alternatives). The 'BigCamelCase' style was consequently applied to functions borrowed from contributed R packages as well. | 694032; 57836 |
hdrcde | Rob Hyndman | Computation of highest density regions in one and two dimensions, kernel estimation of univariate density functions conditional on one covariate, and multimodal regression. | 216906; 18076 |
hts | Earo Wang | Provides methods for analysing and forecasting hierarchical and grouped time series. The available forecast methods include bottom-up, top-down, optimal combination reconciliation (Hyndman et al. 2011) <doi:10.1016/j.csda.2011.03.006>, and trace minimization reconciliation Wickramasuriya et al. 2018) <doi:10.1080/01621459.2018.1448825>. | 215430; 17952 |
tsibble | Earo Wang | Provides a 'tbl_ts' class (the 'tsibble') for temporal data in an data- and model-oriented format. The 'tsibble' provides tools to easily manipulate and analyse temporal data, such as filling in time gaps and aggregating over calendar periods. | 204754; 17063 |
fabletools | Mitchell O’Hara-Wild | Provides tools, helpers and data structures for developing models and time series functions for 'fable' and extension packages. These tools support a consistent and tidy interface for time series modelling and analysis. | 124361; 10363 |
feasts | Mitchell O’Hara-Wild | Provides a collection of features, decomposition methods, statistical summaries and graphics functions for the analysing tidy time series data. The package name 'feasts' is an acronym comprising of its key features: Feature Extraction And Statistics for Time Series. | 111506; 9292 |
fable | Mitchell O’Hara-Wild | Provides a collection of commonly used univariate and multivariate time series forecasting models including automatically selected exponential smoothing (ETS) and autoregressive integrated moving average (ARIMA) models. These models work within the 'fable' framework provided by the 'fabletools' package, which provides the tools to evaluate, visualise, and combine models in a workflow consistent with the tidyverse. | 107671; 8973 |
fpp3 | Rob Hyndman | All data sets required for the examples and exercises in the book "Forecasting: principles and practice" by Rob J Hyndman and George Athanasopoulos <https://OTexts.com/fpp3/>. All packages required to run the examples are also loaded. | 60888; 5074 |
tsibbledata | Mitchell O’Hara-Wild | Provides diverse datasets in the 'tsibble' data structure. These datasets are useful for learning and demonstrating how tidy temporal data can tidied, visualised, and forecasted. | 56156; 4680 |
vitae | Mitchell O’Hara-Wild | Provides templates and functions to simplify the production and maintenance of curriculum vitae. | 20847; 1737 |
bfast | Dainius Masiliunas | Decomposition of time series into trend, seasonal, and remainder components with methods for detecting and characterizing abrupt changes within the trend and seasonal components. 'BFAST' can be used to analyze different types of satellite image time series and can be applied to other disciplines dealing with seasonal or non-seasonal time series, such as hydrology, climatology, and econometrics. The algorithm can be extended to label detected changes with information on the parameters of the fitted piecewise linear models. 'BFAST' monitoring functionality is described in Verbesselt et al. (2010) <doi:10.1016/j.rse.2009.08.014>. 'BFAST monitor' provides functionality to detect disturbance in near real-time based on 'BFAST'- type models, and is described in Verbesselt et al. (2012) <doi:10.1016/j.rse.2012.02.022>. 'BFAST Lite' approach is a flexible approach that handles missing data without interpolation, and will be described in an upcoming paper. Furthermore, different models can now be used to fit the time series data and detect structural changes (breaks). | 19278; 1606 |
eechidna | Jeremy Forbes | Data from the seven Australian Federal Elections (House of Representatives) between 2001 and 2019, and from the four Australian Censuses over the same period. Includes tools for visualizing and analysing the data, as well as imputing Census data for years in which a Census does not occur. This package incorporates data that is copyright Commonwealth of Australia (Australian Electoral Commission and Australian Bureau of Statistics) 2019. | 6901; 575 |
bayesforecast | Asael Alonzo Matamoros | Fit Bayesian time series models using 'Stan' for full Bayesian inference. A wide range of distributions and models are supported, allowing users to fit Seasonal ARIMA, ARIMAX, Dynamic Harmonic Regression, GARCH, t-student innovation GARCH models, asymmetric GARCH, Random Walks, stochastic volatility models for univariate time series. Prior specifications are flexible and explicitly encourage users to apply prior distributions that actually reflect their beliefs. Model fit can easily be assessed and compared with typical visualization methods, information criteria such as loglik, AIC, BIC WAIC, Bayes factor and leave-one-out cross-validation methods. References: Hyndman (2017) <doi:10.18637/jss.v027.i03>; Carpenter et al. (2017) <doi:10.18637/jss.v076.i01>. | 6868; 572 |
seer | Thiyanga Talagala | A novel meta-learning framework for forecast model selection using time series features. Many applications require a large number of time series to be forecast. Providing better forecasts for these time series is important in decision and policy making. We propose a classification framework which selects forecast models based on features calculated from the time series. We call this framework FFORMS (Feature-based FORecast Model Selection). FFORMS builds a mapping that relates the features of time series to the best forecast model using a random forest. 'seer' package is the implementation of the FFORMS algorithm. For more details see our paper at <https://www.monash.edu/business/econometrics-and-business-statistics/research/publications/ebs/wp06-2018.pdf>. | 6734; 561 |
nortsTest | Asael Alonzo Matamoros | Despite that several tests for normality in stationary processes have been proposed in the literature, consistent implementations of these tests in programming languages are limited. Four normality test are implemented. The Lobato and Velasco's, Epps, Psaradakis and Vavra, and the random projections tests for stationary process. Some other diagnostics such as, unit root test for stationarity, seasonal tests for seasonality, and arch effect test for volatility; are also performed. The package also offers residual diagnostic for linear time series models developed in several packages. | 6357; 530 |
HRW | Matt Wand | The book "Semiparametric Regression with R" by J. Harezlak, D. Ruppert & M.P. Wand (2018, Springer; ISBN: 978-1-4939-8851-8) makes use of datasets and scripts to explain semiparametric regression concepts. Each of the book's scripts are contained in this package as well as datasets that are not within other R packages. Functions that aid semiparametric regression analysis are also included. | 6339; 528 |
lookout | Sevvandi Kandanaarachchi | Outlier detection using leave-one-out kernel density estimates and extreme value theory. The bandwidth for kernel density estimates is computed using persistent homology, a technique in topological data analysis. Using peak-over-threshold method, a generalized Pareto distribution is fitted to the log of leave-one-out kde values to identify outliers. | 3996; 333 |
spotoroo | Weihao Li | An algorithm to cluster satellite hot spot data spatially and temporally. | 3527; 294 |
Rsfar | Hossein Haghbin | This is a collection of functions designed for simulating, estimating and forecasting seasonal functional autoregressive time series of order one. These methods are addressed in the manuscript: <https://www.monash.edu/business/ebs/research/publications/ebs/wp16-2019.pdf>. | 3522; 294 |
ferrn | Huize Sherry Zhang | Diagnostic plots for optimisation, with a focus on projection pursuit. These show paths the optimiser takes in the high-dimensional space in multiple ways: by reducing the dimension using principal component analysis, and also using the tour to show the path on the high-dimensional space. Several botanical colour palettes are included, reflecting the name of the package. | 3509; 292 |
nestr | Emi Tanaka | Facilitates building a nesting or hierarchical structure as a list or data frame by using a human friendly syntax. | 3385; 282 |
flipbookr | Evangeline Reynolds | Flipbooks present code step-by-step and side-by-side with its output. 'flipbookr' helps creators build flipbooks efficiently because code pipelines are automatically parsed and prepped for presentation as flipbooks. | 3339; 278 |
liminal | Stuart Lee | Compose interactive visualisations designed for exploratory high-dimensional data analysis. With 'liminal' you can create linked interactive graphics to diagnose the quality of a dimension reduction technique and explore the global structure of a dataset with a tour. A complete description of the method is discussed in ['Lee' & 'Laa' & 'Cook' (2020) <arXiv:2012.06077>]. | 2863; 239 |
outlierensembles | Sevvandi Kandanaarachchi | Ensemble functions for outlier/anomaly detection. There is a new ensemble method proposed using Item Response Theory. Existing outlier ensemble methods from Schubert et al (2012) <doi:10.1137/1.9781611972825.90>, Chiang et al (2017) <doi:10.1016/j.jal.2016.12.002> and Aggarwal and Sathe (2015) <doi:10.1145/2830544.2830549> are also included. | 2691; 224 |
cropgrowdays | Peter Baker | Calculate agrometeorological variables for crops including growing degree days (McMaster, GS & Wilhelm, WW (1997) <doi:10.1016/S0168-1923(97)00027-0>), cumulative rainfall, number of stress days and cumulative or mean radiation and evaporation. Convert dates to day of year and vice versa. Also, download curated and interpolated Australian weather data from the Queensland Government DES longpaddock website <https://www.longpaddock.qld.gov.au/>. This data is freely available under the Creative Commons 4.0 licence. | 2656; 221 |
learningtower | Kevin Wang | The Programme for International Student Assessment (PISA) is a global study conducted by the Organization for Economic Cooperation and Development (OECD) in member and non-member countries to assess educational systems by assessing 15-year-old school students academic performance in mathematics, science, and reading. This datasets contains information on their scores and other socioeconomic characteristics, information about their school and its infrastructure, as well as the countries that are taking part in the program. | 1801; 150 |
adbb | Rob Hyndman | An R package containing state-level demographic data for Australia. To be used in conjunction with the demography package. | - |
ccpdmp | Matt Sutton | PDMP-based samplers move a particle with a state and velocity using deterministic dynamics. At random event times the velocity is updated and the particle continues. The trajectories returned by the sampler define a Markov process sampling the distribution of interest. For more information see Joris Bierken’s Overview, the great Julia package ZigZagBoomerang and our Arxiv preprint: https://arxiv.org/abs/2112.12897. This package implements the concave-convex PDMP algorithm to facilitate sampling from distributions of interest (particularly Bayesian distributions). | - |
conmat | Nicholas Tierney | The goal of conmat is to provide methods for producing contact matrices. | - |
cricketdata | Rob Hyndman | Functions for downloading data on international and other major cricket matches from ESPNCricinfo and Cricsheet. This package provides some functions to download the data into tibbles ready for analysis. | - |
fasster | Mitchell O'Hara-Wild | An implementation of the FASSTER (Forecasting with Additive Switching of Seasonality, Trend and Exogenous Regressors) model in R. This model is designed to capture patterns of multiple seasonality in a state space framework by using state switching. The fasster package prioritizes flexibility, computational speed and accuracy to provide convenient tools for modelling, predicting and understanding high frequency time-series. | - |
Identification-two-phase-recovery | David Warne | This repository contains useful R functions and scripts to process coral cover data obtained from reef surveys to classify patterns in recovery trajectories following major disturbance events. | - |
Maple 2020 procedures and a dashboard for interactive testing of uncontrolled linear-time-invariant structures for structural global identifiability | Jason Whyte | This worksheet relates to the testing of a particular type of "model structure" (collection of related models) for the property of structural global identifiability (SGI). The absence of this property makes it very unlikely that we can obtain a unique estimate of the structure's parameters when calibrating the structure to data. This may mean that we cannot use our structure to make predictions, and the time, effort, and resources used in modelling and data collection may be unproductive. However, we may be able to anticipate this problem in advance of data collection. The content below relates to my book chapter "Branching out into Structural Identifiability Analysis with Maple: Interactive Exploration of Uncontrolled Linear Time-Invariant Structures", to appear in "Maple in Mathematics Education and Research" (the Proceedings of the Maple 2020 Conference, Editors Rob Corless, Jürgen Gerhard, and Ilias Kotsireas), a book in Springer's series "Communications in Computer and Information Science". (The pre-print is available here https://arxiv.org/abs/2103.11309) | - |
ozbabynames | Rob Hyndman | The ozbabynames package provides the dataset ozbabynames. This contains popular Australian baby names by sex, state and year. | - |
PETabc | Clara Grazian | This package performs simulation and Bayesian analysis of kinetic models, in particular of single-tissue and two-tissue compartment models. The Bayesian analysis is performed through rejection ABC, with summary statistics obtained through smooth splines. | - |
tidytreatment | Joshua Bon | Functions for extracting tidy data from Bayesian treatment effect models, in particular BART, but extensions are possible. Functionality includes extracting tidy posterior summaries as in 'tidybayes' <https://github.com/mjskay/tidybayes>, estimating (average) treatment effects, common support calculations, and plotting useful summaries of these. | - |
yahtsee | Nicholas Tierney | The goal of yahtsee is to provide tools around fitting hierarchical time series models for data commonly found in malaria. | - |
ACEMS members maintain a wide variety of R Packages which remain popular amongst CRAN’s diversity of international users. Since 2016 ACEMS has published and maintained over 100 open-source packages on the CRAN platform. These open-source software packages on CRAN have been downloaded more than 73.6 million times since 2016 by students, researchers, and other end-users. A selection of some of the most downloaded packages are shown in Table 2 below.
Table 2: Popular ACEMS Software Packages Published to CRAN (>200,000 downloads) | |||
---|---|---|---|
Package | Maintainer | Current Version | Downloads (since 2016) |
rmarkdown | Yihui Xie | 2.11 | 32717202 |
forecast | Rob Hyndman | 8.15 | 11662976 |
fracdiff | Martin Maechler | 1.5-1 | 7846588 |
GGally | Barret Schloerke | 2.1.2 | 3545227 |
KernSmooth | Brian Ripley | 2.23-20 | 2158380 |
DescTools | Andri Signorell | 0.99.44 | 2136250 |
expsmooth | Rob Hyndman | 2.3 | 1523874 |
fma | Rob Hyndman | 2.4 | 1063133 |
imputeTS | Steffen Moritz | 3.2 | 971113 |
fpp | Rob Hyndman | 0.5 | 782840 |
naniar | Nicholas Tierney | 0.6.1 | 744289 |
visdat | Nicholas Tierney | 0.5.3 | 713627 |
rticles | Christophe Dervieux | 0.22 | 553193 |
hts | Earo Wang | 6.0.2 | 526154 |
tsibble | Earo Wang | 1.1.1 | 506591 |
fpp2 | Rob Hyndman | 2.4 | 488353 |
distributional | Mitchell O'Hara-Wild | 0.2.2 | 423910 |
hdrcde | Rob Hyndman | 3.4 | 383319 |
rainbow | Han Lin Shang | 3.6 | 340607 |
tsfeatures | Rob Hyndman | 1.0.2 | 328420 |
thief | Rob Hyndman | 0.3 | 313749 |
fds | Han Lin Shang | 1.8 | 287994 |
fabletools | Mitchell O'Hara-Wild | 0.3.2 | 282413 |
fable | Mitchell O'Hara-Wild | 0.3.1 | 239963 |
feasts | Mitchell O'Hara-Wild | 0.2.2 | 231037 |
xaringan | Yihui Xie | 0.22 | 208020 |
The Australian Research Data Commons (ARDC) is developing a national research software agenda to establish research software as a first class research output.
Developing and implementing a national research software agenda is a strategic activity for the ARDC’s Platforms and Software theme. Over 2020-2023, this initiative will enable the ARDC to establish the agenda for recognition of research software as a first-class research output, alongside journal papers and datasets, and for recognising the importance of software as a research enabler.”
ACEMS and its members engaged with the ARDC and collaborated in co-hosting several events to promote and advocate software as a first-class research output. This suite of events held in the latter part of 2021 included:
ACEMS members were also featured in or organised multiple public lectures, podcasts, invited seminars and presentations, and workshops for open-source software end-users, related to research software packages. A selection of these events and presentations from 2021 are in Table 3 below.
Table 3: Selected presentations relating to open-source research software in 2021 | |||
---|---|---|---|
Forum | Presentation | Speaker(s) | |
ACEMS/Australian Data Science Network - Focused Workshop: Synthetic Data | GRATIS: GeneRAting TIme Series with diverse and controllable characteristics | Rob Hyndman | |
ACEMS Podcast – The Random Sample | Episode 66: Software as a First-Class Research Output | Tom Honeyman (ARDC), Rob Hyndman, and Nick Tierney | |
ACEMS Public Lecture | Statistical Methodology Development and Software Dissemination | Matt Wand | |
ACEMS Public Lecture | Uncertain futures: what can we forecast and when should we give up? | Rob Hyndman | |
ACEMS Technical Talk | Greta: Simple and Scalable Statistical Modelling in R | Nick Tierney | |
E.A. Cornish Lecture | Feasts and fables: modern tools for time series analysis | Rob Hyndman | |
IASSL Workshop | Data Analysis with R Programming for Beginners | Priyanga Dilini Talagala and Thiyanga Talagala | |
ICSMTR 2021 | Spatial Modelling and Visualization with R | Susanna Cramb | |
Malaysia R 2021 | The Role of R and Data Visualisation in Understanding Your World | Di Cook | |
Monash MDFI Meetup | The right to access, open data, open software, diagnostics and statistics as integral components of AI | Di Cook | |
QUT Centre for Data Science Seminar | Use of Open-Source Tools* for the Australian Environmental Health Atlas (AusEnHealth) [*including visualisation of aggregated spatial data using R, leaflet, and RShiny] | Aiden Price | |
R Ladies Workshop | Working smarter, faster and more securely in R | Katie Buchhorn | |
RStudio Blog + RStudio YouTube Channel | An inclusive solution for teaching and learning R during the COVID pandemic | Patricia Menendez | |
Shiny Dev Podcast @theRcast | Episode 28: The transformative effects of Shiny with the coronaSweeper app | Kate Saunders | |
Statistical Society of Australia Workshop | An introduction to Bayesian modelling using greta | Nick Golding (Telethon Kids) and Nick Tierney | |
Statistical Society of Australia Workshop | Data Visualisation with R | Di Cook and Emi Tanaka | |
United Nations Training Program on Earth Observations for Agricultural Statistics | Virtual Presentation to UN Global Working Group on Big Data, US - Training in the Use of Big Data and Data Science | Amy Stringer | |
Why R? 2021 Conference | Feasts & fables: modern tools for time series analysis | Rob Hyndman |
In addition to these events, many ACEMS members support the development, profile, and adoption of research software by publishing articles on online blogs, informal mentoring sessions and briefings with industry stakeholders. Some examples include:
ACEMS AI Di Cook, who is most well-known for her research in visual inference and high-dimensional data visualisation and as an avid advocate of statistical software development, has funded the Di Cook Award for “Open-Source Statistical Software Award”.
This award was promoted in 2021 in collaboration with the Statistical Society of Australia’s Victorian Branch to acknowledge excellent open-source statistical software developed by students and recent graduates of Victorian and Tasmanian research institutes.
The award was first announced publicly on 22nd September 2021 at the SSA Vic event on that day. The award will be given out yearly. For more information about the award, please see: https://statsocaus.github.io/dicook-award/.
Downloads of open-source software developed and published by ACEMS members to date are counted in the tens of millions.
In addition to these significant outcomes, important benefits are being realised from the use of such software – including by end users in research, government, industry, and hackathons.
For example, the software package R2VR developed by ACEMS and CSIRO collaborators, has been used to provide immersive experiences in environments and to support research quantifying the aesthetics of key ecosystems including the Great Barrier Reef to inform reporting to UNESCO reporting purposes and, most recently, for Antarctica as part of the Virtual Antarctica project – a collaboration involving ACEMS PhD student Katie Buchhorn, and ACEMS AIs Julie Vercelloni and Aiden Price.