class: center, middle, inverse, title-slide # Introduction to automating quality assurance ## Official statistics in DfE ### Statistics Development Team ### 2021/06/10 --- # What we'll cover ??? Remember to record it!! Introduce selves Follows on from introduction to RAP that we did around 6 months ago That's available on our guidance website, and I'd recommend watching that if you haven't already -- - What do we mean by 'automated QA' -- - Why we should use automation in our QA -- - How to get started -- - Examples of this in action -- - What support is available -- - Next steps -- - Time for questions --- layout: true # What is automated QA? --- -- Writing out our quality assurance processes into code so the computer does the legwork ??? We do QA already This is all about improving what we already do Using code and the power of computers to improve our current processes --- Two main types of QA that we have: <br> -- Pass/Fail tests ??? I.e. is a specific variable present? -- - Computers excel at this, and we should fully automate these checks in code ??? Data screener is a great example of this Incredibly efficient - around 70 checks that are ran in an instant (depending on file size) Thoroughly documented - everything is stored in code Reliable - runs the same checks time after time Accurate - no human error like we had in the early days when Tom or I were manually checking files and missing things Automation is really powerful here, but this is only half of the story --- Two main types of QA that we have: <br> Sense checking ??? I.e. are the year on year changes in the data feasible, or does this imply a quality issue we need to investigate -- - Will always require statistician expertise and judgment -- - We should use code to speed up getting the information we need -- - We should use code to create visualisations to aid us to do more with this --- It's a key part of Reproducible Analytical Pipelines (RAP): <img src = "images/hex-diagram.svg" /> ??? These are the RAP levels we're all working towards As David laid out in his email to DISD yesterday, we're aiming to meet all of good and great practice before 2022 production cycles Worth mentioning that this is best prepared before the production cycle starts, so we should be working towards these now --- It's a key part of Reproducible Analytical Pipelines (RAP): <img src = "images/hex-diagram-focussed.svg" /> ??? Red - RAP levels that this directly hits Purple - RAP levels that are indirectly worked towards by doing this Highlights how critical this is to RAP --- layout: true # Why automate our QA --- -- - Reliability and accuracy ??? Manual steps in production and quality assurance introduce human error – everybody makes mistakes -- - Efficiency ??? Computers can do a lot of the leg work much quicker than we can Allows us to rerun at the last minute -- - Reproducibility ??? Automating it documents it -- - Thoroughness ??? By automating our QA, we expand the possible checks that we can do on the data -- - Consistency ??? Consistent QA across all our publications Can make use of template code to save duplication of effort --- -- Priority in DISD ??? David's email around RAP ambitions - DISD Next week we're talking with statistics leaders across DfE to support this more widely -- Office for Statistics Regulation > We recommend that steps are taken within the Government Statistical Service and Analysis Function to promote and support increased use of RAP. We want to see RAP become the default approach to producing official statistics. ??? Have also said that they'll be including RAP when they assess the badging of publications -- --- Code of practice: -- - Q2: Sound methods > Producers of statistics and data should use the best available methods and recognised standards, and be open about their decisions. ??? RAP is our international framework for guiding us on best practice in data processing and analysis -- - Q3: Assured quality > Producers of statistics and data should explain clearly how they assure themselves that statistics and data are accurate, reliable, coherent and timely. ??? By automating our QA, we document our processes clearly and in reproducible manner. --- Be the best we can be: -- - Keep improving our processes > V4.5 Statistics producers should keep up to date with developments that might improve methods and quality. ... -- - Be leaders in government statistics, influence improvements across GSS > T4.4 Good business practices should be maintained in the use of resources. Where appropriate, statistics producers should take opportunities to share resources and collaborate to achieve common goals and produce coherent statistics. ??? Sounds like a stretch, but is genuinely possible It's already happening with OGD's copying the self-assessment app and RAP levels approach By using code for our processes we can then share examples and work more collaboratively, making use of Slack/GitHub/Cross gov meetups etc Part of a bigger thing, really exciting to be able to engage with and be a part of Enough of me talking - pass to Sarah --- layout: false # How to get started -- - Once you have R and Rstudio downloaded, you can begin automating QA. -- - We have some template code on [GitHub](https://github.com/dfe-analytical-services/automated-data-qa) that can be used to run some [suggested basic checks](https://rsconnect/rsc/stats-production-guidance/rap.html#Basic_automated_QA). -- - These cover basic checks like minimum/maximum/average values for indicators, counts for suppressed numbers, checks for duplicates -- - There are also some templates for scatter plots to look at comparing data by time period -- - You’ll want to include more specific tests to your data too, e.g. ??? Publication-specific tests will cover the QA aspects of "great" RAP practice -- + Do filter subtotals add up to totals? + Are percentages and averages calculated correctly? + How do indicators change when you apply various filters? --- # Applying templates ``` check_LA_region_totals <- function(indicator) { function_definition_here %>% filter(indicator) } ``` -- - Our templates are written as functions that you can apply to your tidy data ??? Functions simplify and shorten the amount of code you need, and can be applied across different data easily -- - The arguments the function takes are defined inside the function brackets **function()** -- - To run the code, you just need to tell the R what you want variable you want it to look at, e.g.: ``` check_LA_region_totals(number_on_roll) ``` --- # Examples of basic automated checks ??? Explaining how the code behind the functions work - you can use the code and pull it apart to adapt as you need --- ### Checking for extreme values ??? Manually, how do you check for extreme values? How do you know you've not missed any? Would be using filters in excel, selecting each filter combination and year group and checking the indicators look sensible Easy to miss an outlier! This code can be used against all your indicators and runs off a table you can use to quickly identify outliers. -- ``` geog_level <- "Regional" indicator <- "Pupils" time_compare <- "201819" current_time <- "201920" data %>% #Set geographic level here if you want to change to region/LA filter(geographic_level == geog_level) %>% group_by(time_period) %>% mutate(!!indicator := as.numeric(get(indicator))) %>% select(all_of(filters),indicator) %>% spread(time_period,indicator) %>% mutate(thresh_indicator_big = get(time_compare) * (1+thresh), thresh_indicator_small = get(time_compare) * (1-thresh), flag_big = get(current_time) >= thresh_indicator_big, flag_small = get(current_time) <= thresh_indicator_small) ``` -- [Example in practice from SEN2](examples/QA_2020_output.html#Caseload20) --- ### Geography breakdowns add to totals ??? Manually, how do you check for relevant indicators that LAs add to regions, regions add to national etc? Probably involves creating multiple pivot tables and writing formulas to check that totals match across the different pivots. Very time consuming for multiple indicator/filter combinations! Automating it in code can loop through your chosen indicators and all filters at once, identifying any totals that don't match. -- ``` indicator <- "Pupils" publication_filters <- c("Gender","Ethnicity") data_region <- data %>% filter(geographic_level == "Regional") %>% select(time_period,region_name,region_code, all_of(publication_filters),indicator) %>% mutate(!!indicator := as.numeric(get(indicator))) %>% arrange(time_period,region_code,get(publication_filters)) data_la_aggregate <- data %>% filter(geographic_level == "Local authority") %>% select(time_period,region_name,region_code, all_of(publication_filters),indicator) %>% group_by(across(-c(indicator))) %>% summarise(!!indicator := sum(as.numeric(get(indicator), na.rm = TRUE))) %>% arrange(time_period,region_code,get(publication_filters)) setdiff(data_region,data_la_aggregate) ``` -- [Example in practice from SEN2](examples/QA_csvs_2020_output.html#LA_vs_region_tests) --- ### Sense-check year-on-year trends ??? Manually, how do you look at your tidydata file and pick out the stories? Really hard to do when you have loads of filters and indicators Would involve creating separate workbooks to create charts/graphs to pull out key stories, which itself will require QA Now we can automate creating of these charts, code only needs to be written once and can then be applied across the board. -- ``` indicator <- "average_spend" filter_definition <- "gender == 'Total' & school_type == 'State funded'" time_x <- "2021" time_y <- "2020" data_prep_example <- data %>% filter(geographic_level == "Local authority") %>% select(time_period,region_name,la_name, all_of(publication_filters),indicator) %>% mutate(!!indicator := as.numeric(get(indicator))) %>% spread(time_period,indicator) %>% filter(eval(parse(text=filter_definition))) plot_example <- data_prep_example %>% plot_ly(type = 'scatter', x = ~ get(time_x), y = ~ get(time_y), color = ~region_name, text = ~paste("Indicator:", indicator , "<br>LA: ", la_name, '<br>', time_x, ":", get(time_x),'<br>', time_y, ":", get(time_y)), hoverinfo = 'text', mode = 'markers') %>% layout(xaxis = list(title = time_x), yaxis = list(title = time_y)) ``` -- [Example in practice from CiN/CLA](examples/QA-report.html#Number_of_pupils_by_LA,_this_year_compared_to_last_year) --- # Example of publication-specific checks Once you have adapted the template code for your QA, you can build on it to include checks that are specific to your own publication ??? e.g. KS4 run checks to make sure their attainment 8 values aren't higher than 90, the highest possible score CIN/CLA have got additional checks to check that expected discrepancies in certain filter totals are in line with what they have been in previous years -- [Example in practice from CiN/CLA](examples/QA-report.html#CIN_totals_are_sensible) --- layout: true # Example of end-to-end automated checks - SEN2 --- ### Out with the old... .pull-left[ - How do you answer the question **“am I happy that all my checks pass?”** - If your QA is **not** automated, this will involve running a check, reviewing the result and amending if needed ] .pull-right[ ![Flowchart of manual QA](images/Old QA SEN2.jpg) ] ??? Example here with SEN2 release - previously had to manually run the same check for age group across all indicators in three different files --- layout: false ### In with the new! .center[![Flowchart of automated QA](images/NEW QA SEN2.jpg)] -- - How do you answer the question **“am I happy that all my checks pass?”** once you've automated QA? -- - With automated QA, you can write generalisable code and functions, allowing you to check multiple things at once and review the outcomes of checks in one go ??? You can quickly see if any checks fail, make the relevant amendments and run again. SEN2 example - only need to write the check once, then apply it across everything putting the resource into doing it right first will pay off in years to come --- layout: true # Example of end-to-end automated checks - SEN2 --- - Automating QA as a part of a larger piece of work to automate the end-to-end pipeline for their release -- - Quality assurance checks at multiple steps of the pipeline: data entry, calculation of indicators, csv outputs -- - First team to reach “best practice” in RAP so far -- - Updates to the “variable_change” file have made it easy to transition across for this year -- ###[Demo of SEN2 QA report](examples/QA_2020_output.html) --- layout: false # Template code repository The template code sits in a repository in [GitHub](https://github.com/dfe-analytical-services/automated-data-qa) that can be used to run the [suggested basic checks for good RAP practice](https://rsconnect/rsc/stats-production-guidance/rap.html#Basic_automated_QA). ??? Idea is code is generic and transferable enough to adapt to any publication on EES They can then look at building on the code to get more publication-specific checks in, working towards great practice Elements of best practice involve looping this all together in the end-to-end file production process, generating shareable reports -- - You can use this code and adapt it to embed in their own QA -- - You can then look at building on the code to add more publication-specific checks to add value to your QA --- # Data screener app Our [development version of the data screener app](https://rsconnect-pp/rsc/dfe-published-data-qa/) now has additional functionality that allow you to perform some basic QA checks in the app -- These checks will give you an idea of what is in your file before you upload it to EES, but teams are still expected to carry out their own QA before this stage ??? the QA app checks act as a final check before uploading to the platform remember to demo the changes! Then pass to cam --- layout: false # QA-ing your QA code ??? As code gets more complex, the question of trust in the code comes up This is more advanced, but hopefully gives a sense of what is possible in the direction we're headed -- Get your code peer reviewed ??? Part of great and best practice RAP levels -- Version control is key - use Git for tracking all updates and changes to code ??? Part of our best practice RAP levels I know some of you are already doing this, which is fantastic -- Write 'unit tests' on your QA code using the [testthat](https://testthat.r-lib.org/) package ??? Definitely more advanced, and in terms of RAP, this is the next level above our current best practice -- - Recent [coffee and coding session](https://web.microsoftstream.com/video/6806b4ba-7b13-4c3c-96f1-9b91c0f1d85b?referrer=https:%2F%2Feducationgovuk.sharepoint.com%2F) on testing code -- - The tests will use dummy data and check that the code still works as intended -- - Automatically run these tests after any changes to the code ??? Let us know if you have questions or want to learn more about this Thought I'd mention it to raise awareness of what is possible and where we're moving towards --- # Support -- RAP page of the guidance website - https://rsconnect/rsc/stats-production-guidance/rap.html#Basic_automated_QA -- Template code repository - https://github.com/dfe-analytical-services/automated-data-qa/tree/main/R -- Expansion of data-screener app - https://rsconnect-pp/rsc/dfe-published-data-qa/ ??? This is the pre-production version, where you can test the next features for the next couple of weeks before we fully deploy them -- Our team - [statistics.development@education.gov.uk](mailto:statistics.development@education.gov.uk) --- # Partnership programme Statistics Development Team (us) working alongside your team to agree goals and deliver over a set preiod of time. E.g. -- - Help up-skill your team in using SQL and R -- - Help to transition certain processes into R -- - Work your way up the [RAP levels](https://rsconnect/rsc/publication-self-assessment/) -- - Developing a dashboard to sit alongside EES publications -- - Getting start with automating your QA -- Anything related to statistics publications - we're a flexible resource to make use of! --- layout: true # Next steps --- -- Read the guidance and get started -- Test out the extra features we're adding to the screener ??? There's a separate link that Sarah showed earlier, will be in the guidance for testing this -- Make use of the template code ??? Make sure you're covering the checks we have code for Use that as a base to expand from, develop your own checks above the basic ones to meet the great practice level --- Continue work towards goal of good and great practice RAP: ??? Those in DISD will have seen David's email yesterday, this is a key aim for our division over the next few months -- - Good practice = doing the basic checks we outline in the guidance -- - Great practice = building on these to automate publication specific checks -- - Best practice = integrating with main data processing so that the QA reports are made as the files are created ??? All happens from a single click --- For us as a support team: -- - Helping with lots of questions and partnership programmes -- - Developing the guidance further ??? Please share any examples you have of what you're doing and we can add that in for other teams to learn from -- - Expanding and improving template code -- - Expanding and improving the extra features in the screener ??? This includes further work around harmonisation and automating checks for that -- - Packaging up the screener app and template code ??? Moving to a fully automated model End to end R pipelines The screener and template code will be available as R functions from a package we'll maintain This is the direction we're aiming for, though don't worry if it sounds scary It's still a few years away and we're taking it step by step The pipeline will write the files direct to the EES admin databases -- - Running further sessions ??? Let us know which parts of RAP you want to know more about -- - Self-assessment app ??? Targeting meeting all of good and great practice before 2022 publication cycles Make use of the app to communicate L+D needs Use it to guide you for where to focus in terms of RAP We will also be tracking progress to identify areas of RAP that need more support --- layout: false # Final links If you have any questions or are interested in our partnership programme, please get in touch: [statistics.development@education.gov.uk](mailto:statistics.development@education.gov.uk) -- Today's slides https://sarahmwong.github.io/intro-to-automating-QA/ ??? You'll be able to find these, and the recording of the session on the guidance website by the end of today -- Made in R using the [xaringan](https://bookdown.org/yihui/rmarkdown/xaringan.html) package -- Guidance website: https://rsconnect/rsc/stats-production-guidance/rap.html#Basic_automated_QA ??? Includes slides and recording of the introduction to RAP session Also links to everything that we've mentioned and shown today -- Any questions?