Reflections on the ONS-LOTI Data Science Bootcamp


Over the past 12 weeks data scientists from LOTI boroughs have been working with expert mentors from the Office for National Statistics’ (ONS) Data Science Campus to develop their data matching skills.

This blog sets out why we ran the programme, shares some of the insights we gained and includes some reflections from participants.

Why a Bootcamp?


The LOTI-ONS Data Science Bootcamp has been a collaboration aimed at developing data science skills across London boroughs. It builds on the existing ONS-delivered Data Science Accelerator model which focuses on developing skills of individuals across the public sector.

The Bootcamp has taken a specific skill set, in this case data matching, as its organising concept and developed a programme of work and resources designed for multiple public sector organisations to learn collaboratively in clusters. The aim has been to develop the capability and skills of officers and provide practical demonstrations of the value of data science within London local government.

Data Matching


Data matching is a foundational skill set in data science, enabling data to be brought together for descriptive, diagnostic and predictive analytics. It’s one of the most important tasks for data scientists and analysts in London boroughs who are required to bring together data from multiple different sources and systems to produce insights. During the pandemic, boroughs were required to match data from Public Health England (concerning individuals who were shielding) with council records in order to make contact and offer the required support. Data matching also played a significant role in the efficiency of borough contact tracing services. 

The Bootcamp was designed to develop data matching skills and techniques to tackle some of these common challenges:

  • Matching addresses across multiple data sets
  • Getting a single view of the resident
  • Understanding the makeup of households
  • Providing a consistent user experience across services and between boroughs
  • Identifying resident vulnerabilities
  • Predicting service demand

The 12 week course was split into an 8 week beginner / intermediate module and a 4 week advanced module covering the following skills and more:

Beginner:

  • Data / String manipulation + cleaning
  • Data Linkage in Python and R
  • Good Practice in Data Linkage
  • Address Matching

Intermediate:

  • Deterministic matching algorithms
  • Probabilistic matching algorithms
  • Regex

Advanced:

  • Machine Learning approaches to address matching

15 analysts from 9 boroughs and the GLA participated in the course led by an amazing group of mentors: Clara Standring, Filippo Cavallari, Tom O’Connor, Tom White and designed by Jonathon Mellor. 

Reflections from the Hounslow Data Team


Anna Trichkine and Ejaz Hussain from Hounslow’s data team both participated in the course.

Ejaz took the opportunity to learn R and the R libraries for the first time to supplement his existing knowledge of Python. He worked on a mini project that used ONS-provided data to explore the gender pay gap. He also applied skills learnt on the course to analyse council held data sets looking at overspend. 

Anna (see video below) worked on building a tool to find and remove personal data from Hounslow’s Open Data Portal. This required web scraping the portal, extracting data sets and running a Named Entity Recognition analysis to draw out areas of the data likely to include personal information. The mentors supported Anna in finding the right python tools for this process. Anna and Clara reflected that in learning data science the most important thing can often be just learning that a tool or approach exists and works in your context. The code that Anna has created can now be used to automate the data extraction and categorisation and can be used by others on their own open data portals.

Learning Data Science Skills: Reflections on the course


While there are formal accreditations available, many people learn data science skills through online resources and the large and active online communities sharing knowledge through Slack, Github and Stack Overflow and many other platforms. What can be challenging for those building their skills independently is finding context-specific support and advice. 

The Bootcamp has offered direct teaching, one-to-one support and guidance on how to navigate course resources. The mentors also reflected that one of the best aspects of the course has been the peer-to-peer support offered between participants. While the mentors could provide the theory and help with the code, fellow participants could help each other with applying this in the London local government context. While data science capabilities in each London borough remain relatively small, this cross-borough insight and support is vital for building skills across the city. Continuing this skills development through the LOTI Data Science Network will be a priority for LOTI.

Train the Trainer Model


There was more demand than available places on the course. As a result, ONS designed the course with the intention of enabling participants to cascade knowledge to their colleagues. ONS has provided indefinite licences to the online course materials to support this process. The demand for places and general interest in the course demonstrates a need to build more capacity in training and skills development in London local government. There is a collaborative opportunity to develop skills training and resources for use across London. LOTI will be exploring this in more detail as part of our Data Skills and Capacity work that has so far identified a shared Data Academy as a potential vehicle for cross borough data skills development.

Always more to learn


At the September LOTI Data Science meetup, we discussed skills and subject areas where London Data Scientists are interested in developing their capabilities. In the discussion, air quality, time series analysis, writing good quality reusable code, and data engineering and pipeline skills were all mentioned. We would like to hear from you if you have a good suggestion for a data science skill or topic that’s relevant across London.

Adapting for the future


Gosia Lachowska (GLA) shared helpful feedback on the structure and delivery of the course that will help shape future iterations and is useful to consider across any capability building programme. She suggested that instead of the multiple ‘mini projects’ that each participant completed, a longer project could be identified in the first 2 weeks of the course, which could then be completed over the following 6 weeks. This would allow more time to learn, apply and test more complicated techniques. It would also mimic the format of projects that public sector data scientists work on as part of their day to day jobs.

Summary Reflections

  • Enable data scientists across London boroughs to support each other.
  • Shared understanding of the London Local Gov context is invaluable! This works best when there is also expert mentoring involved.
  • There is demand for more training and support for data scientists across multiple skills sets and topics.
  • With London data science skills dispersed, collaborative solutions to developing skills are likely to be the most effective model.
  • Adapt future training and skills development to reflect project based working

Next Steps


The ONS will be evaluating the success of the course through surveys now and in 6 months’ time with the aim of understanding the impact of skill development on

individuals’ and data teams’ capabilities. LOTI will work with the ONS Data Science Campus to explore options for running a further course in London building on the success of this initial pilot. To do this, we need to continue to identify topics suitable for multi borough collaborative learning. Please do share your thoughts on suitable and relevant topics and skill sets!

For Data Scientists: join us for the next data science meet up on the 27th October. Hear from participants that have taken part in the advanced course sharing their work and experience of the course.

Final Word


A big thank you to all the participants that took part. Your energy, enthusiasm and willingness to support each other and share your knowledge has really made this course a success!

Thank you to the incredible mentors who worked so hard to design and deliver this course. The brilliant feedback we have received from participants is due to your input and the insights and skills shared will have a big impact on data science capability in London!

List of all participants involved

Name Organisation
Chiadi Lionel Camden
Luke Ballance Camden
Malgorzata Lachowska Greater London Authority
Yiran Wei Greater London Authority
Toby Meller Kingston upon Thames
Huu Do Hackney
Lindsey Coulson Hackney
Lee Latchford Havering
Anna Trichkine Hounslow
Ejaz Hussain Hounslow
Ian Hanson Kensington and Chelsea
Sean Pedrick-Case Lambeth 
Karen Kemsley Lewisham
Emmanuel Steadman Tower Hamlets
David Saxton Tower Hamlets

 

by Jay Saggar
22 October 2021 ·