A series of technical and non-technical courses to prepare you for continual success in a digital, data driven world

In February 2022 Novuna set out its Data Strategy, and in doing so committed to drive improvement in data practice across four foundational themes:
Organisational Roles, Data Architecture, Business Strategy and Data Management.


Since then we've made so much progress. We have establised the Data Strategy Committee, assigned data owners across the businesses, and defined and created roles for data stewards.

We have adopted Power BI, built out our lakehouse in Databricks, and implemented Collibra for data cataloguing, data quality and observability. Novuna has invested heavily to make all this happen, generating a lot of momentum, and it is crucial that we continue to capitalise on this. To give you the skills you need to leverage this investment we have created the Novuna Data Dojo.

The Novuna Data Dojo is made up of courses, bite-size learning and bootcamps covering technical skills in Collibra, Power BI and Databricks alongside softer data skills like data storytelling and literacy.

Answer a few simple questions to find out what training is right for you,
or keep scrolling to discover more

Data Dojo Questionnaire

The Technology

Databricks

A unified, open analytics platform for building, deploying, sharing, and maintaining enterprise-grade data, analytics, and AI solutions at scale. The Databricks Data Intelligence Platform integrates with cloud storage and security in your cloud account, and manages and deploys cloud infrastructure on your behalf.

Power BI

A collection of software services, apps, and connectors that work together to turn your unrelated sources of data into coherent, visually immersive, and interactive insights. Your data might be an Excel spreadsheet, or a collection of cloud-based and on-premises hybrid data warehouses. Power BI lets you easily connect to your data sources, visualize and discover what's important, and share that with anyone or everyone you want.

Collibra

A data catalog platform and tool that helps organizations better understand and manage their data assets. Collibra helps create an inventory of data assets, capture information (metadata) about them, and govern these assets. At its core, the Collibra tool is used for helping stakeholders understand what data assets exist, what they are made of, how they are being used, and if they are in regulatory compliance.

Our Learning Partners

Novuna has partnered with QA, the training company, to create our data literacy bootcamp to complement traning from the Databricks Academy, the Microsoft Skills Initiative, Collibra University as well as many other resources to provide you with the best grounding possible in these skills and set you up for success in your role using data.

Whether you want to better understand your data, build your own analyses and dashboards, ask better questions about and storytell with your data, or learn about Data Engineering and Data Science, you should find everything you need at the Data Dojo.

Microsoft ESI

The Microsoft Enterprise Skills Initiative (ESI) provides hands-on training for learning and enhancing technical skills and knowledge of Microsoft and Azure technologies. It offers interactive courses, role-based training curriculums, and Microsoft Certifications in a platform that is accessible by participating organizations.

Collibra University

Learn Collibra, data governance and more through an inclusive learning environment that meets you right where you are. Through a variety of self-paced courses on Collibra University, instructor-led trainings and subscriptions for more advanced learning opportunities, you’ll build the skills and knowledge you need to get the most out of your Collibra investment.

Databricks Academy

Get trained through Databricks Academy. Learn how to master data analytics from the team that started the Apache Spark™ research project at UC Berkeley.

QA

QA believe the answer to closing your digital skills gap lies with the people you already have and the talent they can bring for the future. They're experts in reskilling, upskilling, apprenticeships and other talent needs for leading enterprises and public sector organisations in the UK and around the world.

Avatar photo

Data Soft Skills

Data Literacy Bootcamp

The Data Literacy bootcamp covers an exciting range of topics from fundamentally understanding what data is and why it is so critical to Novuna's future success, through to data storytelling and being able to shape and explain business problems using data. It gives a broad but strong foundation in the data skills required in our increasingly data driven world.

Two courses are available, one for Report Builders and one for Report Consumers make the Data Literacy Bootcamp suitable for all colleagues. The course is made up of nine modules and a final assessment.

Learn More
Module 1: Data Literacy and its role
Understand the import role of data in Novuna and that data is a "language" that we need to learn in order to become data citizens and to make informed decisions.
Module 2: Data Management
Be aware that there are rules, procedures and principles in collecting and using data, the nature of these rules and ensure we remain within them.
Module 3: Data Quality Management
Understand the factors which effect quality of data and how these factors impact data efficacy and use. Learn some of the processes that help us overcome data issues.
Module 4: Data Structures and Data Traits
Gain an understanding of the common types of data and its features, as well as data models and where they can be developed and used.
Module 5: Data Analysis, Process, Terms and Concepts
Understand the basic processes and ideas behind data exploration and analysis and how this fits into the data lifecyle. Be able to form a hypothesis and define a good outcome from this hypothesis, as well as how to curate the data to support it.
Module 6: Data Analysis and Testing
Understand the principles and techniques that underpin data analysis and that allow you to test your hypothesis.
Module 7: Visuals & Charts
Understand different methods of visualising data and how to draw insight from data using different visualisation techniques. Relay different methods for maximising impact based on the insight generated from the data.
Module 8: Storyteling with data
Understand the difference between data insight and data stortelling. Be able to construct a narrative from data, insight and visualisations, using the techniques learned.
Module 9: Data literacy socialisation & communication
Understand the principles of a feedback loop and the mechanism of how feedback loops can be used to effect change.
Assessment
Demonstrate an improvement in data understanding and data literacy. Be able to articulate how the principles learned on the programme can be applied in a day to day role.
Avatar photo

Power BI Learning

Power BI Beginner

Choose from the "Read", "Watch", or "Instructor Led" beginner pathways designed for colleagues looking to take their data analysis skills to a more advanced level and learn how to communicate effectively with data. By the end of this program, you will be able to build interactive data visualizations, define best practices in data visualization, and manipulate data in Power BI.

Power BI Two-Day Champions Course

The CFMS Power BI Two-Day Champions Course was created for our Power BI super users in each of our business units. Learn advanced technical skills to get the most out of Power BI.

Learn More

Power BI Basics Workshop

The CFMS Power BI workshop was created to help you get started with Power BI, regardless of your technical or analytical background. This course is designed for users who are new to Power BI, but who intend to use the program extensively in their business workflows.The CFMS workshop will teach you:

The basic description of Power BI and use case examples
Data source and data manipulation
How to build customisable visuals and dashboards
How to connect to Excel data and import Excel objects into Power BI
How to use key features in order to effectively analyse your data
Create interactive reports and share your findings throughout your organisation
How to get, transform and load data. Power query editor

Learn More

PL-300 Power BI Data Analyst Associate

As a candidate for this certification, you should deliver actionable insights by working with available data and applying domain expertise. You should: Provide meaningful business value through easy-to-comprehend data visualizations. Enable others to perform self-service analytics. Deploy and configure solutions for consumption. As a Power BI data analyst, you work closely with business stakeholders to identify business requirements. You collaborate with enterprise data analysts and data engineers to identify and acquire data. You use Power BI to:

Transform the data
Create data models
Visualize data
Share assets

You should be proficient at using Power Query and writing expressions by using Data Analysis Expressions (DAX). You know how to assess data quality. Plus, you understand data security, including row-level security and data sensitivity.

Avatar photo

Databricks Learning Paths

Three tailored learning paths for different Databricks personas, Data Analyst, Data Engineer and Machine Learning Practitioner. Take the learning needs assessment to work out which path is best suited to your needs and which courses you should complete.

Data Analyst

Learn More
Databricks Lakehouse Fundamentals
Demonstrate your knowledge of Databricks by earning the Databricks Fundamentals accreditation.
Lessons included in this learning plan include:

• What is a Data Lakehouse?
• What is the Databricks Data Intelligence Platform (DI Platform)?
• Databricks DI Platform Architecture and Security Fundamentals
• Supported Workloads on the DI Platform
Get Started with Data Analysis on Databricks
This content provides an introduction to Databricks SQL. Participants will learn about ingesting data, producing visualizations and dashboards, and receive a brief introduction to Unity Catalog.
Data Analysis with Databricks SQL
This course provides a comprehensive introduction to Databricks SQL. Learners will ingest data, write queries, produce visualizations and dashboards, and configure alerts. This course is part of the Databricks Data Analyst learning pathway and was designed to help you prepare for the Databricks Certified Data Analyst Associate certification exam.

Data Engineer

Learn More
Databricks Lakehouse Fundamentals
Demonstrate your knowledge of Databricks by earning the Databricks Fundamentals accreditation.
Lessons included in this learning plan include:

• What is a Data Lakehouse?
• What is the Databricks Data Intelligence Platform (DI Platform)?
• Databricks DI Platform Architecture and Security Fundamentals
• Supported Workloads on the DI Platform
Get Started with Data Engineering on Databricks
In this course, you will learn basic skills that will allow you to use the Databricks Lakehouse Platform to perform a simple data engineering workflow. You will be given a tour of the workspace, and you will be shown how to work with notebooks. You will create a basic data engineering workflow while you perform tasks like creating and using compute resources, working with repositories, and creating and managing basic workflow jobs. The course will also introduce you to Databricks SQL. Finally, you will see how data is stored, managed, governed, and secured within the lakehouse.
Introduction to Python for Data Science and Data Engineering
This course is intended for complete beginners to Python to provide the basics of programmatically interacting with data. The course begins with a basic introduction to programming expressions, variables, and data types. It then progresses into conditional and control statements followed by an introduction to methods and functions. You will learn the basics of data structures, classes, and various string and utility functions. Lastly, you will gain experience using the Pandas library for data analysis and visualization as well as the fundamentals of cloud computing. Throughout the course, you will gain hands-on practice through lab exercises with additional resources to deepen your knowledge of programming after the class.
Apache Spark™ Programming with Databricks
Welcome to the Apache Spark™ Programming with Databricks course. This course is part of the Apache Spark™ Developer learning pathway and was designed to help you prepare for the Apache Spark™ Developer Certification exam. Please note - this is the self-paced version of the Apache Spark™ Programming with Databricks instructor-led courseIn this course, you will explore the fundamentals of Apache Spark™ and Delta Lake on Databricks. You will learn the architectural components of Spark, the DataFrame and Structured Streaming APIs, and how Delta Lake can improve your data pipelines. Lastly, you will execute streaming queries to process streaming data and understand the advantages of using Delta Lake.
Data Engineering with Databricks
This course (formerly Data Engineering with Databricks V3) prepares data professionals to leverage the Databricks Lakehouse Platform to productionalize ETL pipelines. Students will use Delta Live Tables to define and schedule pipelines that incrementally process new data from a variety of data sources into the Lakehouse. Students will also orchestrate tasks with Databricks Workflows and promote code with Databricks Repos.
Optimizing Apache Spark on Databricks
In this course, students will explore five key problems that represent the vast majority of performance problems in an Apache Spark application: Skew, Spill, Shuffle, Storage, and Serialization. With each of these topics, we explore coding examples based on 100 GB to 1+ TB datasets that demonstrate how these problems are introduced, how to diagnose these problems with tools like the Spark UI, and conclude by discussing mitigation strategies for each of these problems.

We continue the conversation by looking at a series of key ingestion concepts that promote strategies for processing Tera Bytes of data including managing Spark-Partition sizes, Disk-Partitioning, Bucketing, Z-Ordering, and more. With each of these topics, we explore when and how each of these techniques should be implemented, new challenges that productionalizing these solutions might provide along with corresponding mitigation strategies.

Finally, we introduce a couple of other key topics such as issues with Data Locality, IO-Caching and Spark-Caching, Pitfalls of Broadcast Joins, and new features like Spark 3’s Adaptive Query Execution and Dynamic Partition Pruning. We then conclude the course with discussions and exercises on designing and configuring clusters for optimal performance given specific use cases, personas, the divergent needs of various teams, and cross-team security concerns.
Advanced Data Engineering with Databricks
In this course, participants will build upon their existing knowledge of Apache Spark, Delta Lake, and Delta Live Tables to unlock the full potential of the data lakehouse by utilizing the suite of tools provided by Databricks. This course places a heavy emphasis on designs favoring incremental data processing, enabling systems optimized to continuously ingest and analyze ever-growing data. By designing workloads that leverage built-in platform optimizations, data engineers can reduce the burden of code maintenance and on-call emergencies, and quickly adapt production code to new demands with minimal refactoring or downtime. The topics in this course should be mastered prior to attempting the Databricks Certified Data Engineering Professional exam.

Machine Learning Practitioner

Learn More
Databricks Lakehouse Fundamentals
Demonstrate your knowledge of Databricks by earning the Databricks Fundamentals accreditation.
Lessons included in this learning plan include:

• What is a Data Lakehouse?
• What is the Databricks Data Intelligence Platform (DI Platform)?
• Databricks DI Platform Architecture and Security Fundamentals
• Supported Workloads on the DI Platform
Get Started with Databricks for Machine Learning
In this course, you will learn basic skills that will allow you to use the Databricks Lakehouse Platform to perform a simple data science and machine learning workflow. You will be given a tour of the workspace, and you will be shown how to work with notebooks. You will train a baseline model with AutoML and transition the best model to production. Finally, the course will also introduce you to MLflow, feature store and workflows and demonstrate how to train and manage an end-to-end machine learning lifecycle.
Introduction to Python for Data Science and Data Engineering
This course is intended for complete beginners to Python to provide the basics of programmatically interacting with data. The course begins with a basic introduction to programming expressions, variables, and data types. It then progresses into conditional and control statements followed by an introduction to methods and functions. You will learn the basics of data structures, classes, and various string and utility functions. Lastly, you will gain experience using the Pandas library for data analysis and visualization as well as the fundamentals of cloud computing. Throughout the course, you will gain hands-on practice through lab exercises with additional resources to deepen your knowledge of programming after the class.
Apache Spark™ Programming with Databricks
Welcome to the Apache Spark™ Programming with Databricks course. This course is part of the Apache Spark™ Developer learning pathway and was designed to help you prepare for the Apache Spark™ Developer Certification exam. Please note - this is the self-paced version of the Apache Spark™ Programming with Databricks instructor-led courseIn this course, you will explore the fundamentals of Apache Spark™ and Delta Lake on Databricks. You will learn the architectural components of Spark, the DataFrame and Structured Streaming APIs, and how Delta Lake can improve your data pipelines. Lastly, you will execute streaming queries to process streaming data and understand the advantages of using Delta Lake.
Scalable Machine Learning with Apache Spark™ (V2)
This course teaches you how to scale ML pipelines with Spark, including distributed training, hyperparameter tuning, and inference. You will build and tune ML models with SparkML while leveraging MLflow to track, version, and manage these models. This course covers the latest ML features in Apache Spark, such as Pandas UDFs, Pandas Functions, and the pandas API on Spark, as well as the latest ML product offerings, such as Feature Store and AutoML.
Machine Learning in Production (V2)
In this course, you will learn MLOps best practices for putting machine learning models into production. The first half of the course uses a feature store to register training data and uses MLflow to track the machine learning lifecycle, package models for deployment, and manage model versions. The second half of the course examines production issues including deployment paradigms, monitoring, and CI/CD. By the end of this course, you will have built an end-to-end pipeline to log, deploy, and monitor machine learning models.
Deep Learning with Databricks
This course begins by covering the basics of neural networks and the Keras API. We will focus on how to leverage Spark to scale our models, including distributed training, hyperparameter tuning, and inference while leveraging MLflow to track, version, and manage these models.

We will deep dive into distributed deep learning, including hands-on examples to compare and contrast various techniques for distributed data preparation, including Petastorm and TFRecord, as well as distributed training techniques such as Horovod and spark-tensorflow-distributor. To better understand the model's predictions, you will apply model interpretability libraries.

Further, you will learn the concepts behind Convolutional Neural Networks (CNNs) and transfer learning, and apply them to solve image classification tasks. We will wrap up the course by covering natural language processing (NLP) and explore text embeddings and the latest transformer-based models for transfer learning and other NLP applications.
Avatar photo

Collibra Learning

Collibra Basics

One hour instructor-led workshops to give a grounding in the key concepts. These are best suited to Data Owners and Data Stewards who will be working with Collibra as part of their role.

Data Catalogue Basics
An hour-long introductory session for users who are new to Collibra. Get hands on with Novuna's catalogue and learn about,

• What Collibra is and how it is organised
• How to navigate to key areas of the Collibra platform
• Searching the catalogue and refining search results
• Creating new assets
• Completing workflow tasks
• Finding key information in your user profile
Learn More
Data Quality Basics
An hour-long introductory session for Data Stewards who are new to Collibra Data Quality. Get hands on and learn about,

• Data sets
• Data quality jobs
• The data quality dimensions
• Data quality rules
• Catalogue integration
Learn More

Collibra Quick Videos

Five to ten minute videos explaining how to get the most out of Collibra, suitable for anyone.

What is Collibra Data Catalogue?
A ten minute video for anyone wondering what Collibra is and how they might be able to use it to answer questions about our data and reports.
Learn More
What is Collibra Data Quality?
A five minute video for anyone wondering what Collibra Data Quality is and why they might get asked to help resolve data quality issues.
Navigating Collibra for Data Assets
A five minute video for anyone wanting to find their way around Collibra when looking for specific data.
Learn More
Navigating Collibra for Power BI
A five minute video for anyone wanting to find their way around Collibra when looking for Power BI reports and data models.
Learn More
Creating Domains, Assets & Relationships
A five minute video for Data Stewards wanting to create their first catalogue assets and create relationships between them.
Learn More
Importing Assets & Metadata
A five minute video for Data Stewards wanting to bulk import assets like business terms, or metadata like a data dictionary.
Learn More
Data Lineage
A five minute video for anyone interested in understanding the flow of data from source, through Databricks, into Power BI data models and into reports.
Learn More