How to use pointblank to understand, validate, and document your data
Register now!
Summary
This workshop will focus on the data quality and data documentation workflows that the pointblank package makes possible. We will use functions that allow us to:
- quickly understand a new dataset
- validate tabular data using rules that are based on our understanding of the data
- fully document a table by describing its variables and other important details
The pointblank package was created to scale from small validation problems (“Let’s make certain this table fits my expectations before moving on”) to very large (“Let’s validate these 35 database tables every day and ensure data quality is maintained”) and we’ll delve into all sorts of data quality scenarios so you’ll be comfortable using this package in your organization.
Data documentation is seemingly and unfortunately less common in organizations (maybe even less than the practice of data validation). We’ll learn all about how this doesn’t have to be a tedious chore. The pointblank package allows you to create informative and beautiful data documentation that will help others understand what’s in all those tables that are so vital to an organization.
Speaker
Richard Iannone, Software Engineer, Posit, PBC
Rich is a software engineer at Posit that enjoys creating useful R and Python packages. He trained and worked as an atmospheric scientist and discovered working with R to be a breath of fresh air compared to the Excel-based analysis workflows common in that field. Since joining Posit he has been focused on developing packages that help organizations with data management and data visualization/publishing. When not working on R and Python packages, Rich also enjoys other things like playing and listening to music, watching movies, and getting outside.