Finding treasures among null columns

Ronald Leung
2 min readMay 26, 2020
Photo credit (Daniel Tuttle)

While working with production data, one issue I have seen multiple times is that the dataset have a large amount of columns, but only certain ones are meaningful. You can certainly filter explicitly by columns, if you know which exact column you want. But often times I realize I just want to get all the rows that have some values in some columns. In this article I’ll share the simple trick. Here’s a sample data frame, with mostly NaN data:

The notnull() function allows us to figure out which value is not null, and what we really want, is to get a sum for each column over all the rows. This way, we can see which column, has some value for some row.

Now we can clearly tell, which are the columns we are interested in; Columns A, B, C, and J. We can feed this back into the data frame filter as follows:

That’s it! Now we get only the columns we want.

--

--