Munging

Querying across files with Apache Drill

Globbing, implicit columns, and the power of SQL

April 6, 2020 Edward Visel

16 minute read

When I first used Apache Drill several years ago, it was one of those “holy crap this is amazing” moments. Moreover, every time since that I’ve thought “Oh, Drill could be really useful here” and spun it up, that thought has been quickly followed by “holy crap this is amazing” all over again. It’s just delightful. I keep thinking I should try out alternatives like Presto (which has two branches now) or Apache Impala, but I always start by spinning up Drill for comparison and never quite make it…

Coalescing joins in dplyr

Filling in missing data by joining

July 28, 2018 Edward Visel

4 minute read

When aggregating data, it is not uncommon to need to combine datasets containing identical non-key variables in varying states of completeness. There are various ways to accomplish this task. One possibility an coalescing join, a join in which missing values in x are filled with matching values from y. Such behavior does not exist in current dplyr joins, though it has been discussed, and so may someday. For now, let’s build an coalesce_join function.

Edward Visel

Odds, ends, and R code

About

Hi, I'm Edward! Welcome to my little world of odds, ends, and R.

Learn More

tags

Home

About

Blog

Packages

Categories

Contact

Recent Posts

Querying across files with Apache Drill

match.arg

Mapping leaves

Fireworks

Coalescing joins in dplyr

Munging

Querying across files with Apache Drill

Coalescing joins in dplyr

Edward Visel

Recent Posts

Querying across files with Apache Drill

match.arg

Mapping leaves

Fireworks

Coalescing joins in dplyr

Categories

About