<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Munging on E. Visel</title>
    <link>/tags/munging/</link>
    <description>Recent content in Munging on E. Visel</description>
    <generator>Hugo -- gohugo.io</generator>
    <language>en-us</language>
    <lastBuildDate>Mon, 06 Apr 2020 00:00:00 +0000</lastBuildDate>
    
	<atom:link href="/tags/munging/index.xml" rel="self" type="application/rss+xml" />
    
    
    <item>
      <title>Querying across files with Apache Drill</title>
      <link>/blog/querying-across-files-with-apache-drill/</link>
      <pubDate>Mon, 06 Apr 2020 00:00:00 +0000</pubDate>
      
      <guid>/blog/querying-across-files-with-apache-drill/</guid>
      <description>When I first used Apache Drill several years ago, it was one of those “holy crap this is amazing” moments. Moreover, every time since that I’ve thought “Oh, Drill could be really useful here” and spun it up, that thought has been quickly followed by “holy crap this is amazing” all over again. It’s just delightful. I keep thinking I should try out alternatives like Presto (which has two branches now) or Apache Impala, but I always start by spinning up Drill for comparison and never quite make it to anything else.</description>
    </item>
    
    <item>
      <title>Coalescing joins in dplyr</title>
      <link>/blog/coalescing-joins/</link>
      <pubDate>Sat, 28 Jul 2018 00:00:00 +0000</pubDate>
      
      <guid>/blog/coalescing-joins/</guid>
      <description>When aggregating data, it is not uncommon to need to combine datasets containing identical non-key variables in varying states of completeness. There are various ways to accomplish this task. One possibility an coalescing join, a join in which missing values in x are filled with matching values from y. Such behavior does not exist in current dplyr joins, though it has been discussed, and so may someday. For now, let’s build an coalesce_join function.</description>
    </item>
    
  </channel>
</rss>