Blog on E. Visel
/blog/
Recent content in Blog on E. ViselHugo -- gohugo.ioen-usMon, 06 Apr 2020 00:00:00 +0000Querying across files with Apache Drill
/blog/querying-across-files-with-apache-drill/
Mon, 06 Apr 2020 00:00:00 +0000/blog/querying-across-files-with-apache-drill/When I first used Apache Drill several years ago, it was one of those “holy crap this is amazing” moments. Moreover, every time since that I’ve thought “Oh, Drill could be really useful here” and spun it up, that thought has been quickly followed by “holy crap this is amazing” all over again. It’s just delightful. I keep thinking I should try out alternatives like Presto (which has two branches now) or Apache Impala, but I always start by spinning up Drill for comparison and never quite make it to anything else.match.arg
/blog/match.arg/
Thu, 04 Jul 2019 00:00:00 +0000/blog/match.arg/Lately I’ve been working with a lot of people whose first language is not R, which has given me more of an appreciation for R’s oddities. Some in retrospect were probably ill-advised, like partial matching with $:
mtcars$disp #> [1] 160.0 160.0 108.0 258.0 360.0 225.0 360.0 146.7 140.8 167.6 167.6 #> [12] 275.8 275.8 275.8 472.0 460.0 440.0 78.7 75.7 71.1 120.1 318.0 #> [23] 304.0 350.0 400.0 79.0 120.Mapping leaves
/blog/recursion/
Sun, 20 Jan 2019 00:00:00 +0000/blog/recursion/at_depth I love purrr.1 Aside from its anonymous function notation, one of the functions that made me love the package was at_depth, which iterates across a list at a specified level of nesting. It has since been deprecated in favor of modify_depth, which is more powerful, but is significantly more finicky.
The additional power is because the .depth parameter can now be passed a negative integer to index up from the bottom of the list.Fireworks
/blog/fireworks/
Sun, 11 Nov 2018 00:00:00 +0000/blog/fireworks/Since Thomas Lin Pedersen took over gganimate, I’ve been building animations. Mostly what I’ve built is not for any particular data visualization purpose. My motivations vary, but have included
I want to try out new features of the package I’m bored I’m trying to understand trig functions in polar space I have an idea I can’t focus I want to play with matrix transformations I want to make pretty things …but I mostly make them because they make me happy.Coalescing joins in dplyr
/blog/coalescing-joins/
Sat, 28 Jul 2018 00:00:00 +0000/blog/coalescing-joins/When aggregating data, it is not uncommon to need to combine datasets containing identical non-key variables in varying states of completeness. There are various ways to accomplish this task. One possibility an coalescing join, a join in which missing values in x are filled with matching values from y. Such behavior does not exist in current dplyr joins, though it has been discussed, and so may someday. For now, let’s build an coalesce_join function.test
/blog/test/
Mon, 07 May 2018 00:00:00 +0000/blog/test/Bayesian Regression
/blog/bayesian-regression/
Mon, 16 Apr 2018 00:00:00 +0000/blog/bayesian-regression/I have been working on my Bayesian statistics skills recently. In particular, I have been reading David Robinson’s lovely Introduction to Empirical Bayes: Examples from Baseball Statistics and watching Rasmus Bååth’s delightful three-part Video Introduction to Bayesian Data Analysis, notable amongst other videos, courses, and textbooks. I have much yet to learn, but my past experience with statistics has taught me that I understand concepts most thoroughly by actually implementing them.Anonymous Functions, Part II: gsubfn
/blog/anonymous-functions-2/
Sat, 07 Apr 2018 00:00:00 +0000/blog/anonymous-functions-2/This is a follow-up to Anonymous Functions, Not Variables. For context, read that first.
After my previous post, Brodie Gaslam pointed me in an interesting direction on Twitter:
Have you seen https://t.co/KTChI5NMYb as.function.formula? Very similar concepts.
— BrodieG (@BrodieGaslam) March 26, 2018 He’s right. Gabor Grothendieck’s gsubfn package is centered around its gsubfn function, an extended version of gsub whose replacement parameter can accept functions, e.g. to do arithmetic with numbers in a string.Anonymous Functions, Not Variables
/blog/anonymous-functions/
Sun, 25 Mar 2018 00:00:00 +0000/blog/anonymous-functions/I am a very heavy purrr user. The killer feature is clearly map_df (fairly recently rebranded as map_dfr and map_dfc for row and column binding, respectively) to iterate over a list à la lapply and simplify the result to a data frame. Thanks to the power of dplyr::bind_rows, it fixes all the drawbacks of sapply’s simplify2array behavior:
It returns a data frame, not a matrix or array, so multiple types can be kept.Pythagorean Triples
/blog/pythagorean-triples/
Fri, 23 Mar 2018 00:00:00 +0000/blog/pythagorean-triples/In a quiet moment, I happened across Project Euler’s Question 39:
Integer right triangles Problem 39 If \(p\) is the perimeter of a right angle triangle with integral length sides, \(\{a,b,c\}\), there are exactly three solutions for \(p = 120\):
\[\{20,48,52\}, \{24,45,51\}, \{30,40,50\}\]
For which value of \(p \le 1000\), is the number of solutions maximised?
Put another way, what integer perimeter less than or equal to 1000 has the most Pythagorean triples?p5 in R
/blog/p5-in-r/
Fri, 23 Mar 2018 00:00:00 +0000/blog/p5-in-r/p5.js is a version of Processing built natively in JavaScript. It’s really, really awesome.
Sean Kross wrote R bindings for p5.js in his p5 package so it can be written and R and published as an htmlwidget. This is a little exploration of how it works.
library(p5) # runs once at start setup_ <- setup() %>% createCanvas(500, 500) %>% noStroke() # reruns every frame draw_ <- draw() %>% background('#888') %>% fill(rgb(1, 1, 1, 0.