top of page
Search

David McGinnis
Jan 19, 20224 min read
Debugging from the Field: Sudden CI Test Failures
This is one of a series of posts revolving around debugging stories from the field. The goal of this series is to help demonstrate how to...
563 views
1 comment


David McGinnis
Mar 10, 20204 min read
A Crash Course in Proper Oozie Usage
[...] focus on best practices such as when and why you should use Oozie, and when to use bundles.
352 views
0 comments

David McGinnis
Mar 3, 20205 min read
Debugging From The Field: When Parallelization Goes Wrong
Blog post chronicling the investigation of a series of errors in a Spark Streaming job reading from Spark.
835 views
0 comments

David McGinnis
Feb 25, 20207 min read
Debugging From The Field: The Case of the Empty Files
A team at a client was using Spark to read and write to a Kafka topic. [...] files that would be written that were completely empty.
342 views
1 comment

David McGinnis
Feb 18, 20204 min read
Spark Job Optimization Myth #6: I'm Seeing Out of Memory Exceptions, So I Need to Increase Memory
... we're going to discuss [...] the common excuse I see from developers to immediately try to increase resources, [the OutOfMemory error]
908 views
0 comments

David McGinnis
Feb 11, 20205 min read
Spark Job Optimization Myth #5: Increasing Executor Cores is Always a Good Idea
This week, we're going to talk about executor cores. First [...] we'll understand how setting executor cores affects our jobs...
12,554 views
1 comment


David McGinnis
Feb 4, 20205 min read
Spark Job Optimization Myth #4: I Need More Overhead Memory
[This week,] we'll look at the overhead memory parameter, which is available for both driver and executors.
2,506 views
2 comments

David McGinnis
Jan 28, 20205 min read
Spark Job Optimization Myth #3: I Need More Driver Memory
The most common misconception I see developers fall into with regards to the driver configuration is increasing driver memory.
10,892 views
0 comments


David McGinnis
Jan 21, 20205 min read
Spark Job Optimization Myth #2: Increasing the Number of Executors Always Improves Performance
Increasing the number of executors certainly works [...], but there are still very important times when it doesn't work the way you'd want.
6,696 views
0 comments

David McGinnis
Jan 6, 20205 min read
Spark Job Optimization Myth #1: Increasing the Memory Per Executor Always Improves Performance
Not only is [increasing memory] haphazard and leads to inconsistent results, but it also doesn't actually do what they think it does.
7,810 views
0 comments

David McGinnis
Nov 5, 20196 min read
Spark Job Optimization: Dealing with Data Skew
Optimizing a Spark job can be a daunting task. [...] This series is going to focus on diving into the inner works of Spark [...]
6,504 views
0 comments


David McGinnis
Oct 29, 20196 min read
Stop Feeding the Small File Monster!
If you've worked with Hadoop for any amount of time, you've likely run into the dreaded small file problem...
234 views
3 comments


David McGinnis
Oct 16, 20196 min read
Writing Environment Agnostic Code
[...] we'll discuss some of the ways we can write environment agnostic code, which can be run on any environment within your enterprise.
1,285 views
0 comments


David McGinnis
Oct 6, 20195 min read
YARN Capacity Scheduler and Node Labels Part 3
We will discuss how partitions play with YARN queues. Finally, we will return to the example given in the first part of this series.
892 views
0 comments


David McGinnis
Sep 29, 20194 min read
YARN Capacity Scheduler and Node Labels Part 2
How do we ensure that GPU jobs run on worker nodes with GPUs without buying expensive GPUs for all of our worker nodes?
889 views
0 comments


David McGinnis
Sep 22, 20195 min read
YARN Capacity Scheduler and Node Labels Part 1
I'm going to explore exactly how YARN works with queues, and the various mechanisms available to control how YARN does this.
2,470 views
0 comments


David McGinnis
Sep 15, 20194 min read
Debugging from the Field: Sudden En Masse Failures in 100s of Spark Streaming Jobs
As this case shows, it is very important to understand where to look and how to filter out noise from the signal you care about.
131 views
0 comments
bottom of page