top of page
Search
David McGinnis
Mar 17, 20206 min read
Testing a Hive Patch on a Local System
[...] I needed to get a Hive cluster running my code and a Confluent cluster that could output Avro messages in the proper format to test.
301 views0 comments
David McGinnis
Mar 10, 20204 min read
A Crash Course in Proper Oozie Usage
[...] focus on best practices such as when and why you should use Oozie, and when to use bundles.
337 views0 comments
David McGinnis
Feb 25, 20207 min read
Debugging From The Field: The Case of the Empty Files
A team at a client was using Spark to read and write to a Kafka topic. [...] files that would be written that were completely empty.
340 views1 comment
David McGinnis
Dec 3, 20195 min read
Debugging From The Field: The Case of the Ignored Configuration Change
We made the change on a Sunday, but four days later, the number of files had not appreciably changed in the YARN logs directory.
160 views0 comments
David McGinnis
Nov 26, 20194 min read
A Crash Course in YARN Log Aggregation
The system which maintains the application logs in HDFS is called the Log Aggregation system and is flexible [...]
7,597 views1 comment
David McGinnis
Nov 12, 20194 min read
Debugging from the Field: Smartsense Activity Explorer Stops Working
This client didn't use the activity explorer much [...] I tried to run one of the paragraphs and immediately [failed].
302 views0 comments
David McGinnis
Oct 22, 20194 min read
Running Garbage Collection on Your Cluster
At a high level, [CGC] is merely going through the cluster, taking inventory of the data and processes that run on the cluster...
107 views0 comments
David McGinnis
Oct 6, 20195 min read
YARN Capacity Scheduler and Node Labels Part 3
We will discuss how partitions play with YARN queues. Finally, we will return to the example given in the first part of this series.
859 views0 comments
David McGinnis
Sep 29, 20194 min read
YARN Capacity Scheduler and Node Labels Part 2
How do we ensure that GPU jobs run on worker nodes with GPUs without buying expensive GPUs for all of our worker nodes?
839 views0 comments
David McGinnis
Sep 22, 20195 min read
YARN Capacity Scheduler and Node Labels Part 1
I'm going to explore exactly how YARN works with queues, and the various mechanisms available to control how YARN does this.
2,412 views0 comments
David McGinnis
Sep 8, 20194 min read
Debugging From the Field: Sudden Kerberos Failure in HiveServer2 Instance
A client has a medium sized kerberized HDP 2.6.0 installation. Hive Interactive is disabled and Hive is set up to use HTTP transport mode.
582 views0 comments
bottom of page