top of page
Search


Testing a Hive Patch on a Local System
[...] I needed to get a Hive cluster running my code and a Confluent cluster that could output Avro messages in the proper format to test.

David McGinnis
Mar 17, 20206 min read


Running Garbage Collection on Your Cluster
At a high level, [CGC] is merely going through the cluster, taking inventory of the data and processes that run on the cluster...

David McGinnis
Oct 22, 20194 min read


Writing Environment Agnostic Code
[...] we'll discuss some of the ways we can write environment agnostic code, which can be run on any environment within your enterprise.

David McGinnis
Oct 16, 20196 min read


YARN Capacity Scheduler and Node Labels Part 2
How do we ensure that GPU jobs run on worker nodes with GPUs without buying expensive GPUs for all of our worker nodes?

David McGinnis
Sep 29, 20194 min read


YARN Capacity Scheduler and Node Labels Part 1
I'm going to explore exactly how YARN works with queues, and the various mechanisms available to control how YARN does this.

David McGinnis
Sep 22, 20195 min read


Machine Learning Solutions: Recommender System Design
With the help of tools like Spark’s MLlib ... [making a recommendation engine] is something that many companies have done and you can too.

David McGinnis
Sep 1, 201910 min read
bottom of page


