David McGinnisOct 29, 20196 minStop Feeding the Small File Monster! If you've worked with Hadoop for any amount of time, you've likely run into the dreaded small file problem...
David McGinnisOct 22, 20194 minRunning Garbage Collection on Your ClusterAt a high level, [CGC] is merely going through the cluster, taking inventory of the data and processes that run on the cluster...
David McGinnisOct 16, 20196 minWriting Environment Agnostic Code[...] we'll discuss some of the ways we can write environment agnostic code, which can be run on any environment within your enterprise.
David McGinnisOct 6, 20195 minYARN Capacity Scheduler and Node Labels Part 3We will discuss how partitions play with YARN queues. Finally, we will return to the example given in the first part of this series.
David McGinnisSep 29, 20194 minYARN Capacity Scheduler and Node Labels Part 2How do we ensure that GPU jobs run on worker nodes with GPUs without buying expensive GPUs for all of our worker nodes?