To date, deep learning has mainly been conducted across high performance computing environments where hardware is specialized, clusters are single tenant, and software is written with the assumption that failures are rare. Cloud computing, on the other hand, enables us to push the boundary on what can be done with commodity hardware in multi-tenant environments with software that is more and more modular and designed to expect failure. This research is looking at the intersection of AI workloads and the cloud software stack. How far can we take AI in the cloud? What doors does it open?