https://dl.acm.org/doi/10.1145/3603269.3604836 skip to main content ACM Digital Library home ACM home * Advanced Search * Browse * About * + Sign in + Register * * Advanced Search * Journals * Magazines * Proceedings * Books * SIGs * Conferences * People * * More * Search ACM Digital Library[ ] SearchSearch Advanced Search 10.1145/3603269.3604836acmconferencesArticle/Chapter ViewAbstract Publication PagescommConference Proceedingsconference-collections comm * Conference * Proceedings * Upcoming Events * Authors * Affiliations * Award Winners * More * Home * Conferences * COMM * Proceedings * ACM SIGCOMM '23 * Lightwave Fabrics: At-Scale Optical Circuit Switching for Datacenter and Machine Learning Systems research-article Open Access Share on * * * * * Lightwave Fabrics: At-Scale Optical Circuit Switching for Datacenter and Machine Learning Systems * Authors: * #Hong Liu Google, Sunnyvale, CA, USA Google, Sunnyvale, CA, USA [o]https://orcid.org/0000-0003-0053-1111 View Profile , * #Ryohei Urata Google, Sunnyvale, CA, USA Google, Sunnyvale, CA, USA [o]https://orcid.org/0009-0009-5788-1834 View Profile , * #Kevin Yasumura Google, Sunnyvale, CA, USA Google, Sunnyvale, CA, USA [o]https://orcid.org/0009-0008-1306-3445 View Profile , * #Xiang Zhou Google, Sunnyvale, CA, USA Google, Sunnyvale, CA, USA [o]https://orcid.org/0000-0003-0121-6527 View Profile , * #Roy Bannon Google, Sunnyvale, CA, USA Google, Sunnyvale, CA, USA [o]https://orcid.org/0009-0005-4523-186X View Profile , * #Jill Berger Google, Sunnyvale, CA, USA Google, Sunnyvale, CA, USA [o]https://orcid.org/0009-0005-3880-5167 View Profile , * #Pedram Dashti Google, Sunnyvale, CA, USA Google, Sunnyvale, CA, USA [o]https://orcid.org/0009-0008-7058-7093 View Profile , * #Norm Jouppi Google, Sunnyvale, CA, United States of America Google, Sunnyvale, CA, United States of America [o]https://orcid.org/0000-0003-1765-1929 View Profile , * #Cedric Lam Google, Sunnyvale, CA, USA Google, Sunnyvale, CA, USA [o]https://orcid.org/0000-0001-7392-451X View Profile , * #Sheng Li Google, Sunnyvale, CA, USA Google, Sunnyvale, CA, USA [o]https://orcid.org/0000-0003-1068-5261 View Profile , * #Erji Mao Google, Sunnyvale, CA, USA Google, Sunnyvale, CA, USA [o]https://orcid.org/0009-0006-1097-5778 View Profile , * #Daniel Nelson Google, Sunnyvale, CA, USA Google, Sunnyvale, CA, USA [o]https://orcid.org/0009-0006-0657-324X View Profile , * #George Papen Google, Sunnyvalue, CA, United States of America Department of ECE, UC San Diego, La Jolla, CA, USA Google, Sunnyvalue, CA, United States of America Department of ECE, UC San Diego, La Jolla, CA, USA [o]https://orcid.org/0000-0001-6727-1292 View Profile , * #Mukarram Tariq Google, Sunnyvale, CA, United States of America Google, Sunnyvale, CA, United States of America [o]https://orcid.org/0000-0003-3164-2551 View Profile , * #Amin Vahdat Google, Sunnyvale, CA, United States of America Google, Sunnyvale, CA, United States of America [o]https://orcid.org/0000-0002-4866-1698 View Profile Authors Info & Claims ACM SIGCOMM '23: Proceedings of the ACM SIGCOMM 2023 Conference September 2023Pages 499-515https://doi.org/10.1145/3603269.3604836 Published:01 September 2023Publication HistoryCheck for updates on crossmark * 0citation * 1,896 * Downloads Metrics Total Citations0 Total Downloads1,896 Last 12 Months1,896 Last 6 weeks1,896 * Get Citation Alerts New Citation Alert added! This alert has been successfully added and will be sent to: You will be notified whenever a record that you have chosen has been cited. To manage your alert preferences, click on the button below. Manage my Alerts New Citation Alert! Please log in to your account * Save to Binder Save to Binder [loader-7e6] Create a New Binder Name [ ] + Cancel + Create * Export Citation * Publisher Site * * eReader * PDF ACM SIGCOMM '23: Proceedings of the ACM SIGCOMM 2023 Conference Lightwave Fabrics: At-Scale Optical Circuit Switching for Datacenter and Machine Learning Systems Pages 499-515 PreviousChapterNextChapter ACM Digital Library ABSTRACT We describe our experience developing what we believe to be the world's first large-scale production deployments of lightwave fabrics used for both datacenter networking and machine-learning (ML) applications. Using optical circuit switches (OCSes) and optical transceivers developed in-house, we employ hardware and software codesign to integrate the fabrics into our network and computing infrastructure. Key to our design is a high degree of multiplexing enabled by new kinds of wavelength-division-multiplexing (WDM) and optical circulators that support high-bandwidth bidirectional traffic on a single strand of optical fiber. The development of the requisite OCS and optical transceiver technologies leads to a synchronous lightwave fabric that is reconfigurable, low latency, rate agnostic, and highly available. These fabrics have provided substantial benefits for long-lived traffic patterns in our datacenter networks and predictable traffic patterns in tightly-coupled machine learning clusters. We report results for a large-scale ML superpod with 4096 tensor processing unit (TPU) V4 chips that has more than one ExaFLOP of computing power. For this use case, the deployment of a lightwave fabric provides up to 3x better system availability and model-dependent performance improvements of up to 3.3x compared to a static fabric, despite constituting less than 6% of the total system cost. References 1. Martin Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A System for Large-Scale Machine Learning. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI'16). USENIX Association, USA, 265--283.Google ScholarGoogle ScholarDigital LibraryDigital Library 2. CW-WDM MSA (Continuous-Wave Wavelength Division Multiplexing Multi-Source Agreement). 2021. www.cwdm4-msa.org, Last accessed on 2023-1-30. (2021).Google ScholarGoogle Scholar 3. Mohammad Al-Fares, Alexander Loukissas, and Amin Vahdat. 2008. A Scalable, Commodity Data Center Network Architecture. In Proceedings of the ACM SIGCOMM 2008 (SIGCOMM '08). Association for Computing Machinery, New York, NY, USA, 63--74. Google ScholarGoogle ScholarDigital LibraryDigital Library 4. Hitesh Ballani, Paolo Costa, Raphael Behrendt, Daniel Cletheroe, Istvan Haller, Krzysztof Jozwik, Fotini Karinou, Sophie Lange, Kai Shi, Benn Thomsen, and Hugh Williams. 2020. Sirius: A Flat Datacenter Network with Nanosecond Optical Switching. In Proceedings of the Annual Conference of the ACM Special Interest Group on Data Communication on the Applications, Technologies, Architectures, and Protocols for Computer Communication (SIGCOMM '20). Association for Computing Machinery, New York, NY, USA, 782--797. Google ScholarGoogle ScholarDigital LibraryDigital Library 5. K.J. Barker, A. Benner, R. Hoare, A. Hoisie, A.K. Jones, D.K. Kerbyson, D. Li, R. Melhem, R. Rajamony, E. Schenfeld, S. Shao, C. Stunkel, and P. Walker. 2005. On the Feasibility of Optical Circuit Switching for High Performance Computing Systems. In SC '05: Proceedings of the 2005 ACM/IEEE Conference on Supercomputing. Association for Computing Machinery, 16--16. Google ScholarGoogle ScholarDigital LibraryDigital Library 6. Brad Calder, Ju Wang, Aaron Ogus, Niranjan Nilakantan, Arild Skjolsvold, Sam McKelvie, Yikang Xu, Shashwat Srivastav, Jiesheng Wu, Huseyin Simitci, Jaidev Haridas, Chakravarthy Uddaraju, Hemal Khatri, Andrew Edwards, Vaman Bedekar, Shane Mainali, Rafay Abbasi, Arpit Agarwal, Mian Fahim ul Haq, Muhammad Ikram ul Haq, Deepali Bhardwaj, Sowmya Dayanand, Anitha Adusumilli, Marvin McNett, Sriram Sankaran, Kavitha Manivannan, and Leonidas Rigas. 2011. Windows Azure Storage: A Highly Available Cloud Storage Service with Strong Consistency. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles (SOSP '11). Association for Computing Machinery, New York, NY, USA, 143--157. Google ScholarGoogle ScholarDigital LibraryDigital Library 7. Calient Technologies. 2023. http://calient.net, Last accessed on 2023-1-30. (2023).Google ScholarGoogle Scholar 8. Peirui Cao, Shizhen Zhao, Min Yee The, Yunzhuo Liu, and Xinbing Wang. 2021. TROD: Evolving From Electrical Data Center to Optical Data Center. In 2021 IEEE 29th International Conference on Network Protocols (ICNP). IEEE, 1--11. Google ScholarGoogle ScholarCross RefCross Ref 9. R.D. Chamberlain, M.A. Franklin, and Ch'ng Shi Baw. 2002. Gemini: an optical interconnection network for parallel processing. IEEE Transactions on Parallel and Distributed Systems 13, 10 (2002), 1038--1055. Google ScholarGoogle ScholarDigital LibraryDigital Library 10. Frank Chang, Sudeep Bhoja, Jamal Riani, Ishwar Hosagrahar, Jennifer Wu, Sameer Herlekar, Arun Tiruvur, Pulkit Khandelwal, and Karthik Gopalakrishnan. 2016. Link Performance Investigation of Industry First 100G PAM4 IC Chipset with Real-time DSP for Data Center Connectivity, In Optical Fiber Communication Conference. Optical Fiber Communication Conference, Th1G.2. Google ScholarGoogle ScholarCross RefCross Ref 11. Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber. 2006. Bigtable: A Distributed Storage System for Structured Data. In 7th USENIX Symposium on Operating Systems Design and Implementation (OSDI 06). USENIX Association, Seattle, WA. https://www.usenix.org/conference/osdi-06/ bigtable-distributed-storage-system-structured-dataGoogle Scholar Google ScholarDigital LibraryDigital Library 12. Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, Parker Schuh, Kensen Shi, Sasha Tsvyashchenko, Joshua Maynez, Abhishek Rao, Parker Barnes, Yi Tay, Noam Shazeer, Vinodkumar Prabhakaran, Emily Reif, Nan Du, Ben Hutchinson, Reiner Pope, James Bradbury, Jacob Austin, Michael Isard, Guy Gur-Ari, Pengcheng Yin, Toju Duke, Anselm Levskaya, Sanjay Ghemawat, Sunipa Dev, Henryk Michalewski, Xavier Garcia, Vedant Misra, Kevin Robinson, Liam Fedus, Denny Zhou, Daphne Ippolito, David Luan, Hyeontaek Lim, Barret Zoph, Alexander Spiridonov, Ryan Sepassi, David Dohan, Shivani Agrawal, Mark Omernick, Andrew M. Dai, Thanumalayan Sankaranarayana Pillai, Marie Pellat, Aitor Lewkowycz, Erica Moreira, Rewon Child, Oleksandr Polozov, Katherine Lee, Zongwei Zhou, Xuezhi Wang, Brennan Saeta, Mark Diaz, Orhan Firat, Michele Catasta, Jason Wei, Kathy Meier-Hellstern, Douglas Eck, Jeff Dean, Slav Petrov, and Noah Fiedel. 2022. PaLM: Scaling Language Modeling with Pathways. (2022). Google ScholarGoogle ScholarCross RefCross Ref 13. James C. Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, J. J. Furman, Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, Wilson Hsieh, Sebastian Kanthak, Eugene Kogan, Hongyi Li, Alexander Lloyd, Sergey Melnik, David Mwaura, David Nagle, Sean Quinlan, Rajesh Rao, Lindsay Rolig, Yasushi Saito, Michal Szymaniak, Christopher Taylor, Ruth Wang, and Dale Woodford. 2013. Spanner: Google's Globally Distributed Database. ACM Trans. Comput. Syst. 31, 3, Article 8 (aug 2013), 22 pages. Google ScholarGoogle ScholarDigital Library Digital Library 14. Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: Simplified Data Processing on Large Clusters. Commun. ACM 51, 1 (jan 2008), 107--113. Google ScholarGoogle ScholarDigital LibraryDigital Library 15. A. Farhood, B Smith, and S. Anderson. 2012. Improved MPI upper bound analysis. http://www.ieee802.org/3/bm/public/nov12/. (2012). Last accessed Jan. 2023.Google ScholarGoogle Scholar 16. Nathan Farrington, George Porter, Sivasankar Radhakrishnan, Hamid Hajabdolali Bazzaz, Vikram Subramanya, Yeshaiahu Fainman, George Papen, and Amin Vahdat. 2010. Helios: A Hybrid Electrical/Optical Switch Architecture for Modular Data Centers. In Proceedings of the ACM SIGCOMM 2010 Conference (SIGCOMM '10). Association for Computing Machinery, New York, NY, USA, 339--350. Google Scholar Google ScholarDigital LibraryDigital Library 17. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. 2003. The Google File System. SIGOPS Oper. Syst. Rev. 37, 5 (Oct. 2003), 29--43. Google ScholarGoogle ScholarDigital LibraryDigital Library 18. Madeleine Glick, David G. Andersen, Michael Kaminsky, and Lily Mummert. 2009. Dynamically Reconfigurable Optical Links for High-Bandwidth Data Center Networks, In Optical Fiber Communication Conference and National Fiber Optic Engineers Conference. Optical Fiber Communication Conference and National Fiber Optic Engineers Conference, OTuA3. Google ScholarGoogle ScholarCross RefCross Ref 19. Albert Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, Changhoon Kim, Parantap Lahiri, David A. Maltz, Parveen Patel, and Sudipta Sengupta. 2009. VL2: A Scalable and Flexible Data Center Network. SIGCOMM Comput. Commun. Rev. 39, 4 (Aug. 2009), 51--62. Google ScholarGoogle ScholarDigital LibraryDigital Library 20. Yanping Huang, Youlong Cheng, Ankur Bapna, Orhan Firat, Mia Xu Chen, Dehao Chen, HyoukJoong Lee, Jiquan Ngiam, Quoc V. Le, Yonghui Wu, and Zhifeng Chen. 2019. GPipe: Efficient Training of Giant Neural Networks Using Pipeline Parallelism. Curran Associates Inc., Red Hook, NY, USA.Google ScholarGoogle Scholar 21. R. Hui. 2019. Introduction to Fiber-Optic Communications. Elsevier Science.Google ScholarGoogle Scholar 22. IEEE 802.3cd Working Group. 2018. https://ieee802.org/3/cd/ public, Last accessed on 2023-2-2. (2018).Google ScholarGoogle Scholar 23. Telescent Inc. 2023. www.telescent.com/products, Last accessed on 2023-6-30. (2023).Google ScholarGoogle Scholar 24. Sushant Jain, Alok Kumar, Subhasree Mandal, Joon Ong, Leon Poutievski, Arjun Singh, Subbaiah Venkata, Jim Wanderer, Junlan Zhou, Min Zhu, Jon Zolla, Urs Holzle, Stephen Stuart, and Amin Vahdat. 2013. B4: Experience with a Globally-Deployed Software Defined Wan. In Proceedings of the ACM SIGCOMM 2013 Conference on SIGCOMM (SIGCOMM '13). Association for Computing Machinery, New York, NY, USA, 3--14. Google ScholarGoogle ScholarDigital Library Digital Library 25. Norm Jouppi, George Kurian, Sheng Li, Peter Ma, Rahul Nagarajan, Lifeng Nai, Nishant Patil, Suvinay Subramanian, Andy Swing, Brian Towles, Clifford Young, Xiang Zhou, Zongwei Zhou, and David A Patterson. 2023. TPU v4: An Optically Reconfigurable Supercomputer for Machine Learning with Hardware Support for Embeddings. In Proceedings of the 50th Annual International Symposium on Computer Architecture (ISCA '23). Association for Computing Machinery, New York, NY, USA, Article 82, 14 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library 26. Norman P. Jouppi, Doe Hyun Yoon, Matthew Ashcraft, Mark Gottscho, Thomas B. Jablin, George Kurian, James Laudon, Sheng Li, Peter Ma, Xiaoyu Ma, Thomas Norrie, Nishant Patil, Sushma Prasad, Cliff Young, Zongwei Zhou, and David Patterson. 2021. Ten Lessons From Three Generations Shaped Google's TPUv4i : Industrial Product. In 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA). 1--14. Google ScholarGoogle ScholarDigital LibraryDigital Library 27. Norman P. Jouppi, Doe Hyun Yoon, George Kurian, Sheng Li, Nishant Patil, James Laudon, Cliff Young, and David Patterson. 2020. A Domain-Specific Supercomputer for Training Deep Neural Networks. Commun. ACM 63, 7 (June 2020), 67--78. Google ScholarGoogle ScholarDigital LibraryDigital Library 28. Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, Rick Boyle, Pierre-luc Cantin, Clifford Chao, Chris Clark, Jeremy Coriell, Mike Daley, Matt Dau, Jeffrey Dean, Ben Gelb, Tara Vazir Ghaemmaghami, Rajendra Gottipati, William Gulland, Robert Hagmann, C. Richard Ho, Doug Hogberg, John Hu, Robert Hundt, Dan Hurt, Julian Ibarz, Aaron Jaffey, Alek Jaworski, Alexander Kaplan, Harshit Khaitan, Daniel Killebrew, Andy Koch, Naveen Kumar, Steve Lacy, James Laudon, James Law, Diemthu Le, Chris Leary, Zhuyuan Liu, Kyle Lucke, Alan Lundin, Gordon MacKean, Adriana Maggiore, Maire Mahony, Kieran Miller, Rahul Nagarajan, Ravi Narayanaswami, Ray Ni, Kathy Nix, Thomas Norrie, Mark Omernick, Narayana Penukonda, Andy Phelps, Jonathan Ross, Matt Ross, Amir Salek, Emad Samadiani, Chris Severn, Gregory Sizikov, Matthew Snelham, Jed Souter, Dan Steinberg, Andy Swing, Mercedes Tan, Gregory Thorson, Bo Tian, Horia Toma, Erick Tuttle, Vijay Vasudevan, Richard Walter, Walter Wang, Eric Wilcox, and Doe Hyun Yoon. 2017. In-Datacenter Performance Analysis of a Tensor Processing Unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA '17). Association for Computing Machinery, New York, NY, USA, 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library 29. Shoaib Kamil, Ali Pinar, Daniel Gunter, Michael Lijewski, Leonid Oliker, and John Shalf. 2007. Reconfigurable Hybrid Interconnection for Static and Dynamic Scientific Applications. In Proceedings of the 4th International Conference on Computing Frontiers (CF '07). Association for Computing Machinery, New York, NY, USA, 183--194. Google ScholarGoogle ScholarDigital LibraryDigital Library 30. M. R. Siavash Katebzadeh, Paolo Costa, and Boris Grot. 2020. Evaluation of an InfiniBand Switch: Choose Latency or Bandwidth, but Not Both. In 2020 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 180--191. Google ScholarGoogle ScholarCross RefCross Ref 31. Mehrdad Khani, Manya Ghobadi, Mohammad Alizadeh, Ziyi Zhu, Madeleine Glick, Keren Bergman, Amin Vahdat, Benjamin Klenk, and Eiman Ebrahimi. 2021. SiP-ML: High-Bandwidth Optical Network Interconnects for Machine Learning Training. In Proceedings of the 2021 ACM SIGCOMM 2021 Conference (SIGCOMM '21). Association for Computing Machinery, New York, NY, USA, 657--675. Google ScholarGoogle ScholarDigital LibraryDigital Library 32. Ang Li, Shuaiwen Leon Song, Jieyang Chen, Jiajia Li, Xu Liu, Nathan R. Tallent, and Kevin J. Barker. 2020. Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect. IEEE Transactions on Parallel and Distributed Systems 31, 1 (2020), 94--110. Google ScholarGoogle ScholarCross RefCross Ref 33. Sheng Li, Garrett Andersen, Tao Chen, Liqun Cheng, Julian Grady, Da Huang, Quoc V. Le, Andrew Li, Xin Li, Yang Li, Chen Liang, Yifeng Lu, Yun Ni, Ruoming Pang, Mingxing Tan, Martin Wicke, Gang Wu, Shengqi Zhu, Parthasarathy Ranganathan, and Norman P. Jouppi. 2023. Hyperscale Hardware Optimized Neural Architecture Search. In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3 (ASPLOS 2023). Association for Computing Machinery, New York, NY, USA, 343--358. Google ScholarGoogle ScholarDigital LibraryDigital Library 34. Hong Liu. 2021. 200G per lane for 800G and beyond. In Workshop in 2021 Optical Fiber Communications Conference and Exhibition (OFC).Google ScholarGoogle Scholar 35. Hong Liu, Cedric F. Lam, and Chris Johnson. 2010. Scaling Optical Interconnects in Datacenter Networks Opportunities and Challenges for WDM. In 2010 18th IEEE Symposium on High Performance Interconnects. 113--116. Google ScholarGoogle ScholarDigital LibraryDigital Library 36. Hong Liu, Ryohei Urata, and Amin Vahdat. 2012. Optical Interconnects for Scale-out Data Centers. In Optical Interconnects for Future Datacenter Networks. Springer, New York, Chapter 2, 17--31.Google ScholarGoogle Scholar 37. William M. Mellette, Rob McGuinness, Arjun Roy, Alex Forencich, George Papen, Alex C. Snoeren, and George Porter. 2017. RotorNet: A Scalable, Low-Complexity, Optical Datacenter Network. In Proceedings of the Conference of the ACM Special Interest Group on Data Communication (SIGCOMM '17). Association for Computing Machinery, New York, NY, USA, 267--280. Google ScholarGoogle ScholarDigital LibraryDigital Library 38. Cyriel Minkenberg, German Rodriguez, Bogdan Prisacari, Laurent Schares, Philip Heidelberger, Dong Chen, and Craig Stunkel. 2015. Large-scale system partitioning using OCS. In 2015 International Conference on Photonics in Switching (PS). 235--237. Google ScholarGoogle ScholarCross RefCross Ref 39. Cyriel Minkenberg, German Rodriguez, Bogdan Prisacari, Laurent Schares, Philip Heidelberger, Dong Chen, and Craig Stunkel. 2016. Performance Benefits of Optical Circuit Switches for Large-Scale Dragonfly Networks, In Optical Fiber Communication Conference. Optical Fiber Communication Conference, W3J.3. Google Scholar Google ScholarCross RefCross Ref 40. B. Mukherjee, I. Tomkos, M. Tornatore, P. Winzer, and Y. Zhao. 2020. Springer Handbook of Optical Networks. Springer International Publishing. https://books.google.com/books?id= EisDEAAAQBAJGoogle ScholarGoogle Scholar 41. Deepak Narayanan, Aaron Harlap, Amar Phanishayee, Vivek Seshadri, Nikhil R. Devanur, Gregory R. Ganger, Phillip B. Gibbons, and Matei Zaharia. 2019. PipeDream: Generalized Pipeline Parallelism for DNN Training. In Proceedings of the 27th ACM Symposium on Operating Systems Principles (SOSP '19). Association for Computing Machinery, New York, NY, USA, 1--15. Google Scholar Google ScholarDigital LibraryDigital Library 42. OSFP-XD MSA. 2017. https://osfpmsa.org/specification.html, Last accessed on 2023-2-3. (2017).Google ScholarGoogle Scholar 43. G.C. Papen and R.E. Blahut. 2019. Lightwave communications. Cambridge University Press.Google ScholarGoogle Scholar 44. Lenin Patra, Arash Farhood, Rajesh Radhamohan, Will Bliss, Sridhar Ramesh, and Dave Cassan. 2023. FEC baseline proposal for 200Gb/s per Lane IM-DD Optical PMDs. https://www.ieee802.org/3/dj /public. (2023). Last accessed June 2023.Google ScholarGoogle Scholar 45. David A. Patterson, Joseph Gonzalez, Quoc V. Le, Chen Liang, Lluis-Miquel Munguia, Daniel Rothchild, David R. So, Maud Texier, and Jeff Dean. 2021. Carbon Emissions and Large Neural Network Training. abs/2104.10350 (2021). arXiv:2104.10350 https:// arxiv.org/abs/2104.10350Google ScholarGoogle Scholar 46. Huber+Suhner Polatis. 2023. polatis.com, Last accessed on 2023-6-30. (2023).Google ScholarGoogle Scholar 47. Leon Poutievski, Omid Mashayekhi, Joon Ong, Arjun Singh, Mukarram Tariq, Rui Wang, Jianan Zhang, Virginia Beauregard, Patrick Conner, Steve Gribble, Rishi Kapoor, Stephen Kratzer, Nanfang Li, Hong Liu, Karthik Nagaraj, Jason Ornstein, Samir Sawhney, Ryohei Urata, Lorenzo Vicisano, Kevin Yasumura, Shidong Zhang, Junlan Zhou, and Amin Vahdat. 2022. Jupiter Evolving: Transforming Google's Datacenter Network via Optical Circuit Switches and Software-Defined Networking. In Proceedings of the ACM SIGCOMM 2022 Conference (SIGCOMM '22). Association for Computing Machinery, New York, NY, USA, 66--85. Google ScholarGoogle ScholarDigital LibraryDigital Library 48. R. Ryf, J. Kim, J.P. Hickey, A. Gnauck, D. Carr, F. Pardo, C. Bolle, R. Frahm, N. Basavanhally, C. Yoh, D. Ramsey, R. Boie, R. George, J. Kraus, C. Lichtenwalner, R. Papazian, J. Gates, H.R. Shea, A. Gasparyan, V. Muratov, J.E. Griffith, J.A. Prybyla, S. Goyal, C.D. White, M.T. Lin, R. Ruel, C. Nijander, S. Arney, D.T. Neilson, D.J. Bishop, P. Kolodner, S. Pau, C. Nuzman, A. Weis, B. Kumar, D. Lieuwen, V. Aksyuk, D.S. Greywall, T.C. Lee, H.T. Soh, W.M. Mansfield, S. Jin, W.Y. Lai, H.A. Huggins, D.L. Barr, R.A. Cirelli, G.R. Bogart, K. Teffeau, R. Vella, H. Mavoori, A. Ramirez, N.A. Ciampa, F.P. Klemens, M.D. Morris, T. Boone, J.Q. Liu, J.M. Rosamilia, and C.R. Giles. 2001. 1296-port MEMS transparent optical crossconnect with 2.07 petabit/s switch capacity. In OFC 2001. Optical Fiber Communication Conference and Exhibit. Technical Digest Postconference Edition (IEEE Cat. 01CH37171), Vol. 4. PD28--PD28.Google ScholarGoogle Scholar 49. Amit Sabne. 2020. XLA : Compiling Machine Learning for Peak Performance. (2020).Google ScholarGoogle Scholar 50. L. Schares, X. J. Zhang, R. Wagle, D. Rajan, P. Selo, S. P. Chang, J. Giles, K. Hildrum, D. Kuchta, J. Wolf, and E. Schenfeld. 2009. A reconfigurable interconnect fabric with optical circuit switch and software optimizer for stream computing systems, In Optical Fiber Communication Conference and National Fiber Optic Engineers Conference. Optical Fiber Communication Conference and National Fiber Optic Engineers Conference, OTuA1. Google ScholarGoogle ScholarCross RefCross Ref 51. Tae Joon Seok, Jianheng Luo, Zhilei Huang, Kyungmok Kwon, Johannes Henriksson, John Jacobs, Lane Ochikubo, Richard S. Muller, and Ming C. Wu. 2019. Silicon photonic wavelength cross-connect with integrated MEMS switching. APL Photonics 4, 10 (2019), 100803. Google ScholarGoogle ScholarCross RefCross Ref 52. Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper, and Bryan Catanzaro. 2020. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism. https://arxiv.org/abs/1909.08053. (2020).Google ScholarGoogle Scholar 53. Arjun Singh, Joon Ong, Amit Agarwal, Glen Anderson, Ashby Armistead, Roy Bannon, Seb Boving, Gaurav Desai, Bob Felderman, Paulie Germano, Anand Kanagala, Jeff Provost, Jason Simmons, Eiichi Tanda, Jim Wanderer, Urs Holzle, Stephen Stuart, and Amin Vahdat. 2015. Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google's Datacenter Network (SIGCOMM '15). Association for Computing Machinery, New York, NY, USA, 183--197. Google ScholarGoogle ScholarDigital LibraryDigital Library 54. Ankit Singla, Atul Singh, Kishore Ramachandran, Lei Xu, and Yueping Zhang. 2010. Proteus: A Topology Malleable Data Center Network. In Proceedings of the 9th ACM SIGCOMM Workshop on Hot Topics in Networks (Hotnets-IX). Association for Computing Machinery, New York, NY, USA, Article 8, 6 pages. Google Scholar Google ScholarDigital LibraryDigital Library 55. Shunichi Sohma, Toshio Watanabe, Tomohiro Shibata, and Hiroshi Takahashi. 2005. Compact and Low Power Consumption 16 x 16 Optical Matrix Switch with Silica-Based PLC Technology, In Optical Fiber Communication Conference and Exposition and The National Fiber Optic Engineers Conference. Optical Fiber Communication Conference and Exposition and The National Fiber Optic Engineers Conference, OThV4. https://opg.optica.org/ abstract.cfm?URI=OFC-2005-OThV4Google ScholarGoogle Scholar 56. Min Yee Teh, Shizhen Zhao, Peirui Cao, and Keren Bergman. 2020. COUDER: Robust Topology Engineering for Optical Circuit Switched Data Center Networks. (2020). Google ScholarGoogle ScholarCross RefCross Ref 57. Romal Thoppilan, Daniel De Freitas, Jamie Hall, Noam Shazeer, Apoorv Kulshreshtha, Heng-Tze Cheng, Alicia Jin, Taylor Bos, Leslie Baker, Yu Du, YaGuang Li, Hongrae Lee, Huaixiu Steven Zheng, Amin Ghafouri, Marcelo Menegali, Yanping Huang, Maxim Krikun, Dmitry Lepikhin, James Qin, Dehao Chen, Yuanzhong Xu, Zhifeng Chen, Adam Roberts, Maarten Bosma, Vincent Zhao, Yanqi Zhou, Chung-Ching Chang, Igor Krivokon, Will Rusch, Marc Pickett, Pranesh Srinivasan, Laichee Man, Kathleen Meier-Hellstern, Meredith Ringel Morris, Tulsee Doshi, Renelito Delos Santos, Toju Duke, Johnny Soraker, Ben Zevenbergen, Vinodkumar Prabhakaran, Mark Diaz, Ben Hutchinson, Kristen Olson, Alejandra Molina, Erin Hoffman-John, Josh Lee, Lora Aroyo, Ravi Rajakumar, Alena Butryna, Matthew Lamm, Viktoriya Kuzmina, Joe Fenton, Aaron Cohen, Rachel Bernstein, Ray Kurzweil, Blaise Aguera-Arcas, Claire Cui, Marian Croak, Ed Chi, and Quoc Le. 2022. LaMDA: Language Models for Dialog Applications. (2022). https:// arxiv.org/abs/2201.08239Google ScholarGoogle Scholar 58. Ryohei Urata and Hong Liu. 2016. Datacenter interconnect and networking: Present state to future challenges.. In IEEE Optical Interconnects Conference.Google ScholarGoogle Scholar 59. Ryohei Urata, Hong Liu, Kevin Yasumura, Erji Mao, Jill Berger, Xiang Zhou, Cedric Lam, Roy Bannon, Darren Hutchinson, Daniel Nelson, Leon Poutievski, Arjun Singh, Joon Ong, and Amin Vahdat. 2022. Mission Apollo: Landing Optical Circuit Switching at Datacenter Scale. (2022). https://arxiv.org/abs/2208.10041Google ScholarGoogle Scholar 60. Ryohei Urata, Hong Liu, Xiang Zhou, and Amin Vahdat. 2017. Datacenter Interconnect and Networking: from Evolution to Holistic Revolution, In Optical Fiber Communication Conference. Optical Fiber Communication Conference, W3G.1. Google Scholar Google ScholarCross RefCross Ref 61. Amin Vahdat, Hong Liu, Xiaoxue Zhao, and Chris Johnson. 2011. The Emerging Optical Data Center, In Optical Fiber Communication Conference/National Fiber Optic Engineers Conference 2011. Optical Fiber Communication Conference/National Fiber Optic Engineers Conference 2011, OTuH2. Google ScholarGoogle Scholar Cross RefCross Ref 62. Guohui Wang, David G. Andersen, Michael Kaminsky, Konstantina Papagiannaki, T.S. Eugene Ng, Michael Kozuch, and Michael Ryan. 2010. C-Through: Part-Time Optics in Data Centers. In Proceedings of the ACM SIGCOMM 2010 Conference (SIGCOMM '10). Association for Computing Machinery, New York, NY, USA, 327--338. Google Scholar Google ScholarDigital LibraryDigital Library 63. Weiyang Wang, Moein Khazraee, Zhizhen Zhong, Manya Ghobadi, Zhihao Jia, Dheevatsa Mudigere, Ying Zhang, and Anthony Kewitsch. 2023. TopoOpt: Co-optimizing Network Topology and Parallelization Strategy for Distributed Training Jobs. In 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23). USENIX Association, Boston, MA, 739--767. https://www.usenix.org/ conference/nsdi23/presentation/wang-weiyangGoogle ScholarGoogle Scholar 64. Ming C. Wu, Olav Solgaard, and Joseph E. Ford. 2006. Optical MEMS for Lightwave Communication. Journal of Lightwave Technology 24, 12 (Dec 2006), 4433--4454. https://opg.optica.org/jlt/ abstract.cfm?URI=jlt-24-12-4433Google ScholarGoogle ScholarCross RefCross Ref 65. S. J. Ben Yoo. 2006. Optical Packet and Burst Switching Technologies for the Future Photonic Internet. Journal of Lightwave Technology 24, 12 (2006), 4468--4492. Google Scholar Google ScholarCross RefCross Ref 66. X. Zhou, R. Urata, E. Mao, H. Liu, and C. L. Johnson. 2016. In-band optical interference mitigation methods for direct detection optical communication systems. (2016). https:// patents.google.com/patent/US10084547B2/en US Patent 10084547B2. Google ScholarGoogle Scholar Cited By View all [loader] Index Terms 1. Lightwave Fabrics: At-Scale Optical Circuit Switching for Datacenter and Machine Learning Systems 1. Hardware 1. Communication hardware, interfaces and storage 1. Networking hardware 2. Networks 1. Network architectures 1. Network design principles Recommendations * Multi-wavelength Optical Switch Fabric for Next-Generation Optical Switches EMERGING '09: Proceedings of the 2009 First International Conference on Emerging Network Intelligence Next-generation switches and routers in optical networks will rely on optical switch fabrics (OSFs) to overcome the problem of the unnecessary optical-electrical-optical conversions and signal processing. In this paper, we propose a new architecture of ... Read More * 80Gb/s multi-wavelength optical packet switching using PLZT switch ONDM'07: Proceedings of the 11th international IFIP TC6 conference on Optical network design and modeling This paper proposes 80Gb/s multi-wavelength optical packet switching(OPS) using a PLZT switch. The Multi-wavelength OPS Network can achieve low implementation costs compared to existing OPS networks in which the number of wavelengths is large. In this ... Read More * A high contrast ratio optical switch with holographic optical switching elements EHAC'05: Proceedings of the 4th WSEAS International Conference on Electronics, Hardware, Wireless and Optical Communications A 2x2 high contrast ratio optical switch composed of holographic optical switching elements is presented. This switch consists of two electro-optic halfwave plates, four layers of holographic gratings, two dielectric substrates, and one spacer. The ... Read More Comments Please enable JavaScript to view thecomments powered by Disqus. Login options Check if you have access through your login credentials or your institution to get full access on this article. Sign in Full Access Get this Publication * Information * Contributors * Published in cover image ACM Conferences ACM SIGCOMM '23: Proceedings of the ACM SIGCOMM 2023 Conference September 2023 1217 pages ISBN:9798400702365 DOI:10.1145/3603269 + Chairs: + #Henning Schulzrinne, + #Vishal Misra, + Program Chairs: + #Eddie Kohler, + #David Maltz Copyright (c) 2023 Owner/Author(s) Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). Sponsors In-Cooperation Publisher Association for Computing Machinery New York, NY, United States Publication History + Published: 1 September 2023 Check for updates Check for updates on crossmark Author Tags + data center networks + optical circuit switches + machine learning Qualifiers + research-article Conference Funding Sources * [loader-7e6] Other Metrics View Article Metrics * Bibliometrics * Citations0 * Article Metrics + 0 Total Citations View Citations + 1,896 Total Downloads + Downloads (Last 12 months)1,896 + Downloads (Last 6 weeks)1,896 Other Metrics View Author Metrics * Cited By This publication has not been cited yet PDF Format View or Download as a PDF file. PDF eReader View online with eReader. eReader Digital Edition View this article in digital edition. View Digital Edition * Figures * Other * * Share this Publication link https://dl.acm.org/doi/10.1145/3603269.3604836 Copy Link Share on Social Media Share on * * * * * * * * * 0References * * * Close Figure Viewer Browse AllReturnChange zoom level[ ] Caption View Table of Contents Export Citations Select Citation format[BibTeX ] * Please download or close your previous search result export first before starting a new bulk export. Preview is not available. By clicking download,a status dialog will open to start the export process. The process may takea few minutes but once it finishes a file will be downloadable from your browser. You may continue to browse the DL while the export process is in progress. Download + Download citation + Copy citation Categories * Journals * Magazines * Books * Proceedings * SIGs * Conferences * Collections * People About * About ACM Digital Library * ACM Digital Library Board * Subscription Information * Author Guidelines * Using ACM Digital Library * All Holdings within the ACM Digital Library * ACM Computing Classification System * Digital Library Accessibility Join * Join ACM * Join SIGs * Subscribe to Publications * Institutions and Libraries Connect * Contact * Facebook * Twitter * Linkedin * Feedback * Bug Report The ACM Digital Library is published by the Association for Computing Machinery. Copyright (c) 2023 ACM, Inc. * Terms of Usage * Privacy Policy * Code of Ethics ACM Digital Library home ACM home Your Search Results Download Request We are preparing your search results for download ... We will inform you here when the file is ready. Download now! Your Search Results Download Request Your file of search results citations is now ready. Download now! Your Search Results Download Request Your search export query has expired. Please try again.