https://www.marktechpost.com/2022/04/09/check-out-this-deepminds-new-language-model-chinchilla-70b-parameters-which-significantly-outperforms-gopher-280b-and-gpt-3-175b-on-a-large-range-of-downstream-evaluation-tasks/ Sign in Join * Machine Learning * Deep Learning * Other AI News + Computer Vision + Federated Learning + Reinforcement Learning + Natural Language Processing + AutoML * AI Startups * Interview * Free AI Courses + Free Introduction To Machine Learning With Python Course + Free Python For Machine Learning (ML) Course + Free Maths For ML Course + Free AI Intro Course + AI Paper Summary * About Us + About Us + Advisory Board Members * Ads/Content Service Sign in Welcome!Log into your account [ ]your username [ ]your password [LOG IN] Forgot your password? Create an account Privacy & TC Sign up Welcome!Register for an account [ ]your email [ ]your username [REGISTER] A password will be e-mailed to you. Privacy & TC Password recovery Recover your password [ ]your email [Send My Pass] Search [ ] LogoNewsWeekPRO LogoNewsWeekPRO Menu [ ]Search Search Logo My account Get into your account. LoginRegister Subscribe * Machine Learning * Deep Learning * Other AI News + Computer Vision + Federated Learning + Reinforcement Learning + Natural Language Processing + AutoML * AI Startups * Interview * Free AI Courses + Free Introduction To Machine Learning With Python Course + Free Python For Machine Learning (ML) Course + Free Maths For ML Course + Free AI Intro Course + AI Paper Summary * About Us + About Us + Advisory Board Members * Ads/Content Service Logo * Machine Learning * Deep Learning * Other AI News + Computer Vision + Federated Learning + Reinforcement Learning + Natural Language Processing + AutoML * AI Startups * Interview * Free AI Courses + Free Introduction To Machine Learning With Python Course + Free Python For Machine Learning (ML) Course + Free Maths For ML Course + Free AI Intro Course + AI Paper Summary * About Us + About Us + Advisory Board Members * Ads/Content Service Home Tech News AI Paper Summary Check Out This DeepMind's New Language Model, Chinchilla (70B Parameters), Which Significantly... Check Out This DeepMind's New Language Model, Chinchilla (70B Parameters), Which Significantly Outperforms Gopher (280B) and GPT-3 (175B) on a Large Range of Downstream Evaluation Tasks By G Chaithali - April 9, 2022 Facebook Twitter Pinterest WhatsApp [Screen-Shot-2022-04-09-at-9]Source: https://arxiv.org/pdf/ 2203.15556.pdf This research summary is based on the paper 'Training Compute-Optimal Large Language Models' Please don't forget to join our ML Subreddit Extreme-scale language models have recently exhibited incredible performance on natural language processing challenges. This is due to their ever-increasing size, exceeding 500 billion parameters. However, while these models have grown in popularity in recent years, the amount of data utilized to train them has not increased. The current generation of huge language models is clearly undertrained. Three prediction approaches for optimally choosing both model size and training length have been proposed by a DeepMind research team. The trade-off between model size and the number of training tokens: Three approaches have been mentioned to estimate the optimal parameter: * Change the size of the models and the number of training tokens. * IsoFLOP profiles * Using a parametric loss function to fit a model The ultimate pretraining loss is calculated as the number of model parameters and training tokens. They minimize the loss function under the restriction of the FLOPs function, which is equal to the computational budget because the computational budget is a probabilistic function of the number of observed training tokens and model parameters. The researchers altered the number of training steps for a fixed family of models, training each model using four distinct training sequences. They can immediately estimate the most negligible loss for a certain number of training FLOPs. The amount of training tokens is adjusted while the model sizes are fixed. In the meantime, the IsoFLOP profiles method changes the model size for a predefined set of nine possible training FLOP counts. It takes the final training loss into account for each point. All final losses from Approach 1 & 2 tests are modeled as a parameterized relation of input parameter count and the number of viewed tokens. They provide a functional form for capturing the loss of an ideal generative process on the data distribution and show that a wholly trained transformer underperforms the idealized productive strategy and is not taught to convergence. [Screen-Shot-2022-04-09-at-9]Source: https://arxiv.org/pdf/ 2203.15556.pdf Following the methods outlined above, the suggested 70B Chinchilla outperforms Gopher (280B), GPT-3 (175B), Jurassic-1 (178B), and Megatron-Turing NLG consistently and significantly (530B). The researchers also discovered that, despite employing various fitting procedures and trained models, these three approaches produce comparable predictions for optimal parameter and token scaling with FLOPs. Overall, this research contributes to developing an effective training paradigm for large auto-regressive language models with limited compute resources. It is standard practice to increase model size without matching the number of training tokens. However, the team recommends that the number of training tokens is twice for every model size doubling. This means that using larger, higher-quality training datasets can lead to better results on downstream tasks. Paper: https://arxiv.org/pdf/2203.15556.pdf For Advertisement or Content Creation Service, Please Contact Us at [email protected] or check out our ad page here Facebook Twitter Pinterest WhatsApp Previous articleThis California-based AI Startup is Developing Smaller and Faster Machine Learning Models to Bridge the Gap Between AI Applications and a Diverse Range of Devices Found on the Edge Next articleResearchers, Including Yann Lecun, Propose 'projUNN': An Efficient Method For Training Deep Neural Networks With Unitary Matrices [Photo-96x96] G Chaithali https://www.marktechpost.com/ Chaithali is a technical content writing consultant at MarktechPost. She is currently pursuing her B.Tech from the Indian Institute of Technology(IIT),Bhubaneswar. She is interested in the field of Data Analytics and has keen interest in exploring its applications in various domains. She is passionate about content writing and debating. RELATED ARTICLESMORE FROM AUTHOR [png] Researchers From KAIST (Korea) Propose 'DiffusionCLIP': A Novel Method That Performs Text-Driven Image Manipulation Using Diffusion Models [png] Baidu Researchers Propose PP-YOLOE Object Detector: an Evolved Version of YOLO Achieving SOTA Performance in Object Detection [png] UCSD and NVIDIA AI Researchers Propose 'CoordGAN': a Novel Disentangled GAN Mode That Produces Dense Correspondence Maps Represented by a Novel Coordinate Space Trending Researchers From KAIST (Korea) Propose 'DiffusionCLIP': A Novel Method That Performs Text-Driven Image Manipulation... Asif Razzaq - April 11, 2022 0 Baidu Researchers Propose PP-YOLOE Object Detector: an Evolved Version of YOLO Achieving SOTA Performance... Kriti Maloo - April 10, 2022 0 UCSD and NVIDIA AI Researchers Propose 'CoordGAN': a Novel Disentangled GAN Mode That Produces... Nitish Kumar - April 10, 2022 0 Google AI Researchers Propose a Meta-Algorithm, Jump Start Reinforcement Learning, That Uses Prior Policies... G Chaithali - April 10, 2022 0 Researchers, Including Yann Lecun, Propose 'projUNN': An Efficient Method For Training Deep Neural Networks... Tanushree Shenwai - April 9, 2022 0 This California-based AI Startup is Developing Smaller and Faster Machine Learning Models to Bridge... Mansi Rawat - April 9, 2022 0 OpenAI Introduces DALL-E 2: A New AI System That Can Create And Edit Realistic... Tanushree Shenwai - April 9, 2022 0 Latest AI Research at Amazon Improves Forecasting by Learning the Quantile Functions Nitish Kumar - April 8, 2022 0 Monitaur Announces The General Availability of 'GovernML' Platform to Bring Responsible AI/ML Systems into... Shruti - April 8, 2022 0 AI Paper Summary: Unsupervised Learning of Probably Symmetric Deformable 3D Objects from Images in the Wild [INS::INS] Logo * Advertisement About us Marktechpost is a California based AI News Platform providing easy-to-consume, byte size updates in machine learning, deep learning, and data science research Facebook Twitter Youtube Company * Download + AI Magazine * Privacy & TC * Cookie Policy * Industry News * AI PR Feeds * Advertisement * Content Service The latest Researchers From KAIST (Korea) Propose 'DiffusionCLIP': A Novel Method That Performs Text-Driven Image Manipulation Using Diffusion Models AI Paper Summary April 11, 2022 0 This research summary is based on the paper 'DiffusionCLIP:... Baidu Researchers Propose PP-YOLOE Object Detector: an Evolved Version of YOLO Achieving SOTA Performance in Object Detection AI Paper Summary April 10, 2022 0 This research summary is based on the paper 'PP-YOLOE:... UCSD and NVIDIA AI Researchers Propose 'CoordGAN': a Novel Disentangled GAN Mode That Produces Dense Correspondence Maps Represented by a Novel Coordinate Space AI Paper Summary April 10, 2022 0 This research summary is based on the paper 'CoordGAN:... (c) 2021 tagDiv. All Rights Reserved. Made with Newspaper Theme. Close this module [artificial] Join the AI conversation and receive daily AI updates Email[ ]Enter your email address Get Updates No thanks, I'm not interested! We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking "Accept", you consent to the use of ALL the cookies. Do not sell my personal information. Cookie settingsACCEPT Privacy & Cookies Policy Close Privacy Overview This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience. Necessary [*] Necessary Always Enabled Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information. Non Necessary [ ] non-necessary Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website. Analytics [ ] analytics Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc. Performance [ ] performance Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors. Uncategorized [ ] uncategorized Undefined cookies are those that are being analyzed and have not been classified into a category as yet. Functional [ ] functional Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features. Advertisement [ ] advertisement Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads. Save & Accept