logo
  • menu
  • Markets
  • ETFs
  • Live
  • Spot
  • Futures
  • Learn
  • Sign In
  • Sign Up
  • Downloads
  • English
  • |
  • USD
  • |
Sign Up
Crypto PricesLearnLatest NewsDownloadsMarketsSpotAnnouncements
Home/
Latest News/
Industry

Meta's new Megabyte system solves one of GPT's biggest hurdles

By Martha Grizzard
Jun 5, 2023
3.8 
★
★
★
★
★
★
★
★
★
★
 322 User Rating
Share

Meta AI recently published preprint research demonstrating a new "megabyte" framework for building generative pre-trained transformer (GPT) systems. Called "promising" by OpenAI's Andrej Karpathy, a former head of artificial intelligence at Tesla, the new architecture is designed to process large amounts of data -- such as images, novels and video files -- without using a process called tokenization.

Tokenization is a lossy process comparable to file compression. To handle large amounts of data, GPT models convert bytes into tokens. The tokens are then processed by the converter and used to generate output tokens, which are then decoded. The process of tokenization allows AI systems to process larger strings of data as numbers. For example, if the sentence "My favorite color is red" is processed by OpenAI's ChatGPT, it will be converted into token strings "3666, 4004, 3124, 318, 2266, 13" for processing. Unfortunately, even with tokenization, there are still hard limits to the amount of data that current state-of-the-art systems can process. For GPT-3.5, the limit is a little over 4,000 tokens or about 3,000 words, while The maximum for GPT-4 is about 32,000 tokens or about 24,000 words.

Meta's new Megabyte system forgoes tokenization in favor of a novel multi-layer predictive architecture capable of end-to-end modeling over 1 million bytes of data.

Most standard English language encoding systems use a standard 8-bit encoding. In this example, each character occupies one byte of data. Thus, an AI system that can process 1 million bytes of data without tokenization can process a text document containing 750,000 words— a 3,025% increase over GPT-4. By comparison, GPT-4 can currently handle about 10 long-form news articles in a single prompt, and Megabyte will be able to parse the entirety of Leo Tolstoy's War and Peace, as well as two other novels of moderate length. Meta's Megabyte model also performs well on ImageNet tests and benchmarks related to processing audio files, equaling or exceeding existing byte-based converter models such as DeepMind's Perciever AR in both:

"Megabyte matches the state-of-the-art performance of PerceiverAR while using half the computation." The implications of this research could be far-reaching. Tokenization is considered an obstacle in this field due to its hard data constraints and the energy and time required to train the system. Without tokenization, it should be possible to train AI models with stronger underlying support for languages ​​other than English, especially those that cannot easily be encoded in standard 8-bit characters. This could lead to further demo cratization of these technologies and enable everything from cryptocurrency trading bots to decentralized autonomous organization technology to be built in local language code anywhere in the world.

It can also improve the ability of models like ChatGPT to process images, video, and audio files by generating multimedia clips using roughly the same time and energy consumption as text.

Disclaimer: The information on this page may have been obtained from third parties and does not necessarily reflect the views or opinions of BitKan. This content is provided for general informational purposes only, without any representation or warranty of any kind, nor shall it be construed as financial or investment advice. BitKan shall not be liable for any errors or omissions, or for any outcomes resulting from the use of this information. Investments in digital assets can be risky. Please carefully evaluate the risks of a product and your risk tolerance based on your own financial circumstances. Products mentioned in this article may not be available in your region.

Related News

  • Microsoft cancels Bing waitlist, lets users use GPT-4 for free

    Microsoft cancels Bing waitlist, lets users use GPT-4 for free

    Microsoft recently announced a host of new artificial intelligence (AI) features for its Bing chatbot and Edge web browser. Chief among these changes is that Bing users now have full access to the GPT-4 model the same underlying engine that powers ChatGPT's “Plus” subscription service.
    Jerry McNeill
    Feb 7, 2025
  • Apple's Private GPT AI: No Public Release Yet

    Apple's Private GPT AI: No Public Release Yet

    According to recent reports, Apple is said to be developing its own Generative Pre-Trained Transformer (GPT) artificial intelligence (AI) model, internally known as "Ajax."
    Jerry McNeill
    Dec 27, 2024
  • Aging ChatGPT: Unveiling the Study Results on Its Abilities

    Aging ChatGPT: Unveiling the Study Results on Its Abilities

    Researchers at Stanford University and the University of California, Berkeley conducted a study on OpenAI's AI chatbot, ChatGPT,
    Craig Green
    Dec 27, 2024

Latest News

Industry

Cryptocurrency

Airdrop

Markets

  • VerifiedX Launches Bitcoin Sidechain for Native DeFi Privacy

    VerifiedX Launches Bitcoin Sidechain for Native DeFi Privacy

    VerifiedX has officially introduced a decentralized "reliever chain" designed to bring programmable, privacy-preserving functionality to the Bitcoin network.
    Martha Grizzard
    May 18, 2026
  • Japan’s SBI and Rakuten Plan Crypto Trusts as Rules Finalize

    Japan’s SBI and Rakuten Plan Crypto Trusts as Rules Finalize

    SBI Securities and Rakuten Securities have officially announced plans to introduce cryptocurrency investment trusts to their massive retail user bases.
    Craig Green
    May 18, 2026
  • Senate Advances CLARITY Act: A New Era for U.S. Crypto Oversight

    Senate Advances CLARITY Act: A New Era for U.S. Crypto Oversight

    The Senate Banking Committee advanced the CLARITY Act on May 14, 2026 to establish a comprehensive federal framework for the digital asset industry.
    May 15, 2026
  • TRC20-USDT Circulation Soars to 89.3 Billion Record on TRON

    TRC20-USDT Circulation Soars to 89.3 Billion Record on TRON

    The circulation of TRC20-USDT has officially ascended to a historic peak of 89.3 billion tokens, fundamentally expanding the liquidity threshold of the decentralized financial landscape.
    Hallie Gill
    May 12, 2026
  • 21Shares Debuts First Canton Network ETF (TCAN) on Nasdaq

    21Shares Debuts First Canton Network ETF (TCAN) on Nasdaq

    The TCAN ETF provides the first U.S.-listed gateway to Canton Coin (CC), the native utility token of the Canton Network.
    Martha Grizzard
    May 8, 2026
View more data 
BTCBTC(BTC)
$0
--(Last 24h)
SpotFutures

Top

View more
  1. 1S&P 500 Reclaims 200-Day Moving Average, Bitcoin Gains
  2. 2Trump Softens His Stance on Reciprocal Tariffs, US Stocks and Crypto Markets Rise
  3. 3Vitalik Buterin : The current price of ETH has not been affected by the merger event
  4. 4Vibhu Norby : Solana Spaces store to bring 100K people to Solana per month
  5. 5CZ: compared with the record high nine months ago, the current situation of the industry is much better

Top Gainers

View more
Opinion
OpinionOPN

$0.2107

+75.00%
Epic Chain
Epic ChainEPIC

$0.5950

+26.60%
Siren
SirenSIREN

$0.7280

+25.26%
Audiera
AudieraBEAT

$1.3922

+12.86%
Telcoin
TelcoinTEL

$0.002569

+8.67%

Top Trending

View more
Opinion
OpinionOPN

$0.2112

+75.42%
Filecoin
FilecoinFIL

$0.8340

-6.92%
Plasma
PlasmaXPL

$0.0823

-9.26%
Litecoin
LitecoinLTC

$45.2000

-6.11%
OFFICIAL TRUMP
OFFICIAL TRUMPTRUMP

$1.7650

-11.35%

Recently added

View more
Kinetiq
KinetiqKNTQ

$0.2164

-5.91%
Citrea
CitreaCTR

$0.0162

-9.36%
Solstice
SolsticeSLX

$0.2403

-30.17%
Nexus
NexusNEX

$0.00000280

-14.86%
Zest Protocol
Zest ProtocolZEST

$0.1279

-13.94%

Learn

View more
  1. 1What is Bitwise Hyperliquid ETF? How Does BHYP Work?
  2. 2What is PaperTrade on HyperEVM? Is Zero Funding Real?
  3. 3What Is Circle Arc? How Does the New USDC Blockchain Work?
  4. 4What Is Circle Arc Whitepaper? How to Join Circle Arc Testnet?
  5. 5Is the Bear Market Over? Decoding Bitcoin On-Chain Data
About Us
  • About BitKan
  • Contact Us
  • Announcements
  • VIP Program
  • BitKan Ambassador
  • Institutional Services
Products
  • Spot
  • Futures
  • Crypto Prices
  • Learn
  • News
  • Markets
  • How to Buy Crypto
  • BTC to USD Calculator
  • Reward
Help
  • Help Center
  • Email Us
  • Live Chat
  • Download APP
  • Listing Application
  • Buy Bitcoin
  • Buy Ethereum
  • Buy Dogecoin
  • Buy Altcoins
Terms
  • Terms of Use
  • Privacy Policy
  • Trading Rules
  • Fee
K-Site
English
About Us
+
  • About BitKan
  • Contact Us
  • Announcements
  • VIP Program
  • BitKan Ambassador
  • Institutional Services
Products
+
  • Spot
  • Futures
  • Crypto Prices
  • Learn
  • News
  • Markets
  • How to Buy Crypto
  • BTC to USD Calculator
  • Reward
Help
+
  • Help Center
  • Email Us
  • Live Chat
  • Download APP
  • Listing Application
  • Buy Bitcoin
  • Buy Ethereum
  • Buy Dogecoin
  • Buy Altcoins
Terms
+
  • Terms of Use
  • Privacy Policy
  • Trading Rules
  • Fee
K-Site
+
  • Twitter
  • Facebook
  • Telegram
  • YouTube
  • Instagram
  • Medium
  • Linkedin
@2012-2026 BITKAN.com