MPEG-1 psychoacoustic model emulation using multiscale convolutional neural networks
Average rating
Cast your vote
You can rate an item by clicking the amount of stars they wish to award to this item.
When enough users have cast their vote on this item, the average rating will also be shown.
Star rating
Your vote was cast
Thank you for your feedback
Thank you for your feedback
Issue Date
2023-01-01
Metadata
Show full item recordPublisher
SpringerJournal
Multimedia Tools and ApplicationsDOI
10.1007/s11042-023-15949-yAdditional Links
https://link.springer.com/article/10.1007/s11042-023-15949-yAbstract
The Moving Picture Experts Group - 1 (MPEG-1) perceptual audio compression scheme is a successful family of audio codecs described in standard ISO/IEC 11172–3. Currently, there is no general framework to emulate nor MPEG-1 neither any other psychoacoustic model, which is a core piece of many perceptual codecs. This work presents a successful implementation of a convolutional neural network which emulates psychoacoustic model 1 from the MPEG-1 standard, termed “MCNN-PM” (Multiscale Convolutional Neural Network – Psychoacoustic Model). It is then implemented as part of the MPEG-1, Layer I codec. Using the objective difference grade (ODG) to evaluate audio quality, the MCNN-PM MPEG-1, Layer I codec outperforms the original MPEG-1, Layer I codec by up to 17% at 96 kbps, 14% at 128 kbps and performs almost equally at 192 kbps. This work shows that convolutional neural networks are a viable alternative to standard psychoacoustic models and can be used as part of perceptual audio codecs successfully.Type
info:eu-repo/semantics/articleRights
info:eu-repo/semantics/embargoedAccessLanguage
engISSN
13807501EISSN
15737721ae974a485f413a2113503eed53cd6c53
10.1007/s11042-023-15949-y
Scopus Count
Collections