Share to: share facebook share twitter share wa share telegram print page

AlexNet

AlexNet
Developer(s)Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton
Initial releaseJun 28, 2011
Repositorycode.google.com/archive/p/cuda-convnet/
Written inCUDA, C++
TypeConvolutional neural network
LicenseNew BSD License
AlexNet architecture and a possible modification. On the top is half of the original AlexNet (which is split into two halves, one per GPU). On the bottom is the same architecture but with the last "projection" layer replaced by another one that projects to fewer outputs. If one freezes the rest of the model and only finetune the last layer, one can obtain another vision model at cost much less than training one from scratch.
AlexNet block diagram

AlexNet is the name of a convolutional neural network (CNN) architecture, designed by Alex Krizhevsky in collaboration with Ilya Sutskever and Geoffrey Hinton, who was Krizhevsky's Ph.D. advisor at the University of Toronto.[when?] It had 60 million parameters and 650,000 neurons.[1]

The original paper's primary result was that the depth of the model was essential for its high performance, which was computationally expensive, but made feasible due to the utilization of graphics processing units (GPUs) during training.[1]

The three formed team SuperVision and submitted AlexNet in the ImageNet Large Scale Visual Recognition Challenge on September 30, 2012.[2] The network achieved a top-5 error of 15.3%, more than 10.8 percentage points better than that of the runner-up.

The architecture influenced a large number of subsequent work in deep learning, especially in applying neural networks to computer vision.

Architecture

AlexNet contains eight layers: the first five are convolutional layers, some of them followed by max-pooling layers, and the last three are fully connected layers. The network, except the last layer, is split into two copies, each run on one GPU.[1] The entire structure can be written as

(CNN → RN → MP)² → (CNN³ → MP) → (FC → DO)² → Linear → softmax

where

  • CNN = convolutional layer (with ReLU activation)
  • RN = local response normalization
  • MP = maxpooling
  • FC = fully connected layer (with ReLU activation)
  • Linear = fully connected layer (without activation)
  • DO = dropout

It used the non-saturating ReLU activation function, which trained better than tanh and sigmoid.[1]

Because the network did not fit onto a single Nvidia GTX580 3GB GPU, it was split into two halves, one on each GPU.[1]: Section 3.2

Training

The ImageNet training set had 1.2 million images. It was trained for 90 epochs, which took five to six days on two NVIDIA GTX 580 3GB GPUs,[1] which has a theoretical performance of 1.581 TFLOPS in float32 and release price 500 USD.[3]

It was trained with momentum gradient descent with a batch size of 128 examples, momentum of 0.9, and weight decay of 0.0005. Learning rate started at and was manually decreased 10-fold whenever validation error appeared to stop decreasing. It was reduced three times during training, ending at .

It used two forms of data augmentation, both computed on the fly on the CPU, thus "computationally free":

  • Extracting random 224 × 224 patches (and their horizontal reflections) from the original 256×256 images. This increases the size of the training set 2048-fold.
  • Randomly shifting the RGB value of each image along the three principal directions of the RGB values of its pixels.

It used local response normalization, and dropout regularization with drop probability 0.5.

All weights were initialized as gaussians with 0 mean and 0.01 standard deviation. Biases in convolutional layers 2, 4, 5, and all fully-connected layers, were initialized to constant 1 to avoid the dying ReLU problem.

History

Previous work

Comparison of the LeNet and AlexNet convolution, pooling, and dense layers
(AlexNet image size should be 227×227×3, instead of 224×224×3, so the math will come out right. The original paper said different numbers, but Andrej Karpathy, the former head of computer vision at Tesla, said it should be 227×227×3 (he said Alex didn't describe why he put 224×224×3). The next convolution should be 11×11 with stride 4: 55×55×96 (instead of 54×54×96). It would be calculated, for example, as: [(input width 227 - kernel width 11) / stride 4] + 1 = [(227 - 11) / 4] + 1 = 55. Since the kernel output is the same length as width, its area is 55×55.)

AlexNet is a convolutional neural network. In 1980, Kunihiko Fukushima proposed an early CNN named neocognitron.[4][5] It was trained by an unsupervised learning algorithm. The LeNet-5 (Yann LeCun et al., 1989)[6][7] was trained by supervised learning with backpropagation algorithm, with an architecture that is essentially the same as AlexNet on a small scale. (J. Weng, 1993) added max-pooling.[8][9]

During the 2000s, as GPU hardware improved, some researchers adapted these for general-purpose computing, including neural network training. (K. Chellapilla et al., 2006) trained a CNN on GPU that was 4 times faster than an equivalent CPU implementation.[10] A deep CNN of (Dan Cireșan et al., 2011) at IDSIA was 60 times faster than an equivalent CPU implementation.[11] Between May 15, 2011, and September 10, 2012, their CNN won four image competitions and achieved SOTA for multiple image databases.[12][13][14] According to the AlexNet paper,[1] Cireșan's earlier net is "somewhat similar." Both were written with CUDA to run on GPU.

Computer vision

During the 1990 -- 2010 period, neural networks and were not better than other machine learning methods like kernel regression, support vector machines, AdaBoost, structured estimation,[15] among others. For computer vision in particular, much progress came from manual feature engineering, such as SIFT features, SURF features, HoG features, bags of visual words, etc. It was a minority position in computer vision that features can be learned directly from data, a position which became dominant after AlexNet.[16]

In 2011, Geoffrey Hinton started reaching out to colleagues about "What do I have to do to convince you that neural networks are the future?", and Jitendra Malik, a sceptic of neural networks, recommended the PASCAL Visual Object Classes challenge. Hinton said its dataset was too small, so Malik recommended to him the ImageNet challenge.[17]

While AlexNet and LeNet share essentially the same design and algorithm, AlexNet is much larger than LeNet and was trained on a much larger dataset on much faster hardware. Over the period of 20 years, both data and compute became cheaply available.[16]

Subsequent work

AlexNet is highly influential, resulting in much subsequent work in using CNNs for computer vision and using GPUs to accelerate deep learning. As of mid 2024, the AlexNet paper has been cited over 157,000 times according to Google Scholar.[18]

At the time of publication, there was no framework available for GPU-based neural network training and inference. The codebase for AlexNet was released under a BSD license, and had been commonly used in neural network research for several subsequent years.[19][16]

In one direction, subsequent works aimed to train increasingly deep CNNs that achieve increasingly higher performance on ImageNet. In this line of research are GoogLeNet (2014), VGGNet (2014), Highway network (2015), and ResNet (2015). Another direction aimed to reproduce the performance of AlexNet at a lower cost. In this line of research are SqueezeNet (2016), MobileNet (2017), EfficientNet (2019).

References

  1. ^ a b c d e f g Krizhevsky, Alex; Sutskever, Ilya; Hinton, Geoffrey E. (2017-05-24). "ImageNet classification with deep convolutional neural networks" (PDF). Communications of the ACM. 60 (6): 84–90. doi:10.1145/3065386. ISSN 0001-0782. S2CID 195908774.
  2. ^ "ImageNet Large Scale Visual Recognition Competition 2012 (ILSVRC2012)". image-net.org.
  3. ^ "NVIDIA GeForce GTX 580 Specs". TechPowerUp. 2024-11-12. Retrieved 2024-11-12.
  4. ^ Fukushima, K. (2007). "Neocognitron". Scholarpedia. 2 (1): 1717. Bibcode:2007SchpJ...2.1717F. doi:10.4249/scholarpedia.1717.
  5. ^ Fukushima, Kunihiko (1980). "Neocognitron: A Self-organizing Neural Network Model for a Mechanism of Pattern Recognition Unaffected by Shift in Position" (PDF). Biological Cybernetics. 36 (4): 193–202. doi:10.1007/BF00344251. PMID 7370364. S2CID 206775608. Retrieved 16 November 2013.
  6. ^ LeCun, Y.; Boser, B.; Denker, J. S.; Henderson, D.; Howard, R. E.; Hubbard, W.; Jackel, L. D. (1989). "Backpropagation Applied to Handwritten Zip Code Recognition" (PDF). Neural Computation. 1 (4). MIT Press - Journals: 541–551. doi:10.1162/neco.1989.1.4.541. ISSN 0899-7667. OCLC 364746139.
  7. ^ LeCun, Yann; Léon Bottou; Yoshua Bengio; Patrick Haffner (1998). "Gradient-based learning applied to document recognition" (PDF). Proceedings of the IEEE. 86 (11): 2278–2324. CiteSeerX 10.1.1.32.9552. doi:10.1109/5.726791. S2CID 14542261. Retrieved October 7, 2016.
  8. ^ Weng, J; Ahuja, N; Huang, TS (1993). "Learning recognition and segmentation of 3-D objects from 2-D images". Proc. 4th International Conf. Computer Vision: 121–128.
  9. ^ Schmidhuber, Jürgen (2015). "Deep Learning". Scholarpedia. 10 (11): 1527–54. CiteSeerX 10.1.1.76.1541. doi:10.1162/neco.2006.18.7.1527. PMID 16764513. S2CID 2309950.
  10. ^ Kumar Chellapilla; Sidd Puri; Patrice Simard (2006). "High Performance Convolutional Neural Networks for Document Processing". In Lorette, Guy (ed.). Tenth International Workshop on Frontiers in Handwriting Recognition. Suvisoft.
  11. ^ Cireșan, Dan; Ueli Meier; Jonathan Masci; Luca M. Gambardella; Jurgen Schmidhuber (2011). "Flexible, High Performance Convolutional Neural Networks for Image Classification" (PDF). Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence-Volume Volume Two. 2: 1237–1242. Retrieved 17 November 2013.
  12. ^ "IJCNN 2011 Competition result table". OFFICIAL IJCNN2011 COMPETITION. 2010. Retrieved 2019-01-14.
  13. ^ Schmidhuber, Jürgen (17 March 2017). "History of computer vision contests won by deep CNNs on GPU". Retrieved 14 January 2019.
  14. ^ Cireșan, Dan; Meier, Ueli; Schmidhuber, Jürgen (June 2012). "Multi-column deep neural networks for image classification". 2012 IEEE Conference on Computer Vision and Pattern Recognition. New York, NY: Institute of Electrical and Electronics Engineers (IEEE). pp. 3642–3649. arXiv:1202.2745. CiteSeerX 10.1.1.300.3283. doi:10.1109/CVPR.2012.6248110. ISBN 978-1-4673-1226-4. OCLC 812295155. S2CID 2161592.
  15. ^ Taskar, Ben; Guestrin, Carlos; Koller, Daphne (2003). "Max-Margin Markov Networks". Advances in Neural Information Processing Systems. 16. MIT Press.
  16. ^ a b c Zhang, Aston; Lipton, Zachary; Li, Mu; Smola, Alexander J. (2024). "8.1. Deep Convolutional Neural Networks (AlexNet)". Dive into deep learning. Cambridge New York Port Melbourne New Delhi Singapore: Cambridge University Press. ISBN 978-1-009-38943-3.
  17. ^ Li, Fei Fei (2023). The worlds I see: curiosity, exploration, and discovery at the dawn of AI (First ed.). New York: Moment of Lift Books ; Flatiron Books. ISBN 978-1-250-89793-0.
  18. ^ AlexNet paper on Google Scholar
  19. ^ Krizhevsky, Alex (July 18, 2014). "cuda-convnet: High-performance C++/CUDA implementation of convolutional neural networks". Google Code Archive. Retrieved 2024-10-20.

Read other articles:

Курт Ройбер Сталинградская Мадонна. 1942 нем. Stalingradmadonna Бумага, уголь. 120 × 90 см Мемориальная церковь кайзера Вильгельма, Берлин  Медиафайлы на Викискладе «Сталинградская Мадонна» (нем. Muttergottes von Stalingrad) — рисунок немецкого военного врача Курта Ройбер�...

 

 

8th Marine Regiment Création 1917 Pays États-Unis Allégeance United States Marine Corps Type Régiment Rôle infanterie Fait partie de 2e division des MarinesII Marine Expeditionary Force Composée de bataillon de quartier général1er bataillon2e bataillon3e bataillon Garnison Marine Corps Base Camp Lejeune, Caroline du Nord (États-Unis) Devise More Than Duty Guerres Seconde Guerre mondialeGuerre en IrakGuerre contre le terrorismeGuerre du golfe Batailles Bataille GuadalcanalBataille de ...

 

 

Suku Ryukyu琉球民族Shō EnShō TaiAnkō ItosuHirokazu NakaimaYoko GushikenJames IhaGacktTakeshi KaneshiroJake ShimabukuroNamie AmuroAi MiyazatoDavid IgeDaerah dengan populasi signifikan Okinawa Kagoshima (Amami) Kansai (Osaka)[1], Kanto (Tokyo, Yokohama)[2] Taiwan Filipina Brasil Peru California (AS) Hawaii (AS)BahasaRumpun bahasa Ryukyu, Bahasa JepangAgamaAgama Ryukyu, Buddhisme, ShintoKelompok etnik terkaitSuku Yamato, Suku Yayoi [3][4] Suku Ryukyu a...

Nicole Cardoch Ramos Subsecretaria General de Gobierno de Chile Actualmente en el cargo Desde el 10 de marzo de 2023Presidente Gabriel BoricPredecesora Valeska Naranjo Información personalNacimiento 06 de mayo de 1992Rengo, ChileNacionalidad ChilenaEducaciónEducación Colegio San Antonio del BaluarteEducada en Universidad de ChileInformación profesionalOcupación PolíticaAños activa 2010-actualidadPartido político Partido Socialista de Chile[editar datos en Wikidata] Nicole Ca...

 

 

باكيسنوف   الاسم الرسمي (بالأذرية: Razin)‏(بالأذرية: Bakıxanov)‏(بالروسية: Разин)‏  الإحداثيات 40°25′18″N 49°57′52″E / 40.421666666667°N 49.964444444444°E / 40.421666666667; 49.964444444444  تاريخ التأسيس 1923  تقسيم إداري  البلد أذربيجان[1]  خصائص جغرافية  المساحة 12 كيلومتر مربع  ا

 

 

هذه المقالة يتيمة إذ تصل إليها مقالات أخرى قليلة جدًا. فضلًا، ساعد بإضافة وصلة إليها في مقالات متعلقة بها. (يونيو 2021) يوميات لص الأكسجينDiary of an Oxygen Thief (بالإنجليزية) معلومات عامةالمؤلف مجهولاللغة الإنجليزيةالبلد هولنداالنوع الأدبي رومانسيالناشر كتب المعرضتاريخ الإصدار 2006ال

الدوري السويدي الدرجة الثانية 1937–38 تفاصيل الموسم الدوري السويدي الدرجة الثانية  البلد السويد  البطل نادي هاماربي  عدد المشاركين 40   الدوري السويدي الدرجة الثانية 1936–37  الدوري السويدي الدرجة الثانية 1938–39  تعديل مصدري - تعديل   الدوري السويدي الدرجة الثا�...

 

 

Events at the1987 World ChampionshipsTrack events100 mmenwomen200 mmenwomen400 mmenwomen800 mmenwomen1500 mmenwomen3000 mwomen5000 mmen10,000 mmenwomen100 m hurdleswomen110 m hurdlesmen400 m hurdlesmenwomen3000 msteeplechasemen4 × 100 m relaymenwomen4 × 400 m relaymenwomenRoad eventsMarathonmenwomen10 km walkwomen20 km walkmen50 km walkmenField eventsHigh jumpmenwomenPole vaultmenLong jumpmenwomenTriple jumpmenShot putmenwomenDiscus throwmenwomenHammer throwmenJavelin throwmenwomenCombined ...

 

 

Yūki TakadaNama asal高田 憂希Lahir16 Maret 1993 (umur 30)Kitakyushu, Fukuoka Prefecture, JepangPekerjaanpengisi suaraTahun aktif2013–sekarangAgenMausu PromotionTinggi151 cm (4 ft 11 in)Tanda tangan Yūki Takada (高田 憂希code: ja is deprecated , Takada Yūki, lahir 16 Maret 1993)[1][2] adalah seorang seiyu dari Kitakyushu, Prefektur Fukuoka yang ada di bawah naungan Mausu Promotion. Setelah lulus dari Sekolah Animasi Yoyogi di Fukuoka, ...

For the radio station licensed to Cary, North Carolina that held the call letters WKSL from 2007 to 2014, see WNCB. Not to be confused with KSL in Salt Lake City. This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.Find sources: WKSL – news · newspapers · books · scholar · JSTOR (February 2013) (Learn how and when to remove t...

 

 

Walkup redirects here. For the American basketball player, see Thomas Walkup. Walkup redirects here. For the American educator, see Lawrence Walkup. Country walkdown, in blue, with Carter Family picking. Playⓘ In country music, walkdown is a bassline which connects two root position chords whose roots are a third apart, often featuring an inverted chord[1] to go between the root notes of the first two chords. See: slash chord. A walkup would be the converse. For example the chords G...

 

 

WerkkabinetKabinet Kerja Kabinet in Indonesië Start 27 oktober 2014 Eind 20 oktober 2019 Voorganger Verenigd Indonesië-kabinet II Opvolger Indonesië Vooruit-kabinet Staatshoofd Joko Widodo Lijst van Indonesische kabinetten Portaal    Politiek Het Werkkabinet (Indonesisch: Kabinet Kerja) was een Indonesisch kabinet dat regeerde in de jaren 2014-2019. Het was het eerste kabinet onder leiding van president Joko Widodo, nadat hij de presidentsverkiezingen van 2014 had gewonnen. Dit w...

Facility where cue games are played This article is about commercial billiard venues. For billiard rooms inside larger structures, see Billiard room. A pool hall in Chicago, Chris's Billiards, where parts of The Color of Money were shot Wikimedia Commons has media related to Billiard halls. A billiard, pool or snooker hall (or parlour, room or club; sometimes compounded as poolhall, poolroom, etc.) is a place where people get together for playing cue sports such as pool, snooker or carom bill...

 

 

Emblème du Tribunal spécial des Nations unies pour le Liban Cet article est lié à une ou plusieurs affaires judiciaires en cours. Le texte peut changer fréquemment, n’est peut-être pas à jour et peut manquer de recul. Le titre et la description de l'acte concerné reposent sur la qualification juridique retenue lors de la rédaction de l'article et peuvent évoluer en même temps que celle-ci. N’hésitez pas à participer de manière neutre et objective, en citant vos sources et en...

 

 

Abbey located in Var, in France The cloister Thoronet Abbey (French: L'abbaye du Thoronet) is a former Cistercian abbey built in the late twelfth and early thirteenth century, now restored as a museum. It is sited between the towns of Draguignan and Brignoles in the Var Department of Provence, in southeast France. It is one of the three Cistercian abbeys in Provence, along with the Sénanque Abbey and Silvacane, that together are known as the Three Sisters of Provence. Thoronet Abbey is one o...

1997 studio album by A. R. RahmanVande MataramStudio album by A. R. RahmanReleased12 August 1997[1]Recorded1997Panchathan Record Inn(Chennai, India)Sarm West Studios(London, UK)Metropolis Studios(London, UK)Reaktor Studios(London, UK)Sargam Studios(Lahore, Pakistan)XIC Studios(Mumbai, India)GenreWorld music, Indian pop, Folk rock[2]Length55:25LabelColumbia/SME Records (1997) CK 68525 (North America)488709 (international) Varese Sarabande (2009)ProducerA. R. RahmanKanik...

 

 

This article does not cite any sources. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.Find sources: Final Holocaust – news · newspapers · books · scholar · JSTOR (December 2009) (Learn how and when to remove this template message) 1990 studio album by MassacraFinal HolocaustStudio album by MassacraReleasedJanuary 1990RecordedLate 1989StudioRA.SH StudiosGenreThrash meta...

 

 

Korea City Air Terminal The Korea City Air Terminal in Gangnam district, Seoul, is one of South Korea's Airport terminals. It is part of the COEX complex. The Korea City Air Tower is a public transportation facility in which travellers on certain flights can check in their luggage at the terminal before leaving for the airport as well as going through immigration for expedited entry once at the airport. As of December, 2015, the Korea City Air Terminal supports flights for Korean Air, Asiana ...

Artikel ini tidak memiliki referensi atau sumber tepercaya sehingga isinya tidak bisa dipastikan. Tolong bantu perbaiki artikel ini dengan menambahkan referensi yang layak. Tulisan tanpa sumber dapat dipertanyakan dan dihapus sewaktu-waktu.Cari sumber: Islam di Mozambik – berita · surat kabar · buku · cendekiawan · JSTOR Islam menurut negara Afrika Aljazair Angola Benin Botswana Burkina Faso Burundi Kamerun Tanjung Verde Republik Afrika Tengah Chad Kom...

 

 

特例都道 東京都道新宿副都心五号線 北通り 路線延長 815 m 制定年 未調査 開通年 未調査 道路の方角 東西 起点 新宿区西新宿一丁目(東京都道新宿副都心八号線交点) 終点 新宿区西新宿二丁目(熊野神社前交差点) ■テンプレート(■ノート ■使い方) ■PJ道路 新宿警察署裏交差点 東京都道新宿副都心五号線(とうきょうとどう しんじゅくふくとしんごごうせん)�...

 

 

Kembali kehalaman sebelumnya