Büyük veri araçlarını kullanarak duygu analizi gerçekleştirimi

Özdeş, Merve

Please use this identifier to cite or link to this item: https://hdl.handle.net/11499/2076

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Sezai Tokat	-
dc.contributor.author	Özdeş, Merve	-
dc.date	2017-07-06	en_US
dc.date.accessioned	2017-09-28T12:10:51Z
dc.date.available	2017-09-28T12:10:51Z
dc.date.issued	2017-04	-
dc.identifier.uri	https://hdl.handle.net/11499/2076	-
dc.description.abstract	İnternetin yaygın olarak kullanılmasıyla birlikte veri miktarında da inanılmaz büyüklükte artış meydana gelmiştir. Veri miktarındaki bu artış, bu verilerin yönetimini zorlaştırmakla birlikte, bu veriler arasından anlamlı bilgiler elde etmeyi de gerekli kılmıştır. Geleneksel veri tabanlarıyla verilerin saklanması, işlenmesi ve analiz edilmesi gibi işlemlerin yapılamaması büyük veri kavramını ortaya çıkarmıştır. Büyük veri kavramı verinin oluşturulması, saklanması, işlenmesi ve analiz edilmesi gibi işlemlerin tümüne verilen addır. Basit bir ifadeyle, verinin anlamlı ve işlenebilir hale dönüştürülmüş biçimidir. İnternet ortamında paylaşılan video, blog, resim, web sunucularının log dosyaları, GSM operatörlerinin arama kayıtları ve buna benzer birçok kaynak büyük veri araçlarıyla işlenerek anlamlı hale dönüştürülmektedir. Üretim, pazarlama, telekomünikasyon, hükümet kaynakları, sağlık ve eğitim gibi birçok alanda büyük veri inanılmaz kolaylık sağlamaktadır. Büyük veri analizi için kullanılan pek çok araç mevcuttur. Bu tezde, büyük veri araçlarından olan Spark kullanılarak elde edilen veriler üzerinde duygu analizi işlemi gerçekleştirilmiştir. Duygu analizi, sözlüğe dayalı ve makine öğrenmesine dayalı olmak üzere iki farklı şekilde gerçekleştirilebilmektedir. Bu tezde, makine öğrenmesi yöntemlerinden biri olan denetimli öğrenme metoduyla duygu analizi işlemi gerçekleştirilmiştir. Toplamda 57.650 adet İngilizce şarkı sözü üzerinde veri temizleme işlemleri gerçekleştirildikten sonra, pozitif ya da negatif olacak şekilde etiketleme işlemi gerçekleştirilmiştir. Etiketlenen veri pozitifse 1, negatifse 0 değeri ile skorlanarak duygu analizi işleminde kullanılacak algoritmalara uygun bir formata dönüştürülmüştür. Dönüştürülen bu veri, denetimli öğrenme algoritmalarından Naive Bayes, Logistic Regresyon ve Decision Tree olmak üzere toplamda üç farklı algoritmaya tabi tutularak, algoritmanın çalıştırılması sonucu elde edilen başarım oranları karşılaştırılmıştır. Veri, RStudio üzerinde Naive Bayes algoritmasıyla tekrar çalıştırılmış ve algoritmanın işlemesi için geçen süresi Spark üzerinde geçen süreyle karşılaştırılmıştır. Spark’ın bu karşılaştırma sonucunda çok daha hızlı olduğu görülmüştür. Son olarak da çalışmanın geliştirilmeye açık yönleri belirtilmiş ve gelecek çalışmalar için önerilerde bulunulmuştur.	en_US
dc.description.abstract	With the widespread usage of the Internet, the amount of data has also increased enormously. This increase in the amount of data has also made it necessary to obtain meaningful information from these data, as well as making it difficult to manage this data. The fact that data can not be stored, processed and analyzed by traditional databases reveals the concept of big data. The term of big data is sum of all operations such as creating, storing, processing and analyzing the data. In simple terms, the form is transformed into meaningful and processable. The log files of web servers, videos, blogs, images shared on internet, search records of GSM operators and many other similar resources are converted into meaningful data by processing with big data tools. Big data in many fields such as production, marketing, telecommunications, government resources, health and education provide incredible convenience. There are many tools available for big data analysis. In this thesis, sentiment analysis is performed on the data obtained by Spark, which is a big data tool. Sentiment analysis can be performed in two different ways, based on dictionary and machine learning. In this thesis, sentiment analysis process is performed with supervised learning method which is one of the machine learning methods. After a total of 57.650 songs were cleaned in the English language, labeling was performed either positively or negatively. The tagged data were converted to a form suitable for the algorithms to be used in the sentiment analysis process by scoring 1 if it is positive otherwise, with 0. The transformed data is subjected to three different algorithms, namely Naive Bayes, Logistic Regression and Decision Tree, from supervised learning algorithms, and the performance ratios obtained by running the algorithm are compared. The data was re-run on RStudio with the Naive Bayes algorithm, and the time spent for the algorithm to run was compared to the time spent on Spark. It has been found that Spark is much faster in this comparison. Finally, explicit aspects of the study were identified and suggestions for future studies were made.	en_US
dc.language.iso	tr	en_US
dc.publisher	Pamukkale Üniversitesi Fen Bilimleri Enstitüsü	en_US
dc.rights	info:eu-repo/semantics/closedAccess	en_US
dc.subject	Duygu Analizi	en_US
dc.subject	Büyük Veri	en_US
dc.subject	Spark	en_US
dc.subject	Öznitelik Seçme	en_US
dc.subject	Naive Bayes	en_US
dc.subject	Logistic Regression	en_US
dc.subject	Decision Tree	en_US
dc.subject	Sentiment Analysis	en_US
dc.subject	Big Data	en_US
dc.subject	Feature Exctraction	en_US
dc.subject	Naive Bayes	en_US
dc.title	Büyük veri araçlarını kullanarak duygu analizi gerçekleştirimi	en_US
dc.title.alternative	Sentiment analysis using big data tools	en_US
dc.type	Master Thesis	en_US
dc.authorid	111118	-
dc.authorid	11412	-
dc.relation.publicationcategory	Tez	en_US
dc.identifier.yoktezid	464992	en_US
dc.owner	Pamukkale University	-
item.openairetype	Master Thesis	-
item.cerifentitytype	Publications	-
item.fulltext	With Fulltext	-
item.openairecristype	http://purl.org/coar/resource_type/c_18cf	-
item.grantfulltext	open	-
item.languageiso639-1	tr	-
crisitem.author.dept	10.10. Computer Engineering	-
Appears in Collections:	Tez Koleksiyonu