Ways to analyze a 1M rows dataset of search queries
-
Hi,
I have this large dataset, about 1 million search queries with visits, bounce rate and a few other metrics. I'm trying to explore this data to find keyword "buckets" (such as include product name, location name, transactional objective, informational, etc.), as well as explore the density of certain keywords (keywords as in instances of a single word amongst all queries)
My idea was to use Excel and a macro to split all queries in separate words (also clearing punctuation and uppercase/lowercase), then storing this word in a new worksheet, adding to another column the visit counts from the row where the word was extracted (as to give a sense of weight). Before adding the word to the new worksheet, the script will look if the word already existed, if so it would just add the current value of visits to the existing visit counts etc.
In the end it will create sort of a "dictionary" of all the keywords in all search queries ranked by weight (= visits from search query including this keyword)
This would help me get started I believe, because I can't segment and analyze 1M raw search queries...
My issue is: this VBA has been running on my (fast) PC for the last 24hr and it doesn't seem to get to an end. Obviously excel+VBA is not the best way to do text mining and manipulation in such a large dataset (although it's just a 30mb file)
What would you do if you had this dataset and would like to mine the text/semantic as I am doing? Any idea of tools? process?
I'm considering dumping this data into a MySQL db and doing the processing through PHP (the only backend language I'm versed in), and getting the "summified" data stored into another table, which I'll then be able to export to a Excel for analysis. But I'm afraid that I'll be facing memory limit issues and such...
In the meantime, I'm definitely interested into knowing what you guys would do if you had this data and wanted to simply start exploring its constituencies
Thanks!
-
Yeah, Access can process any number of rows. It's Microsoft's database program. You can upload data, and then create queries. They have a design view where you can construct queries in a WYSIWYG fashion, or if you want, you can write your own SQL.
-
Thanks a lot John!
I'm going to try this out tonight!
So, I assume, Access won't have the same processing limitations with 1 million rows, will it?
Once I'll be done with the "discovery phase" I'm going through with this keyword list, I'll definitely use Advanced filters (in Excel) as you recommend to understand keyword groups in details
-
I had a similar problem going through my search query reports. If you're already familiar with VB you could do this with a Microsoft Access database rather than setting up a MySQL one w/PHP. I've been working on creating an Access database that I can import my data into, and have it spit out all sorts of useful info (for example negative keywords and placements), but it's only in its early stages right now.
If you just want to see it for a few terms and don't mind doing it one at a time, in the past I've filtered data like this in Excel without VB using advanced filters. I found that using advanced filters rather than VB sped up the process quite a bit; I'd imagine because it's an innate Excel function. Using 4 filters you can match whole words in the queries. For example, to find queries with "blah", you'd set a filter for "blah", "* blah", "blah " and " blah *". Then you can use the Subtotal command to do calculations over the visible rows and calculate the data.
More about advanced filters: http://office.microsoft.com/en-us/excel-help/filter-by-using-advanced-criteria-HP005200178.aspx
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Paid traffic or "Paid Search" is not showing in my Google Analytics
Hi, I have two campaigns running in Google Adwords or Google Ads now and I saw in Google Ads account that I had 5 clicks today (09/18/2018) but when I try to search for this clicks in my Google Analytics in ACQUISITION > All Traffic > Channels I don't find nothing about "Paid Search" or something like that. Bellow is a picture of my Google Analytics account to prove it. The accounts are linked and I can find the 2 campaigns in the Analytics. How can I interpret this picture? Where the paid traffic is showing? or not showing there? Thanks Leandro uvAtrsg
Reporting & Analytics | | lmoraes0 -
Conflicting average position data from Google Search Console?
I'm looking at Google Search Console data in Google Analytics, specifically Average Position as given in the Landing Page report, and the same metric broken out by mobile and desktop in the Devices report. In the Landing Page report, I see an aggregated average position that's much higher/worse than an actual average of what is reported for mobile, desktop and tablet traffic under the Device reporting. For example: Mobile: 5 Desktop: 5 Tablet: 5 So the average still should be roughly 5, right? Why would the Landing Page then show an aggregate Average Position of 8? I wouldn't expect to see a precisely same average given that different device types have different proportions that could render differently when the buckets are combined, but this is a huge swing. In fact, the aggregate Average Position as given in the top level Devices report is closer to 5 than to the 8 shown in the Landing Pages report. (These aren't actual numbers, but are illustrative of what I'm seeing, by the way.) Unless I'm missing some vital difference in the way that Average Position is reporting for the Landing Page report versus the Device reports, it doesn't seem like this should be possible. What am I missing?
Reporting & Analytics | | BradsDeals0 -
Query on google analytic benchmarking report
Hi All, First I select My Industry Vertical - ABCD then I select Region - US ( all region) then size by daily session - 10000-99999 finally reports appears "Blank" but when I decrease daily session from 10000-99999 to 1000-4999 or less then report is perfect. So what does it mean? My Avg daily session is 70k to 80k. So how to analysis benchmarking in this case? Thanks!
Reporting & Analytics | | pragnesh96390 -
What is the "UPDATE" indicate in the Google Search Console Query Reports?
We recently noticed an update note in the Google Search Console that happened on April 27th. Does this denote an algorithm update? Any feedback or article would really be helpful. Thanks! gfQ8FG9.jpg
Reporting & Analytics | | RosemaryB0 -
Google Analytics - Organic Search Traffic & Queries -What caused the huge difference?
Our website traffic dropped a little bit during the last month, but it's getting better now, almost the same with previous period. But our conversion rate dropped by 50% for the last three weeks. What could cause this huge drop in conversion rate? In Google Analytics, I compared the Organic Search Traffic with previous period, the result is similar. But the Search Engine Optimization ->Queries shows that the clicks for last month is almost zero. What could be the cause of this huge differnce? e9sJNwD.png k4M8Fa5.png
Reporting & Analytics | | joony0 -
Best way to measure local search keyword rank
I have several clients that have a regional or local presence and want to track their rankings for various keywords, but only in the areas they are located in. What are the best ways to track keyword rank in specific locations or regions of the US?
Reporting & Analytics | | TheURLdr0 -
No Query parameter for site search
Hi Guys, I have enable site search for analytics a number of times. But this time it's the first time I came across a search with no query parameters. example.com/search/item/searchterm What is the most simple way to approach this? thank you!
Reporting & Analytics | | GetApp0 -
Organic search on google
Hi there, pl take a look at this link, there is a section which says shared results and has a star against two agencies and once article. can pl someone let me know what those starts are and how to get them? Thank you 🙂 so sorry - the link is http://www.google.com/search?sourceid=chrome&ie=UTF-8&q=digital+mareketing+agency+los+angeles Vijay
Reporting & Analytics | | vijayvasu0