Ways to analyze a 1M rows dataset of search queries
-
Hi,
I have this large dataset, about 1 million search queries with visits, bounce rate and a few other metrics. I'm trying to explore this data to find keyword "buckets" (such as include product name, location name, transactional objective, informational, etc.), as well as explore the density of certain keywords (keywords as in instances of a single word amongst all queries)
My idea was to use Excel and a macro to split all queries in separate words (also clearing punctuation and uppercase/lowercase), then storing this word in a new worksheet, adding to another column the visit counts from the row where the word was extracted (as to give a sense of weight). Before adding the word to the new worksheet, the script will look if the word already existed, if so it would just add the current value of visits to the existing visit counts etc.
In the end it will create sort of a "dictionary" of all the keywords in all search queries ranked by weight (= visits from search query including this keyword)
This would help me get started I believe, because I can't segment and analyze 1M raw search queries...
My issue is: this VBA has been running on my (fast) PC for the last 24hr and it doesn't seem to get to an end. Obviously excel+VBA is not the best way to do text mining and manipulation in such a large dataset (although it's just a 30mb file)
What would you do if you had this dataset and would like to mine the text/semantic as I am doing? Any idea of tools? process?
I'm considering dumping this data into a MySQL db and doing the processing through PHP (the only backend language I'm versed in), and getting the "summified" data stored into another table, which I'll then be able to export to a Excel for analysis. But I'm afraid that I'll be facing memory limit issues and such...
In the meantime, I'm definitely interested into knowing what you guys would do if you had this data and wanted to simply start exploring its constituencies
Thanks!
-
Yeah, Access can process any number of rows. It's Microsoft's database program. You can upload data, and then create queries. They have a design view where you can construct queries in a WYSIWYG fashion, or if you want, you can write your own SQL.
-
Thanks a lot John!
I'm going to try this out tonight!
So, I assume, Access won't have the same processing limitations with 1 million rows, will it?
Once I'll be done with the "discovery phase" I'm going through with this keyword list, I'll definitely use Advanced filters (in Excel) as you recommend to understand keyword groups in details
-
I had a similar problem going through my search query reports. If you're already familiar with VB you could do this with a Microsoft Access database rather than setting up a MySQL one w/PHP. I've been working on creating an Access database that I can import my data into, and have it spit out all sorts of useful info (for example negative keywords and placements), but it's only in its early stages right now.
If you just want to see it for a few terms and don't mind doing it one at a time, in the past I've filtered data like this in Excel without VB using advanced filters. I found that using advanced filters rather than VB sped up the process quite a bit; I'd imagine because it's an innate Excel function. Using 4 filters you can match whole words in the queries. For example, to find queries with "blah", you'd set a filter for "blah", "* blah", "blah " and " blah *". Then you can use the Subtotal command to do calculations over the visible rows and calculate the data.
More about advanced filters: http://office.microsoft.com/en-us/excel-help/filter-by-using-advanced-criteria-HP005200178.aspx
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Organic search traffic down 60% since 8/1/18\. What now?
I have a small health & fitness blog, and my Google search traffic suddenly dropped 60% around August 1 (I've attempted to link an image). My rank has dropped for 86 keywords. I have no manual penalty, so I'm guessing I was affected by the algorithm change. My technical skills are VERY limited. I've tried to find answers on my own, but every time I try to "fix" something, I only seem to make it worse. I do seem to have some structural/performance issues with my site (e.g., lots of 404 errors from uninstalled plugins and unwanted permalinks). I asked my server for assistance (I used managed Wordpress hosting), and they said they couldn't help. As you can imagine, this is quite devastating, and I have no clue where to go from here. I don't know if I'm allowed to link to my site here, but it's mommyrunsit dot com. Any assistance is greatly appreciated. Thanks. Sharon 0lPu4wY
Reporting & Analytics | | RoniFaida1 -
Google Search Console
To the Moz Community, Should we be considering the information that Google Search Console is telling us? It is showing a dramatic drop in our SEO and our pages are not being indexed, however it is showing differently in our Moz Analytics section. Any clarification will be greatly appreciated. Many thanks Dawn
Reporting & Analytics | | DawnQ0 -
Lots of missing main keywords from Search Console
Hello Moz Community, A while back I noticed in Google Search Console that the volume of impressions and clicks dropped off a cliff. I also noticed that this was for primary head keywords that generally had a decent volume of impression share. The natural initial reaction was 'oh, I must have lost those rankings' but upon checking I realized these rankings are still in existence. Admittedly most are page 2 or 3, but still within the confines of being captured in GSC. Is there a logical reason why these keywords have just gone from search console? An example keyword would be something like 'online football management game' and the website is https://www.worldelitesoccer.com There is 0 queries in my search console data that includes the word 'football'. Thanks, Ben
Reporting & Analytics | | melaniedsg0 -
Are these Search Console crawl errors a major concern to new client site?
We recently (4/1) went live with a new site for a client of ours. The client site was originally Point2 before they made the switch to a template site with Real Estate Webmasters. Now when I look into the Search Console I am getting the following Crawl Errors: 111 Server Errors (photos) 104 Soft 404s (blogs, archives, tags) 6,229 Not Found (listings) I have a few questions. The server errors I know not a lot about so I generally ignore. My main concerns are the 404s and not found. The 404s are mostly tags and blog archives which I wonder if I should leave alone or do 301s for each to /blog. For not found, these are all the previous listings from the IDX. My assumption is these will naturally fall away after some time, as the new ones have already indexed. But I wonder what I should be doing here and which will be affecting me. When we launched the new site there was a large spike in clicks ( 250% increase) which has now tapered off to an average of ~85 clicks versus ~160 at time of launch. Not sure if the Crawl Errors have any effect, I'm guessing not so much right now. I'd appreciate your insights Mozzers!
Reporting & Analytics | | localwork0 -
Is there a way to find out who the admin of a Google Analytics account is?
We have a client who has been searching high and low to gain access to the GA account that is tied to their website, but previous agencies keep stating they don't have access. Is there a way to find out who the admins are on a GA account with the tracking number so we can reach out to that person and ask them to admin the client?
Reporting & Analytics | | marketingmediamanagement0 -
Massive decrease in search volumes from Keyword planner
The search volumes that I'm getting out of Keyword planner are approximately 10% (some as low as 1%) of the search volumes for the same keywords back in April. Is there something I missed in this time? I'm using the same account, same language and location settings. Anyone have an explanation for this? To be honest these new search volumes sound a lot more plausible to me - I'm just intrigued by what might have happened!
Reporting & Analytics | | a-champ0 -
Search Traffic Drops Before It Improves?
I'm working on a site with tons of great, useful content....the owners of the site implemented a new site layout and design (complete overhaul) and they were lacking basics such as meta descriptions, 301 redirects, and, shockingly, they had the same Title tag for every single page on a site with thousands of unique how-to articles. Unsurprisingly their traffic dropped by about 300%. They generate most of their traffic from people learning how to build stairs, how to install crown molding, and other related matters. Beginning last Thursday I've been performing basic on-site SEO, things like having unique titles for each page and similar tasks. The week from Thursday when I began until yesterday (Wednesday), Google traffic dropped -29.73% - 17,715 vs 25,210 I believe this is a normal part of the "Google Shuffle" -- does anyone have a Matt Cutts link or similar proof that this is a normal part of the process?
Reporting & Analytics | | wattssw0 -
My own brand searches stopped appearing in google
Hi everyone, I've never asked a question here before, so go easy on me 🙂 I've noticed recently that my site is not showing up for my own brand searches anymore, and that my organic google results are way down. For example, it used to be if you would search for "grouvee kings quest 6" you would either see http://www.grouvee.com/news/kings-quest-6-retro-rerun-playthrough-part-1/63/ or http://www.grouvee.com/games/kings-quest-vi-heir-today-gone-tomorrow/69/ show up first in google. Now they're nowhere to be found. Here's another example. If you would search for "grouvee mass effect 3" you'd either see http://www.grouvee.com/games/mass-effect-3/75/ or http://www.grouvee.com/news/mass-effect-3-lets-take-the-earth-back/136/ show up first in google. Once again, nowhere to be found. I could go on with several other examples, but I don't really know what I've done. I 301 redirected 2 pages a few weeks ago because I added a /reviews/ structure to my site and needed to move a couple of pages that used to be under /news/ over to the new section, but those shouldn't have had anything to do with the pages I referenced above. I haven't done any link building really. I submit some of my articles occasionally to places like reddit or n4g, but that's about it. Anyone have any tips or things I should be looking at?
Reporting & Analytics | | petecorsaro0