How long does it take for customized Google Site Search to show results from pdf files?
-
The site in question is http://www.ejmh.eu
I am pretty unsatisfied with the results I am getting from the Site Search provided by Google.
We have over 160 pdf files in this subfolder: http://www.ejmh.eu/mellekletek
The files are the digital versions of articles. When I search for content in those pdf files, Google does not show results. It does show results from older pages, dating back 1-2 years but it is certainly not showing anything from pdf files that I have just put up 3 weeks ago.
My questions:
If I place a Google Search on a site, does it not automatically display results from ALL the content in the root domain?
Is there any correlation between how the Site Search is indexing the files and how Google is indexing the urls in general?
Should I just wait and see whether site search performance improves or should I switch to another Search software like Zoom Search?
It is vital to have a proper, high-quality search functioning on that site in the very near future.
What are your experiences? Any tips are greatly appreciated.
-
Hi, everyone: problem solved.
Here is what I did: I created a seperate sitemap-xml and linked to all the new pdfs.
I updated the general sitemap.xml and linked to the new sitemap as well.
I (re)submitted both sitempas via the Webmaster Tools.
Within a few hours, most of pdfs got indexed and the overall quality of search has improved dramatically. Thanks for all your help.
-
It may be a good idea to include all the pdf files on the sitemap, even if it is a troublesome process.
Otherwise it just takes too long for Google to index them.
What still surprises me is that even for a site search, you need to win the 'indexing battle'. I thought that Google indexes everythig within the map for the 'sake of the site search' and displays the results when a visitor is searching within the site. Less fancy softwares are actually doing the job. I thought a Google Site Search provides something even better.
-
Last crawl - thanks, great info.
yes, all new pdfs are linked from the html files.
This the summary page of one article: http://www.ejmh.eu/5archives_ppr_jaggle_061.html
In the middle of the page, you see 'download full text' - this is from where the individual papers (pdf) are linked.
-
Do you have the new PDFs Linked from pages like the old ones?
Try to create a page listing all the new PDFs, and basically Google might take time to recrawl your site and add these new PDFs ( by the way the last copy saved in Google Cache is from Feb 11)
-
You are great, thanks for your time. Yeah, I did check things out with this google command: there are pdf's listed but these are all old pdfs I have put up a long time ago. None of the pdfs I have put up recently are among those indexed.
Do you think that only those urls come up through a customized site search that are indexed by Google? Does Google not crawl the site and make a list of urls for the sake of the search purely? (Zoom search does it, for example) In theory, there could be two different type of 'crawls': one for the site search and one for the larger world, searching in the browser.
As for the settings...can you plase help me further: what exactly would you change?
-
if you check here all the pdf are indexed in google
so i will check the settings on CSE
reference here http://www.google.com/cse/docs/resultsxml.html#wsQueryTerms
-
Thanks for the tip, it's a good one. But they are all 100% texts.
-
If a search engine cannot read the text, due to it being a graphic and not text, then it won't be able to fully index the words on the document.
so make sure all your PDF are 100% text that was converted to a PDF and not a "Scan" (image) of the original document that was saved as a PDF
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How ask Google to de index scrapper sites?
While doing text Google searches for various keywords I have found two sites that have scrapped pages from my site which goes by an old URL of www.tpxcnex.com and a new URL of www.tpxonline.com www.folder.com is one of the sites and if you try to visit that site or any of the scrapped Google index listing, Chrome warns you not to. How can I ask Chrome to deindex www.folder.com or another scrapper site, or atleast deindex the URLs which have clearly scrapped my content?
Technical SEO | | DougHartline0 -
GWT shows 38 external links from 8 domains to this PDF - But it shows no links and no authority in OSE
Hi All, I found one other discussion about the subject of PDFs and passing of PageRank here: http://moz.com/community/q/will-a-pdf-pass-pagerank But this thread didn't answer my question so am posting it here. This PDF: http://www.ccisolutions.com/jsp/pdf/YAM-EMX_SERIES.PDF is reported by GWT to have 38 links coming from 8 unique domains. I checked the domains and some of them are high-quality relevant sites. Here's the list: Domains and Number of Links
Technical SEO | | danatanseo
prodiscjockeyequipment.com 9
decaturilmetalbuildings.com 9
timberlinesteelbuildings.com 6
jaymixer.com 4
panelsteelbuilding.com 4
steelbuildingsguide.net 3
freedocumentsearch.com 2
freedocument.net 1 However, when I plug the URL for this PDF into OSE, it reports no links and a Page Authority if only "1". This is not a new page. This is a really old page. In addition to that, when I check the PageRank of this URL, the PageRank is "nil" - not even "0" - I'm currently working on adding links back to our main site from within our PDFs, but I'm not sure how worthwhile this is if the PDFs aren't being allocated any authority from the pages already linking to them. Thoughts? Comments? Suggestions? Thanks all!0 -
Page Indexing increase when I request Google Site Link demote
Hi there, Has anyone seen a page crawling increase in Google Web Master Tools when they have requested a site link demotion? I did this around the 23rd of March, the next day I started to see page crawling rise and rise and report a very visible spike in activity and to this day is still relatively high. From memory I have asked about this in SEOMOZ Q&A a couple of years ago in and was told that page crawl activity is a good thing - ok fine, no argument. However at the nearly in the same period I have noticed that my primary keyword rank for my home page has dropped away to something in the region of 4th page on Google US and since March has stayed there. However the exact same query in Google UK (Using SEOMOZ Rank Checker for this) has remained the same position (around 11th) - it has barely moved. I decided to request an undemote on GWT for this page link and the page crawl started to drop but not to the level before March 23rd. However the rank situation for this keyword term has not changed, the content on our website has not changed but something has come adrift with our US ranks. Using Open Site Explorer not one competitor listed has a higher domain authority than our site, page authority, domain links you name it but they sit there in first page. Sorry the above is a little bit of frustration, this question is not impulsive I have sat for weeks analyzing causes and effects but cannot see why this disparity is happening between the 2 country ranks when it has never happened for this length of time before. Ironically we are still number one in the United States for a keyword phrase which I moved away from over a month ago and do not refer to this phrase at all on our index page!! Bizarre. Granted, site link demotion may have no correlation to the KW ranking impact but looking at activities carried out on the site and timing of the page crawling. This is the only sizable factor I can identify that could be the cause. Oh! and the SEOMOZ 'On-Page Optimization Tool' reports that the home page gets an 'A' for this KW term. I have however this week commented out the canonical tag for the moment in the index page header to see if this has any effect. Why? Because as this was another (if not minor) change I employed to get the site to an 'A' credit with the tool. Any ideas, help appreciated as to what could be causing the rank differences. One final note the North American ranks initially were high, circa 11-12th but then consequently dropped away to 4th page but not the UK rankings, they witnessed no impact. Sorry one final thing, the rank in the US is my statistical outlier, using Google Analytics I have an average rank position of about 3 across all countries where our company appears for this term. Include the US and it pushes the average to 8/9th. Thanks David
Technical SEO | | David-E-Carey0 -
Google instant results different to results shown when press enter
A client's site, www.duorol.co.uk is top (or second if a youtube video makes an appearance) for the term duorol if you press enter after typing it in to google UK. Before you press enter though, their site is not listed in the results bought back for instant search. It's the same behaviour in incognito mode too. Very weird I thought. Does anyone have any ideas please? Their site's only been live about a month. Could that be anything to do with it?
Technical SEO | | OffSightIT0 -
Small business sites banned by google. Please help.
Hi. My 2 sites http://www.painterdublin.com and http://www.tilers-dublin.com were banned by google update in November 2012. Both were ranking fairly well in search results: painterdublin generating cca 600/month and tilers-dublin cca 300/month organic traffic from google. After update it is about 70% less. Is there anyone willing to take a look at my pages and give me some advice about what to do to improve the situation? thank you very much
Technical SEO | | jarik0 -
Noindex search result pages Add Classifieds site
Dear All, Is it a good idea to noindex the search result pages of a classified site?
Technical SEO | | te_c
Taking into account that category pages are also search result pages, I would say it is not a good idea, but the whole information is in the sitemap, google can index individual listings (which are index, follow) anyway. What would you do? What kind of effects has in the indexing of the site, marking the search result pages as "search results" with schema.org microdata? Many thanks for your help, Best Regards, Daniel0 -
My site cannot be found by google at all
I don't know why but our company site can not be found by google at all. I have submitted to google webmaster, have social media point to, etc, Is there any reason for this? url for our website is www.bistosamerica.com Thank you
Technical SEO | | BistosAmerica0 -
Google Confusion: Two Sites and a 301 Redirect.
Hi, We have a client who just sprang a new project on us. As always, they went ahead and did some stuff before bringing us into the loop! (oh the joy of providing SEO services!) Anyway, i'm pretty swamped right now and need some extra brains on this. Basically the client had www.examplesiteA.com online for many years (an affiliate site which had built up a strong brand in the industry). They have now decided to turn this affiliate site into a full blown service platform and so with the new site being built they 301'd the whole thing over to www.examplesiteB.com - this is where they want all the old affiliate content to be hosted. So essentially examplesiteA.com is now examplesiteB.com and a new site is being placed on examplesiteA.com - still with me? So this has all happened and a brand new website is on examplesiteA.com and the old examplesiteA is now sitting exactly as it used to, but on the examplesiteB domain. The 301 redirect has been removed and the new examplesiteA seems to have been crawled, but the homepage is not indexed. When you search for examplesiteA, examplesiteB is the top result. Now they are similar domain names and to be fair I have very little data at this point i.e. I don't know when the 301 redirect was removed and it maybe that this all fixes itself with time. How is link equity effected now that examplesiteA.com was 301 redirected to examplesiteB.com and cached in this way, but now the 301 redirect has been removed and does not exist? Would link juice have been diluted throughout the process? Obviously if we had been in on all this before anything was implemented we would have done things differently. Interested to hear what others would do coming in at this point. Thanks and look forward to the advice!
Technical SEO | | MarcLevy0