Are the CSV downloads malformatted, when a comma appears in a URL?
-
Howdy folks, we've been a PRO member for about 24 hours now and I have to say we're loving it! One problem I am having with however is a CSV exported from our crawl diagnostics summary that I've downloaded.
The CSV contains all the data fine, however I am having problems with it when a URL contains a comma. I am making a little tool to work with the CSVs we download and I can't parse it properly because there sometimes URLs contain commas and aren't quoted the same as other fields, such as meta_description_tag, are.
Is there something simple I'm missing or is it something that can be fixed?
Looking forward to learn more about the various tools. Thanks for the help.
-
I won't be too hard on the programmers - I'm a programmer myself. Our small business has developers and designers doing the bulk of the SEO. I can see you've looked in to it as I have - there are many factors involved if I was to decide to "fix" this myself. To be honest, I don't fancy it - I'm hoping the better approach will come from the wonderful SEO Moz developers who might put in a fix. Hint hint.
-
The first rule in this business is "You can't trust programmers"
I should know, I am a programmer and I used to manage teams of them.
You can't trust them to write something perfect, because they will always make huge assumptions, based on what they know.
They should know that URLs can contain commas, and they should quote them.
If they didn't do that in the final field, it is a deficiency in the code and your stuff isn't going to workunless you fix it manually.
What you need to do to fix this is to add a quote after the 10th comma and also add one at the end of each line.
Unfortunately, even that is a problem.
The problem is there are other fields that may not be quoted, some of which can start with http://
There can also be line breaks in the title field, and possibly even in the link text field.
Quotes and other characters are escaped with double quotes.
Titles and link text can also contain commas, so it is very complex.
Some of the fields are a bigger mess because it depends on the link text, and if the link text contains an image, you'll have quotes and equals signs, commas and all kinds of stuff. You can also have upper ascii characters and multibyte characters.
They did actually quote the first URL, if it contains commas.
They really should have quoted every field
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Moz-Specific 404 Errors Jumped with URLs that don't exist
Hello, I'm going to try and be as specific as possible concerning this weird issue, but I'd rather not say specific info about the site unless you think it's pertinent. So to summarize, we have a website that's owned by a company that is a division of another company. For reference, we'll say that: OURSITE.com is owned by COMPANY1 which is owned by AGENCY1 This morning, we got about 7,000 new errors in MOZ only (these errors are not in Search Console) for URLs with the company name or the agency name at the end of the url. So, let's say one post is: OURSITE.com/the-article/ This morning we have an error in MOZ for URLs OURSITE.com/the-article/COMPANY1 OURSITE.com/the-article/AGENCY1 x 7000+ articles we have created. Every single post ever created is now an error in MOZ because of these two URL additions that seem to come out of nowhere. These URLs are not in our Sitemaps, they are not in Google... They simply don't exist and yet MOZ created an an error with them. Unless they exist and I don't see them. Obviously there's a link to each company and agency site on the site in the about us section, but that's it.
Moz Pro | | CJolicoeur0 -
Abnormal crawl issues appearing in my Moz results
I have been asked to look at a site for a friend and was more than surprised to see 16,9k crawl issues appear in the dashboard... of this 6,238 are duplicate page content and 5878 are duplicated page titles. What on earth is going on? I have spoken to the web developer as it appears there is a dev site somewhere and this is his response [Can I stress that Google determines which site was in the index first and then removes other sites it sees as having duplicate content. Our dev sites appearing in the search index would not affect your ranking due to duplicate content as Google would see your site as the first site with the content] As I cannot make contact with him, I am scratching my head, surely a dev site should be no-indexed, it sounds as though he is saying that its ok because Google will take the main site as the first site with the content... Very confused! Help need MOZ community. Manythanks, Sarah
Moz Pro | | Mutatio_Digital0 -
Why am I not getting my allowance of 10,000 inbound links in csv download file? 370 out of 4700??
Hi, I'm desparately trying to audit my backlinks to remove a penguin penalty on my site livefit.co.uk When I do the inbound link report i'm not getting all the links in the download. I know there is a limit of 25 links from each linking site so we get the full picture of links bu: I have 4700 links so why does it need to limit it when we are supposed to see up to 10,000? When you check the link profile on the report it doesn't seem there are many sites with anything close to 25, so surely that rule is invalid as an explanation here? Should I just work off OSE? But there is less useful info than on the csv.. I'd be very grateful for your thoughts. Thanks! James
Moz Pro | | LiveFit0 -
I am trying to find inbound links for one of my site urls. My question is does SEOMoz able to track all internal links as the Open Site Explorer shows 0 internal links?
It shows 0 internal links when I am pretty sure we have multiple internal links.Should we use absolute urls or relative urls for internal links?
Moz Pro | | SulekhaUSLLC0 -
Batch lookup domain authority on list of URL's?
I found this site the describes how to use excel to batch lookup url's using seomoz api. The only problem is the seomoz api times out and returns 1 if I try dragging the formula down the cells which leaves me copying, waiting 5 seconds and copying again. This is basically as slow as manually looking up each url. Does anyone know a workaround?
Moz Pro | | SirSud1 -
Can I specify a url for a keyword in the rank checker tool?
Hello! I'm new to seomoz and excited to learn the system. I created a campaign and added keywords but I'm not clear how the seomoz campaign rankings tool works. As an example, one of my keywords 'cigar cutters' is reporting at position 20 for url http://www.cheaphumidors.com/c_guillotine-cutters.html. However, I think it would be better target to focus that keyword on http://www.cheaphumidors.com/c_cutters.html. as a search for 'cigar cutters' could encompass either a guillotine cutter, punch cutter or cigar scissors. Is there any way to assign http://www.cheaphumidors.com/c_cutters.html to the term 'cigar cutters' in the campaign ranking report? Brian
Moz Pro | | davesabot0 -
Why do pages with canonical urls show in my report as a "Duplicate Page Title"?
eg: Page One
Moz Pro | | DPSSeomonkey
<title>Page one</title>
No canonical url Page Two
<title>Page one</title> Page two is counted as being a page with a duplicate page title.
Shouldn't it be excluded?0 -
How do I delete a url from a keyword campaign
I have a couple of urls that are associated with the keywords in my campaign. They are no longer valid so how do I remove them?
Moz Pro | | PerriCline0