?

Log in

No account? Create an account
Vexen Crabtree 2015

vexen

Vexen Crabtree's Live Journal

Sociology, Theology, Anti-Religion and Exploration: Forcing Humanity Forwards


Previous Entry Share Flag Next Entry
Vexen Crabtree 2015
vexen

Counting URLs (SQL, in Access environment)

I'm trying to do something quite simple in SQL but can't get it right. I suddenly found myself creating 3 queries just to do this one thing, and I'm sure I should be able to do it in one query. I have a list of URLs in a table (T_URLs including the field URL_URL), and I simply want to return a list of URLs, ordered by domain, with a count of how many URLs are in each domain.

So,

1. Extract the Domain part of the URL. For example the domain of http://www.vexen.co.uk/countries/best.html is http://www.vexen.co.uk/:
SELECT Mid(URL_URL,1,InStr(8,URL_URL,"/")) AS URL_Domain FROM T_URLs;
2. I need to Group By and Order By this URL_Domain field, which is where my problems start. So I need extra SQL like:
Count(URL_Domain)
and
GROUP BY Mid(URL_URL,1,InStr(8,URL_URL,"/")) ORDER BY URL_Domain
3. I then want to count how many URLs are in that domain. The result should be a list of URLs, with an extra "Domain_Q" field that counts how many other URLs are also in that domain:

In total:
SELECT Mid(URL_URL,1,InStr(8,URL_URL,"/")) AS Domain_URL, Count(URL_URL) AS Domain_Q

FROM T_URLs

GROUP BY Mid(URL_URL,1,InStr(8,URL_URL,"/")), URL_URL

ORDER BY URL_URL;
This will later let me iterate through every URL that exists in a domain which has more than 10 URLs in it.

But the SQL so far returns a (correct) list of URL_URLs and Domain_URLs, but, the Domain_Q says "1" for every line in the results.

It works if I put the Count_Q in a different SQL query, but then I end up creating a circular reference when I try to bring the Domain_Q back into the URL list. Can anyone work out (a) what I'm talking about, and (b) what nifty little trick I'm missing to get this done sensibly?


  • 1

Re: Something like this?

select url from (select substring(url, 8, locate('/',url,8)-8) as domain, count(url) as count from urls group by domain) as t1 join (select substring(url, 8, locate('/',url,8)-8) as domain, url from urls) as t2 on t1.domain = t2.domain where count > 10 order by t2.domain;

Re: Something like this?

OK I see, you're awesome old friend, I never thought of putting the two complementary queries in as subqueries... I've done with this for now (the thing works - it writes sitemap.xml files correctly for each domain I happen to have URLs on), I'll play with subqueries next time I have a go at editting that part of Ziggy!

  • 1