?

Log in

No account? Create an account
Vexen Crabtree 2015

vexen

Vexen Crabtree's Live Journal

Sociology, Theology, Anti-Religion and Exploration: Forcing Humanity Forwards


Previous Entry Share Next Entry
Vexen Crabtree 2015
vexen

Counting URLs (SQL, in Access environment)

I'm trying to do something quite simple in SQL but can't get it right. I suddenly found myself creating 3 queries just to do this one thing, and I'm sure I should be able to do it in one query. I have a list of URLs in a table (T_URLs including the field URL_URL), and I simply want to return a list of URLs, ordered by domain, with a count of how many URLs are in each domain.

So,

1. Extract the Domain part of the URL. For example the domain of http://www.vexen.co.uk/countries/best.html is http://www.vexen.co.uk/:
SELECT Mid(URL_URL,1,InStr(8,URL_URL,"/")) AS URL_Domain FROM T_URLs;
2. I need to Group By and Order By this URL_Domain field, which is where my problems start. So I need extra SQL like:
Count(URL_Domain)
and
GROUP BY Mid(URL_URL,1,InStr(8,URL_URL,"/")) ORDER BY URL_Domain
3. I then want to count how many URLs are in that domain. The result should be a list of URLs, with an extra "Domain_Q" field that counts how many other URLs are also in that domain:

In total:
SELECT Mid(URL_URL,1,InStr(8,URL_URL,"/")) AS Domain_URL, Count(URL_URL) AS Domain_Q

FROM T_URLs

GROUP BY Mid(URL_URL,1,InStr(8,URL_URL,"/")), URL_URL

ORDER BY URL_URL;
This will later let me iterate through every URL that exists in a domain which has more than 10 URLs in it.

But the SQL so far returns a (correct) list of URL_URLs and Domain_URLs, but, the Domain_Q says "1" for every line in the results.

It works if I put the Count_Q in a different SQL query, but then I end up creating a circular reference when I try to bring the Domain_Q back into the URL list. Can anyone work out (a) what I'm talking about, and (b) what nifty little trick I'm missing to get this done sensibly?


  • 1
Shouldn't you be counting Domain_URL 's , and not URL_URL 's? That line creates a URL entry for a domain, and I'm guessing if the same old domain doesn't come up there'll be no new entry in the database table?

Also a sensible question; the problem is that there is no field called Domain_URLs so it causes an error, it is worked-out from URL_URL. This is why I've had to put that part in a seperate query - although it still seems like there should be a way round that, I can't find it!

  • 1