|
|
Author |
Thread |
smonkey
Paranoid (IV) Inmate
From: Northumberland, England Insane since: Apr 2003
|
posted 05-08-2003 17:16
Is there a way to create a page of my website that automatically scans all directories of my site and then lists links to all the html/php/shtml documents contained within my domain?
I know there are scripts that can spider through links on pages to create a sitemap, but this isn't a viable option in my case as there are no links between pages.
I also know that a php script can be created that scans a directory and pulls out all files of a certain type, so is it possible to scan directories for both html files and other directories? I can't see why this should be so hard, but maybe there is a major reason against it. I dunno, you tell me as you peeps are the brains in this area.
|
Tyberius Prime
Paranoid (IV) Mad Scientist with Finglongers
From: Germany Insane since: Sep 2001
|
posted 05-08-2003 17:30
trivial thing to do... definatly possible.
|
smonkey
Paranoid (IV) Inmate
From: Northumberland, England Insane since: Apr 2003
|
posted 05-08-2003 18:23
|
bitdamaged
Maniac (V) Mad Scientist
From: 100101010011 <-- right about here Insane since: Mar 2000
|
posted 05-08-2003 18:44
PHP readdir
That's got the basics for reading the contents of a directory, what you need to do is read your base directory and get all the files in it. If there is a directory in there you need to read that.
here's psuedo code (not real code just the idea)
function getfiles() {
open directory
read files one at a time. (see link) {
is this file a directory? {
then call get files on that
} if it's not a directory and is a php/html/shtml file {
then create a link to the file
}
}
}
This is a recursive function (a function that calls itself) which should do the trick.
One issue you will run into is that this will work with your server file tree, which isn't the same as your web directory file tree, you will need to convert filepaths on the system to URL s but that's pretty easy
.:[ Never resist a perfect moment ]:.
|
Tyberius Prime
Paranoid (IV) Mad Scientist with Finglongers
From: Germany Insane since: Sep 2001
|
posted 05-08-2003 18:50
actually, I'd built up a structure of nested arrays with bitdamaged' method, and either save that to a file with serialize(), or save the outputted sitemap to a file and usually show just that. Updating would either be done via a cron job, or manually from time to time.
That would save a huge amount of server power.
hey, and you asked wether there were major reasons against it. there were none, and that's what you got ;-)
|
smonkey
Paranoid (IV) Inmate
From: Northumberland, England Insane since: Apr 2003
|
posted 05-08-2003 19:00
sounds cool, exactly what I need
the cron job idea sounds good too, I really only need to update the file every few days or whatever. I don't know anything about php or cron jobs, but I understand the principle outlined by bitdamaged (thanks). I figure that cron jobs are scheduled events (not sure if event is technically the right term). What is server support for cron jobs like?
ok, if I have a go at cobbling these ideas together and show them to you, you must promise not laugh at my feeble attempt which more than likely would cause a server to explode if ever I were to run it. I may require some assistance with syntax and suchlike.
|
smonkey
Paranoid (IV) Inmate
From: Northumberland, England Insane since: Apr 2003
|
posted 05-09-2003 00:12
** MY FEEBLE ATTEMPT NO. 1 **
code:
<?php
function getfiles() {
if ($handle = opendir('/path/to/files')) { // what does this path need to be?
while (false !== ($file = readdir($handle))) {
if (is_dir($file)) {
getfiles()
}
else echo "$file\n"; // echo is like document.write in js isn't it?
} // how do I change this to put the results into an array instead?
closedir($handle);
}
}
?>
Don't laugh, I cobbled this together using my very limited knowledge of javascript to try to understand my non-existent knowledge of php. It's created from stuff in the links, but they were written for programmers to understand and not people like me so I didn't really get it. If you could break this down and tell me what each bit of the code would do if it worked (I'm sure it wouldn't) then I'd be grateful.
What I think it does is open a somehow specified directory, check it's contents, if it's a directory then it starts the function again on that directory also, while if it's not a directory then it writes it to a document. How wrong am I?
I can't test it coz I don't know how to implement it on a page or how to run it or what it should display exactly if it is running correctly or not, I know nothing.
HELP ME PLEASE
[This message has been edited by smonkey (edited 05-09-2003).]
|
bitdamaged
Maniac (V) Mad Scientist
From: 100101010011 <-- right about here Insane since: Mar 2000
|
posted 05-09-2003 02:43
Alright for testing just upload it to your server and view it in your browser, before you do let's fix a couple of things.
first make it so you pass your function the directory to start with
function getFiles($dir) {
blah.. blah...
}
And start by calling it like this
getFiles('/path/to/files");
or if you just want to start with the current directory just do this
getfiles(getcwd());
also you need to change the part inside your function to this
getFiles($file)
since you want to pass that directory to your function
.:[ Never resist a perfect moment ]:.
|
smonkey
Paranoid (IV) Inmate
From: Northumberland, England Insane since: Apr 2003
|
posted 05-09-2003 13:39
Hi thanks for that, but I seem to get a parse error on line 9 ?
This is what I have: code:
<?php
function getFiles($dir) {
if ($handle = opendir($dir)) {
while (false !== ($file = readdir($handle))) {
if (is_dir($file)) {
getFiles($file)
}
else echo "$file\n";
}
closedir($handle);
}
getFiles(getcwd())
?>
I have also quizzed a comment poster from the php.net manual, he came back with a whole script that works great but I'm unsure how I convert the data into a nice sitemap or file tree or something that shows the structure of the site in a tidier way. Also his file registers all files and not just .htm/.html/.php/.shtml etc. How can I change this? do you think me a loser for giving up with my own script in favour of one created by someone who knows more than me?
** HIS CODE **
code:
<?php
/**
* Extraction of files information from a certain directory
*
* The function declared, parseDir will go through all files and
* sub-directories of the directory used to first call it. This function
* is recursive, which means it calls itself if a sub-directory is
* encountered. It's execution ends when there is no more file and/or
* directories in the directory used to first call it.
*
* This script runs on both Windows and Linux ;)
* PHP 4.1.0 recommended
*
* @author Jean-Philippe Léveillé <jpleveille@webgraphe.com>
* @copyright (c) 2003 released4free.org
*/
// the path we want to display - "." for current working directory
// "." is where this script reside
$root_path = ".";
function parseDir($path)
{
if (!($dir_handler = opendir($path))) {
die("Can't opendir() path '". $path ."'");
}
// now we have a directory handler in $dir_handler
$files = array();
// getting each file of the current directory
while (($file = readdir($dir_handler)) !== false) {
// realpath (no mix between \ and /)
$temp_path = realpath($path."/".$file);
// we do not parse "." (current directory) and ".." (parent)
if ($file != "." && $file != "..") {
// if we have a directory, we call recursively
if (is_dir($temp_path)) {
$files = array_merge($files, parseDir($temp_path));
// otherwise we record the file in the array
} elseif (is_file($temp_path)) {
// you could filter the extension of the file if you only
// need HTML files here - use path_info()
// [url=http://www.php.net/manual/en/function.pathinfo.php]http://www.php.net/manual/en/function.pathinfo.php[/url]
$files[] = $temp_path;
}
}
}
// returning all files found in the current directory and its
// sub-directories
return $files;
}
// display a debug output of the Array constructed
print_r(parseDir($root_path));
?>
|
bitdamaged
Maniac (V) Mad Scientist
From: 100101010011 <-- right about here Insane since: Mar 2000
|
posted 05-09-2003 15:37
lol his is virtually the same thing, just a few small bells and whistles. It's up to you if you want to learn this or just use his script.
Parse errors in PHP much of the time will be a missing semi-colon (And missing semi-colons will usually actually be the line number before the error, because technically it doesn't break where the semi-colon is missing, it breaks where the next code is.)
.:[ Never resist a perfect moment ]:.
|
smonkey
Paranoid (IV) Inmate
From: Northumberland, England Insane since: Apr 2003
|
posted 05-09-2003 17:35
I want to learn, but it's hard when I don't know what I'm even doing, I tried what you said with my code and the semicolon on the previous line and it fixed that parse error but then it throws up another on the last line which I can't seem to figure out, I think part of my problem may well be I'm trying to assemble php like it's javascript, plus I don't even know javascript beyond the most basic and primitive things so it's causing problems.
So if someone could explain why my bit of code doesn't work or even run that would be useful in helping me to understand php a little better.
Also if someone could tell me in simple terms what both codes are actually doing at various points that would help too. I think I know what my code does as I built it with the logic supplied by bitdamaged combined with my own linear way of thinking. I know the other guys code has been annotated by him but it uses too many terms I'm unfamiliar with (like 'handler' etc.), please talk to me like I'm really dumb, that way maybe I will start to understand what is happening when.
Thanks guys.
[This message has been edited by smonkey (edited 05-09-2003).]
|
bitdamaged
Maniac (V) Mad Scientist
From: 100101010011 <-- right about here Insane since: Mar 2000
|
posted 05-09-2003 18:50
code:
<?php
// Alright, start the function called getFiles,
// $dir is the function input (a directory name)
function getFiles($dir) {
// We're going to start this with somthing to tell us where we
// are at
echo "<p><hr> Searching Directory $dir <br>";
// Alright, first we have to open the directory
// and get a directory "handle". $handle isn't a
// normal variable, it's a pointer to the directory
// stream, but don't worry about that, just know
// when using files or directories, you need
// to open them first and create a variable
// that points to that open directory.
//
// This line actually does 3 things in one
// It opens the directory and assigns the directory
// handle, by wrapping it in the if statement we check
// at the same time to make sure we actually opened
// The directory, if we couldn't open the directory
// (say you put a bad path in there that doesn't exist
// It would return false
if ($handle = opendir($dir)) {
// Basic while loop, you don't see this in JS much
// easy enough though. It keeps looping as
// Long as the test returns true.
// breaking it up you have this ($file = readdir($handle))
// What that does is readdir (read directory). Looks in
// The directory that $handle refers to, returns the first
// file and increments the directory pointer, the pointer
// is a mechanism that it uses to know where in the file we
// are at. Technically it's not returning the first one, it
// returns the file that the directory pointer is pointing at
// which the first time through will be the first file.
// The second time through the while loop(since we incremented the pointer the
// first time) it will point to the second file. When it
// runs out of files readdir will return false.
//
// This bit (false !== looks at $file and returns true as
// long as $file is NOT false (it's a double negative)
// that is as long as readdir returns a file $file is true
// (or NOT false) when readdir runs out of files it returns
// false making (false !== ($file) true. (Make sense?)
while (false !== ($file = readdir($handle))) {
// Real easy, if $file is a directory then start this
// funciton again because we need to look in that directory
// This is why it's "recursive", the function calls itself
if (is_dir($file)) {
getFiles($file);
}
// If it's not a directory then print the file name
// Instead of \n we're going to print a <br> tag
// because \n means nothing to a browser
else {
echo "$file <br>";
}
// Close that directory $handle
closedir($handle);
}
}
// This just starts this whole cycle with the Current Working Directory
// As a starting point
getFiles(getcwd())
?>
one of the things you may be having problems with is the difference between the server paths and the web document path.
This uses server paths. which is the complete path to the file on windows machines this will usually be something like C:\\phpdev\public\htdocs\mytest.php on Unix something like /var/apache/username/htdocs/mytest.php where the web url is just
http://www.mysite.com/mytest.php
.:[ Never resist a perfect moment ]:.
[This message has been edited by bitdamaged (edited 05-09-2003).]
[This message has been edited by bitdamaged (edited 05-09-2003).]
|
Tyberius Prime
Paranoid (IV) Mad Scientist with Finglongers
From: Germany Insane since: Sep 2001
|
posted 05-09-2003 18:55
parse error's on the last line usually mean you didn't get your {} straight...
|
smonkey
Paranoid (IV) Inmate
From: Northumberland, England Insane since: Apr 2003
|
posted 05-09-2003 19:58
Thanks BD, you are a great help, keep working on me and maybe I'll get somewhere with this php malarky
TP: I've looked at the {} and as far as I can count they are all paired, so it must have something to do with their positioning, is there any positioning rules in php that differ from javascript? why do lines have to end with a semicolon in php whereas a lot of the time you can end a lone in javascript with a line break/carriage return or in the case of curly parentheses you just close them and that ends it. What syntax differences should I be aware of?
*edit* d'oh, they weren't all paired, but I'm still not quite geetiing the thing working properly
[This message has been edited by smonkey (edited 05-09-2003).]
|
bitdamaged
Maniac (V) Mad Scientist
From: 100101010011 <-- right about here Insane since: Mar 2000
|
posted 05-09-2003 20:13
Actually there is a parentheses missing. I didn't close the while loop
Most languages require the semi-colon. JS is one of the few that lets you get away with that sloppy bit. The reason it's required is that sometimes you want to break up a single line of code on a couple of lines for readability.
Something like this (this is real code I crap you not)
code:
$winkAdmin->update_existing($USER->{user_id}, ${$PARAMETERS->{element_id}}, ${$PARAMETERS->{wink_display}}, ${$PARAMETERS->{wink_headline}});
Is easier to view if it's
code:
$winkAdmin->update_existing(
$USER->{user_id},
${$PARAMETERS->{element_id}},
${$PARAMETERS->{wink_display}},
${$PARAMETERS->{wink_headline}}
);
Get used to the semi-colons
.:[ Never resist a perfect moment ]:.
|
smonkey
Paranoid (IV) Inmate
From: Northumberland, England Insane since: Apr 2003
|
posted 05-09-2003 20:41
I've been fiddling, but my if and else statements seem to be the problem, something isn't working right there, I seem to get a massive list of the echo <p><hr> Searching Directory $dir <br> but it doesn't list the name of the directory except for the starting directory, and it doesn't list any files whatsoever. On fiddling around I have managed to get files listed for the starting directory only, so that obviously isn't 'recursing' properly when it finds directories within the starting directory. This is so damn confusing for my crap brain.
|