Background job management & custom FileSystem provider

Topics: Developer Forum
Developer
Feb 9, 2007 at 3:27 AM
Edited Feb 9, 2007 at 3:32 AM
I'm thinking about a custom file system provider. The powershell builtin has several limitations:
- it uses System.IO.FileSystemInfo, which is slooow, since it calls FindFirstFile/FindNextFile under the hood, but it does not use the data returned to construct the FileInfo/DirectoryInfo objects. Instead, it unnecessary calls GetFileAttributes for each item returned. This does not seem to be fixed in .net 3.5.
We would not be able to return System.IO.FileInfo, but I believe it is not a big deal, if we provide the same interface, and perhaps a conversion operator. We might also return PSObjects containing our Pscx.IO.FileInfos, with additional System.IO.FileInfo typename. That would make such object almost undistinguishable from the system variant, at least for the format and type extensions.
- I'd like to see something like Copy-ItemWithProgress, which would allow progress reporting / asynchronous completion. that would require some background-job management facility. perhaps a Jobs:\ provider, managing a pool of runspaces and list of non-runspace pending jobs (like the mentioned async copy)
- we would be able to support junctions and symlinks natively, which would gain us another performance benfit against the current type extension implementation.

... just a brain dump .. :)
Developer
Feb 9, 2007 at 8:57 PM
A Jobs provider! brilliant! I love it Jachym... it's often helpful to try to envisage how you would use the file command metaphors in this case (dir, new-item, del, copy, move etc) to see if a provider is the "right thing."

You could create virtual folders in the jobs provider which represent priority:

jobs:\high\
jobs:\normal\
jobs:\low\
jobs:\idle\

This would allow you to move-item jobs between folders, effectively changing their priority. move-item works here (copy doesn't make sense)

remove-item kills the job ; a cmdlet (Stop-Job?) would attempt a graceful shutdown by informing the job to finish up perhaps?

Continuing the folder metaphor, you could also create:

jobs:\running\
jobs:\paused\

a dir in the root would show all jobs in all states, etc.

thoughts?
Coordinator
Feb 10, 2007 at 8:07 AM
I like the idea of a jobs provider. Makes me think that a scheduled tasks provider (or perhaps just a set of cmdlets) would be handy also. The filesystem provider sounds pretty ambitious. :-) Would we make that an "option" to use the PSCX filesytem provider instead of the PowerShell provided, uh, provider. It might also be worth looking into whether the PoSH team is doing performance enhancements with their filesystem provider for the next PowerShell release.
Developer
Feb 10, 2007 at 11:56 PM
regarding the background job manager: i think a simple flat provider would be better. in vast majority of cases, you are not interested in the thread priority. exposing a ThreadPriority property seems much better to me. also, it would be confusing to see the same job in more "directories", (in root, in the priority dir, perhaps in notstarted/running/completed directory). not to mention the implementation complexity.

there are more important issues to solve: should we do any session state sharing between the runspaces? and should we recycle the runspaces? creating a new runspace and running all the profile scripts takes some time. on the other hand, you don't know what state the runspace is in after a job completes

wrt the filesystem provider, I was told they are working with .net framework team to provide enumerator-based wrapper of the FindFirst/FindNextFile. but they are not planning to do any progress reporting, so I think our own filesystem provider would still have its value.

scheduled tasks provider would be great. have you seen any documentation of the brand new scheduler in Vista? I remember seeing some COM interfaces, but that was in win xp...
Developer
Feb 11, 2007 at 3:56 AM
BTW, Oisin, speaking of pausing and gracefully stopping the jobs, how would you implement this? There is no way of "gracefully stopping" an executing pipeline. It wont be a problem for jobs implemented in c# (tab completion cache, the async file copy), but most of the jobs will be simple scripts...
I've recently written a windows service, hosting a runspace which ran a loop watching changes to some directory. The service startup code set a $Proxy variable, which had a StopPending property, and the script checked that in the loop. Maybe we could do something similar.
Developer
Feb 11, 2007 at 6:39 PM
> it would be confusing to see the same job in more "directories", (in root, in
> the priority dir, perhaps in notstarted/running/completed directory).
> not to mention the implementation complexity.

I think it's very limiting to imagine providers in the one-dimensional sense of a traditional filesystem. I prefer to look on them as views: they are much more flexible than just simple "directories" or "folders." if you use the aliases like "dir" and "ls" a lot with providers, I suppose it's easy to get trapped in that paradigm. I don't think it complicates the implementation either ; it all depends how you approach it mentally. For another -- perhaps more easily imaginable -- example, what about a gmail provider, or a RSS provider that supports labels as opposed to the hiarchical containers. Items can be in one or more containers simultaneously, it's just an abstraction after all.

I don't know enough about runspaces to suggest the technical implementation, but the win32 service model appeals, but perhaps this is only a real full possibility on Vista, with it's new cancellable async i/o apis. loops and polling stopping variables work fine for jobs that can take a break frequently enough to look, but I suppose we should start small ;)




Developer
Feb 11, 2007 at 9:00 PM
Ok, I agree, there are cases when it makes sense to have more paths to a single item. However I still think the job provider should be simple & flat. The priority changing is such an edge case, and filtering by status would be nicer if done by a "group-by" format control. I don't think you'd have more than a few concurrently running jobs any time, either.

BTW, the Vista innovation lies in SYNCHRONOUS io being cancellable. You were able to cancel async I/O easily since the first version of NT, but it was impossible to cancel synchronous pending IO from other thread. But that's hardly a graceful shutdown, we can just as well kill the runspace thread and the result would be the same. I think the polling model will be the best to begin with; we will need to provide some state object to the job to get parameters from the invoking runspace either way, so why not provide a StopPending boolean as well. It is up to the job author, whether she makes her job cancellable nicely. If not, we will wait few seconds for the runspace to finish, and after that kill it mercilessly.
Developer
Feb 13, 2007 at 5:40 AM
changed my mind about the containers. they wont be used for filtering, though. the user will be able to create two types of container jobs. one will execute its children sequentially, the other one in parallel. the root children will be obviously executed in parallel.
and a future "robocopy" cmdlet or ftp provider could copy directory structures by incrementally creating a job tree...
Developer
Feb 13, 2007 at 9:25 PM
Edited Feb 13, 2007 at 9:26 PM
aha! ;)

sounds pretty cool. maybe push-job/pop-job/peek-job/insert-job cmdlets for querying/manipulating the sequential queue? add-job, remove-job for the parallel one?

I love the idea of having the job provider there for use as a general service layer. it would make creating async cmdlets and providers quite easy to implement. nice!

Developer
Feb 14, 2007 at 12:10 AM
another usage: file system watcher job, so you won't need IronPython or the blocking WaitForChanges...
(http://thepowershellguy.com/blogs/posh/archive/2007/02/10/using-ironpython-from-powershell-part-1-watch-folder-for-changes-without-blocking-console.aspx)

WRT the cmdlets: I think New-Item will handle the add-, push-, and insert-job cases easily. altough we can provide simple wrapper functions. the question is, how to sort the sequential container? I was thinking the jobs would be identified and sorted by their names. And the sequential container might support an -Index dynamic parmeter on Move-Item, what do you think?
Developer
Feb 14, 2007 at 5:44 PM
Just for fun, in the Start method of the provider, you can add a default async job to check the latest version of PSCX. We could do this with a WebRequest and a special page on the wiki that contains the latest version number, e.g. 1.1 and perhaps the release date.

thoughts?

Developer
Feb 14, 2007 at 6:35 PM
nice :)