Yes, indeed I had thought about that idea. Even perhaps adding a RAMDISK
to the list of devices (and somehow marking it as always dirty and not
being able to store information permenantly). That would allow you to
setup a multilevel cache system, again as in virtual memory.
> >3. It might be difficult to get every filesystem to support the idea of
> >growing to accomodate more space.
>
> Yes, this probably a though thing to do, a better idea is as you have
> stated bellow to use a lokal disk as cache or first tear storage and a
> Network, Tape or Optical Library as Near line storage.
Well, this was just sort of an aside... I already knew that wasn't what I
wanted shortly after I thought of the idea.
> >5. This is *not* a filesharing system. It would be implemented at the
> >*block* level. There would be an equivalent of a "page table" for looking
> >up information. Only the original writer is guaranteed to know where
> >their files were. Sharing would have to be done with NFS or something
> >else.
>
> I would suggest making the system have a daemon running that would write to
> what ever mount point (ext2, NFS, smb) or would have the possibility of
> writing to tape device. This would make implementing HSM using the same
> mechanism much easier. This user space program could also be used to write
> two copies of the same data to two different devices ( local or remote) and
> thus we would have a bit of redundancy.
I'm not sure how easy a tape device would be to integrate into this
system. It could be very inefficient. I would like to know more about
your HSM ideas. It sounds like we're working towards similar goals and
we should try to be sure our ideas are compatible...
> >8. Rather than having a dedicated, fixed size cache, the system would be
> >an all-cache system. When I requested a file to my workstation, if it was
> >not already local, it would be copied to my local hard drive over
> >something else, and then whoever I copied it from could mark that block as
> >"able to be written over" since I have a valid copy.
> I would suggest using a slightly more complicated model. When a file is not
> used for a period of time and the system has reached a certain level files
> would be premigrated (ala Veritas). If more space is needed on the Virtual
> block device the local copy gets deleted. If the remote instance would only
> be marked free if the blocks were modified. If a read occured then we would
> have to leave the blocks as premigrated this would lead to reduced network
> traffic.
Well, you're working at the file level here, and I want to stick to the
block level. Actually though, there's nothing to say I can't write blocks
to the tape device, too... I'd have to have a sequential method of access
too... Integrating tape into this system will need some work. Again, I'd
like this to work with minimal FS modification (only the need to notify
the block device of freeing an inode or equivalent is obviously
necessary). Your premigration sounds reasonable enough unless the network
is already heavily in use... It might be possible to mirror and migrate
when detected network activity is low.
<OFFTOPIC>As an aside, it would be neat if the kernel would pre-write
pages to disk so that when it came time to swap to disk, there would
already be some pages written. Of course this should only happen when
there isn't alot of disk accesses going on.</OFFTOPIC>
> >10. A quota system would probably be desirable to keep it from going
> >completely out of hand, but then again, if you pick reasonable partition
> >sizes, it should be okay since you won't expand past what you decide the
> >partition be.
>
> Quota would definatly be disireble, also the ability to mark files as non
> movable objects. I.E. stuff that has to remain in the local cache disk
> would be desireble.
Yeah, quotas aren't as critical as I had initially imagined because as
long as you make say a 4gig partition or 8 gig partition then you'll be
fine because obviously you can't consume more than 8 gigs of space.
Still, it would probably be useful (although possibily quite costly).
The whole system would be very insecure and would require trust between
all the people using it (right?) unless there were some very expensive
security measures put in (maybe sometime later...) so quotas probably
won't need to be air tight just yet, especially at the cost of
performance.
> >Anyway, it's an idea that I would like to persue if any of you think it's
> >viable. It would be alot of work to design, but hey, that's what senior
> >design is all about, right? Any comments would be greatly appreciated.
>
> It is a viable method since people from Veritas, Netstor (CA) ,etc are
> doing it for UNIX/Windows NT
> so why not Linux. Btw. I like your ideas of doing HSM over a network in a
> clustered environment this is something I had not considred. You should
> have a look at the latest BYTE magazine and at the work of the IEEE SSS?G
> (I can't remember what they are called) who are trying to standardize
> storage systems.
Well, I don't know enough about what HSM is (other than what it stands
for) to know whether what I'm doing is that...
> >If you have any interest, please e-mail me, and maybe I can setup a list.
> >I've actually thought through the problem more than I can relate here.
>
> If you don't mind I would like to try to help you with your endeavour. I am
> not a great programmer but I could help set up a web page/mailing list and
> try to help out with the programming/ideas.
I appreciate the offer. If I really want to use this as my senior design
project, I'm not sure just how far I can get ahead, but my professor
assures me that it shouldn't hurt me to go ahead and get some ideas.
First thing (I think) should be to figure out the mechanics, and that
needs no programming knowledge, just a practical sense of computing.
1) How do we arrange information on the disk so as to be efficient?
2) How can we keep the "page table" for this whole system small and
updated?
3) How do we detect and handle network outages?
4) How do we prevent fragmentation?
5) What kinds of optimizations should we keep in mind?
6) How do we keep the protocol open for modification and enhancements like
security some day?
7) What kinds of media could we support? How about sequential things like
tapes (from your suggestion here)? What about WORM drives?
Anyway, there's alot to do before we code. If there are enough people
interested, I'd say a list is probably reasonable to do, along with a
basic web site, etc etc...
Thanks for your interest in my project!
-Mark
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu