File transfer


This document is about how to do simple one-to-one file exchanges. It is a subset of what is needed to do friendcast-based file sharing. See also how we're being pragmatic in the case of PsycZilla:Httpd. As the rest of the world doesn't talk PSYC yet, let's start with HTTP.

See also streamcasting when it comes to multicasting a file to multiple recipients in real-time and software projects that would like to use such an ability.

Contents

Built-in webserver approach

Many clients provide a way to implement file transfers using popular protocols. In this case a pointer to the uniform where the file resides is enough to initiate the transfer. Other information is optional, since some protocols we interchange files with, will not provide them anyway.

:_source	psyc://ihaveeverything/~youneed
:_target	psyc://iwanna/~leech

:_nick	noonee
:_hash_MD5	b1451afea9f7041a6ef7b36525ce0d36
:_name_file	PSYC.ico
:_size_file	1150
:_uniform	http://www.psyced.org/favicon.ico
:_description	An icon that noonee extracted from the PSYC logo
:_tag           someuniquetag
_notice_available_file
[_nick] points you to [_uniform]: [_description].
.

If the sender started an HTTP server which is only available for a limited amount of time, we could signal them that we have finished the download:

:_source        psyc://iwanna/~leech
:_target        psyc://ihaveeverything/~youneed

:_tag           someuniquetag
_notice_retrieved_file
Thanks for letting me leech this

Jabber out-of-band is compatible with this approach so we have some kind of cross-protocol file transfer. The sender may still choose to shutdown the HTTP server after the download of the file is complete though.

Good thing about using HTTP for this: You can let webchat users download and, using the Masinter-FORM-protocol of HTTP/1.1, even upload files to a person running a suitable client. Again, see PsycZilla:Httpd for something in the pragmatic making.

Bad thing about HTTP: When STUN is necessary to get through NATs, we might as well use native file transfers or BitTorrent.

Feature wishlist

<lynX> Right now I do file exchanges with my local throttled thttpd. It's neat, but it isn't integrated and I have no feedback on who is leeching what and when. Of course I could hack that in (then again, it's only an IP number.. who is going to resolve it to the actual person for me?), but a PZ-integrated webserver would do so much better for everyone as thttpd is very geeky to install - and ultimately I still find myself using proprietary IM file transfers even if they aren't throttled. Ouch! So here goes my wishlist:

  1. Drag+drop a file into a chat window generates a temporary hidden http/https URL and sends _notice_available_file to the involved person or place.
  2. Alternatively person can come looking into your public file server. The public could actually be a friends-only file server using some authentication.
  3. A _request_upload_file may cause the potential recipient to either offer a HTTP PUT slot, or an HTML page with a masinter upload form.
  • Consider that all mcs mentioned here are going to be rendered to appropriate stuff in IRC, XMPP and in webchats, so that only one person needs to be running a psyc-integrated httpd for both to share files.
  • Would be nice to have a convenient and easy way to tune the throttling of the transfer, to avoid any network links getting congested.
  • More than that, an up/download manager tab in PsycZilla would be fab, plus notifications. You get to know when somebody downloads or uploads something. Should be as convenient as those transfer status monitors of file sharing apps.
  • Should permission to leech be linked directly to trust? It doesn't take much trust to be granted a leeching license, though.
  • We could generate temporary passwords for HTTP's basic authentication, so we don't have to encode user recognition in the URL. Then again no, that's not practical for manually clicking users.
  • Ability to resume file transfers some other day or after crash
  • Maybe even drop some other file transfer application's unfinished file onto a scheduled or ongoing file transfer listing for it to recycle what has already been downloaded so far and start off from there.

<coyo> i wish for dropbox-style FUSE or similar based drag and drop file sharing, file synchronization, file version control, delta-based diff transfers, with automatic rename on conflict

See crypto sharing for newer thoughts in this area.

<coyo> i wish for inband (over psyc) multicast channel-based file transfer, where you get the file, but you can choose to discard and ignore the incoming data. (this can get spammy, so trust metrics will need to be used to protect against SPIM)

BitTorrent

Nifty about BitTorrent: it's there, quite nifty and fault-resistant in everyday use. it's also in AllPeers and already knows how to do STUN. It may not be a forever solution, but it is one that gets off its feet pretty quickly.

Here's a story about yTorrent mUI, a cellphone user interface for a file sharing app running at home. This shows how the lines between client and server keep blurring, how everything can have the other's role sometimes. Also, a remote control for file sharing may find some advantage in using a common protocol with a chat technology...

And finally, didn't we design PSYC's multicast layer to be pluggable? Let's have BitTorrent as one out of several routing solutions. Yes, it is binary, but so is TCP and UDP, too. Then it isn't so realtime, but it's there for data intensive casts.

One improvement PSYC can give to BT is our trust network: My friends can automatically help seed files that I find important to share to the world. I don't depend only on the folks who want to have it.

Native file transfers

Some Discussion, see discussion page for more

We think the best for requesting a file would be over its hash. Comments? (Who is we in this case?)

the mmp vars of the following fictive packets are wrong, i think, they should contain unl's, because it's p2p (when not using onion routing i think..). so we also need a way to request the permission for file transfers. this could be done by the clients directly or by the server, if it should be allowed generally for friends for example.

<lynX> i think it is perfectly fine to send the first _request to the UNI which checks your identity, then forwards the request to the user's client who will reply to it while providing his _identification. so all you have to do is check his _identification and off you go into p2p wonderland..
<coyo> i really hope that p2p wonderland will not always be a spawned thttpd. it just seems.. uncool/inelegant to me..
<Kuchn> was bringt seine _identification, die krieg ich doch eh als _source? ich will seine ip. also muesst er oder der server die uebertragen, eh? oder überseh ich was?
<Kuchn> oder ganz anders.. wäre es nicht vielleicht sinnig, für bestimmte unis als weiteres profilfeld _host oder wie es hieß freizuschalten? boss() siehts ja jetzt schon. hm, geht ja schon nah an den plan mit den friendship levels.. ab einem bestimmten lvl auch sowas sichtbar. jedenfalls holt der client sich die ip dann aus der/dem variable/paket.
<lynX> müsstest du nicht irgendwo ein _location gesteckt bekommen? wenn der client bei einer anderen UNI nen file requestet, dann sollte die UNI ihrem client die _location mitteilen, wonach dann die locations untereinander p2p reden können
<el> was spricht gegen diesen _request_* die man der uni schickt, um die location des file-share-clients desjenigen zu erfragen? ihr wolltet doch sogar ein _request_conversation einführen..

requesting a file

A packet for requesting a file could look like this:

:_target	psyc://ihaveeverything/~youneed
:_identification	psyc://iwanna/~leech

:_hash	hsd3egfezhchjz3893hedfs3
:_size_file     3942021
:_size_fragments 8192
:_seek_resume    516256
_request_file
.

_seek_resume would be optional, so the client will send the file beginning at byte 516256 (for resuming transfers)

maybe the fragment size could be optionally specified vs the amount of fragments, or clients should have a way to negotiate it?

<eL> Ich bin dafür, generell Teile eines files zu requesten. D.h. offset und length als variablen. Defaultwerte sind dann halt _offset=0 und _size=file-size. Auf diese Weise können wir nicht nur resume sondern auch gleich downloads von mehreren quellen.
<Kuchn> sauber.. hm, dann direkt das _size_fragments von _request_file hier rein, damits dynamisch wie qos/shaping arbeiten kann? feste größe ginge dann ja mit ner pvar klar. .. dann sollte jedenfalls aus _status_available_file _amount_fragments raus: wozu soll das der sender so erfahren, er kann ja selber rechnen und dann kann man ebenfalls dynamisch weitere verbindungen zuschalten
<eL> was soll man hier eigentlich mit der _source anfangen? kommt der wirklich von localhost? Wenn jemand seine ip oder seinen host nicht weiss, dann lässt er _source weg.
<Kuchn> dachte seit fork ist _source und _target/_context zwingend.
<eL> naja, sowas schnappt man sich eigentlich beim _notice_circuit auf, und behält es. oder man kennt seinen dyndns-namen.

The other side could then provide such a file or not:

:_source	psyc://ihaveeverything/~youneed
:_target	psyc://iwanna/~leech

:_hash	hsd3egfezhchjz3893hedfs3
:_name_file	lsd.exe
:_size_file	3942021
:_size_fragments	4096
:_amount_fragments	836
_status_available_file
.

Note that the file sending site doesn't want this "high" fragment size of 8KB/s, so it decreases it to 4KB/s. So it's changed to the optimum value for the site with a lower bandwith to make it possible for both sites to use PSYC still without too much lag.

<coyo> sheesh, 4KB/s is really slow.. is this transatlantic or something? i'm really hoping that circuit speeds of ~700 KB/s would be normal, and possibly as high as 2MB/s would be seen on some circuits. just saying.
<Kuchn> hm, und wnen man eine dateiliste requested und bevor man einen download startet ändert sich was im fs des sharenden? wärs nicht schlecht wenn er das mit übertragenem hash oder zumindest der größe überprüfen könnte?
<eL> _amount_fragments ist eine mmp-variable, die in jedem mmp paket vorkommt, das ein fragment ist. Die info dafür, wieviele fragments kommen und wie groß die im einzelnen sind, ist nicht notwendig. Es reicht im prinzip aus, wenn die eine Seite sagt, wie sie es gerne hätte. Diese ganzen infos kommen in die einzelnen pakete der fragmentierten nachricht.
:_source	psyc://ihaveeverything/~youneed
:_target	psyc://iwanna/~leech

:_hash	hsd3egfezhchjz3893hedfs3
:_time_delta_retry 1800
_failure_unavailable_file_temporarily
I'm phoning currently, so i've to save my bandwidth.
.

There could optionally be a message with a reason (mainly for _notice_file_unavailable, for example "Currently I'm paranoid".

receiving a file

PSYC packets containing files have the method _data_file. It's up to the client coder to decide if fragmentation is needed.

=_sum_fragments 10
=_sum_length    300000     
:_fragment      1
32801
:_name_file parkk.txt
_data_file
[ 32768 bytes of data ]

In fact this isn't a strategy limited to file transfer, it is the general way the routing layer could provide for message fragmentation (and actually did in the perlpsyc implementation). So here we have one PSYC message - the file transfer - split over ten packets, containing ten fragments of the actual message. Here's the last fragment of it:

      
:_fragment      10
:_sum_fragments 10
:_sum_length    300000
5088
[ last 5088 bytes of data ]

Note how the entity header and method are missing in this case. The specification as it currently stands does not permit this fragmentation technique, though.

Since ten times 32768 is 327680, the last fragment is actually smaller than the ones before. The numbers are completely examplaric of course. My guess for a fragment size would be in the 250k range (262144 = 2^18), but some real world statistics would be useful to optimize that.

This fragmentation technique is applicable to all sorts of large data uses, like streaming. Resuming a file transfer the next day, or after a power shortage, is also taken in consideration by providing persistent state. This will have the sender hold all missing fragments in spool until the recipient acknowledges reception.

You can send regular chat messages between one fragment and the next.

sender stops file transfer

<lynX> Shouldn't this be a little einheitlich with recipient stop/resume? (see below)

Or when sending process is not finished:

:_source	psyc://ihaveeverything/~youneed
:_target	psyc://iwanna/~leech

:_hash	hsd3egfezhchjz3893hedfs3
_failure_aborted_file
.

So the client knows it shouldn't save received bytes into a (finished) file. It could also save everything into a temporary one to try to resume it later.

<Kuchn> umgetauft in _failure_aborted_file. was fehlt deiner meinung nach noch? _failure_deleted_file? sollte sowas dann nicht aber eher weiter nach oben direkt nach dem requesten? ich glaub kaum dass die datei während der übertragung gelöscht werden kann/sollte. flock.
<lynX> sagen wir manche betriebssysteme machen es unwahrscheinlich, dass sowas eintreten kann. aber kategorisch wäre ich da nicht. und evtl würde ich dem user sowas auch erlauben. abort & delete bewusst und nicht aus versehen..

stop/pause transfer

If a transfer should be stopped/paused, receiving client could send something like:

:_source	psyc://localhost:4711
:_target	psyc://ihaveeverything/~youneed	
:_identification	psyc://iwanna/~leech

:_hash	hsd3egfezhchjz3893hedfs3
_request_stop_file
.

(_request is not perfect, same situation as with _request_leave, because one can't be forced to receive a file..) Then it will receive a _notice_file_transfer_aborted.

<lynX> guess what, _request_file_stop is not a special case of request_file, as you are not requesting for the delivery of a stop file. so a _request_stop_file makes more sense to me. but your comparison to _request_leave is quite interesting. if file transfers were generally implemented in form of a context that you enter temporarily for a specific filetransfer, then all the conference control would be applicable and what's best: a file transfer could be multicast to multiple people in one go. so now as i think of it - delivering files should actually work by dropping the file into a sort of chatroom for it. but it's not much extra work to also accept a _data_file by unicast, so we can enable both styles.
<Kuchn> sounds nice; but wasn't it mentioned by el already in some kind of? i remember that he wanted to use this with contexts or so to make his plan with onion routing easy to integrate. but yes, both styles would be the best i think; think into the future: when there are video- or voice-conferencing over psyc (not sip like now), this would be needed in every case, right? but a home user who just wants to share data with a friend would like to do real p2p (i am right when i think that with multicast transfers, which are solved with junctions at the moment, a connection to the server would be needed for this. or do i think wrong? either _target or _context in a _data_file-packet..).
<lynX> Telephony wouldn't be a file transfer, but the issues are certainly related and it's good to see how same things can be done the same way (and especially with shared and inherited method families). No you are wrong in the sense that the client can be its server and open up a context for the file transfer to his friends and yes you are right as the client needs to be free from NAT problems. But that's an issue with p2p file transfers anyway, so doing it as a context in the client app is the answer in this case. It works for one-to-one transfers, and whenever you want multicast file transfers you can either use this syntax together with psyced's to implement them, or the other PSYC implementations like perlpsyc are capable of multicasting too by then.
<Kuchn> nein sowas wären keine dateien, aber _data. _data_stream_voice z.b. alles sehr ähnlich, lässt sich doch sicher schon im voraus kombinieren damit mans später einfacher hat. | an nat hab ich keine sekunde gedacht.. ich hoffe du hast überhaupt verstanden was ich sagen wollte x) dass der client nen server spielen kann in dem falle ist aber jedenfalls die antwort auf meine frage.. xD nett jedenfalls, die strategie bietet alles was man sich wünschen könnte. ich werd dieses we mal aktualisierte fiktive paket hier basteln.. mal schaun ob wir dann weiter kommen.
<lynX> ja wenn du das alles (telephony und files) unter einen hut kriegst, hau rein. super.

resume

<Kuchn> _request_file + _offset?
<eL> resume sollte kein sonderfall sein. das ist einfach nur ein request auf den Teil des files, der einem noch fehlt. Ergo nichts anderes als jeder andere request.

successful completion

When file is completely sent, receipient informs sender of successful completion of operations, so that the sender can

  • inform his user that a file was successfully handed out
  • deallocate fragment tables and data structures for this transfer, not wait for some timeout
  • or whatever similar
:_source	psyc://iwanna/~leech
:_target	psyc://ihaveeverything/~youneed
:_reply         <packet-id>

:_hash_file	hsd3egfezhchjz3893hedfs3
:_name_file     my_radio_show_in_estonia.ogg
:_nick          Seppälä
_notice_completed_file
[_nick] has successfully downloaded [_name_file] from you.
.
<lynX> (FAQ) no, this ain't echo. _echo is the acquittal that something you typed or initiated has arrived at destination. A file transfer is too asynchronous for _echo, though. It's more of an event, thus _notice.

browsing/searching files

hm. i think there should be message id's for that, because when requesting a file list maybe the client needs to know for which file request the response packet is.. we can't give our pattern, filesizes etc. into the response list..

<lynX> you don't need to use packet ids for that, mere _tags can suffice. I think they are mentioned in Group Communication. unfortunately we never documented them generally, and also haven't implemented them generically. essentially the client sends a _tag variable containing its own recognition code. the other side merely echoes it back allowing the client to figure out what belongs where. it's like a session id for psyc packets. or a color code sticker, to make it sound friendlier and more harmless.  :)
<Kuchn> (OffTopic) nice one.. sound perfect, not only for this case - is it right that _tag can be used in every message?
<lynX> that's the plan, so if you encounter situations where you want it, we'll make psyced support tags in a generic way.

To let somebody know what files one can request, there should be a method like _request_list_files (for files in general. there could be specified versions for other things, like _request_list_files_media_audio):

:_source	psyc://localhost:4711
:_target	psyc://ihaveeverything/~youneed
:_identification	psyc://iwanna/~leech

:_match_glob	*drugs*.txt
:_limit_maximum_size	1000000
:_limit_minimum_size	10000
_request_list_files

or

:_source	psyc://localhost:4711
:_target	psyc://ihaveeverything/~youneed
:_identification	psyc://iwanna/~leech

:_limit_minimum_size	50000000
:_match_glob	first men on pluto
_request_list_files_media_audio
<lynX> mentioning specific formats in the method doesn't look too appealing to me. also do you really want to ask your user such boring details like "should i really skip ogg files even if they exactly match what you are looking for?" and matching file names or media metadata like ID3 is something the recipient client thinks about. it doesn't belong into the protocol. btw, _glob is the name of *? matching. you could also provide _match_expression_regular or _match_exact.
<Kuchn> first i thought this would be better, because when a mp3 is requested for example, a further class could be inheritet by message class which has some code to handle id3 tags - but you're right.. a client can ignore vars if it wants to. and no, i wouldn't like to ask such stupid questions, because i usually know what i'm searching for- if i would like to find specified mp3's, then i never ever want the same audio in a different format. if format is not important: there's no need to specify it. but this all could be done via psyc vars.. right. and thanks, it was called glob. this word lay on my tongue while writing. ;)

or just such to browse everything (or just videos):

:_source	psyc://localhost:4711
:_target	psyc://ihaveeverything/~youneed
:_identification	psyc://iwanna/~leech

_tag	sdihvj3efnpko
_request_list_files(_media_video)

Client could send a _tag within the packet to be able to manage more than one file list.

now the client could answer with at least one list:

:_source	psyc://ihaveeverything/~youneed
:_target	psyc://iwanna/~leech

:_tag	sdihvj3efnpko
:_amount        2
:_list_files	sdkjsdgf.txt
:	oirgsdfsgd.exe
:_list_hashes	ihjf3029jffw3ihj32f32f
:	oiejf932rf92ßkfwioefj
:_list_sizes	341
:	326216
_list_files
:_source	psyc://ihaveeverything/~youneed
:_target	psyc://iwanna/~leech

:_tag	sdihvj3efnpko
:_amount        1
:_list_files	dsisdgsdg.mp3
:_list_hashes	oiejf932rf92ßkfwioefj
:_list_sizes	4516910
:_list_bitrate	192
_list_files_media_audio
:_source	psyc://ihaveeverything/~youneed
:_target	psyc://iwanna/~leech

:_tag	sdihvj3efnpko
:_amount        1
:_list_files	iojesfdipjsdf.mpeg
:_list_hashes	oiejf932rf92ßkfwioefj
:_list_sizes	51254421
:_list_bitrate	4931
_list_files_media_video

Note that you can get more than one message of the _list_files class, files with other informations than just the basic ones got extra lists in it.

<lynX> Ja okay das geht auch.. also wir können uns dafür entscheiden ein Megapaket voller Listen zu versenden, oder mehrere Pakete. Irgendwie gefallen mir mehrere mehr, dann sollten die Variablen aber nicht mehr _list enthalten sondern lediglich die mc.
<coyo> i'd be really excited to see file browsing capability.

i think these are the main attributes that are important. further details, like those extracted from id3-tags, could be given by further specialised vars.

<lynX> the example of _bitrate however already shows that leaving empty fields in a table dump like this isn't optimal. the alternative would be to dump some form of structured data which unfortunately spells XML to me. otherwise, if we had a way to index the variables, we could make variables that only apply to a certain element.. hmm, too complex too complicated. this is a good way to start and we'll see.
<Kuchn> el said, the order of _list should stay. so there's no real problem.. but it does not look nice and is not efficiently as it could be maybe. you mean something like serialized data in a var? thats... what? as i mentioned in the first revision of this site, the lists in _list_files could be no vars of type _list but instead every file gets its own _list_file like the answers on a _request_description (_list_person_description)- this could be combined.. every special type of file gets its own _list_files with vars of type _list. so after requesting a list of some shared files - which are for example 3 textfiles and 2 mp3s - the user gets 2 _list_files, one _list_files with some basic vars (mentioned above: filename, size, hash) of type _list and _list_files_media_audio with the basic lists plus some additional ones containing bitrates (i forgot it! very important..), id3-tags etc.
<Kuchn> changed to my suggestion. so ok? did you put _amount into the packet? what sense does it make? can't you see it with the amount of values in the lists?
<lynX> _amount probably isn't necessary, yes
<eL> Was haltet ihr davon, hashes auf mp3s nur für den Teil ab id3 zu machen? Das gleiche dann noch für andere media-files mit meta-informationen.. ;)
<lynX> jaja macht nur
<Kuchn> wie meinen, statt hash der datei nur auf den id3-string oder was? hmm...

notes

Availability

yarec asks: would the file be available after the user logs off?

With simple file transfer, no. Not unless you have a server that implements file transfer and you can leave the file there. But if we look into File Sharing using multicast (see also Packet Ids), elaborate implementations would allow you to rerequest fragments of a file from another recipient, resulting maybe in even allowing you to download a former multicast file from a recipient nearby. But that's just theory.

<Monkey> See also Bittorrent.... ;)
<lynX> ... mentioned in Packet Ids.